Age | Commit message (Collapse) | Author |
|
This rebase is a better implementation of the previous ones.
Modifications are done to reduce the total clock cycle.
Speedup: 1.341
Compiled with -O3
Tested with: park_joy_420_720p50.y4m
Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
|
|
|
|
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h
Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
|
|
|
|
This removes a lot of operations in setting partition context...
Change-Id: I365e6f5607ece85190cb21443988816dfa510ce3
|
|
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.
Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
|
|
on arm until we implenment real vp9_idct32x32_34_add_neon.
This issue is due to commit 47665452f0da3c11427ecb4852535e1787bb0c5b
Merge "Add 32x32 idct function for eob<=34 case".
Change-Id: I56b5f0abc20e7dd1bba521f78a995e85d65ea296
|
|
|
|
|
|
Removes silly operations inside loop.
Change-Id: I9eeab1e914e715a887f86cf1089de508e2364165
|
|
|
|
|
|
Change-Id: If97ae16a4478717933345b6b9d5bc1b417b8dd84
|
|
Change-Id: Ib748eb287520c794631697204da6ebe19523ce95
|
|
Change-Id: Ic6770072f80dfb54d2725ed96370d4f243a9f474
|
|
Change-Id: I9d18f351abe7614107f34f47eeb38a234a9937c9
|
|
Change-Id: I4e2ad4b7342681e6ac236356ef3a4927a54f105b
|
|
Simplifies the code by implementing band mapping with static arrays.
A lot of the code complexity introduced in a previous patch
disappears.
Change-Id: Ia3fac36e594fb5ad2d55ae141c58bba4c55c2d28
|
|
|
|
|
|
|
|
|
|
Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c
|
|
Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d
|
|
Moving because q_index is used only by encoder.
Change-Id: I0b96175614ed4fd3d76ee56a0ba36258e1e896f6
|
|
|
|
|
|
|
|
|
|
Change-Id: I60e02fa3de930ff1f969687ab5af93dee40d86ad
|
|
As Jim suggested, 1D array was used to store filter levels instead
of 2D array. This used shift_y in setup_mask directly, and saved
few cycles.
Change-Id: If61ab298784861f1806b1cd396d4e4e2e0f097b9
|
|
|
|
iOS doesn't recognize B:
bad instruction `B idct32_pass_loop'
Change-Id: I3cf6aede4639f1d9efa97f7962fa287ba6feaaef
|
|
|
|
|
|
Implements scan order to band map with arrays in both the encoder
and decoder to remove conditional statements.
Encoding seems to be about 1% faster at speed 0, tested on football.
Decoding seems to be about 0.5-1% faster on a set of 25 videos.
Change-Id: Idb233ca0b9e0efd790e30880642e8717e1c5c8dd
|
|
Removing foreach_predicted_block_visitor and calling build_inter_predictors
directly.
Change-Id: I11bb3c872b99b47c2680b01b0dbcc01c558c4a2b
|
|
Added loop filter mask checking, and made the caller function
ready for implementation of parallel loopfiltering in horizontal
direction.
Next, we need to go through the loopfilter functions (both c and
optimized versions), and provide 16-byte wide loopfiltering for
each filter type.
Change-Id: Ifef47e7ef9086ebc2fd6ca7ede8f27c9bbf79e66
|
|
We use {sb, mb, b, ab}_index only inside encoder, so moving them into
appropriate data structure.
Change-Id: Ib5c1036716354d9d321e11a60c1634c1cb8f9716
|
|
Make the macroblockd_plane contain dynamic buffer pointers instead
static pointers to the memory space allocated therein. The decoder
uses the buffer allocated in pbi, while encoder will use a dual
buffer approach for rate-distortion optimization search.
Change-Id: Ie6f24be2dcda35df7c15b4014e5ccf236fb3f76c
|
|
Change-Id: Ic416e3f8a11e82ee298e6f709b2119a9ddf1e2f8
|
|
|
|
Inlining set_contexts_on_border() into set_contexts(). The only difference
is the additional check that "has_eob != 0" in addition to
"xd->mb_to_right_edge < 0" and "xd->mb_to_right_edge < 0". If has_eob == 0
then memset does the right thing and works faster.
Change-Id: I5206f767d729f758b14c667592b7034df4837d0e
|
|
|
|
This patch continued the work done in "Rewrite loop_filter_info_n
struct"(commit:00dbd369c70270428d56da6d15ea5486fc821c52) to further
improve loopfilter function.
1. Instead of storing pointers to thresholds, store loopfilter
levels within 64x64 SB;
2. Since loopfilter levels are already calculated in setup_mask,
we don't need call build_lfi to look up them again. Just save
loopfilter levels in setup_mask.
3. Reorganized and simplified filter_block_plane().
Tests showed a ~0.8% decoder speedup.
Change-Id: I723c7779738bbc2afcb9afa2c6f78580ee6c3af7
|
|
I63df79a13cf62aa2c9360a7a26933c100f9ebda3."
|
|
|
|
SVC multiple layer per frame encoding is invoked with vpx_svc_init and
vpx_svc_encode. These interfaces are designed to be invoked from ffmpeg.
Additional improvements:
- make dummy frame handling a bit more explicit
- fixed bug with single layer encodes
- track individual frame sizes and psnrs instead of averages
- parameterized quantizer, 16th scalefactors, more logging,
- enabled single layer encodes to generate baseline
- include new mode for 3 layer I frame with 5 total layers
Change-Id: I46cfa600d102e208c6af8acd6132e0cc25cda8d4
|
|
Change-Id: I04c55daef89bca2b85cb7db0850f9b052abc5a7c
|
|
|