Age | Commit message (Collapse) | Author |
|
Change-Id: I9789476865a1b24dad54115d8f7edb4fed780b90
|
|
macroblockd."
|
|
This improve the deocde performance by ~2% on Nexus 7 2013.
Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3
|
|
|
|
Change 72193 made the encoder behave differently
when configured with and without high bitdepth.
This change means the same algorithm is used for both.
Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
|
|
Change-Id: Ifcf2efbb232ea4cabcdebbe77e0820d121e4a6da
|
|
For key frame at speed 6: enable the non-rd mode selection in speed setting
and use the (non-rd) variance_based partition.
Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames),
mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16.
Loss in key frame quality (~0.6-0.7dB) compared to rd coding,
but speeds up key frame encoding by at least 6x.
Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6.
Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
|
|
Also removes some spurious changes in common/vp9_blockd.h which
was introduced by a rebase issue between nextgen and master branches.
Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
(cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba)
(cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3)
(cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
|
|
Change-Id: I90ad08823e1d038384536fa9f458caadc2c87f38
|
|
|
|
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.
The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected. For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.
Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
(cherry picked from commit 454342d4e77dbb67f4a3c10f97a57a6fcb46d9a0)
|
|
|
|
Also includes block error.
(This patch is mostly cherry picked from
commit db7192e0b014a331a1dcb102c8a1148e9f0e1081)
Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78
|
|
It improves the speed performance of vp9_fdct8x8_quant_sse2 by
about 5%.
Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097
|
|
|
|
|
|
This commit reworks the forward transform and quantization process
for 8x8 block coding. It combines the two operations in a single
function to save a store/load stage of the original transform
coefficients. Overall the speed -6 is slightly faster (around 1%
range). The compression performance of speed -6 is improved by
3.4%.
Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
|
|
vp9_quantize_fp is the quantization process used by rtc coding
mode. This commit adds a sse2 implementation of it. The
implementation is modified based on vp9_quantize_b_sse2. No speed
difference from ssse3 version.
Change-Id: I24949c5b27df160b4f35117d28858d269454e64a
|
|
The function pointer in compressor instance does not change, so this
commit changes to call the function directly.
Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5
|
|
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f
(cherry picked from commit d7422b2b1eb9f0011a8c379c2be680d6892b16bc)
(cherry picked from commit 6d741e4d76a7d9ece69ca117d1d9e2f9ee48ef8c)
|
|
|
|
Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f
(cherry picked from commit b1a6f6b9cb47eafe0ce86eaf0318612806091fe5)
|
|
Change-Id: I266777d40c300bc53b45b205144520b85b0d6e58
(cherry picked from commit a1b726117f5470f227bc90cd030b7d25045dc510)
|
|
|
|
This commit rename a reserved color space entry to BT_2020, it intends
to provide support for VP9 bitstream to pass along the color space
type defined in BT.2020(Rec.2020)
please note this entry does not have any effect on encoding/decoding
behavior, but allow applications to the pass the information along
from encoding end to decoding end.
Change-Id: I4678520e89141ea5e8900f7bd1c0e95b710b7091
|
|
This patch was to fix the vpxdec fuzzing3 test failure. When an
error occurs, setjmp() is invoked, which calls the decoder
removing routine. In multiple thread situation, other threads
could try to access the frame context memory that is already
deallocated, thus causing a segfault.
An invalid unit test was added for this issue.
Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952
|
|
|
|
This will save the memory and improve the decode speed due to
removing unnecessary memset of big prev_mi array for
all the key frames.
Decoding a all key frames 1080p video shows speed improve around 2%.
Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10
|
|
For configured with --enable-vp9-highbitdepth
Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
|
|
Change-Id: I2ceee7341d906259002c0ea31ea009ae32c04bfd
|
|
|
|
In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic
_mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1.
until it will be fixed I created a workaround that create the up convert by
using broadcast128+shuffle.
The bug was reported here:
https://code.google.com/p/webm/issues/detail?id=867
Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351
|
|
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.
Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
|
|
INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV."
|
|
* changes:
add vp9_loop_filter_data_reset
move LFWorkerData allocation to VP9LfSync
vp9_loop_filter_frame_mt: remove pbi dependency
vp9_loop_filter_frame_mt: pass planes directly
vp9_loop_filter_frame_mt: pass VP9LfSync directly
vp9: store TileWorkerData allocations separately
|
|
Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275
|
|
|
|
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.
Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
|
|
When profiling, gprof can't distinguish between matching labels in
different files.
Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f
|
|
INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and
IF_DIFF_REF_FRAME_ADD_MV.
Change-Id: Ic36c9eb6ccb8ec324d991f7241e42b40b60b1dcb
|
|
|
|
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
|
|
Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/
(a92f987a6b7819ae5c62a429e126e1c26bdb1b71) on highbitdepth branch.
Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8
|
|
this removes an assumption that worker->data1 would be pointing to a
TileWorkerData allocation.
additionally, within the multi-threaded loopfilter pass VP9LfSync as a
parameter to the worker hook, removing the need for a shadow pointer in
LFWorkerData.
Change-Id: Ic7b2faa34e3eb59dbcb8a7c67f333448fa047c88
|
|
|
|
This is based on the 64-bit ssse3 quantizer.
1.1x speedup for screen content at speed 7.
Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448
|
|
|
|
|
|
Change-Id: I016b4e77d8268e189473f4c382603afe1ae1750f
|
|
Change-Id: I4b4764463f5a7cdc01ec004b882c6237466c74b0
|