summaryrefslogtreecommitdiff
path: root/vp9/common
AgeCommit message (Collapse)Author
2014-12-08Fix the comments.hkuang
Change-Id: I9789476865a1b24dad54115d8f7edb4fed780b90
2014-12-08Merge "Improve the performance by caching the left_mi and right_mi in ↵hkuang
macroblockd."
2014-12-05Improve the performance by caching the left_mi and right_mi in macroblockd.hkuang
This improve the deocde performance by ~2% on Nexus 7 2013. Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3
2014-12-05Merge "Merge set_prev_mi function into encoder function."hkuang
2014-12-04Use the RTC optimizations when in high bitdepth mode.Peter de Rivaz
Change 72193 made the encoder behave differently when configured with and without high bitdepth. This change means the same algorithm is used for both. Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
2014-12-04Merge set_prev_mi function into encoder function.hkuang
Change-Id: Ifcf2efbb232ea4cabcdebbe77e0820d121e4a6da
2014-12-03Enable non-rd mode coding on key frame, for speed 6.Marco
For key frame at speed 6: enable the non-rd mode selection in speed setting and use the (non-rd) variance_based partition. Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames), mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16. Loss in key frame quality (~0.6-0.7dB) compared to rd coding, but speeds up key frame encoding by at least 6x. Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6. Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
2014-12-02Added high bitdepth sse2 transform functionsPeter de Rivaz
Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba) (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3) (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
2014-11-24Fix a tautological assert.Alex Converse
Change-Id: I90ad08823e1d038384536fa9f458caadc2c87f38
2014-11-24Merge "Refactored idct routines and headers"Debargha Mukherjee
2014-11-24Refactored idct routines and headersPeter de Rivaz
This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665 (cherry picked from commit 454342d4e77dbb67f4a3c10f97a57a6fcb46d9a0)
2014-11-21Merge "Added highbitdepth sse2 acceleration for quantize"Debargha Mukherjee
2014-11-19Added highbitdepth sse2 acceleration for quantizePeter de Rivaz
Also includes block error. (This patch is mostly cherry picked from commit db7192e0b014a331a1dcb102c8a1148e9f0e1081) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78
2014-11-19Enable ssse3 version of vp9_fdct8x8_quantJingning Han
It improves the speed performance of vp9_fdct8x8_quant_sse2 by about 5%. Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097
2014-11-19Merge "Combine fdct8x8 and quantization process"Jingning Han
2014-11-19Merge "Add sse2 version for vp9_quantize_fp"Jingning Han
2014-11-18Combine fdct8x8 and quantization processJingning Han
This commit reworks the forward transform and quantization process for 8x8 block coding. It combines the two operations in a single function to save a store/load stage of the original transform coefficients. Overall the speed -6 is slightly faster (around 1% range). The compression performance of speed -6 is improved by 3.4%. Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
2014-11-18Add sse2 version for vp9_quantize_fpJingning Han
vp9_quantize_fp is the quantization process used by rtc coding mode. This commit adds a sse2 implementation of it. The implementation is modified based on vp9_quantize_b_sse2. No speed difference from ssse3 version. Change-Id: I24949c5b27df160b4f35117d28858d269454e64a
2014-11-17change to call vp9_refining_search_sad() directlyYaowu Xu
The function pointer in compressor instance does not change, so this commit changes to call the function directly. Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5
2014-11-14Added sse2 acceleration for highbitdepth variancePeter de Rivaz
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit d7422b2b1eb9f0011a8c379c2be680d6892b16bc) (cherry picked from commit 6d741e4d76a7d9ece69ca117d1d9e2f9ee48ef8c)
2014-11-12Merge "Added highbitdepth sse2 SAD acceleration and tests"Debargha Mukherjee
2014-11-12Added highbitdepth sse2 SAD acceleration and testsPeter de Rivaz
Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f (cherry picked from commit b1a6f6b9cb47eafe0ce86eaf0318612806091fe5)
2014-11-07Iadst transforms to use internal low precisionDeb Mukherjee
Change-Id: I266777d40c300bc53b45b205144520b85b0d6e58 (cherry picked from commit a1b726117f5470f227bc90cd030b7d25045dc510)
2014-11-07Merge "Change the use of a reserved color space entry"Yaowu Xu
2014-11-06Change the use of a reserved color space entryYaowu Xu
This commit rename a reserved color space entry to BT_2020, it intends to provide support for VP9 bitstream to pass along the color space type defined in BT.2020(Rec.2020) please note this entry does not have any effect on encoding/decoding behavior, but allow applications to the pass the information along from encoding end to decoding end. Change-Id: I4678520e89141ea5e8900f7bd1c0e95b710b7091
2014-11-06Modify the frame context memory deallocationYunqing Wang
This patch was to fix the vpxdec fuzzing3 test failure. When an error occurs, setjmp() is invoked, which calls the decoder removing routine. In multiple thread situation, other threads could try to access the frame context memory that is already deallocated, thus causing a segfault. An invalid unit test was added for this issue. Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952
2014-11-05Merge "Totally remove prev_mi in VP9 decoder."hkuang
2014-11-05Totally remove prev_mi in VP9 decoder.hkuang
This will save the memory and improve the decode speed due to removing unnecessary memset of big prev_mi array for all the key frames. Decoding a all key frames 1080p video shows speed improve around 2%. Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10
2014-11-05Fix visual studio 2013 compiler warningsYaowu Xu
For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
2014-11-04Fix the memory leak due to missing free frame_mvs.hkuang
Change-Id: I2ceee7341d906259002c0ea31ea009ae32c04bfd
2014-11-03Merge "WORKAROUND FIX FOR GCC4.9.1"Yunqing Wang
2014-11-01WORKAROUND FIX FOR GCC4.9.1levytamar82
In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic _mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1. until it will be fixed I created a workaround that create the up convert by using broadcast128+shuffle. The bug was reported here: https://code.google.com/p/webm/issues/detail?id=867 Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351
2014-10-31Bind motion vectors with frame buffer structure.hkuang
This will save a lot of memory for decoder due to removing of prev_mi, but prev_mi is still needed in encoder. So this will increase a little bit memory for encoder. Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
2014-10-30Merge "Move the definition of switchable filter numbers into enum ↵Hui Su
INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV."
2014-10-24Merge changes I8a9c9019,Ic7b2faa3,I44d42a50,I3f3a3924,I10747b32,I31b49c9eJames Zern
* changes: add vp9_loop_filter_data_reset move LFWorkerData allocation to VP9LfSync vp9_loop_filter_frame_mt: remove pbi dependency vp9_loop_filter_frame_mt: pass planes directly vp9_loop_filter_frame_mt: pass VP9LfSync directly vp9: store TileWorkerData allocations separately
2014-10-23add vp9_loop_filter_data_resetJames Zern
Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275
2014-10-22Merge "vp9_ethread: allocate frame contexts outside VP9_COMMON struct"Yunqing Wang
2014-10-22vp9_ethread: allocate frame contexts outside VP9_COMMON structYunqing Wang
This patch allocated frame contexts outside VP9_COMMON. This allows multiple threads to share the same copy of frame contexts, and reduces the overhead. It also guarantees the correct update of these contexts during bitstream packing. This patch doesn't change encoding result. Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
2014-10-22Fix Neon convolve profilingFrank Galligan
When profiling, gprof can't distinguish between matching labels in different files. Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f
2014-10-21Move the definition of switchable filter numbers into enumHui Su
INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV. Change-Id: Ic36c9eb6ccb8ec324d991f7241e42b40b60b1dcb
2014-10-20Merge "SAD32xh and SAD64xh for AVX2"Yunqing Wang
2014-10-19SAD32xh and SAD64xh for AVX2levytamar82
All sad function that process above 32 consecutive elements are optimized for AVX2: vp9_sad64x64 vp9_sad64x32 vp9_sad32x64 vp9_sad32x32 vp9_sad32x16 vp9_sad64x64_avg vp9_sad64x32_avg vp9_sad32x64_avg vp9_sad32x32_avg vp9_sad32x16_avg The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64 vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90% both of them gave and overall ~2.3% user level gain Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
2014-10-17Add highbitdepth function for vp9_avg_8x8Peter de Rivaz
Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/ (a92f987a6b7819ae5c62a429e126e1c26bdb1b71) on highbitdepth branch. Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8
2014-10-16move LFWorkerData allocation to VP9LfSyncJames Zern
this removes an assumption that worker->data1 would be pointing to a TileWorkerData allocation. additionally, within the multi-threaded loopfilter pass VP9LfSync as a parameter to the worker hook, removing the need for a shadow pointer in LFWorkerData. Change-Id: Ic7b2faa34e3eb59dbcb8a7c67f333448fa047c88
2014-10-14Merge "Add a 32-bit friendly sse2 quantizer."Alex Converse
2014-10-14Add a 32-bit friendly sse2 quantizer.Alex Converse
This is based on the 64-bit ssse3 quantizer. 1.1x speedup for screen content at speed 7. Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448
2014-10-14Merge "Remove extra line."hkuang
2014-10-14Merge "Remove mi_grid_base_array from VP9_COMMON (unused)"Adrian Grange
2014-10-13Use pre increment.hkuang
Change-Id: I016b4e77d8268e189473f4c382603afe1ae1750f
2014-10-13Remove mi_grid_base_array from VP9_COMMON (unused)Adrian Grange
Change-Id: I4b4764463f5a7cdc01ec004b882c6237466c74b0