summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2016-02-02intrapred: protect functions w/CONFIG check x2James Zern
high-bitdepth version d207e, d63e, d45e are only used with CONFIG_MISC_FIXES Change-Id: I77292e11f51fd76d4127fd0027f876866bcf8675
2016-01-31Merge "Enable sse2 version of inverse wht for hbd build"Yaowu Xu
2016-01-30Merge changes If13946e4,I61a1814d,I2ca9aa3c,I44d91eaaJames Zern
* changes: intrapred: protect functions w/CONFIG check vp9_noise_estimate: protect copy_frame w/CONFIG check vp8_cx_iface: delete 3 unused functions vp8: mark intra_prediction_down_copy inline
2016-01-29Enable sse2 version of inverse wht for hbd buildYaowu Xu
Change-Id: If8f5efd701a11c8a7ad3078d10ec3cd0fe27667e
2016-01-29SSSE3 idct8x8 functions for highbitdpeth buildYaowu Xu
This commit changes SSSE3 optimized idct8x8 functions to work with highbitdepth build. With this commit and the previous one that enabled SSSE3 idct32x32 functions, tests showed virtually no difference on decoding speed for file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth option and the build without the option. Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
2016-01-29Enable hbd_build to use SSSE3optimized functionsYaowu Xu
This commit changes the SSSE3 assembly functions for idct32x32 to support highbitdepth build. On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference between hbd and lbd build from between 3-4% to 1-2%. Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
2016-01-26intrapred: protect functions w/CONFIG checkJames Zern
d207e, d63e, d45e are only used with CONFIG_MISC_FIXES Change-Id: If13946e483c4d0ccaa3e1d60dc14216c06d5a219
2016-01-25Merge "Code clean of sad4xNx4D_sse"James Zern
2016-01-13Revert "Merge "Change highbd variance rounding to prevent negative variance.""Alex Converse
This reverts commit ea48370a500537906d62544ca4ed75301d79e772, reversing changes made to 15939cb2d76c773950cda40988ede89e111872ea. The commit was insufficiently tested and causes failures. Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
2016-01-13Merge "Change highbd variance rounding to prevent negative variance."Alex Converse
2015-12-22Code clean of highbd_tm_predictor_32x32Jian Zhou
Remove the ARCH_X86_64 constraint. No performance hit on both big core and small core. Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
2015-12-22Code clean of highbd_tm_predictor_16x16Jian Zhou
Remove the ARCH_X86_64 constraint. Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
2015-12-22Code clean of highbd_dc_predictor_32x32Jian Zhou
Remove the ARCH_X86_64 constraint. Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
2015-12-22Merge "Code clean of highbd_tm_predictor_4x4"James Zern
2015-12-22Merge "Code clean of highbd_dc_predictor_4x4"James Zern
2015-12-21Merge "Code clean of highbd_v_predictor_4x4"Jian Zhou
2015-12-19Merge "Fix for issue 1114 compile error"Yunqing Wang
2015-12-18sad_sse2: fix sad4xN(_avg) on windowsJames Zern
reduce the register count by 1 to avoid xmm6 and unnecessarily penalizing the other users of the base macro Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
2015-12-18Code clean of highbd_tm_predictor_4x4Jian Zhou
Replace MMX with SSE2, reduce mem access to left neighbor, loop unrolled. Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
2015-12-18Code clean of highbd_v_predictor_4x4Jian Zhou
MMX replaced with SSE2, same performance. Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
2015-12-18Code clean of highbd_dc_predictor_4x4Jian Zhou
MMX replaced with SSE2, same performance. Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
2015-12-18Fix for issue 1114 compile errorPeter de Rivaz
In 32-bit build with --enable-shared, there is a lot of register pressure and register src_strideq is reused. The code needs to use the stack based version of src_stride, but this doesn't compile when used in an lea instruction. This patch also fixes a related segmentation fault caused by the implementation using src_strideq even though it has been reused. This patch also fixes the HBD subpel variance tests that fail when compiled without disable-optimizations. These failures were caused by local variables in the assembler routines colliding with the caller's stack frame. Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
2015-12-17Code clean of sad4xNx4D_sseJian Zhou
Replace MMX with SSE2. Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
2015-12-17Code clean of sad4xN(_avg)_sseJian Zhou
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont. Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
2015-12-15Merge "move vp9_avg to vpx_dsp"James Zern
2015-12-14move vp9_avg to vpx_dspJames Zern
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14Merge "Code clean of tm_predictor_32x32"Jian Zhou
2015-12-11Merge "Speed up tm_predictor_16x16"Jian Zhou
2015-12-11Code clean of tm_predictor_32x32Jian Zhou
Reallocate the xmm register usage so that no ARCH_X86_64 required. Reduce memory access to the left neighbor by half. Speed up by single digit on big core machine. Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
2015-12-11Merge "SSE2 based h_predictor_32x32"Jian Zhou
2015-12-11Merge "dc_left_pred[48]: fix pic builds"James Zern
2015-12-11Merge "Code clean of dc_left/top_predictor_16x16"Jian Zhou
2015-12-10dc_left_pred[48]: fix pic buildsJames Zern
GET_GOT modifies the stack pointer so the offset for left's address will be wrong if loaded afterword. Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
2015-12-10Fix the win32 crash when GET_GOT is not definedYunqing Wang
This patch continues to fix the win32 crash issue: https://bugs.chromium.org/p/webm/issues/detail?id=1105 Johann's patch is here: https://chromium-review.googlesource.com/#/c/316446/2 Change-Id: I7fe191c717e40df8602e229371321efb0d689375
2015-12-10Code clean of dc_left/top_predictor_16x16Jian Zhou
Remove some redundant code. Change-Id: Ida2e8c0ce28770f7a9545ca014fe792b04295260
2015-12-10SSE2 based h_predictor_32x32Jian Zhou
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8, and reduce mem access to left. Speed up by single digit in ./test_intra_pred_speed on big core machines. Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
2015-12-09Merge "fix null pointer crash in Win32 because esp register is broken"Johann Koenig
2015-12-07Re-enable SSE2 based intra 4x4 predictionJian Zhou
4x4 Intra predictor implemented with MMX is replaced with SSE2. Segfault in change 315561 when decoding vp8 is taken care of. Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
2015-12-07Merge "VP9: Add ssse3 version of vpx_idct32x32_135_add()"Scott LaVarnway
2015-12-07fix null pointer crash in Win32 because esp register is brokenSergey Kolomenkin
https://bugs.chromium.org/p/webm/issues/detail?id=1105 Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d
2015-12-05Revert "MMX in intra 4x4 prediction replaced with SSE2"James Zern
This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a. This causes a segfault when decoding vp8, in both 32 and 64-bit Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
2015-12-04Speed up h_predictor_16x16Jian Zhou
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4, and reduce mem access to left. Speed up by >20% in ./test_intra_pred_speed. Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
2015-12-04Speed up h_predictor_8x8Jian Zhou
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2, and reduce mem access to left. Speed up by >20% in ./test_intra_pred_speed. Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
2015-12-03MMX in intra 8x8 prediction replaced with SSE2Jian Zhou
8x8 Intra predictor implemented with MMX is replaced with SSE2. Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
2015-12-03MMX in intra 4x4 prediction replaced with SSE2Jian Zhou
4x4 Intra predictor implemented with MMX is replaced with SSE2. Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
2015-12-02Merge "SSE2 speed up of h_predictor_4x4"Jian Zhou
2015-12-02VP9: Add ssse3 version of vpx_idct32x32_135_add()Scott LaVarnway
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-11-30Speed up tm_predictor_16x16Jian Zhou
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed with the same instruction size. Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
2015-11-30Merge "VPX: x86 asm version of vpx_idct32x32_1024_add()"Scott LaVarnway
2015-11-30SSE2 speed up of h_predictor_4x4Jian Zhou
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers. Speed up by ~25% in ./test_intra_pred_speed. Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d