summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2016-01-13Revert "Merge "Change highbd variance rounding to prevent negative variance.""Alex Converse
This reverts commit ea48370a500537906d62544ca4ed75301d79e772, reversing changes made to 15939cb2d76c773950cda40988ede89e111872ea. The commit was insufficiently tested and causes failures. Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
2016-01-13Merge "Change highbd variance rounding to prevent negative variance."Alex Converse
2015-12-22Code clean of highbd_tm_predictor_32x32Jian Zhou
Remove the ARCH_X86_64 constraint. No performance hit on both big core and small core. Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
2015-12-22Code clean of highbd_tm_predictor_16x16Jian Zhou
Remove the ARCH_X86_64 constraint. Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
2015-12-22Code clean of highbd_dc_predictor_32x32Jian Zhou
Remove the ARCH_X86_64 constraint. Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
2015-12-22Merge "Code clean of highbd_tm_predictor_4x4"James Zern
2015-12-22Merge "Code clean of highbd_dc_predictor_4x4"James Zern
2015-12-21Merge "Code clean of highbd_v_predictor_4x4"Jian Zhou
2015-12-19Merge "Fix for issue 1114 compile error"Yunqing Wang
2015-12-18sad_sse2: fix sad4xN(_avg) on windowsJames Zern
reduce the register count by 1 to avoid xmm6 and unnecessarily penalizing the other users of the base macro Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
2015-12-18Code clean of highbd_tm_predictor_4x4Jian Zhou
Replace MMX with SSE2, reduce mem access to left neighbor, loop unrolled. Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
2015-12-18Code clean of highbd_v_predictor_4x4Jian Zhou
MMX replaced with SSE2, same performance. Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
2015-12-18Code clean of highbd_dc_predictor_4x4Jian Zhou
MMX replaced with SSE2, same performance. Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
2015-12-18Fix for issue 1114 compile errorPeter de Rivaz
In 32-bit build with --enable-shared, there is a lot of register pressure and register src_strideq is reused. The code needs to use the stack based version of src_stride, but this doesn't compile when used in an lea instruction. This patch also fixes a related segmentation fault caused by the implementation using src_strideq even though it has been reused. This patch also fixes the HBD subpel variance tests that fail when compiled without disable-optimizations. These failures were caused by local variables in the assembler routines colliding with the caller's stack frame. Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
2015-12-17Code clean of sad4xN(_avg)_sseJian Zhou
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont. Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
2015-12-15Merge "move vp9_avg to vpx_dsp"James Zern
2015-12-14move vp9_avg to vpx_dspJames Zern
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14Merge "Code clean of tm_predictor_32x32"Jian Zhou
2015-12-11Merge "Speed up tm_predictor_16x16"Jian Zhou
2015-12-11Code clean of tm_predictor_32x32Jian Zhou
Reallocate the xmm register usage so that no ARCH_X86_64 required. Reduce memory access to the left neighbor by half. Speed up by single digit on big core machine. Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
2015-12-11Merge "SSE2 based h_predictor_32x32"Jian Zhou
2015-12-11Merge "dc_left_pred[48]: fix pic builds"James Zern
2015-12-11Merge "Code clean of dc_left/top_predictor_16x16"Jian Zhou
2015-12-10dc_left_pred[48]: fix pic buildsJames Zern
GET_GOT modifies the stack pointer so the offset for left's address will be wrong if loaded afterword. Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
2015-12-10Fix the win32 crash when GET_GOT is not definedYunqing Wang
This patch continues to fix the win32 crash issue: https://bugs.chromium.org/p/webm/issues/detail?id=1105 Johann's patch is here: https://chromium-review.googlesource.com/#/c/316446/2 Change-Id: I7fe191c717e40df8602e229371321efb0d689375
2015-12-10Code clean of dc_left/top_predictor_16x16Jian Zhou
Remove some redundant code. Change-Id: Ida2e8c0ce28770f7a9545ca014fe792b04295260
2015-12-10SSE2 based h_predictor_32x32Jian Zhou
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8, and reduce mem access to left. Speed up by single digit in ./test_intra_pred_speed on big core machines. Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
2015-12-09Merge "fix null pointer crash in Win32 because esp register is broken"Johann Koenig
2015-12-07Re-enable SSE2 based intra 4x4 predictionJian Zhou
4x4 Intra predictor implemented with MMX is replaced with SSE2. Segfault in change 315561 when decoding vp8 is taken care of. Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
2015-12-07Merge "VP9: Add ssse3 version of vpx_idct32x32_135_add()"Scott LaVarnway
2015-12-07fix null pointer crash in Win32 because esp register is brokenSergey Kolomenkin
https://bugs.chromium.org/p/webm/issues/detail?id=1105 Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d
2015-12-05Revert "MMX in intra 4x4 prediction replaced with SSE2"James Zern
This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a. This causes a segfault when decoding vp8, in both 32 and 64-bit Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
2015-12-04Speed up h_predictor_16x16Jian Zhou
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4, and reduce mem access to left. Speed up by >20% in ./test_intra_pred_speed. Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
2015-12-04Speed up h_predictor_8x8Jian Zhou
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2, and reduce mem access to left. Speed up by >20% in ./test_intra_pred_speed. Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
2015-12-03MMX in intra 8x8 prediction replaced with SSE2Jian Zhou
8x8 Intra predictor implemented with MMX is replaced with SSE2. Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
2015-12-03MMX in intra 4x4 prediction replaced with SSE2Jian Zhou
4x4 Intra predictor implemented with MMX is replaced with SSE2. Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
2015-12-02Merge "SSE2 speed up of h_predictor_4x4"Jian Zhou
2015-12-02VP9: Add ssse3 version of vpx_idct32x32_135_add()Scott LaVarnway
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-11-30Speed up tm_predictor_16x16Jian Zhou
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed with the same instruction size. Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
2015-11-30Merge "VPX: x86 asm version of vpx_idct32x32_1024_add()"Scott LaVarnway
2015-11-30SSE2 speed up of h_predictor_4x4Jian Zhou
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers. Speed up by ~25% in ./test_intra_pred_speed. Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
2015-11-25VPX: x86 asm version of vpx_idct32x32_1024_add()Scott LaVarnway
Change-Id: I3ba4ede553e068bf116dce59d1317347988b3542
2015-11-25Merge "Speed up tm_predictor_8x8"Jian Zhou
2015-11-24Change highbd variance rounding to prevent negative variance.Alex Converse
Always round sum error and sum square error toward zero in variance calculations. This prevents variance from becoming negative. Avoiding rounding variance at all might be better but would be far more invasive. Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
2015-11-24Speed up tm_predictor_8x8Jian Zhou
Left neighbor read from memory only once. Speed up by ~20% in ./test_intra_pred_speed. Change-Id: Ia1388630df6fed0dce9a6eeded6cb855bbc43505
2015-11-24Merge "bitreader/writer: Change shift to signed"Alex Converse
2015-11-23VPX: Removed unnecessary pmulhrsw in IDCT32X32_34Scott LaVarnway
and fixed macro name. Change-Id: I306b98a2b4ec80b130ae80290b4cd9c7a5363311
2015-11-20Revert "Speed up h_predictor_4x4"James Zern
This reverts commit d76032ae87e535be5b924d9e88bbd67189380534. breaks 32-bit builds Change-Id: If6266ec2a405b5a21d615112f0f37e8a71193858
2015-11-21Merge "Speed up h_predictor_4x4"James Zern
2015-11-20Merge "Fix a signed shift overflow in vpx_rb_read_inv_signed_literal."Alex Converse