Age | Commit message (Collapse) | Author |
|
high-bitdepth version
d207e, d63e, d45e are only used with CONFIG_MISC_FIXES
Change-Id: I77292e11f51fd76d4127fd0027f876866bcf8675
|
|
|
|
* changes:
intrapred: protect functions w/CONFIG check
vp9_noise_estimate: protect copy_frame w/CONFIG check
vp8_cx_iface: delete 3 unused functions
vp8: mark intra_prediction_down_copy inline
|
|
Change-Id: If8f5efd701a11c8a7ad3078d10ec3cd0fe27667e
|
|
This commit changes SSSE3 optimized idct8x8 functions to work with
highbitdepth build.
With this commit and the previous one that enabled SSSE3 idct32x32
functions, tests showed virtually no difference on decoding speed for
file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth
option and the build without the option.
Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
|
|
This commit changes the SSSE3 assembly functions for idct32x32 to
support highbitdepth build.
On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference
between hbd and lbd build from between 3-4% to 1-2%.
Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
|
|
d207e, d63e, d45e are only used with CONFIG_MISC_FIXES
Change-Id: If13946e483c4d0ccaa3e1d60dc14216c06d5a219
|
|
|
|
This reverts commit ea48370a500537906d62544ca4ed75301d79e772, reversing
changes made to 15939cb2d76c773950cda40988ede89e111872ea.
The commit was insufficiently tested and causes failures.
Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
|
|
|
|
Remove the ARCH_X86_64 constraint. No performance hit on both
big core and small core.
Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
|
|
Remove the ARCH_X86_64 constraint.
Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
|
|
Remove the ARCH_X86_64 constraint.
Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
|
|
|
|
|
|
|
|
|
|
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro
Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
|
|
Replace MMX with SSE2, reduce mem access to left neighbor,
loop unrolled.
Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
|
|
MMX replaced with SSE2, same performance.
Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
|
|
MMX replaced with SSE2, same performance.
Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
|
|
In 32-bit build with --enable-shared, there is a lot of
register pressure and register src_strideq is reused.
The code needs to use the stack based version of src_stride,
but this doesn't compile when used in an lea instruction.
This patch also fixes a related segmentation fault caused by the
implementation using src_strideq even though it has been
reused.
This patch also fixes the HBD subpel variance tests that fail
when compiled without disable-optimizations.
These failures were caused by local variables in the assembler
routines colliding with the caller's stack frame.
Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
|
|
Replace MMX with SSE2.
Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
|
|
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont.
Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
|
|
|
|
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
|
|
|
|
|
|
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.
Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
|
|
|
|
|
|
|
|
GET_GOT modifies the stack pointer so the offset for left's address will
be wrong if loaded afterword.
Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
|
|
This patch continues to fix the win32 crash issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1105
Johann's patch is here:
https://chromium-review.googlesource.com/#/c/316446/2
Change-Id: I7fe191c717e40df8602e229371321efb0d689375
|
|
Remove some redundant code.
Change-Id: Ida2e8c0ce28770f7a9545ca014fe792b04295260
|
|
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.
Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
|
|
|
|
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.
Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
|
|
|
|
https://bugs.chromium.org/p/webm/issues/detail?id=1105
Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d
|
|
This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a.
This causes a segfault when decoding vp8, in both 32 and 64-bit
Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
|
|
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
|
|
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
|
|
8x8 Intra predictor implemented with MMX is replaced with SSE2.
Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
|
|
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
|
|
|
|
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
|
|
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed
with the same instruction size.
Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
|
|
|
|
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
|