Age | Commit message (Collapse) | Author |
|
This reverts commit ea48370a500537906d62544ca4ed75301d79e772, reversing
changes made to 15939cb2d76c773950cda40988ede89e111872ea.
The commit was insufficiently tested and causes failures.
Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
|
|
|
|
Remove the ARCH_X86_64 constraint. No performance hit on both
big core and small core.
Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
|
|
Remove the ARCH_X86_64 constraint.
Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
|
|
Remove the ARCH_X86_64 constraint.
Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
|
|
|
|
|
|
|
|
|
|
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro
Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
|
|
Replace MMX with SSE2, reduce mem access to left neighbor,
loop unrolled.
Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
|
|
MMX replaced with SSE2, same performance.
Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
|
|
MMX replaced with SSE2, same performance.
Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
|
|
In 32-bit build with --enable-shared, there is a lot of
register pressure and register src_strideq is reused.
The code needs to use the stack based version of src_stride,
but this doesn't compile when used in an lea instruction.
This patch also fixes a related segmentation fault caused by the
implementation using src_strideq even though it has been
reused.
This patch also fixes the HBD subpel variance tests that fail
when compiled without disable-optimizations.
These failures were caused by local variables in the assembler
routines colliding with the caller's stack frame.
Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
|
|
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont.
Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
|
|
|
|
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
|
|
|
|
|
|
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.
Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
|
|
|
|
|
|
|
|
GET_GOT modifies the stack pointer so the offset for left's address will
be wrong if loaded afterword.
Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
|
|
This patch continues to fix the win32 crash issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1105
Johann's patch is here:
https://chromium-review.googlesource.com/#/c/316446/2
Change-Id: I7fe191c717e40df8602e229371321efb0d689375
|
|
Remove some redundant code.
Change-Id: Ida2e8c0ce28770f7a9545ca014fe792b04295260
|
|
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.
Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
|
|
|
|
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.
Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
|
|
|
|
https://bugs.chromium.org/p/webm/issues/detail?id=1105
Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d
|
|
This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a.
This causes a segfault when decoding vp8, in both 32 and 64-bit
Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
|
|
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
|
|
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
|
|
8x8 Intra predictor implemented with MMX is replaced with SSE2.
Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
|
|
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
|
|
|
|
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
|
|
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed
with the same instruction size.
Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
|
|
|
|
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
|
|
Change-Id: I3ba4ede553e068bf116dce59d1317347988b3542
|
|
|
|
Always round sum error and sum square error toward zero in variance
calculations. This prevents variance from becoming negative.
Avoiding rounding variance at all might be better but would be far
more invasive.
Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
|
|
Left neighbor read from memory only once.
Speed up by ~20% in ./test_intra_pred_speed.
Change-Id: Ia1388630df6fed0dce9a6eeded6cb855bbc43505
|
|
|
|
and fixed macro name.
Change-Id: I306b98a2b4ec80b130ae80290b4cd9c7a5363311
|
|
This reverts commit d76032ae87e535be5b924d9e88bbd67189380534.
breaks 32-bit builds
Change-Id: If6266ec2a405b5a21d615112f0f37e8a71193858
|
|
|
|
|