Age | Commit message (Collapse) | Author |
|
Change-Id: I103be7eee36492f8619144ce8325bc916d4975c7
|
|
|
|
In so doing this fixes a couple of bugs:
vpx_plane_add_noise.c needed to subtract a clamp instead of add.
And the assembly (mmx sse) had assumptions that parameters were
continuous in memory which was not true.
Change-Id: I76f2c43cf54bfc838eb2edf8a443eaaa7565d7b5
|
|
|
|
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
|
|
Change-Id: I481eb271b082fa3497b0283f37d9b4d1f6de270c
|
|
The product always fits in uint32_t, but the operands don't.
An optimizing compiler should generate the wraparound code.
(Verified with clang).
Change-Id: I25eb64df99152992bc898b8ccbb01d55c8d16e3c
|
|
These blocks will never overflow since max sum is +/-255*w*h.
Change-Id: Ia2c630339fd9cfb411b56b6040ff402095f12a2e
|
|
load the full row rather than doing 2 8-wide columns
Change-Id: I7a1c0cba06b0dc1ae86046410922b1efccb95c95
|
|
only output[0] needs to be set, store_output is more involved than a
movdqa in the high bitdepth case
Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
|
|
This reverts commit 9aa083d164e0d39086aa0c83f0d1a0d0f0d1ba61.
Fixes a decoder mismatch with 32bit PIC builds.
Change-Id: I94717df662834810302fe3594b38c53084a4e284
|
|
This reverts commit 15ecdc3970462c15fdf7185d373cb52664f40c0f.
breaks 32-bit pic builds
Change-Id: I8bb1b9471a293f05ac7423aaba0339d408931b7a
|
|
These instructions are unnecessary if the adds
are done in the correct order.
Change-Id: I4e533b8267c32e610a4b94203ad052dc9fdabd71
|
|
|
|
the filter will be the same in this case
Change-Id: I95159bcb05bbfb71b57da741393e80cc7ffc5cff
|
|
in non-hbd configurations; any high-bitdepth changes will be done in a
follow-up
Change-Id: Ia74e30971b744c1faab68c92fdeda1a053988c77
|
|
Store result with one 16 byte store instead of
two 8 byte stores.
Change-Id: I43acbc5edfd6d6055a926f9b9605d47127400f09
|
|
Change-Id: I661f64390f232826857b259e7a67e77f5a3a91ad
|
|
Change-Id: Ic7ea79988ed0864e7ddbfeb312516bcf77eaaac1
|
|
Change-Id: Idcf29281d617b275e3ca50f77e6d00c60992a36d
|
|
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter
Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
|
|
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter
Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
|
|
Change-Id: I655a771e1b1a8753be5669ef9348a312ba6cfdbc
|
|
Change-Id: Iaca71ea3796115d4c2d43563b4e6f3914e21f1bf
|
|
Change-Id: Ic6da723c5cf3cd8127db1f476c3e46ea134cb774
|
|
Change-Id: Id16f7259897654831d31642c2d5e0bbe5e13416c
|
|
Change-Id: Iec7d8eda343991f7d7d46931dca17af23c821d11
|
|
Change-Id: I48741e167a7b09b7c9ad3bfc1c4b88ef1029ae46
|
|
Change-Id: I43a191cb3d42e51e7bca266adfa11c6239a8064c
|
|
Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c
|
|
Change-Id: If8f5efd701a11c8a7ad3078d10ec3cd0fe27667e
|
|
This commit changes SSSE3 optimized idct8x8 functions to work with
highbitdepth build.
With this commit and the previous one that enabled SSSE3 idct32x32
functions, tests showed virtually no difference on decoding speed for
file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth
option and the build without the option.
Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
|
|
This commit changes the SSSE3 assembly functions for idct32x32 to
support highbitdepth build.
On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference
between hbd and lbd build from between 3-4% to 1-2%.
Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
|
|
|
|
This reverts commit ea48370a500537906d62544ca4ed75301d79e772, reversing
changes made to 15939cb2d76c773950cda40988ede89e111872ea.
The commit was insufficiently tested and causes failures.
Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
|
|
|
|
Remove the ARCH_X86_64 constraint. No performance hit on both
big core and small core.
Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
|
|
Remove the ARCH_X86_64 constraint.
Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
|
|
Remove the ARCH_X86_64 constraint.
Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
|
|
|
|
|
|
|
|
|
|
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro
Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
|
|
Replace MMX with SSE2, reduce mem access to left neighbor,
loop unrolled.
Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
|
|
MMX replaced with SSE2, same performance.
Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
|
|
MMX replaced with SSE2, same performance.
Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
|
|
In 32-bit build with --enable-shared, there is a lot of
register pressure and register src_strideq is reused.
The code needs to use the stack based version of src_stride,
but this doesn't compile when used in an lea instruction.
This patch also fixes a related segmentation fault caused by the
implementation using src_strideq even though it has been
reused.
This patch also fixes the HBD subpel variance tests that fail
when compiled without disable-optimizations.
These failures were caused by local variables in the assembler
routines colliding with the caller's stack frame.
Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
|
|
Replace MMX with SSE2.
Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
|
|
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont.
Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
|