summaryrefslogtreecommitdiff
path: root/vpx_dsp/x86
AgeCommit message (Collapse)Author
2016-05-04vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h includeJames Zern
Change-Id: I103be7eee36492f8619144ce8325bc916d4975c7
2016-05-04Merge "libvpx: add a unit test for plane_add_noise."James Bankoski
2016-05-03libvpx: add a unit test for plane_add_noise.Jim Bankoski
In so doing this fixes a couple of bugs: vpx_plane_add_noise.c needed to subtract a clamp instead of add. And the assembly (mmx sse) had assumptions that parameters were continuous in memory which was not true. Change-Id: I76f2c43cf54bfc838eb2edf8a443eaaa7565d7b5
2016-05-03Merge "Move vpx_add_plane from codec to vpx_dsp and dedup."James Bankoski
2016-05-02Move vpx_add_plane from codec to vpx_dsp and dedup.Jim Bankoski
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2016-04-27Tweak casts on vpx_sub_pixel_variance to avoid implicit overflow.Alex Converse
Change-Id: I481eb271b082fa3497b0283f37d9b4d1f6de270c
2016-04-27Be explicit about overflow in vpx_variance16x16_sse2.Alex Converse
The product always fits in uint32_t, but the operands don't. An optimizing compiler should generate the wraparound code. (Verified with clang). Change-Id: I25eb64df99152992bc898b8ccbb01d55c8d16e3c
2016-04-27Remove casts on < 16x16 variance.Alex Converse
These blocks will never overflow since max sum is +/-255*w*h. Change-Id: Ia2c630339fd9cfb411b56b6040ff402095f12a2e
2016-04-04vpx_fdct16x16_1_sse2: improve load patternJames Zern
load the full row rather than doing 2 8-wide columns Change-Id: I7a1c0cba06b0dc1ae86046410922b1efccb95c95
2016-04-04vpx_fdctNxN_1_sse2: reduce store sizeJames Zern
only output[0] needs to be set, store_output is more involved than a movdqa in the high bitdepth case Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
2016-03-08VPX: loopfilter_mmx.asm using x86inc 2Scott LaVarnway
This reverts commit 9aa083d164e0d39086aa0c83f0d1a0d0f0d1ba61. Fixes a decoder mismatch with 32bit PIC builds. Change-Id: I94717df662834810302fe3594b38c53084a4e284
2016-03-04Revert "VPX: loopfilter_mmx.asm using x86inc"James Zern
This reverts commit 15ecdc3970462c15fdf7185d373cb52664f40c0f. breaks 32-bit pic builds Change-Id: I8bb1b9471a293f05ac7423aaba0339d408931b7a
2016-02-27VPX: Remove pmin/pmax from subpixel functions.Scott LaVarnway
These instructions are unnecessary if the adds are done in the correct order. Change-Id: I4e533b8267c32e610a4b94203ad052dc9fdabd71
2016-02-27Merge "VPX: vpx_filter_block1d16_(v8, v8_avg)"Scott LaVarnway
2016-02-25x86/convolve.h: remove redundant check in FUN_CONV_2DJames Zern
the filter will be the same in this case Change-Id: I95159bcb05bbfb71b57da741393e80cc7ffc5cff
2016-02-25x86/convolve.h: replace while w/if for w < 16James Zern
in non-hbd configurations; any high-bitdepth changes will be done in a follow-up Change-Id: Ia74e30971b744c1faab68c92fdeda1a053988c77
2016-02-25VPX: vpx_filter_block1d16_(v8, v8_avg)Scott LaVarnway
Store result with one 16 byte store instead of two 8 byte stores. Change-Id: I43acbc5edfd6d6055a926f9b9605d47127400f09
2016-02-24x86/convolve.h: change filter[] || chains to |James Zern
Change-Id: I661f64390f232826857b259e7a67e77f5a3a91ad
2016-02-23BUG FIX: vpx_filter_block1d(8,4)_(v8, v8_avg)Scott LaVarnway
Change-Id: Ic7ea79988ed0864e7ddbfeb312516bcf77eaaac1
2016-02-18VPX: loopfilter_mmx.asm using x86incScott LaVarnway
Change-Id: Idcf29281d617b275e3ca50f77e6d00c60992a36d
2016-02-16split vpx_highbd_lpf_horizontal_16 in twoJames Zern
replace with vpx_highbd_lpf_horizontal_edge_16 and vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
2016-02-16split vpx_lpf_horizontal_16 in twoJames Zern
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to avoid passing a count parameter Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
2016-02-16vpx_highbd_lpf_horizontal_4: remove unused count paramJames Zern
Change-Id: I655a771e1b1a8753be5669ef9348a312ba6cfdbc
2016-02-16vpx_highbd_lpf_horizontal_8: remove unused count paramJames Zern
Change-Id: Iaca71ea3796115d4c2d43563b4e6f3914e21f1bf
2016-02-16vpx_highbd_lpf_vertical_4: remove unused count paramJames Zern
Change-Id: Ic6da723c5cf3cd8127db1f476c3e46ea134cb774
2016-02-16vpx_highbd_lpf_vertical_8: remove unused count paramJames Zern
Change-Id: Id16f7259897654831d31642c2d5e0bbe5e13416c
2016-02-16vpx_lpf_horizontal_4: remove unused count paramJames Zern
Change-Id: Iec7d8eda343991f7d7d46931dca17af23c821d11
2016-02-16vpx_lpf_horizontal_8: remove unused count paramJames Zern
Change-Id: I48741e167a7b09b7c9ad3bfc1c4b88ef1029ae46
2016-02-16vpx_lpf_vertical_4: remove unused count paramJames Zern
Change-Id: I43a191cb3d42e51e7bca266adfa11c6239a8064c
2016-02-16vpx_lpf_vertical_8: remove unused count paramJames Zern
Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c
2016-01-29Enable sse2 version of inverse wht for hbd buildYaowu Xu
Change-Id: If8f5efd701a11c8a7ad3078d10ec3cd0fe27667e
2016-01-29SSSE3 idct8x8 functions for highbitdpeth buildYaowu Xu
This commit changes SSSE3 optimized idct8x8 functions to work with highbitdepth build. With this commit and the previous one that enabled SSSE3 idct32x32 functions, tests showed virtually no difference on decoding speed for file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth option and the build without the option. Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
2016-01-29Enable hbd_build to use SSSE3optimized functionsYaowu Xu
This commit changes the SSSE3 assembly functions for idct32x32 to support highbitdepth build. On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference between hbd and lbd build from between 3-4% to 1-2%. Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
2016-01-25Merge "Code clean of sad4xNx4D_sse"James Zern
2016-01-13Revert "Merge "Change highbd variance rounding to prevent negative variance.""Alex Converse
This reverts commit ea48370a500537906d62544ca4ed75301d79e772, reversing changes made to 15939cb2d76c773950cda40988ede89e111872ea. The commit was insufficiently tested and causes failures. Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
2016-01-13Merge "Change highbd variance rounding to prevent negative variance."Alex Converse
2015-12-22Code clean of highbd_tm_predictor_32x32Jian Zhou
Remove the ARCH_X86_64 constraint. No performance hit on both big core and small core. Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
2015-12-22Code clean of highbd_tm_predictor_16x16Jian Zhou
Remove the ARCH_X86_64 constraint. Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
2015-12-22Code clean of highbd_dc_predictor_32x32Jian Zhou
Remove the ARCH_X86_64 constraint. Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
2015-12-22Merge "Code clean of highbd_tm_predictor_4x4"James Zern
2015-12-22Merge "Code clean of highbd_dc_predictor_4x4"James Zern
2015-12-21Merge "Code clean of highbd_v_predictor_4x4"Jian Zhou
2015-12-19Merge "Fix for issue 1114 compile error"Yunqing Wang
2015-12-18sad_sse2: fix sad4xN(_avg) on windowsJames Zern
reduce the register count by 1 to avoid xmm6 and unnecessarily penalizing the other users of the base macro Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
2015-12-18Code clean of highbd_tm_predictor_4x4Jian Zhou
Replace MMX with SSE2, reduce mem access to left neighbor, loop unrolled. Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
2015-12-18Code clean of highbd_v_predictor_4x4Jian Zhou
MMX replaced with SSE2, same performance. Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
2015-12-18Code clean of highbd_dc_predictor_4x4Jian Zhou
MMX replaced with SSE2, same performance. Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
2015-12-18Fix for issue 1114 compile errorPeter de Rivaz
In 32-bit build with --enable-shared, there is a lot of register pressure and register src_strideq is reused. The code needs to use the stack based version of src_stride, but this doesn't compile when used in an lea instruction. This patch also fixes a related segmentation fault caused by the implementation using src_strideq even though it has been reused. This patch also fixes the HBD subpel variance tests that fail when compiled without disable-optimizations. These failures were caused by local variables in the assembler routines colliding with the caller's stack frame. Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
2015-12-17Code clean of sad4xNx4D_sseJian Zhou
Replace MMX with SSE2. Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
2015-12-17Code clean of sad4xN(_avg)_sseJian Zhou
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont. Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d