libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2013-12-02	Improve idct16x16: _256_add_sse2(x1.107)&_10_add_sse2(x1.012)	Abo Talib Mahfoodh
	The performance gain of idct16x16_10_add_sse2 function is not noticeable. However since both functions use the IDCT16_1D, idct16x16_10_add_sse2 should be modified as well. Tested with: park_joy_420_720p50.y4m Change-Id: I02b957e36fcf997c677d15baf496533895271bff
2013-12-02	Merge "improve vp9_idct32x32_34(x1.472)&1024(x1.032)_add_sse2"	Yunqing Wang

2013-11-26	improve vp9_idct32x32_34(x1.472)&1024(x1.032)_add_sse2	Abo Talib Mahfoodh
	vp9_idct32x32_34_add_sse2: speedup: 1.472 IDCT32_1D_34 and MULTIPLICATION_AND_ADD_2 are optimized based on the fact that Only upper-left 8x8 has non-zero values. vp9_idct32x32_1024_add_sse2: speedup: 1.032 Tested with: park_joy_420_720p50.y4m Change-Id: I8670ce547552b48695049de298e2fc46ce28dfbc
2013-11-22	Do vertical loopfiltering in parallel	Yunqing Wang
	This patch followed "Add filter_selectively_vert_row2 to enable parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. For other optimizations (neon and dspr2), current 16-pixel functions were done by calling 8-pixel functions twice, and real 16-pixel functions could be added later. Decoder speedup: tulip clip: 2% speed gain; old_town_cross: 1.2% speed gain; bus: 2% speed gain. Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
2013-11-20	Correct ssse3 8/16-pixel wide sub-pixel filter calculation	Yunqing Wang
	Although no mismatch was indicated for 8/16 wide sub-pixel filters in issue 661, they had similar problems that could cause mismatch potentially. This patch fixed calculations in HORIZx8/16 and VERTx8/16. Change-Id: I169961c9d40a20340995b7d22aafc89ccf30bfca
2013-11-20	Fix stack pointer in sub-pixel filters	Yunqing Wang
	In commit "3d50da5397d20abc932d81453b26cde758293a40", the stack pointer was modified while aligning the stack, and it needed to be pop out at the end. Change-Id: I062971e195f1f2ab9d0ab5fb84dcf215a0fcaa67
2013-11-19	Fix decoder mismatch with ssse3 enabled	Yunqing Wang
	This patch fixed issue 661: "Decoder produces mismatched outputs with ssse3 enabled and disabled." In sub-pixel filters, a pixel value was multiplied by a filter coefficient, and the results were added up. The order of adding up these multiplications had to be arranged carefully to prevent incorrect overflowing. Change-Id: Id08af4200fea9e1b896fc40157b8651c2c7e80f2
2013-11-18	Improve vp9_iht4x4_16_add_sse2 (x1.341)	Abo Talib Mahfoodh
	This rebase is a better implementation of the previous ones. Modifications are done to reduce the total clock cycle. Speedup: 1.341 Compiled with -O3 Tested with: park_joy_420_720p50.y4m Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
2013-11-15	Do horizontal loopfiltering in parallel	Yunqing Wang
	This patch followed "Rewrite filter_selectively_horiz for parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. Also, corrected the declaration of aligned arrays. For 8-pixel-in-parallel case, improved the calculation of the masks and filters. Updated the threshold loading since the thresholds were already duplicated. Updated neon C functions to call neon loopfilters twice. Using tulip clip, tests showed it gave a ~1.5% decoder speed gain. Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
2013-11-08	Merge "Improve vp9_idct4x4_1_add_sse2"	Yunqing Wang

2013-11-01	vp9 ssse3 d207_predictor_32x32: add missing GLOBAL()	James Zern
	removes a textrel for sh_b23456789abcdefff Change-Id: I80cb9dfd8e49a0fe884c8ff76472275b3a00cb57
2013-10-31	mb_lpf_horizontal_edge AVX2 optimization	Tamar Levy
	This CL contains two AVX2 optimized loop filter functions, mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16. Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
2013-10-25	Merge "Add 32x32 idct function for eob<=34 case"	Yunqing Wang

2013-10-24	Add 32x32 idct function for eob<=34 case	Yunqing Wang
	When only upper-left 8x8 area has non-zero dct coefficients, we could skip 1D IDCT for 9th to 32th rows to save operations. This function is called when eob <= 34. Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
2013-10-23	Renaming vp9_short_fdct8x8 to vp9_fdct8x8.	Dmitry Kovalev
	For consistency with idct function names. Change-Id: I7b6af2f92c66eff56f84ed29edc3a66af8dc421f
2013-10-22	Improve vp9_idct4x4_1_add_sse2	Abo Talib Mahfoodh
	Simple modification to reduce number of cycles in the function. Original function number of cycles: 973 Modified function number of cycles: 835 Improvment factor: 1.165 Tested with: park_joy_420_720p50.y4m Change-Id: Ic5857272ea3aafe21d5ef9a69258d78c688f69bd
2013-10-18	Fix d207 intra prediction SSSE3 functions	Yunqing Wang
	This patch fixed a bug that caused 32bit PIC build mismatch. The stack pointer was modified after "GET_GOT". Loading left pointer from a hard-coded position gave wrong result. Change-Id: Iea0aec6f917b12a6b3393ffc986bad74510248cc
2013-10-15	Merge "Fix a few indent format issues in buffer defs"	Jingning Han

2013-10-15	Fix a few indent format issues in buffer defs	Jingning Han
	Change-Id: Iac55891ac9e6f13718c9f822aa099b5ca491832a
2013-10-11	Making input pointer of any inverse transform constant.	Dmitry Kovalev
	Also renaming dest_stride to stride in some places. Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940
2013-10-11	Consistent names for inverse hybrid transforms (1 of 2).	Dmitry Kovalev
	Renames: vp9_short_iht4x4_add -> vp9_iht4x4_16_add vp9_short_iht8x8_add -> vp9_iht8x8_64_add vp9_short_iht16x16_add_c -> vp9_iht16x16_256_add Change-Id: Ibca7a188fd062b196787ac5efc1ea545e7f166c0
2013-10-11	Merge "Removing vp9_idct4_1d_sse2 function."	Dmitry Kovalev

2013-10-11	Code cleanup	Yunqing Wang
	Minor code cleanup. Change-Id: I47c1f794842d4570bb39cfd23b80f54f5606bba6
2013-10-11	Merge "SSE2 8-tap sub-pixel filter optimization"	Yunqing Wang

2013-10-10	Removing vp9_idct4_1d_sse2 function.	Dmitry Kovalev
	We have two SSE2-optimized functions for idct4_1d: vp9_idct4_1d_sse2 <-- removing this one idct4_1d_sse2 vp9_idct4_1d_sse2 was used only by the following functions which already have SSE2 optimized variants: vp9_idct4x4_16_add_c -> vp9_idct4x4_16_add_see2 idct8_1d -> vp9_idct8x8_{16, 10, 1}_see2 vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2 Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
2013-10-10	d207 intra prediction ssse3 using bytes	Scott LaVarnway
	byte version of ronalds d207 ssse3 optimizations (commit: f891f84d3ba9345b0074e682f0fea09b8ddf4f1e) Change-Id: If15f71a589ea16f78ac86a501b0c5c6231dc9af1
2013-10-10	Merge "Giving consistent names to IDCT 32x32 functions."	Dmitry Kovalev

2013-10-10	Merge "d153 intra prediction (32x32) ssse3 using bytes"	Yunqing Wang

2013-10-10	SSE2 8-tap sub-pixel filter optimization	Yunqing Wang
	To ensure fast encoding/decoding on devices without ssse3 support, SSE2 optimization of sub-pixel filters was done. Test using 1080p clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps with sse2 filters, and ~15fps with c filters. Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
2013-10-10	Giving consistent names to IDCT 32x32 functions.	Dmitry Kovalev
	Renames: vp9_short_idct32x32_add -> vp9_idct32x32_1024_add vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add vp9_idct_add_32x32 -> vp9_idct32x32_add Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
2013-10-07	Giving consistent names to IDCT 16x16 functions.	Dmitry Kovalev
	Renames: vp9_short_idct16x16_add -> vp9_idct16x16_256_add vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add vp9_short_idct16x16_1_add -> vp9_idct16x16_1_add vp9_idct_add_16x16 -> vp9_idct16x16_add Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3
2013-10-07	Merge "Giving consistent names to IDCT 8x8 functions."	Dmitry Kovalev

2013-10-07	d153 intra prediction (32x32) ssse3 using bytes	Scott LaVarnway
	Change-Id: Ie2c0d84ff9f6294084d65f4380e1f30c09e681c9
2013-10-06	Merge changes I8a106dd6,Iec442603	Jim Bankoski
	* changes: d153 intra prediction (16x16) ssse3 using bytes d153 intra prediction ssse3 using bytes
2013-10-06	Giving consistent names to IDCT 8x8 functions.	Dmitry Kovalev
	Renames: vp9_short_idct8x8_add -> vp9_idct8x8_64_add vp9_short_idct8x8_1_add -> vp9_idct8x8_1_add vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add vp9_idct_add_8x8 -> vp9_idct8x8_add Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
2013-10-04	Giving consistent names to IDCT/IWHT functions.	Dmitry Kovalev
	The idea is to have the following names for each transform size: vp9_idct4x4_add vp9_idct4x4_1_add vp9_idct4x4_10_add vp9_idct4x4_16_add vp9_idct8x8_add vp9_idct8x8_1_add vp9_idct8x8_10_add vp9_idct8x8_64_add etc for 16x16, 32x32 The actual list of renames in this patch: vp9_idct_add_lossless -> vp9_iwht4x4_add vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add vp9_idct_add -> vp9_idct4x4_add vp9_short_idct4x4_add -> vp9_idct4x4_16_add vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
2013-10-03	Merge "Rewrite HORIZx4 and HORIZx8 in subpixel filter functions"	Yunqing Wang

2013-10-03	Rewrite HORIZx4 and HORIZx8 in subpixel filter functions	Yunqing Wang
	In subpixel filters, prefetched source data, unrolled loops, and interleaved instructions. In HORIZx4, integrated the idea in Scott's CL (commit: d22a504d11a15dc3eab666859db0046b5a7d75c5), which was suggested by Erik/Tamar from Intel. Further tweaking was done to combine row 0, 2, and row 1, 3 in registers to do more 2-row-in-1 operations until the last add. Test showed a ~2% decoder speedup. Change-Id: Ib53d04ede8166c38c3dc744da8c6f737ce26a0e3
2013-10-02	d153 intra prediction (16x16) ssse3 using bytes	Scott LaVarnway
	Change-Id: I8a106dd61b0a2520fae792d87d6348e662649b2d
2013-10-01	Adding SSE2 optimized vp9_short_idct32x32_1_add function.	Dmitry Kovalev
	Change-Id: I4b1c6bb9ff615f5872b96ed07dbf0f5e18e63643
2013-10-01	Merge "Modify HORIZx16 macro in subpixel filter functions"	Yunqing Wang

2013-10-01	Modify HORIZx16 macro in subpixel filter functions	Yunqing Wang
	Interleaved the instructions, reduced register dependency, and prefetched the source data. This improved the decoder speed by 0.6% - 2%. Change-Id: I568067aa0c629b2e58219326899c82aedf7eccca
2013-10-01	d153 intra prediction ssse3 using bytes	Scott LaVarnway
	byte version of ronalds d153 ssse3 optimizations for 4x4 and 8x8 (commit: fc91a2a112238a1aee568f3b840585de4e928fca) Change-Id: Iec4426032311483f615fd9e0dceba3ee85ddebd7
2013-09-29	fixed cpp lint issue in vp9_postproc_x86	Jim Bankoski
	Change-Id: I2b2af1dd9f5c29c05e28a4fd51fa58ccc4071477
2013-09-29	nolintify intrinsic idct file	Jim Bankoski
	Change-Id: Id2cc5c829399a2afdf7a8a82615a4e272c814986
2013-09-27	Renaming vp9_short_idct10_8x8_add to vp9_short_idct8x8_10_add.	Dmitry Kovalev
	Making name consistent with vp9_short_idct8x8 and vp9_short_idct8x8_1. Change-Id: I99e0be040ec893f9571dcf090e18f98dc58339f5
2013-09-27	Merge "Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10."	Dmitry Kovalev

2013-09-26	Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10.	Dmitry Kovalev
	Making function name consistent with vp9_short_idct16x16 and vp9_short_idct16x16_1. Change-Id: I70e54be9e6b9a1dddab0de470686591e96d05517
2013-09-25	d63 intra prediction ssse3 using bytes	Scott LaVarnway
	byte version of ronalds d63 ssse3 optimizations (commit: c5a1c8cf3541cf3665fee981b36d22c9fbd4191e) Change-Id: Ifd3e6d454a2246085f23eabb38518a930321e807
2013-09-18	Fix x86inc.asm to build PIC code correctly	Yunqing Wang
	Current x86inc.asm didn't handle 32bit PIC build properly. TEXTRELs were seen in the library built. The PIC macros from libvpx's x86_abi_support.asm was used to fix this problem. The assembly code was modified to use the macros. Notes: We need this fix in for decoder building. Functions in encoder will be fixed later. Change-Id: Ifa548d37b1d0bc7d0528db75009cc18cd5eb1838