libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2014-02-14	Replace vqshrun by vqmovun if shift #0 bit	James Yu
	Change-Id: Ifabb8c7ec0c327fea9d6739cab10addb060ff435 Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-14	Merge "Remove redundant arm neon instructions."	Johann

2014-02-14	Merge "minor spelling cleanup in comments"	Yaowu Xu

2014-02-12	Fix neon wide loopfilter for filter8 only branch	Frank Galligan
	The current code removed the check to only perform the filter8. Change-Id: Ie54e19a77745042a5660eab986d9ef1c42e82410
2014-02-12	minor spelling cleanup in comments	Andrew Russell
	Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06
2014-02-11	Remove redundant arm neon instructions.	James Yu
	Change-Id: I1fabad59747eb5f68c64275a36c3a1d94daf32a3 Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-05	arm: Consistently use braces around doubleword arguments to vld	Martin Storsjo
	This isn't strictly necessary, but makes the file more consistent with the other arm assembly source files. Change-Id: I245c9677d89e0ab3f31991e473764858af35b180
2014-02-05	arm: Use {} around quadword arguments to vld	Martin Storsjo
	This fixes building for iOS. Change-Id: Ice082648c02a3faf93891f7ddc122875e2bdc9cb
2014-01-31	Removing "_short" suffix from arm transform file names.	Dmitry Kovalev
	Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507
2014-01-27	Add vp9_tm_predictor_32x32 neon implementation	hkuang
	which is 7.8 times faster than C. Change-Id: I858ef4ec09202a07d445da8db702783d6d9d7321
2014-01-27	Fix the vp9_tm_predictor_8x8_neon.	hkuang
	Change-Id: I832cf83871044bfee7b7e57dbd31bae05cbd53e9
2014-01-24	Merge "Optimize vp9_tm_predictor_8x8_neon function"	Frank Galligan

2014-01-24	Optimize vp9_tm_predictor_8x8_neon function	Frank Galligan
	Change-Id: Ia12aae491202098ff66366145aa0c3da38dc97e5
2014-01-24	Add vp9_tm_predictor_16x16 neon implementation	hkuang
	which is 3.5 times faster than C. Change-Id: I24439ba7a2971829c11620f34848facf2c916678
2014-01-22	Add tm_predictor_8x8 neon implementation.	hkuang
	Change-Id: I76c2720546b737cb63018a8ab6a3ff62a291786d
2014-01-16	Merge "Add vp9_tm_predictor_4x4 neon implementation"	hkuang

2014-01-15	Add vp9_tm_predictor_4x4 neon implementation	hkuang
	Change-Id: I10c423bde7ea5a3bac9f14f35c73b6bc31c8f3e3
2014-01-08	Merge "Add initial intra frame neon optimization. 1~2% gain."	hkuang

2014-01-08	Add initial intra frame neon optimization. 1~2% gain.	hkuang
	More intra optimizations will be added. Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
2013-12-17	rename loop filter functions	Jim Bankoski
	This renames all the loop filter functions so that they no longer refer to mb Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b
2013-11-26	Fix 16 wide neon horz loopfilter.	Frank Galligan
	Multiply by 3 was on 8bit vectors when it should have been on 16bit vectors. Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc
2013-11-22	Do vertical loopfiltering in parallel	Yunqing Wang
	This patch followed "Add filter_selectively_vert_row2 to enable parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. For other optimizations (neon and dspr2), current 16-pixel functions were done by calling 8-pixel functions twice, and real 16-pixel functions could be added later. Decoder speedup: tulip clip: 2% speed gain; old_town_cross: 1.2% speed gain; bus: 2% speed gain. Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
2013-11-21	Revert "Add 16 wide neon horz loopfilter."	Frank Galligan
	The change caused mismatches with some test vectors on neon. Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/ Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0
2013-11-21	Add 16 wide neon horz loopfilter.	Frank Galligan
	Add support to do 16 pixel horizontal filtering in Neon. Nexus devices saw about 0.5% decode speed increase. Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
2013-11-15	Do horizontal loopfiltering in parallel	Yunqing Wang
	This patch followed "Rewrite filter_selectively_horiz for parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. Also, corrected the declaration of aligned arrays. For 8-pixel-in-parallel case, improved the calculation of the masks and filters. Updated the threshold loading since the thresholds were already duplicated. Updated neon C functions to call neon loopfilters twice. Using tulip clip, tests showed it gave a ~1.5% decoder speed gain. Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
2013-11-12	Use lowercase 'b' to branch	Johann
	iOS doesn't recognize B: bad instruction `B idct32_pass_loop' Change-Id: I3cf6aede4639f1d9efa97f7962fa287ba6feaaef
2013-11-11	Fix a bug in the assembly code.	hkuang
	Change-Id: Ic416e3f8a11e82ee298e6f709b2119a9ddf1e2f8
2013-11-05	Add back vp9_short_idct32x32_1_add_neon which is deleted in	hkuang
	cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3. Change-Id: I034848cf05031618818f7df2e7f9c35102686948
2013-10-11	Making input pointer of any inverse transform constant.	Dmitry Kovalev
	Also renaming dest_stride to stride in some places. Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940
2013-10-11	Consistent names for inverse hybrid transforms (1 of 2).	Dmitry Kovalev
	Renames: vp9_short_iht4x4_add -> vp9_iht4x4_16_add vp9_short_iht8x8_add -> vp9_iht8x8_64_add vp9_short_iht16x16_add_c -> vp9_iht16x16_256_add Change-Id: Ibca7a188fd062b196787ac5efc1ea545e7f166c0
2013-10-10	Giving consistent names to IDCT 32x32 functions.	Dmitry Kovalev
	Renames: vp9_short_idct32x32_add -> vp9_idct32x32_1024_add vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add vp9_idct_add_32x32 -> vp9_idct32x32_add Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
2013-10-07	Giving consistent names to IDCT 16x16 functions.	Dmitry Kovalev
	Renames: vp9_short_idct16x16_add -> vp9_idct16x16_256_add vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add vp9_short_idct16x16_1_add -> vp9_idct16x16_1_add vp9_idct_add_16x16 -> vp9_idct16x16_add Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3
2013-10-06	Giving consistent names to IDCT 8x8 functions.	Dmitry Kovalev
	Renames: vp9_short_idct8x8_add -> vp9_idct8x8_64_add vp9_short_idct8x8_1_add -> vp9_idct8x8_1_add vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add vp9_idct_add_8x8 -> vp9_idct8x8_add Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
2013-10-04	Giving consistent names to IDCT/IWHT functions.	Dmitry Kovalev
	The idea is to have the following names for each transform size: vp9_idct4x4_add vp9_idct4x4_1_add vp9_idct4x4_10_add vp9_idct4x4_16_add vp9_idct8x8_add vp9_idct8x8_1_add vp9_idct8x8_10_add vp9_idct8x8_64_add etc for 16x16, 32x32 The actual list of renames in this patch: vp9_idct_add_lossless -> vp9_iwht4x4_add vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add vp9_idct_add -> vp9_idct4x4_add vp9_short_idct4x4_add -> vp9_idct4x4_16_add vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
2013-09-27	Renaming vp9_short_idct10_8x8_add to vp9_short_idct8x8_10_add.	Dmitry Kovalev
	Making name consistent with vp9_short_idct8x8 and vp9_short_idct8x8_1. Change-Id: I99e0be040ec893f9571dcf090e18f98dc58339f5
2013-09-27	Properly save neon registers.	Christian Duvivier
	Replace current code which corrupts the stack by duplicate of vp8 code to save and restore neon registers. Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a
2013-09-27	Merge "Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10."	Dmitry Kovalev

2013-09-26	Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10.	Dmitry Kovalev
	Making function name consistent with vp9_short_idct16x16 and vp9_short_idct16x16_1. Change-Id: I70e54be9e6b9a1dddab0de470686591e96d05517
2013-09-25	Fix a bunch of TODO from vp9_short_idct32x32_add_neon.	Christian Duvivier
	- full ASM version, no more C gateway file. - integrate combine-add with last step of 2nd pass. - remove a few push/pop pairs. - some instruction reordering to hide latency. Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6
2013-09-20	Use lowercase instruction in assembly	Johann
	The iOS compiler does not recognize BLE: bad instruction `BLE idct32_transpose_pair_loop' Change-Id: I7426694c66bc31caf939a2d5000968da1222c15b
2013-09-16	Speed up iht8x8 by rearranging instructions.	hkuang
	Speed improves from 282% to 302% faster based on assembly-perf. Change-Id: I08c5c1a542d43361611198f750b725e4303d19e2
2013-09-12	Merge "Add neon optimize iht8x8 which is 282% faster than C."	hkuang

2013-09-12	Add neon optimize iht8x8 which is 282% faster than C.	hkuang
	Change-Id: I963dd4a6e8671957403ccbb9a16ea7de703e3530
2013-09-11	First draft of vp9_short_idct32x32_add_neon.	Christian Duvivier
	Lots of TODO which will be taken care in upcoming changes. As is, about 6x faster than C version. Change-Id: Ie2557b72fd2d8edca376dbf400a4d173aa5e63e0
2013-09-09	Speed up idct16x16 by rearrange instructions.	hkuang
	Speed improve from 376% to 400% faster base on assembly-perf. Change-Id: If0b2eccc39d5793dc101ce9feb7fcadf88396ea2
2013-09-04	Speed up idct8x8 by rearrange instructions.	hkuang
	Speed improve from 264% ~ 270% to 280% ~ 300% base on assembly-perf. Change-Id: I3e2cc818ec14b432204ff43732f39b6438db685d
2013-09-04	Add neon optimize vp9_short_iht4x4_add.	hkuang
	Change-Id: I42c497b68ae1ee645b59c9968ad805db0a43e37e
2013-08-27	Add neon optimize vp9_short_idct16x16_1_add.	hkuang
	Change-Id: Ib9354c1d975d03e8081df20d50b6a77dfe2dc7e5
2013-08-26	Add neon optimize vp9_short_idct8x8_1_add.	hkuang
	Change-Id: I0b15d5e3b0eb97abb9ab5ec08e88b61f8723aaf4
2013-08-26	Add neon optimize vp9_short_idct4x4_1_add.	hkuang
	Change-Id: I6ecb5c4a1a472feb8e84e9f3352b536d5e28a4a5