libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2014-05-23	Merge "Inverse 16x16 2D-DCT SSSE3 implementation"	Jingning Han

2014-05-23	Inverse 16x16 2D-DCT SSSE3 implementation	Jingning Han
	This commit enables the SSSE3 implementation of full inverse 16x16 2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles, about 7% speed-up. Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
2014-05-22	Removing vp9_pragmas.h.	Dmitry Kovalev
	Change-Id: I9120a87e27e73e496932d11716937e2fad246521
2014-05-21	Renames x86_64 specific asm files	Deb Mukherjee
	Renames all x86_64 specific assembly files to consistently end in _x86_64.asm. This will be useful for build systems to handle these files differently. All new 64-bit specific assembly files should use the new naming convention. Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
2014-05-12	Only build neon assembly for armv7 targets	Johann
	Allow selectively building just the intrinsics for armv8 Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
2014-05-05	SSSE3 implementation of full inverse 8x8 2D-DCT	Jingning Han
	This commit enables SSSE3 version full inverse 8x8 2D-DCT and reconstruction. It makes the runtime of vp9_idct8x8_64_add down from 256 cycles (SSE2) to 246 cycles. Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
2014-03-05	Removing vp9_onyx.h and moving its content to the encoder.	Dmitry Kovalev
	Change-Id: I03451c88536bc498edddbe0cd9773ff79da085c2
2014-03-03	build: convert rtcd.sh to perl	James Zern
	significantly speeds up file generation. the goal of this change is to convert rtcd.sh to perl as directly as possible to allow for simple comparison. future changes can make it more perl-like. --- Linux [CREATE] vpx_scale_rtcd.h real 0m0.485s -> 0m0.022s [CREATE] vp8_rtcd.h real 0m4.619s -> 0m0.060s [CREATE] vp9_rtcd.h real 0m10.102s -> 0m0.087s Windows [CREATE] vpx_scale_rtcd.h real 0m8.360s -> 0m0.080s [CREATE] vp8_rtcd.h real 1m8.083s -> 0m0.160s [CREATE] vp9_rtcd.h real 2m6.489s -> 0m0.233s Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
2014-02-26	Removing vp9_systemdependent.c.	Dmitry Kovalev
	Change-Id: I7b9738a7113c0c4687e5d320581ff69d98a8b271
2014-02-14	SSSE3 convolution optimization	levytamar82
	Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization is done only for 64 bit Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
2014-02-12	AVX2 Convolve Optimization	levytamar82
	Two convolve functions were optimized for AVX2: 1. vp9_filter_block1d16_h8 2. vp9_filter_block1d16_v8 vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of loop strides by half, two strides were processed in parallel. vp9_filter_block1d16_v8 was also optimized in the same way also some of the loads were being done outside of the loop and by that preventing redundant loads. This Optimization gives 43% function level gain and 1.3% user level gain. Now can be compiled in Windows Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
2014-02-11	Merge "Add get release decoder frame buffer functions."	Frank Galligan

2014-02-10	Merge "*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/"	James Zern

2014-02-10	Add get release decoder frame buffer functions.	Frank Galligan
	This CL changes libvpx to call a function when a frame buffer is needed for decode. Libvpx will call a release callback when no other frames reference the frame buffer. This CL adds a default implementation of the frame buffer callbacks. Currently only VP9 is supported. A future CL will add support for applications to supply their own frame buffer callbacks. Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
2014-02-04	*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/	James Zern
	CONFIG_USE_X86INC is available to every makefile, there's no need to duplicate its value with USE_X86INC Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
2014-02-04	Optimize bilinear sub-pixel filters in ssse3	Yunqing Wang
	This patch added ssse3 optimization of bilinear sub-pixel filters. The real time encoder was speeded up by ~1%. Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
2014-02-03	Merge "Removing "_short" suffix from arm transform file names."	Dmitry Kovalev

2014-02-03	Optimize bilinear sub-pixel filters in sse2	Yunqing Wang
	Using bilinear filters could speed up the codec in real-time mode. This patch added sse2 optimizations of bilinear filters that operate on different-sized blocks. Tests showed that the real-time encoder was speeded up by 3%. Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
2014-01-31	static function convert to inline or global vp9_blockd.h	Jim Bankoski
	Change-Id: Ifdd951f24932839f06d1c700371662511dde6ebe
2014-01-31	Removing "_short" suffix from arm transform file names.	Dmitry Kovalev
	Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507
2014-01-16	Revert "Revert "Revert "SSSE3 convolution optimization"""	Yunqing Wang
	This reverts commit f9404f240642222775a371acde8fc0721b3812df. This patch caused some ASAN error. Change-Id: If15b7e581310e19061d111c69f2931809662ed19
2014-01-13	Revert "Revert "SSSE3 convolution optimization""	Yunqing Wang
	This reverts commit b645257121da20b422dbbebf02aae0fc6dff95d4. Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45
2014-01-10	Revert "SSSE3 convolution optimization"	Paul Wilkins
	This reverts commit 511d218c60b9b6c1ab9383db746815e907af0359. In current form intrinsics break borg build. Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
2014-01-09	Merge "SSSE3 convolution optimization"	Yunqing Wang

2014-01-09	SSSE3 convolution optimization	levytamar82
	Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization done only for 64bit. Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
2014-01-08	Merge "Add initial intra frame neon optimization. 1~2% gain."	hkuang

2014-01-08	Add initial intra frame neon optimization. 1~2% gain.	hkuang
	More intra optimizations will be added. Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
2013-12-19	Removing vp9_findnearmv.{h, c} files.	Dmitry Kovalev
	Moving all code from that files to vp9_mvref_common.{h, c}. Change-Id: Ibc4afcb8cea6847166ff411130e93611ebe63b20
2013-12-16	Converting vp9_treecoder.h to vp9_prob.{h, c}	Dmitry Kovalev
	Moving vp9_norm probability table from vp9_entropy.c to vp9_prob.c Change-Id: Ie757b73860c6f43130790c332b292e2a1a81b788
2013-12-05	Moving vp9_tree_probs_from_distribution() to encoder.	Dmitry Kovalev
	Writing custom coeff branch count calculation (which is much clearer) in adapt_coef_probs() function. Removing vp9_treecoder.c file. Change-Id: I8880fb7a39996c8bcf6cd0acf9898a8c712ba91f
2013-12-04	Removing vp9_default_coef_probs.h file.	Dmitry Kovalev
	Moving all probability tables from removed file to vp9_entropy.c. Change-Id: I12846f1da778c3016d96b82e53384d4634883430
2013-11-26	Fix 16 wide neon horz loopfilter.	Frank Galligan
	Multiply by 3 was on 8bit vectors when it should have been on 16bit vectors. Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc
2013-11-21	Revert "Add 16 wide neon horz loopfilter."	Frank Galligan
	The change caused mismatches with some test vectors on neon. Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/ Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0
2013-11-21	Merge "Add 16 wide neon horz loopfilter."	Frank Galligan

2013-11-21	Add 16 wide neon horz loopfilter.	Frank Galligan
	Add support to do 16 pixel horizontal filtering in Neon. Nexus devices saw about 0.5% decode speed increase. Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
2013-11-19	Move vp9_sadmxn.h from common to encoder	Yaowu Xu
	Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b
2013-11-18	Merge "Move vp9_extend.{h,c} from common to encoder"	Yaowu Xu

2013-11-18	Move vp9_extend.{h,c} from common to encoder	Yaowu Xu
	Since they used in encoder only. This commit also re-order includes for the files that include vp9_extend.h Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
2013-11-15	Do horizontal loopfiltering in parallel	Yunqing Wang
	This patch followed "Rewrite filter_selectively_horiz for parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. Also, corrected the declaration of aligned arrays. For 8-pixel-in-parallel case, improved the calculation of the masks and filters. Updated the threshold loading since the thresholds were already duplicated. Updated neon C functions to call neon loopfilters twice. Using tulip clip, tests showed it gave a ~1.5% decoder speed gain. Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
2013-11-13	Merge "mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)"	Johann

2013-11-13	mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)	Parag Salasakar
	Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c
2013-11-13	mips dsp-ase r2 vp9 decoder loopfilter module optimizations (rebase)	Parag Salasakar
	Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d
2013-11-07	Merge "Add back vp9_short_idct32x32_1_add_neon which is deleted in cleanup ↵	hkuang
	I63df79a13cf62aa2c9360a7a26933c100f9ebda3."
2013-11-05	Add back vp9_short_idct32x32_1_add_neon which is deleted in	hkuang
	cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3. Change-Id: I034848cf05031618818f7df2e7f9c35102686948
2013-10-31	mb_lpf_horizontal_edge AVX2 optimization	Tamar Levy
	This CL contains two AVX2 optimized loop filter functions, mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16. Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
2013-10-24	mips dsp-ase r2 vp9 decoder idct module optimizations (rebase)	Parag Salasakar
	Change-Id: Iedcdb8867084f328f4fce2fadb968e0984217308
2013-10-11	Merge "SSE2 8-tap sub-pixel filter optimization"	Yunqing Wang

2013-10-10	SSE2 8-tap sub-pixel filter optimization	Yunqing Wang
	To ensure fast encoding/decoding on devices without ssse3 support, SSE2 optimization of sub-pixel filters was done. Test using 1080p clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps with sse2 filters, and ~15fps with c filters. Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
2013-10-10	Merge "Moving all scan/iscan code into separate vp9_scan.{h, c} files."	Dmitry Kovalev

2013-10-09	mips dsp-ase r2 vp9 decoder bilinear convolve optimizations	Parag Salasakar
	Change-Id: Ic31b4ef85e65070b4f8b9f26e068ccfaae00c4f0