libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2011-03-11	Merge "Align SAD output array to be 16-byte aligned"	Yunqing Wang

2011-03-11	vp8cx- alternate ssim function with optimizations	Jim Bankoski
	Change-Id: I91921b0a90dbaddc7010380b038955be347964b3
2011-03-11	Align SAD output array to be 16-byte aligned	Yunqing Wang
	Use aligned store. Change-Id: Icab4c0c53da811d0c52bb7e8134927f249ba2499
2011-03-09	Add vp8_sub_pixel_variance16x8_ssse3 function	Yunqing Wang
	Added SSSE3 function Change-Id: I8c304c92458618d93fda3a2f62bd09ccb63e75ad
2011-03-09	Remove unused functions	Yunqing Wang
	Removed some unused functions Change-Id: Ifdfc27453e53cfc75997b38492901d193a16b245
2011-03-08	Improve SSE2 half-pixel filter funtions	Yunqing Wang
	Rewrote these functions to process 16 pixels once instead of 8. Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
2011-03-08	Add zero offset checking in SSE2 sub-pixel filter function	Yunqing Wang
	Skip filter at zero offset. Change-Id: I95fc7e211869bc0ab5bcfb7ab2e3259d1c0ccf38
2011-03-08	Write SSSE3 sub-pixel filter function	Yunqing Wang
	1. Process 16 pixels at one time instead of 8. 2. Add check for both xoffset =0 and yoffset=0, which happens during motion search. This change gave encoder 1%~3% performance gain. Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
2011-02-28	Merge "Add prefetch before variance calculation"	Yunqing Wang

2011-02-28	Add prefetch before variance calculation	Yunqing Wang
	This improved encoding performance by 0.5% (good, speed 1) to 1.5% (good, speed 5). Change-Id: I843d72a0d68a90b5f694adf770943e4a4618f50e
2011-02-22	Remove temporal alt ref from realtime only build	Attila Nagy
	It is not used in realtime mode. Reduces memory footprint. Change-Id: I7f163225762368df5457cfd413050161d3704a3f
2011-02-18	Revert "use unaligned load"	Johann
	This reverts commit f50f2fd2a73f2c5ee3f10ad077e780398df17cd7. Change Ib7506e3e aligns the buffer Change-Id: Ie0f8bd3e57cfdfef81d39638a1451458ebbae2e0
2011-02-17	Merge "Fix relative include paths"	John Koleszar

2011-02-14	Improve vp8_sad16x16_sse3 function	Yunqing Wang
	In real-time mode, vp8_sad16x16 function is called heavily in motion search part. Improvement of this function gives 1.2% encoding performance gain (real-time mode, tulip clip). Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
2011-02-10	Fix relative include paths	John Koleszar
	Allow compiling without adding vp8/{common,encoder,decoder} to the include paths. Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-01-25	Merge "update sse2 regular quantizer"	Johann

2011-01-21	Modify sub-pixel filters to eliminate unnecessary calculations	Yunqing Wang
	In sub-pixel calculation, xoffset and yoffset mostly take some specific values. Modified sub-pixel filter functions according to these possible values to improve performance. Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
2011-01-18	Fix encoder real-time only configuration.	Attila Nagy
	Remove allocation/deallocation of stats storage. Remove full search functions in machine specific encoder inits. Remove last pass validation in validate_config. Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
2011-01-14	update sse2 regular quantizer	Johann
	about ~5% gain on 32bit. disabled for 64bit unset executable bit on ssse3 version (cosmetic) Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
2011-01-11	use unaligned load	Johann
	source buffer is not guaranteed to be aligned for odd size buffers Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
2011-01-06	x86 sse2 temporal_filter_apply	Johann
	count can be reduced to short because the max number of filtered frames is set to 15. the max value for any frame is 32 (modifier = 16, filter_weight = 2). 15*32 = 480 which requires 9 bits this function goes from about 7000 us / 1000 iterations for the C code to < 275 us / 1000 iterations for sse2 for block_size = 16 and from about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8 Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
2010-12-28	Use the fast quantizer for inter mode selection	Scott LaVarnway
	Use the fast quantizer for inter mode selection and the regular quantizer for the rest of the encode for good quality, speed 1. Both performance and quality were improved. The quality gains will make up for the quality loss mentioned in I9dc089007ca08129fb6c11fe7692777ebb8647b0. Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
2010-12-13	remove unused temporal preproc code	John Koleszar
	This code is unused, as the current preproc implementation uses the same spatial filter that postproc uses. Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
2010-12-09	vp8 fast quantizer sse2 optimizations for eob.	Fritz Koenig
	Changed the end of block computation to use pmaxw. Removed additional pushing and popping of registers that was not needed. Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98
2010-11-15	Remove stack shadowing for x86-x64 for SAD functions.	Fritz Koenig
	x86-64 passes arguments in registers. There is no need to push them to the stack before using them. This fixes 15acc84f10cefd98b2f8dbd2eac2cc92c5a3f851 where ebx was not getting preserved on x86. Change-Id: I1214b5f818a0201f75ab6ad7d5c6f448e09b16c2
2010-11-11	Revert "Remove stack shadowing for x86-64"	Fritz Koenig
	This reverts commit 15acc84f10cefd98b2f8dbd2eac2cc92c5a3f851. Change-Id: Ia640be8cbc134432914849c1750f62575ea084e6
2010-11-10	Merge "Remove stack shadowing for x86-64"	Fritz Koenig

2010-11-10	FDCT optimizations.	Fritz Koenig
	Fixed up the fdct for mmx and 8x4 sse2 to match them most recent changes. Change-Id: Ibee2d6c536fe14dcf75cd6eb1c73f4848a56d719
2010-11-01	SSSE3 version of fast quantizer	Scott LaVarnway
	(test clip: tulip) For good quality mode with speed=1, this gave the encoder a small (2 - 3%) performance boost. Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
2010-10-28	Save XMM registers in asm functions	Yunqing Wang
	XMM6/7 are used in these functions, and need to be saved. Change-Id: I3dfaddaf2a69cd4bf8e8735c7064b17bac5a14e5
2010-10-28	Fix full-search SAD function crash in Visual Studio	Yunqing Wang
	Unlike GCC, Visual Studio compiler doesn't allocate SAD output array 16-byte aligned, which causes crash in visual studio. Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
2010-10-27	Full search SAD function optimization in SSE4.1	Yunqing Wang
	Use mpsadbw, and calculate 8 sad at once. Function list: vp8_sad16x16x8_sse4 vp8_sad16x8x8_sse4 vp8_sad8x16x8_sse4 vp8_sad8x8x8_sse4 vp8_sad4x4x8_sse4 (test clip: tulip) For best quality mode, this gave encoder a 5% performance boost. For good quality mode with speed=1, this gave encoder a 3% performance boost. Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
2010-10-27	Fix half-pixel variance RTCD functions	John Koleszar
	This patch fixes the system dependent entries for the half-pixel variance functions in both the RTCD and non-RTCD cases: - The generic C versions of these functions are now correct. Before all three cases called the hv code. - Wire up the ARM functions in RTCD mode - Created stubs for x86 to call the optimized subpixel functions with the correct parameters, rather than falling back to C code. Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
2010-10-25	add missing GET_GOT/RESTORE_GOT pairs	John Koleszar
	These functions made global references but did not set up the GOT, causing compilation failures in PIC mode. Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
2010-10-21	Convert [4][4] matrices to [16] arrays.	Timothy B. Terriberry
	Most of the code that actually uses these matrices indexes them as if they were a single contiguous array, and coverity produces reports about the resulting accesses that overflow the static bounds of the first row. This is perfectly legal in C, but converting them to actual [16] arrays should eliminate the report, and removes a good deal of extraneous indexing and address operators from the code. Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
2010-10-21	Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm	Yunqing Wang
	Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
2010-10-21	Remove stack shadowing for x86-64	Fritz Koenig
	x86-64 passes most arguments in registers. There is no need to push them to the stack before using them. Change-Id: I13c683f1358782682ecafaf1df3fb0af23b978ea
2010-10-21	Rewrite vp8_short_walsh4x4_sse2()	Yunqing Wang
	This rewriting reflects changes made in commit "Improve the accuracy of forward walsh-hadamard transform". Since this function is not called much, only a small encoder performance gain (~0.5% ) is seen. Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
2010-10-18	Add SSE2 subtract functions	Yunqing Wang
	Instead of doing 8-bit data unpack and 16-bit subtraction, use psubb to do 16 8-bit subtractions and pcmpgtb to preserve the sign information. This does not bring noticable gain since these functions are not called frequently. Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
2010-10-13	Fix compiler warning about vp8_fast_quantize_b_impl_ssse2.	Fritz Koenig
	Typo had function defined as _ssse2 and prototyped as _sse2. Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
2010-10-13	Correct QWORD usage in assembly files	Fritz Koenig
	QWORD was being undefined because it was being used incorrectly. Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876
2010-10-12	Merge "Add const qualifiers to variance/SAD functions."	John Koleszar

2010-10-12	Add const qualifiers to variance/SAD functions.	Timothy B. Terriberry
	These functions should never change their input, and there's no reason not to declare that. This allows them to be passed static const data. Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
2010-10-11	Merge "Added vp8_fast_quantize_b_sse2"	Scott LaVarnway

2010-10-07	Remove unused file in encoder	Yunqing Wang
	Remove vp8/encoder/x86/csystemdependent.c Change-Id: I7c590dcd07b68704d463a1452f62f29ffb1402f4
2010-10-07	Added vp8_fast_quantize_b_sse2	Scott LaVarnway
	Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into quantize_sse2.asm and renamed. Updated the assembly code to match the C version. Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
2010-10-04	nasm: address labels 'rel label' vice 'wrt rip'	Jan Kratochvil
	nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04	nasm: match instruction length (movd/movq) to parameters	Jan Kratochvil
	nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-09-09	Use WebM in copyright notice for consistency	John Koleszar
	Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-08-02	nasm: end labels with colon (':')	Jan Kratochvil
	Labels should end by colon (':'), nasm requires it. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I0b2ec6f01afb061d92841887affb5ca0084f936f