libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2011-02-14	Improve vp8_sad16x16_sse3 function	Yunqing Wang
	In real-time mode, vp8_sad16x16 function is called heavily in motion search part. Improvement of this function gives 1.2% encoding performance gain (real-time mode, tulip clip). Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
2011-01-25	Merge "update sse2 regular quantizer"	Johann

2011-01-21	Modify sub-pixel filters to eliminate unnecessary calculations	Yunqing Wang
	In sub-pixel calculation, xoffset and yoffset mostly take some specific values. Modified sub-pixel filter functions according to these possible values to improve performance. Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
2011-01-18	Fix encoder real-time only configuration.	Attila Nagy
	Remove allocation/deallocation of stats storage. Remove full search functions in machine specific encoder inits. Remove last pass validation in validate_config. Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
2011-01-14	update sse2 regular quantizer	Johann
	about ~5% gain on 32bit. disabled for 64bit unset executable bit on ssse3 version (cosmetic) Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
2011-01-11	use unaligned load	Johann
	source buffer is not guaranteed to be aligned for odd size buffers Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
2011-01-06	x86 sse2 temporal_filter_apply	Johann
	count can be reduced to short because the max number of filtered frames is set to 15. the max value for any frame is 32 (modifier = 16, filter_weight = 2). 15*32 = 480 which requires 9 bits this function goes from about 7000 us / 1000 iterations for the C code to < 275 us / 1000 iterations for sse2 for block_size = 16 and from about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8 Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
2010-12-28	Use the fast quantizer for inter mode selection	Scott LaVarnway
	Use the fast quantizer for inter mode selection and the regular quantizer for the rest of the encode for good quality, speed 1. Both performance and quality were improved. The quality gains will make up for the quality loss mentioned in I9dc089007ca08129fb6c11fe7692777ebb8647b0. Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
2010-12-13	remove unused temporal preproc code	John Koleszar
	This code is unused, as the current preproc implementation uses the same spatial filter that postproc uses. Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
2010-12-09	vp8 fast quantizer sse2 optimizations for eob.	Fritz Koenig
	Changed the end of block computation to use pmaxw. Removed additional pushing and popping of registers that was not needed. Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98
2010-11-15	Remove stack shadowing for x86-x64 for SAD functions.	Fritz Koenig
	x86-64 passes arguments in registers. There is no need to push them to the stack before using them. This fixes 15acc84f10cefd98b2f8dbd2eac2cc92c5a3f851 where ebx was not getting preserved on x86. Change-Id: I1214b5f818a0201f75ab6ad7d5c6f448e09b16c2
2010-11-11	Revert "Remove stack shadowing for x86-64"	Fritz Koenig
	This reverts commit 15acc84f10cefd98b2f8dbd2eac2cc92c5a3f851. Change-Id: Ia640be8cbc134432914849c1750f62575ea084e6
2010-11-10	Merge "Remove stack shadowing for x86-64"	Fritz Koenig

2010-11-10	FDCT optimizations.	Fritz Koenig
	Fixed up the fdct for mmx and 8x4 sse2 to match them most recent changes. Change-Id: Ibee2d6c536fe14dcf75cd6eb1c73f4848a56d719
2010-11-01	SSSE3 version of fast quantizer	Scott LaVarnway
	(test clip: tulip) For good quality mode with speed=1, this gave the encoder a small (2 - 3%) performance boost. Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
2010-10-28	Save XMM registers in asm functions	Yunqing Wang
	XMM6/7 are used in these functions, and need to be saved. Change-Id: I3dfaddaf2a69cd4bf8e8735c7064b17bac5a14e5
2010-10-28	Fix full-search SAD function crash in Visual Studio	Yunqing Wang
	Unlike GCC, Visual Studio compiler doesn't allocate SAD output array 16-byte aligned, which causes crash in visual studio. Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
2010-10-27	Full search SAD function optimization in SSE4.1	Yunqing Wang
	Use mpsadbw, and calculate 8 sad at once. Function list: vp8_sad16x16x8_sse4 vp8_sad16x8x8_sse4 vp8_sad8x16x8_sse4 vp8_sad8x8x8_sse4 vp8_sad4x4x8_sse4 (test clip: tulip) For best quality mode, this gave encoder a 5% performance boost. For good quality mode with speed=1, this gave encoder a 3% performance boost. Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
2010-10-27	Fix half-pixel variance RTCD functions	John Koleszar
	This patch fixes the system dependent entries for the half-pixel variance functions in both the RTCD and non-RTCD cases: - The generic C versions of these functions are now correct. Before all three cases called the hv code. - Wire up the ARM functions in RTCD mode - Created stubs for x86 to call the optimized subpixel functions with the correct parameters, rather than falling back to C code. Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
2010-10-25	add missing GET_GOT/RESTORE_GOT pairs	John Koleszar
	These functions made global references but did not set up the GOT, causing compilation failures in PIC mode. Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
2010-10-21	Convert [4][4] matrices to [16] arrays.	Timothy B. Terriberry
	Most of the code that actually uses these matrices indexes them as if they were a single contiguous array, and coverity produces reports about the resulting accesses that overflow the static bounds of the first row. This is perfectly legal in C, but converting them to actual [16] arrays should eliminate the report, and removes a good deal of extraneous indexing and address operators from the code. Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
2010-10-21	Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm	Yunqing Wang
	Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
2010-10-21	Remove stack shadowing for x86-64	Fritz Koenig
	x86-64 passes most arguments in registers. There is no need to push them to the stack before using them. Change-Id: I13c683f1358782682ecafaf1df3fb0af23b978ea
2010-10-21	Rewrite vp8_short_walsh4x4_sse2()	Yunqing Wang
	This rewriting reflects changes made in commit "Improve the accuracy of forward walsh-hadamard transform". Since this function is not called much, only a small encoder performance gain (~0.5% ) is seen. Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
2010-10-18	Add SSE2 subtract functions	Yunqing Wang
	Instead of doing 8-bit data unpack and 16-bit subtraction, use psubb to do 16 8-bit subtractions and pcmpgtb to preserve the sign information. This does not bring noticable gain since these functions are not called frequently. Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
2010-10-13	Fix compiler warning about vp8_fast_quantize_b_impl_ssse2.	Fritz Koenig
	Typo had function defined as _ssse2 and prototyped as _sse2. Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
2010-10-13	Correct QWORD usage in assembly files	Fritz Koenig
	QWORD was being undefined because it was being used incorrectly. Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876
2010-10-12	Merge "Add const qualifiers to variance/SAD functions."	John Koleszar

2010-10-12	Add const qualifiers to variance/SAD functions.	Timothy B. Terriberry
	These functions should never change their input, and there's no reason not to declare that. This allows them to be passed static const data. Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
2010-10-11	Merge "Added vp8_fast_quantize_b_sse2"	Scott LaVarnway

2010-10-07	Remove unused file in encoder	Yunqing Wang
	Remove vp8/encoder/x86/csystemdependent.c Change-Id: I7c590dcd07b68704d463a1452f62f29ffb1402f4
2010-10-07	Added vp8_fast_quantize_b_sse2	Scott LaVarnway
	Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into quantize_sse2.asm and renamed. Updated the assembly code to match the C version. Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
2010-10-04	nasm: address labels 'rel label' vice 'wrt rip'	Jan Kratochvil
	nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04	nasm: match instruction length (movd/movq) to parameters	Jan Kratochvil
	nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-09-09	Use WebM in copyright notice for consistency	John Koleszar
	Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-08-02	nasm: end labels with colon (':')	Jan Kratochvil
	Labels should end by colon (':'), nasm requires it. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I0b2ec6f01afb061d92841887affb5ca0084f936f
2010-08-02	nasm: use OWORD vs DQWORD	Jan Kratochvil
	nasm knows only OWORD. yasm knows both OWORD and DQWORD. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I62151390089e90df9a7667822fa594ac20b00e78
2010-07-27	x86/sse2: disable asm quantizer	Johann
	follow up to Change I0e51492d: neon: disable asm quantizer Now x86 doesn't segfault with --disable-runtime-cpu-detect and -p=2 Change-Id: I8ca127bb299198efebbcbd5a661e81788361933f
2010-07-23	Make the quantizer exact.	Timothy B. Terriberry
	This replaces the approximate division-by-multiplication in the quantizer with an exact one that costs just one add and one shift extra. The asm versions have not been updated in this patch, and thus have been disabled, since the new method requires different multipliers which are not compatible with the old method. Change-Id: I53ac887af0f969d906e464c88b1f4be69c6b1206
2010-06-28	Improve the accuracy of forward walsh-hadamard transform	Yaowu Xu
	Besides the slight improvement in round trip error. This also fixes a sign bias in the forward transform, so the round trip errors are evenly distributed between +1s and -1s. The old bias seemed to work well with the dc sign bias in old fdct, which no longer exist in the improved fdct. Change-Id: I8635e7be16c69e69a8669eca5438550d23089cef
2010-06-24	Added first-pass sse2 version of Yaowu's new fdct.	Scott LaVarnway
	Change-Id: Ib479210067510162879c368428b92690591120b2
2010-06-24	Redo the forward 4x4 dct	Yaowu Xu
	The new fdct lowers the round trip sum squared error for a 4x4 block ~0.12. or ~0.008/pixel. For reference, the old matrix multiply version has average round trip error 1.46 for a 4x4 block. Thanks to "derf" for his suggestions and references. Change-Id: I5559d1e81d333b319404ab16b336b739f87afc79
2010-06-18	vp8_block_error_xmm: remove unnecessary instructions	Jim Bankoski
	Remove a couple instructions from this function which weren't necessary for correct execution. Change-Id: Ib649674f140689f7e5c1530c35686241688a3151
2010-06-18	cosmetics: trim trailing whitespace	John Koleszar
	When the license headers were updated, they accidentally contained trailing whitespace, so unfortunately we have to touch all the files again. Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
2010-06-14	sse2 version of vp8_regular_quantize_b	Scott LaVarnway
	Added sse2 version of vp8_regular_quantize_b which improved encode performance(for the clip used) by ~10% for 32 bit builds and ~3% for 64 bit builds. Also updated SHADOW_ARGS_TO_STACK to allow for more than 9 arguments. Change-Id: I62f78eabc8040b39f3ffdf21be175811e96b39af
2010-06-11	Enable vp8_sad16x16x4d_sse3 in non-RTCD case	John Koleszar
	Typo caused C version of 16x16x4 SAD to be called when built with --disable-runtime-cpu-detect. Change-Id: I0fe6fa67280b3a5f13acb3c8ed914f039aaaf316
2010-06-04	LICENSE: update with latest text	John Koleszar
	Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
2010-05-18	Initial WebM release	John Koleszar