libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2011-05-25	Return sse value in vp8_variance SSE2 functions	Yunqing Wang
	Minor modification. Change-Id: I09511d38fd1451d5c4106a48acdb3f766ce59cb7
2011-05-19	Merge "changed configure option name to reduce confusion"	John Koleszar

2011-05-10	Merge "Use diamond search to replace full search in full-pixel refining search"	Yunqing Wang

2011-05-09	Use diamond search to replace full search in full-pixel refining search	Yunqing Wang
	In NEWMV mode, currently, full search is used as the refining search after n-step search. By replacing it with an iterative diamond search of radius 1 largely reduced the computation complexity, but still maintained the same encoding quality since the refining search is done for every macroblock instead of only a small precentage of macroblocks while using full search. Tests on the test set showed a 3.4% encoding speed increase with none psnr & ssim loss. Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
2011-05-09	clean up unused variable warnings	Johann
	Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
2011-04-29	changed configure option name to reduce confusion	Yaowu Xu
	Renamed configure option "enable-psnr" to "enable-internal-stats" to better reflect the purpose of the option and eliminate the confusion reported in http://code.google.com/p/webm/issues/detail?id=35 Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
2011-04-25	Merge "keep values in registers during quantization"	Johann

2011-04-22	Fix overflow in temporal_filter_apply_sse2().	Ronald S. Bultje
	The accumulator array is an integer array, so use paddd instead of paddw to add values to it. Fixes overflows when using large --arnr-maxframes (>8) values. Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
2011-04-21	keep values in registers during quantization	Johann
	add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm registers instead of proxying through the stack. and as long as we're bumping up, use some ssse3 instructions in the EOB detection (see ssse3 fast quantizer) pick up about a percent on 32bit and about two on 64bit. Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
2011-04-19	modify SAVE_XMM for potential 64bit use	Johann
	the win64 abi requires saving and restoring xmm6:xmm15. currently SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow specifying the highest register used and if the stack is unaligned. Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-19	Merge "Add save/restore xmm registers in x86 assembly code"	Johann

2011-04-18	Add save/restore xmm registers in x86 assembly code	Johann
	Went through the code and fixed it. Verified on Windows. Where possible, remove dependencies on xmm[67] Current code relies on pushing rbp to the stack to get 16 byte alignment. This broke when rbp wasn't pushed (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned memory accesses. Revisit this and the offsets in vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM. Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18	Merge "store quant_shift as an unsigned char"	Johann

2011-04-18	Merge "fixed an overflow in ssim calculation"	Yaowu Xu

2011-04-13	store quant_shift as an unsigned char	Johann
	in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant only use 8 bits to store this, instead of 16. will allow saving an xmm register in an updated version of the regular quantize Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
2011-04-11	Set cpu_used range to [-16, 16] in real-time mode	Yunqing Wang
	Remove encoding speed limitation in real-time mode. Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617
2011-04-07	fixed an overflow in ssim calculation	Jim Bankoski
	This commit fixed an overflow in ssim calculation, added register save and restore to make sure assembly code working for x64 platform. It also changed the sampling points to every 4x4 instead of 8x8 and adjusted the constants in SSIM calculation to match the scale of previous VPXSSIM. Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
2011-04-07	use asm_offsets with vp8_fast_quantize_b_sse3	Johann Koenig
	on the same order as the sse2 fast quantize change: ~2% except for 32bit. only a slight improvment there. Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
2011-04-07	Use correct 32 bit comparisons for SAD breakout.	James Berry
	Rax updated to eax to avoid uninitialized memory usage. Change-Id: Iedb953f104329ede2a786fc648a47f1be2f3798a
2011-04-04	use asm_offsets with vp8_fast_quantize_b_sse2	Johann
	on the same order as the regular quantize change: ~2% Change-Id: I5c9eec18e89ae7345dd96945cb740e6f349cee86
2011-04-01	tweak vp8_regular_quantize_b_sse2	Johann
	rather than look up rc in the zig zag table, embed it in the macro. this also allows us to shuffle some values in the macro and keep *d in rsi gains of about the same order as the obj_int_extract implementation: ~2% Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
2011-03-29	Merge "Fix a crash while enabling shared (--enable-shared)"	Yunqing Wang

2011-03-29	Fix a crash while enabling shared (--enable-shared)	Yunqing Wang
	Fixed a bug in SSSE3 sub-pixel filter functions. Change-Id: I2e2126652970eb78307ffcefcace1efd5966fb0a
2011-03-29	use GLOBAL correctly on 32bit shared libraries	Johann
	http://code.google.com/p/webm/issues/detail?id=309 Change-Id: I6fce9e2f74bc09a9f258df7f91ab599812324e8c
2011-03-24	use asm_offsets with vp8_regular_quantize_b_sse2	Johann
	remove helper function and avoid shadowing all the arguments to the stack on 64bit systems when running with --good --cpu-used=0: ~2% on linux x86 and x86_64 ~2% on win32 x86 msys and visual studio more on darwin10 x86_64 significantly more on x86_64-win64-vs9 Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
2011-03-21	Remove unused vp8_get4x4sse_cs_mmx declaration	John Koleszar
	This declaration did not match the prototype_sad() prototype, but was unused in this translation unit, so it is removed instead. Fixes issue 290. Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
2011-03-17	Increase static linkage, remove unused functions	John Koleszar
	A large number of functions were defined with external linkage, even though they were only used from within one file. This patch changes their linkage to static and removes the vp8_ prefix from their names, which should make it more obvious to the reader that the function is contained within the current translation unit. Functions that were not referenced were removed. These symbols were identified by: $ nm -A libvpx.a \| sort -k3 \| uniq -c -f2 \| grep ' [A-Z] ' \ \| sort \| grep '^ *1 ' Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-03-11	Merge "Align SAD output array to be 16-byte aligned"	Yunqing Wang

2011-03-11	vp8cx- alternate ssim function with optimizations	Jim Bankoski
	Change-Id: I91921b0a90dbaddc7010380b038955be347964b3
2011-03-11	Align SAD output array to be 16-byte aligned	Yunqing Wang
	Use aligned store. Change-Id: Icab4c0c53da811d0c52bb7e8134927f249ba2499
2011-03-09	Add vp8_sub_pixel_variance16x8_ssse3 function	Yunqing Wang
	Added SSSE3 function Change-Id: I8c304c92458618d93fda3a2f62bd09ccb63e75ad
2011-03-09	Remove unused functions	Yunqing Wang
	Removed some unused functions Change-Id: Ifdfc27453e53cfc75997b38492901d193a16b245
2011-03-08	Improve SSE2 half-pixel filter funtions	Yunqing Wang
	Rewrote these functions to process 16 pixels once instead of 8. Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
2011-03-08	Add zero offset checking in SSE2 sub-pixel filter function	Yunqing Wang
	Skip filter at zero offset. Change-Id: I95fc7e211869bc0ab5bcfb7ab2e3259d1c0ccf38
2011-03-08	Write SSSE3 sub-pixel filter function	Yunqing Wang
	1. Process 16 pixels at one time instead of 8. 2. Add check for both xoffset =0 and yoffset=0, which happens during motion search. This change gave encoder 1%~3% performance gain. Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
2011-02-28	Merge "Add prefetch before variance calculation"	Yunqing Wang

2011-02-28	Add prefetch before variance calculation	Yunqing Wang
	This improved encoding performance by 0.5% (good, speed 1) to 1.5% (good, speed 5). Change-Id: I843d72a0d68a90b5f694adf770943e4a4618f50e
2011-02-22	Remove temporal alt ref from realtime only build	Attila Nagy
	It is not used in realtime mode. Reduces memory footprint. Change-Id: I7f163225762368df5457cfd413050161d3704a3f
2011-02-18	Revert "use unaligned load"	Johann
	This reverts commit f50f2fd2a73f2c5ee3f10ad077e780398df17cd7. Change Ib7506e3e aligns the buffer Change-Id: Ie0f8bd3e57cfdfef81d39638a1451458ebbae2e0
2011-02-17	Merge "Fix relative include paths"	John Koleszar

2011-02-14	Improve vp8_sad16x16_sse3 function	Yunqing Wang
	In real-time mode, vp8_sad16x16 function is called heavily in motion search part. Improvement of this function gives 1.2% encoding performance gain (real-time mode, tulip clip). Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
2011-02-10	Fix relative include paths	John Koleszar
	Allow compiling without adding vp8/{common,encoder,decoder} to the include paths. Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-01-25	Merge "update sse2 regular quantizer"	Johann

2011-01-21	Modify sub-pixel filters to eliminate unnecessary calculations	Yunqing Wang
	In sub-pixel calculation, xoffset and yoffset mostly take some specific values. Modified sub-pixel filter functions according to these possible values to improve performance. Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
2011-01-18	Fix encoder real-time only configuration.	Attila Nagy
	Remove allocation/deallocation of stats storage. Remove full search functions in machine specific encoder inits. Remove last pass validation in validate_config. Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
2011-01-14	update sse2 regular quantizer	Johann
	about ~5% gain on 32bit. disabled for 64bit unset executable bit on ssse3 version (cosmetic) Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
2011-01-11	use unaligned load	Johann
	source buffer is not guaranteed to be aligned for odd size buffers Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
2011-01-06	x86 sse2 temporal_filter_apply	Johann
	count can be reduced to short because the max number of filtered frames is set to 15. the max value for any frame is 32 (modifier = 16, filter_weight = 2). 15*32 = 480 which requires 9 bits this function goes from about 7000 us / 1000 iterations for the C code to < 275 us / 1000 iterations for sse2 for block_size = 16 and from about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8 Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
2010-12-28	Use the fast quantizer for inter mode selection	Scott LaVarnway
	Use the fast quantizer for inter mode selection and the regular quantizer for the rest of the encode for good quality, speed 1. Both performance and quality were improved. The quality gains will make up for the quality loss mentioned in I9dc089007ca08129fb6c11fe7692777ebb8647b0. Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
2010-12-13	remove unused temporal preproc code	John Koleszar
	This code is unused, as the current preproc implementation uses the same spatial filter that postproc uses. Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7