libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2013-08-06	variance x86inc guards	Jim Bankoski
	also fixed bug in sad calcs Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d
2013-08-06	sad + miscellaneous updates	Jim Bankoski
	Enable use_x86inc as a commandline option. Fix Bug with sse2 when x86inc is disabled. Adds Sad asm protection to x86inc protection Change-Id: Iee0f9dd235ea10e8ace512eb362ba9bebe8c9df6
2013-08-06	Merge "Move fdct32x32 SSE2 implementation in separate file."	Jingning Han

2013-08-06	Move fdct32x32 SSE2 implementation in separate file.	Christian Duvivier
	This is in preparation for the SSE2 version of the high-precision 32x32 forward DCT which will share a lot of code with the existing low precision version used for rate-distortion search. Change-Id: I7084b6bdfb480b1fabb8493fb14e3f7fcc7888c0
2013-08-06	block error / x86inc mods	Jim Bankoski
	Change-Id: Icb607745634e10b9bac5019d06661ece09fcdb40
2013-08-05	reworked config for use_x86_inc	Jim Bankoski
	Support enabling it or disabling it. Moved read out to configure.sh so that its done once instead of in make and in config. Change-Id: I73a9190cf31de9f03e8a577f478fa522f8c01c8b
2013-07-10	Remove unused fwalsh/fdct x86 SIMD implementations.	Ronald S. Bultje
	Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e
2013-07-01	Merge "Quantize (64-bit only, for now) SSSE3 SIMD."	Yaowu Xu

2013-07-01	Quantize (64-bit only, for now) SSSE3 SIMD.	Ronald S. Bultje
	Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-06-29	Moving encoder subexp encoding functions to subexp.{h, c}.	Dmitry Kovalev
	Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca
2013-06-21	Implement SSE2 block_error.	Ronald S. Bultje
	Change vp9_block_error() to return a 64bit error variable, change all callers to expect a 64bit return value (this will prevent overflows, which we basically don't check for at all right now). Remove duplicate block_error() function, which fixed that through truncation. Remove old (incompatible) mmx/sse2 block_error SIMD versions and replace with a new one that returns a 64bit value. Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to 3min23, i.e. a 3% overall speedup. Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
2013-06-21	Add subtract_block SSE2 version and unit test.	Ronald S. Bultje
	3% faster overall (3min35.0 to 3min28.5). Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e
2013-06-20	Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.	Ronald S. Bultje
	Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-17	Move subpixel variance function from common/ to encoder/.	Ronald S. Bultje
	This seems to only be used in the encoder. Also remove an empty wrapper file that contained forward declarations for this function, but didn't actually define any actual functions. Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b
2013-05-28	Compressed/uncompressed frame header changes.	Dmitry Kovalev
	Adding API to read/write uncompressed frame header bits (it is not final yet). Separate functions to read/write uncompressed header. Moving clr_type, error_resilient_mode, refresh_frame_context, frame_parallel_decoding_mode, frame_context_idx from compressed partition to uncompressed frame header. Change-Id: Id3ed8a387980c652ae147549412f4ec24a0a5bd0
2013-05-28	Revert "Adding API to read/write uncompressed frame header bits." because of ↵	Dmitry Kovalev
	bitstream mismatches. This reverts commit df037b615fcc0196386977faae060fdfd9a887a8 Change-Id: I1a529f2590df7bc912f5035d22311268933e3dd6
2013-05-21	Adding API to read/write uncompressed frame header bits.	Dmitry Kovalev
	The API is not final yet and can be changed. Actual layout of uncompressed frame part will be finalized later. Right now moving clr_type, error_resilient_mode, refresh_frame_context, frame_parallel_decoding_mode from first compressed partition to uncompressed frame part. Change-Id: I3afc5d4ea92c5a114f4c3d88f96858cccc15b76e
2013-05-03	Automatically flag intrinsic files	Johann
	Change-Id: Iee9894615265d42aa23c43a4183924953aedb0c6
2013-04-30	Remove unused quantize optimizations.	Johann
	Files were copied from vp8 and never maintained. Change-Id: I9659a8755985da73e8c19c3c984423b6666d8871
2013-04-26	Merge branch 'master' into experimental	Johann
	Conflicts: vp9/common/vp9_findnearmv.c vp9/common/vp9_rtcd_defs.sh vp9/decoder/vp9_decodframe.c vp9/decoder/x86/vp9_dequantize_sse2.c vp9/encoder/vp9_rdopt.c vp9/vp9_common.mk Resolve file name changes in favor of master. Resolve rdopt changes in favor of experimental, preserving the newer experiments. Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f
2013-04-25	Normalize more intrinsic filenames	Johann
	vp9_dequantize_x86 has only sse2 functions. vp9_dct_sse2_intrinsics has no namespace collision and can drop _intrinsics. vp9_idct_mmx.h is unused. Change-Id: Ic16e31fb372a1d1e841a62ecb4189fe8f95808ec
2013-04-25	Move dequant from BLOCKD to per-plane MACROBLOCKD	John Koleszar
	This data can vary per-plane, but not per-block. Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a
2013-04-16	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4.	Christian Duvivier
	Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
2013-04-16	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4.	Christian Duvivier
	Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
2013-02-28	mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision	Jim Bankoski
	Change-Id: Id786be31da3c91d95d2955aa569ecdc6e66650df
2013-02-27	Faster vp9_short_fdct8x8.	Christian Duvivier
	Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d
2013-02-27	Merge "Remove unused vp9_copy32xn" into experimental	John Koleszar

2013-02-27	Move eob from BLOCKD to MACROBLOCKD.	Ronald S. Bultje
	Consistent with VP8. Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
2013-02-27	Remove unused vp9_copy32xn	John Koleszar
	This function was part of an optimization used in VP8 that required caching two macroblocks. This is unused in VP9, and might not survive refactoring to support superblocks, so removing it for now. Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d
2013-02-15	Remove some Y2-related code.	Ronald S. Bultje
	Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78
2013-02-08	Port sadNxNx4d functions to x86inc.asm.	Ronald S. Bultje
	Change-Id: Ic639f5742f7a007753d7a3fa5c66235172eb31d8
2013-02-08	Add sad64x64 and sad32x32 SSE2 versions.	Ronald S. Bultje
	Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this makes them all slightly faster, particularly on x86-64. Remove SSE3 sad16x16 version, since the SSE2 version is now faster. About 1.5% overall encoding speedup. Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797
2012-12-26	Build fixes to merge vp9-preview into master	John Koleszar
	Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
2012-12-05	Remove ARM optimizations from VP9	Johann
	Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b
2012-12-04	Fix the build with MSVC	Yaowu Xu
	1. remove the dependency on non existing "vp9_temporal_filter_x86.h" 2. prefix filenames with vp9_ in obj_int_extract.bat to reflect the change of the actual filenames. Change-Id: Ib1b4d96ac41788f76917764a6722d8461c857302
2012-11-28	more rtcd cleanup	Jim Bankoski
	Change-Id: Ieefd76e164ca4aa87597da0412977614ddfbacb7
2012-11-27	Add vp9_ prefix to all vp9 files	John Koleszar
	Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
2012-11-07	Rough merge of master into experimental	John Koleszar
	Creates a merge between the master and experimental branches. Fixes a number of conflicts in the build system to allow either VP8 or VP9 to be built. Specifically either: $ configure --disable-vp9 $ configure --disable-vp8 --disable-unit-tests VP9 still exports its symbols and files as VP8, so that will be resolved in the next commit. Unit tests are broken in VP9, but this isn't a new issue. They are fixed upstream on origin/experimental as of this writing, but rebasing this merge proved difficult, so will tackle that in a second merge commit. Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
2012-11-01	Rename vp8/ codec directory to vp9/.	Ronald S. Bultje
	Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4