libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2014-02-14	AVX2 SubPixel Variance Optimization	levytamar82
	Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_variance64x64 2. vp9_sub_pixel_variance32x32 both of those function were calling vp9_sub_pixel_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 70% function level gain and 2% user level gain Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde
2014-02-10	Merge "*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/"	James Zern

2014-02-06	Finally removing "short" from transform names.	Dmitry Kovalev
	Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d
2014-02-05	Renaming vp9_sad_c.c to vp9_sad.c.	Dmitry Kovalev
	Change-Id: I0beb01b0209cf4ae849b4c67d72107b631f46c0d
2014-02-04	*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/	James Zern
	CONFIG_USE_X86INC is available to every makefile, there's no need to duplicate its value with USE_X86INC Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
2014-02-04	Renaming vp9_variance_c.c to vp9_variance.c.	Dmitry Kovalev
	Change-Id: I7b29cb18ad36d79e1c6329c7de88496059f49db4
2014-01-21	Adds a non-normative resize library to vp9 encoder	Deb Mukherjee
	Adds an arbitrary-size resize library for use in scaling of input frames in a non-normative manner in the vp9 encoder. The method used is as follows: Downsampling - Uses a 8 tap filter for factor of 2 decimation upto a size just higher than the desired size. Then interpolates pixels at a precision of 1/32 pel using a set of 8-tap filters. Upsampling - Interpolates pixels at a precision of 1/32 pel using a set of 8-tap filters. There is no assembly optimization yet. Change-Id: Ib5b81e174fc139da322bb97c8214d52289d60d8a
2014-01-21	Removing duplicated SAD calculation code.	Dmitry Kovalev
	Change-Id: I8d693371a29103769d5bed9d5f9cfe4f58ca3189
2014-01-16	Inter-frame non-RD mode decision	Jingning Han
	This commit setups a test framework for real-time coding. It enables a light motion search for non-RD mode decision purpose. Change-Id: I8bec656331539e963c2b685a70e43e0ae32a6e9d
2014-01-08	AVX2 Variance Optimization	levytamar82
	Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32, vp9_variance64x64, vp9_variance32x16, vp9_variance64x32, vp9_mse16x16 by migrating to AVX2 some of the functions were optimized by processing 32 elements instead of 16. some of the functions were optimized by processing 2 loop strides of 16 elements in a single 256 bit register This optimization gives between 2.4% - 2.7% user level performance gain and 42% function level gain. Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
2013-12-20	Renaming vp9_boolcoder.{h, c} to vp9_writer.{h, c}.	Dmitry Kovalev
	Change-Id: I9b9a5fcce8530284df0f270706ee060a0edc1517
2013-11-27	Merge "vp9_short_fdct32x32_rd vp9_short_fdct32x32 optimized for AVX2"	Yaowu Xu

2013-11-25	Removing vp9_modecosts.{c, h} files.	Dmitry Kovalev
	Renaming vp9_init_mode_costs() to fill_mode_costs() and moving it to vp9_rdopt.c. Change-Id: Ib2542d216458f6dced9f4b7ccbdd2cd98176aa5a
2013-11-21	vp9_short_fdct32x32_rd vp9_short_fdct32x32 optimized for AVX2	levytamar82
	Change-Id: I6366e84490883b72362f762369d7e5bccb64f02f
2013-11-19	Move vp9_sadmxn.h from common to encoder	Yaowu Xu
	Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b
2013-11-18	Move vp9_extend.{h,c} from common to encoder	Yaowu Xu
	Since they used in encoder only. This commit also re-order includes for the files that include vp9_extend.h Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
2013-11-15	Removing vp9_encodeintra.{h, c} files.	Dmitry Kovalev
	There was only one function in *.c file, so moving it to vp9_encodemb.c. Change-Id: I728859d08b3d6c05c33c1c5b21f0ea1d0e0f83af
2013-10-25	Adding fht{4x4, 8x8, 16x16} functions.	Dmitry Kovalev
	Adding these functions to encapsulate tx_type check. Changing TX_TYPE to int to match the declaration in vo9_rtch.h. Change-Id: I6f3a2df6e35595ca73b6aaa9e3909ee7bc3fd16f
2013-10-16	Implement variance-based adaptive quantization	Guillaume Martres
	This should be similar to what x264 does with --aq-mode 1. It works well with clips like parkjoy and touhou (http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv). At low bitrates, the segmentation signaling overhead may negate the benefits of this feature. (PGW) Default changed to feature OFF to allow provisional merge. Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b
2013-09-04	make vp9 postproc a config option	Jim Bankoski
	Vp9 postproc is disabled for now as its not been shown to help and may be merged with vp8. Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057
2013-08-06	variance x86inc guards	Jim Bankoski
	also fixed bug in sad calcs Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d
2013-08-06	sad + miscellaneous updates	Jim Bankoski
	Enable use_x86inc as a commandline option. Fix Bug with sse2 when x86inc is disabled. Adds Sad asm protection to x86inc protection Change-Id: Iee0f9dd235ea10e8ace512eb362ba9bebe8c9df6
2013-08-06	Merge "Move fdct32x32 SSE2 implementation in separate file."	Jingning Han

2013-08-06	Move fdct32x32 SSE2 implementation in separate file.	Christian Duvivier
	This is in preparation for the SSE2 version of the high-precision 32x32 forward DCT which will share a lot of code with the existing low precision version used for rate-distortion search. Change-Id: I7084b6bdfb480b1fabb8493fb14e3f7fcc7888c0
2013-08-06	block error / x86inc mods	Jim Bankoski
	Change-Id: Icb607745634e10b9bac5019d06661ece09fcdb40
2013-08-05	reworked config for use_x86_inc	Jim Bankoski
	Support enabling it or disabling it. Moved read out to configure.sh so that its done once instead of in make and in config. Change-Id: I73a9190cf31de9f03e8a577f478fa522f8c01c8b
2013-07-10	Remove unused fwalsh/fdct x86 SIMD implementations.	Ronald S. Bultje
	Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e
2013-07-01	Merge "Quantize (64-bit only, for now) SSSE3 SIMD."	Yaowu Xu

2013-07-01	Quantize (64-bit only, for now) SSSE3 SIMD.	Ronald S. Bultje
	Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-06-29	Moving encoder subexp encoding functions to subexp.{h, c}.	Dmitry Kovalev
	Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca
2013-06-21	Implement SSE2 block_error.	Ronald S. Bultje
	Change vp9_block_error() to return a 64bit error variable, change all callers to expect a 64bit return value (this will prevent overflows, which we basically don't check for at all right now). Remove duplicate block_error() function, which fixed that through truncation. Remove old (incompatible) mmx/sse2 block_error SIMD versions and replace with a new one that returns a 64bit value. Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to 3min23, i.e. a 3% overall speedup. Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
2013-06-21	Add subtract_block SSE2 version and unit test.	Ronald S. Bultje
	3% faster overall (3min35.0 to 3min28.5). Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e
2013-06-20	Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.	Ronald S. Bultje
	Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-17	Move subpixel variance function from common/ to encoder/.	Ronald S. Bultje
	This seems to only be used in the encoder. Also remove an empty wrapper file that contained forward declarations for this function, but didn't actually define any actual functions. Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b
2013-05-28	Compressed/uncompressed frame header changes.	Dmitry Kovalev
	Adding API to read/write uncompressed frame header bits (it is not final yet). Separate functions to read/write uncompressed header. Moving clr_type, error_resilient_mode, refresh_frame_context, frame_parallel_decoding_mode, frame_context_idx from compressed partition to uncompressed frame header. Change-Id: Id3ed8a387980c652ae147549412f4ec24a0a5bd0
2013-05-28	Revert "Adding API to read/write uncompressed frame header bits." because of ↵	Dmitry Kovalev
	bitstream mismatches. This reverts commit df037b615fcc0196386977faae060fdfd9a887a8 Change-Id: I1a529f2590df7bc912f5035d22311268933e3dd6
2013-05-21	Adding API to read/write uncompressed frame header bits.	Dmitry Kovalev
	The API is not final yet and can be changed. Actual layout of uncompressed frame part will be finalized later. Right now moving clr_type, error_resilient_mode, refresh_frame_context, frame_parallel_decoding_mode from first compressed partition to uncompressed frame part. Change-Id: I3afc5d4ea92c5a114f4c3d88f96858cccc15b76e
2013-05-03	Automatically flag intrinsic files	Johann
	Change-Id: Iee9894615265d42aa23c43a4183924953aedb0c6
2013-04-30	Remove unused quantize optimizations.	Johann
	Files were copied from vp8 and never maintained. Change-Id: I9659a8755985da73e8c19c3c984423b6666d8871
2013-04-26	Merge branch 'master' into experimental	Johann
	Conflicts: vp9/common/vp9_findnearmv.c vp9/common/vp9_rtcd_defs.sh vp9/decoder/vp9_decodframe.c vp9/decoder/x86/vp9_dequantize_sse2.c vp9/encoder/vp9_rdopt.c vp9/vp9_common.mk Resolve file name changes in favor of master. Resolve rdopt changes in favor of experimental, preserving the newer experiments. Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f
2013-04-25	Normalize more intrinsic filenames	Johann
	vp9_dequantize_x86 has only sse2 functions. vp9_dct_sse2_intrinsics has no namespace collision and can drop _intrinsics. vp9_idct_mmx.h is unused. Change-Id: Ic16e31fb372a1d1e841a62ecb4189fe8f95808ec
2013-04-25	Move dequant from BLOCKD to per-plane MACROBLOCKD	John Koleszar
	This data can vary per-plane, but not per-block. Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a
2013-04-16	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4.	Christian Duvivier
	Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
2013-04-16	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4.	Christian Duvivier
	Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
2013-02-28	mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision	Jim Bankoski
	Change-Id: Id786be31da3c91d95d2955aa569ecdc6e66650df
2013-02-27	Faster vp9_short_fdct8x8.	Christian Duvivier
	Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d
2013-02-27	Merge "Remove unused vp9_copy32xn" into experimental	John Koleszar

2013-02-27	Move eob from BLOCKD to MACROBLOCKD.	Ronald S. Bultje
	Consistent with VP8. Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
2013-02-27	Remove unused vp9_copy32xn	John Koleszar
	This function was part of an optimization used in VP8 that required caching two macroblocks. This is unused in VP9, and might not survive refactoring to support superblocks, so removing it for now. Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d
2013-02-15	Remove some Y2-related code.	Ronald S. Bultje
	Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78