libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2015-04-01	Merge "Optimize quantization simd implementation"	Jingning Han

2015-04-01	Refactor block_yrd function for RTC coding mode	Jingning Han
	This commit separates Hadamard transform/quantization operations from rate and distortion computation in block_yrd. This allows one to skip SATD computation when all transform blocks are quantized to zero. It also uses a new block error function that skips repeated computation of sum of squared residuals. It reduces the CPU cycles spent on block error calculation in block_yrd by 40%. Change-Id: I726acb2454b44af1c3bd95385abecac209959b10
2015-04-01	Optimize quantization simd implementation	Jingning Han
	This commit allows the quantizer to compare the AC coefficients to the quantization step size to determine if further multiplication operations are needed. It makes the quantization process 20% faster without coding statistics change. Change-Id: I735aaf6a9c0874c82175bb565b20e131464db64a
2015-03-31	Use aligned copy in 8x8 Hadamard transform SSE2	Jingning Han
	This reduces the 8x8 Hadamard transform cycles by 20%. Change-Id: If34c5e02f3afa42244c6efabe121f7cf5d2df41b
2015-03-30	Fix 8x8 Hadamard SSE2 implementation	Jingning Han
	This commit fixes the SSE2 version 8x8 Hadamard transform alignment and makes it consistent with the C version. Change-Id: I1304e5f97e0e5ef2d798fe38081609c39f5bfe74
2015-03-30	Enable 16x16 Hadamard transform in SATD based mode decision	Jingning Han
	This commit replaces the 16x16 2D-DCT transform with Hadamard transform for RTC coding mode. It reduces the CPU cycles cost on 16x16 transform by 5X. Overall it makes the speed -6 encoding speed 1.5% faster without compromise on compression performance. Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b
2015-03-30	Hadamard transform based coding mode decision process	Jingning Han
	This commit uses Hadamard transform based rate-distortion cost estimate for rtc coding mode decision. It improves the compression performance of speed -6 for many hard clips at lower bit-rates. For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for niklas720p. This will introduce extra encoding cycle costs at this point. Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375
2015-03-18	vp9_fdct8x8_quant_ssse3: quiet a static analysis warning	James Zern
	add an assert to validate 'in' array size Change-Id: Ie5a24275c066d9dd59714f6104510abbd4850dc5
2015-03-18	vp9_fdct8x8_quant_sse2: quiet a static analysis warning	James Zern
	add an assert to validate 'in' array size Change-Id: Ib72946a86f34e1ce8a69954e8e3e4fe1a0f18a91
2015-03-16	Refactor column integral projection computation	Jingning Han
	Move the scaling factor outside column projection. This avoids repeated calculation of the same scaling factor. Profiling shows that the percentage of vp9_int_pro_col_sse2 of overall cycles goes from 2.29% down to 1.88%. Change-Id: I5ac4e324ab2d7f33ba2de66dd2a12e04e04dfd66
2015-03-12	Fix fdct8x8_quant ssse3 overflow issue	Jingning Han
	This resolves webm issue 968. Change-Id: Ieb363129b1e135a561141c68211d413226aba754
2015-03-11	Apply fast motion search to golden reference frame	Jingning Han
	This commit enables the rtc coding mode to run integral projection based motion search for golden reference frame. It improves the speed -6 compression performance by 1.1% on average, 3.46% for jimred_vga, 6.46% for tacomascmvvga, and 0.5% for vidyo clips. The speed -6 is about 6% slower. Change-Id: I0fe402ad2edf0149d0349ad304ab9b2abdf0c804
2015-03-03	Scale the normalization factor depending on the block size	Jingning Han
	Change-Id: I0a26994bf65ea224e496b09af2ce71e1a4210433
2015-03-01	Use variance metric for integral projection vector match	Jingning Han
	This commit replaces the SAD with variance as metric for the integral projection vector match. It improves the search accuracy in the presence of slight light change. The average speed -6 compression performance for rtc set is improved by 1.7%. No speed changes are observed for the test clips. Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d
2015-02-26	Refactor integral projection based motion estimation	Jingning Han
	Support variable block size integral projection based motion estimation. Change-Id: Iee6d65e44df4480aa13fb7b84b9c91914b89caa1
2015-02-25	Merge "Fix ssse3 quantize_fp functions while skip=1"	Yunqing Wang

2015-02-24	Fix fwd transform sse2 build issue on older gcc version	Jingning Han
	Change-Id: I3e0e53d129552babf29e6c5d047483733983973c
2015-02-24	Fix ssse3 quantize_fp functions while skip=1	Yunqing Wang
	In ssse3 functions, DEFINE_ARGS macro hard codes qcoeff and dqcoeff to r3 and r4. If skip is 1, qcoeff and dqcoeff need to be loaded from the stack, which doesn't work because of the above definitions. Currently, skip=1 case is not used in the encoder. This patch fixed the issue, so it can be turned on later. Change-Id: I998d696b1a7a85dca2b3bcee790b21c21e039147
2015-02-19	Integral projection based motion estimation	Jingning Han
	This commit introduces a new block match motion estimation using integral projection measurement. The 2-D block and the nearby region is projected onto the horizontal and vertical 1-D vectors, respectively. It then runs vector match, instead of block match, over the two separate 1-D vectors to locate the motion compensated reference block. This process is run per 64x64 block to align the reference before choosing partitioning in speed 6. The overall CPU cycle cost due to this additional 64x64 block match (SSE2 version) takes around 2% at low bit-rate rtc speed 6. When strong motion activities exist in the video sequence, it substantially improves the partition selection accuracy, thereby achieving better compression performance and lower CPU cycles. The experiments were tested in RTC speed -6 setting: cloud 1080p 500 kbps 17006 b/f, 37.086 dB, 5386 ms -> 16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster) pedestrian_area 1080p 500 kbps 53537 b/f, 36.771 dB, 18706 ms -> 51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings) blue_sky 1080p 500 kbps 70214 b/f, 33.600 dB, 13979 ms -> 53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster) jimred 400 kbps 13380 b/f, 36.014 dB, 5723 ms -> 13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower) Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c
2015-02-05	Fix high bit depth assembly function bugs	Yunqing Wang
	The high bit depth build failed while building for 32bit target. The bugs were in vp9_highbd_subpel_variance.asm and vp9_highbd_sad4d_sse2.asm functions. This patch fixed the bugs, and made 32bit build work. Change-Id: Idc8e5e1b7965bb70d4afba140c6583c5d9666b75
2015-01-27	Fix issues in 32bit PIC enabled build	Yunqing Wang
	This patch was to fix issue 924: https://code.google.com/p/webm/issues/detail?id=924 The SECTION_RODATA macro was modified to support macho32 format. The sub-pixel functions were modified to pass in 2 more parameters to handle the global offsets for PIC build. Change-Id: I3bfcd336bcae945edf300bca4ab40376a2628cd4
2014-12-22	Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value.""	Jingning Han
	This reverts commit 9946ee23e0a4c158e26a505b162a072f81b8a3be. Fix the ssse3 asm function. Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07
2014-12-19	Revert "Removal of legacy zbin_extra / zbin_oq_value."	Paul Wilkins
	This reverts commit e9b586e21bb899e247346e82bccf5afb42604910. Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4
2014-12-18	Removal of legacy zbin_extra / zbin_oq_value.	Paul Wilkins
	zbin extra / zbin_oq_value was widely passed around, hence removal touches a lot of code. Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5
2014-12-08	Merge "Changes to assembler for NASM on mac."	James Zern

2014-12-03	sse2 visual studio build fix	Deb Mukherjee
	Change-Id: Id8c8c3be882bcd92afea3ccec6ebdf3f208d28ef
2014-12-03	Enable non-rd mode coding on key frame, for speed 6.	Marco
	For key frame at speed 6: enable the non-rd mode selection in speed setting and use the (non-rd) variance_based partition. Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames), mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16. Loss in key frame quality (~0.6-0.7dB) compared to rd coding, but speeds up key frame encoding by at least 6x. Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6. Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
2014-12-02	Added high bitdepth sse2 transform functions	Peter de Rivaz
	Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba) (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3) (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
2014-11-24	Changes to assembler for NASM on mac.	John Stark
	fixes non-Apple nasm part of issue #755 Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a
2014-11-21	Merge "Added highbitdepth sse2 acceleration for quantize"	Debargha Mukherjee

2014-11-19	Added highbitdepth sse2 acceleration for quantize	Peter de Rivaz
	Also includes block error. (This patch is mostly cherry picked from commit db7192e0b014a331a1dcb102c8a1148e9f0e1081) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78
2014-11-19	Enable ssse3 version of vp9_fdct8x8_quant	Jingning Han
	It improves the speed performance of vp9_fdct8x8_quant_sse2 by about 5%. Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097
2014-11-18	Combine fdct8x8 and quantization process	Jingning Han
	This commit reworks the forward transform and quantization process for 8x8 block coding. It combines the two operations in a single function to save a store/load stage of the original transform coefficients. Overall the speed -6 is slightly faster (around 1% range). The compression performance of speed -6 is improved by 3.4%. Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
2014-11-18	Add sse2 version for vp9_quantize_fp	Jingning Han
	vp9_quantize_fp is the quantization process used by rtc coding mode. This commit adds a sse2 implementation of it. The implementation is modified based on vp9_quantize_b_sse2. No speed difference from ssse3 version. Change-Id: I24949c5b27df160b4f35117d28858d269454e64a
2014-11-14	Added sse2 acceleration for highbitdepth variance	Peter de Rivaz
	Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit d7422b2b1eb9f0011a8c379c2be680d6892b16bc) (cherry picked from commit 6d741e4d76a7d9ece69ca117d1d9e2f9ee48ef8c)
2014-11-12	Added highbitdepth sse2 SAD acceleration and tests	Peter de Rivaz
	Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f (cherry picked from commit b1a6f6b9cb47eafe0ce86eaf0318612806091fe5)
2014-11-05	Fix visual studio 2013 compiler warnings	Yaowu Xu
	For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
2014-10-28	vp9_denoiser_sse2: refactor the code.	JackyChen
	Combined vp9_denoiser_8xM_sse2 and vp9_denoiser_4xM_sse2 into one function vp9_denoiser_NxM_sse2_small and passed the bitexact testing. Changed the name of the function vp9_denoiser_64_32_16xM_sse2 to vp9_denoiser_NxM_sse2_big. Change-Id: Ib22478df585994dd347ebae04202c0b701e7f451
2014-10-22	Merge "vp9_denoiser_sse2.c: improve code style."	JackyChen

2014-10-22	vp9_denoiser_sse2.c: improve code style.	JackyChen
	denoiser_sse2.c: fix typos in comment. Change-Id: Ic0fb102331b0e533c058da3cab1fbc30de9a0070
2014-10-20	Merge "SAD32xh and SAD64xh for AVX2"	Yunqing Wang

2014-10-19	SAD32xh and SAD64xh for AVX2	levytamar82
	All sad function that process above 32 consecutive elements are optimized for AVX2: vp9_sad64x64 vp9_sad64x32 vp9_sad32x64 vp9_sad32x32 vp9_sad32x16 vp9_sad64x64_avg vp9_sad64x32_avg vp9_sad32x64_avg vp9_sad32x32_avg vp9_sad32x16_avg The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64 vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90% both of them gave and overall ~2.3% user level gain Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
2014-10-17	vp9_denoiser_sse2.c: solve windows build error.	JackyChen
	Change-Id: Ib5df91c8580d5dbeb0b3554edc9c2ca906ba4c4d
2014-10-17	Merge "vp9_denoiser_sse2.c: eliminate gcc warnings"	James Zern

2014-10-17	vp9_denoiser_sse2.c: eliminate gcc warnings	JackyChen
	Change-Id: I5f63f48e11e31ea9951223c5b18f42a2471e4560
2014-10-14	Add a 32-bit friendly sse2 quantizer.	Alex Converse
	This is based on the 64-bit ssse3 quantizer. 1.1x speedup for screen content at speed 7. Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448
2014-10-10	vp9_avg_intrin_sse2: correct intrinsics include	James Zern
	immintrin.h -> emmintrin.h fixes build where newer intrinsics are unavailable Change-Id: I79311b39bfa782fc2abeb45884ecb417050cb9f8
2014-10-07	experimental : partition using 1/8 x 1/8 image	Jim Bankoski
	The concept: There's too much noise in source pixels for variance and at low bitrate the reconstructed looks nothing like the source so we have problems getting good partitionings with either. This skirts the issue by using a box blur scaled down version for variance calculations. To compare against source_var_ moved keyframe to be rd based like source_var. Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
2014-10-06	Add SSE2 code and unit test for VP9 denoiser.	JackyChen
	This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are only 16x16 blocks in denoiser, while in VP9, there are 13 different block sizes. By adding this SSE2 code, the improvement of encoder speed is around 20%(using C code vs using SSE2 code), vary for different clips. The unit test for VP9 denoiser is to confirm that the SSE2 code is bit-exact with the C code. The unit test covers all block size. Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
2014-09-06	Replacing vp9_get_mb_ss_sse2 asm implementation with intrinsics.	Dmitry Kovalev
	Change-Id: Ib4f5dd733eb2939b108070a01e83da5d9990bac0