libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2018-08-29	Merge "Skip unnecessary motion search"	Hui Su

2018-08-28	Skip unnecessary motion search	Hui Su
	If a ref frame is masked out, we do not need to do motion search for it. It makes speed 0 a little faster. Change-Id: I68f71255b2798b24fd1d5b28ed24a2ef87251413
2018-08-22	Rework the ref_frame_skip_mask feature in RDO	Hui Su
	Previously we often skip all compound inter prediction modes, causing large coding loss. This patch modifies how we set the ref_frame_skip_mask so that compound modes are considered in RDO. This affects speed>=1. Coding gains(overall psnr): lowres midres hdres average speed 1 0.54% 0.43% 0.64% 0.53% speed 2 0.59% 0.48% 0.60% 0.56% Tested encoding speed on 10 HD sequences, average speed loss is 5% for speed 1; 2% for speed 2. Change-Id: Ib8758af7ee7c9812022bd21c5fe61631e2bb8e5c
2018-08-16	Improve enhanced_full_pixel_motion_search	Hui Su
	Do full pixel MV search around all 3 MV candidates. Coding gains for speed 0: avg_psnr ovr_psnr ssim lowres -0.088% -0.095% -0.117% midres -0.175% -0.177% -0.148% hdres -0.115% -0.146% -0.146% Coding gains for speed 1: avg_psnr ovr_psnr ssim lowres -0.089% -0.104% -0.124% midres -0.151% -0.171% -0.195% hdres -0.110% -0.105% -0.132% Tested encoding speed with speed 1 QP=30,40 over 10 midres sequences, average speed loss is about 1%. Change-Id: I9e6de035f4ed2e814e6494aefc2f84aae333a6b4
2018-08-14	Make Sharpness parameter affect visual sharpness	Jim Bankoski
	1: Lower rdmult used in trellis optimization 2: Shut off the end of block optimization that tries end of block at every sub position if any of the coefficients are > 1. 3: Change the rounding and zbin factor according to sharpness. 4: Disable the skip block check that calculates RD using SSE from predictor. Change-Id: I247b61a26fa22f12f8b684e7cd6d4e368de7c3e4
2018-08-09	Use the pred_mv feature for speed 0	Hui Su
	Before this patch, pred_mv is used only when the adaptive_motion_search speed feature is on(speed>=1). This patch enables pred_mv for speed 0 as well. Coding gains: avg_psnr ovr_psnr ssim lowres -0.31% -0.32% -0.38% midres -0.37% -0.41% -0.42% hdres -0.30% -0.31% -0.29% Tested encoding speed over 18 midres sequences with QP=40. The overall speed loss is about 0.6%. Change-Id: I8987e9efb5a70d2bf8779fc2a43838009f9bbd8a
2018-08-07	Add enhanced_full_pixel_motion_search feature	Hui Su
	Do some extra full pixel search to improve motion vector quality. Currently it is enabled for speed 1 only; disabled for real time mode. Coding gain for speed 1: avg_psnr ovr_psnr ssim lowres -0.23% -0.23% -0.35% midres -0.33% -0.35% -0.38% hdres -0.28% -0.29% -0.28% Tested encoding time over 10 HD sequences. Overall speed overhead is 1.5% for QP=30; 0.6 % for QP=40. Change-Id: Ic2ea4d78c4979de9d5090c9d7c702944f155f8af
2018-07-24	Fix typos in txfm_rd_in_plane()	Hui Su
	Change-Id: I1c62e51f5ccd33ff74abc3385410525bcae2fedd
2018-07-23	Add prune_ref_frame_for_rect_partitions feature	Hui Su
	Add a speed feature to prune reference frames for rectangular partitions. Rectangular partition RD search happens after square partition RD search. With this feature, we keep record of the ref frames picked by square partitions, and only consider those ref frames during rect partition RD search. With this feature on, the computation cost of rect partition RD search is greatly reduced, so we can afford to skip rect partition RD search less aggressively. Overall, both compression and encoding speed are improved. Only speed 0 is affected. Coding gains: lowres midres hdres ovr psnr 0.00% -0.36% -0.37% avg psnr 0.00% -0.36% -0.36% Tested encoding speed with QP=40 on about 30 sequences. Speed gains: lowres midres hdres average 13.4% 7.1% 6.1% max 28.0% 12.0% 9.8% Change-Id: Id5f36dd2ac75028ae98550d67b0a524aa251b692
2018-05-16	Merge "Use the updated best rd cost for transform block search"	Jingning Han

2018-05-15	Use the updated best rd cost for transform block search	Jingning Han
	The compression performance change is +/-0.01% for both speed 0/1. Locally tested the encoding speed: ped_1080p 150 frames speed 0 79544 b/f 41.339 dB 503072 ms -> 79566 b/f 41.338 dB 493009 ms. speed 1 79789 b/f 41.152 dB 104583 ms -> 79770 b/f 41.153 dB 102607 ms Change-Id: Ief200b613608643e5708cebe979982eb4a84831b
2018-05-14	Make a config time flag	Yaowu Xu
	This commit replace a hard coded macro with a macro defined by a configure command. Change-Id: Ib31354d61865314ed43e2c429c72b4ef2c8fa2a7
2018-05-14	Fixes for consistent encoding across recodes of a frame	Ranjit Kumar Tulabandu
	Change-Id: I094bca857f0fc2c067a4d08d1b36370fe61c25aa
2018-05-04	Don't use transform domain distortion when eob is 0	Hui Su
	When eob is 0, pixel domain distortion is more accurate and efficient. This mainly affects speed >= 2. Speed 0 always use pixel domain distortion; speed 1 use it most of the time. Compression impact(negative means gain): speed 2 speed 3 speed 4 lowres -0.04% -0.06% -0.06% midres -0.10% -0.10% -0.20% hdres -0.01% -0.03% -0.06% Encoding speed is about neutral. Change-Id: I77b957658deeaad57381fd13afc11bacdec8c08f
2018-05-01	Clean switch cases in vp9 encoder	Linfeng Zhang
	To save a branch. Change-Id: Ifa2be7583e95c6991784731c654bbd4cce31e993
2018-04-26	Merge "Do one level less of transform search for large blocks"	Hui Su

2018-04-25	Do one level less of transform search for large blocks	Hui Su
	If block size is larger than 32x32, search transform size for one level less than the other blocks. This mainly affects speed 0 and 1, as speed >= 2 uses largest transform size(except for keyframes and alt-ref frames). Compression(negative means gain): speed 0 speed 1 lowres -0.007% 0.00% midres 0.023% -0.011% hdres 0.002% -0.016% Encoder speed: Tested on crowd_run_1080p 30 frames Fixed QP = 30, speed 0: 582.5s -> 564.6s speed 1: 75.0s -> 73.3s Change-Id: I46622efafe0e88d502efa1480a5324ead1d1e8d0
2018-04-25	Merge "Calculate transform size cost once per frame"	Hui Su

2018-04-25	Merge "Add speed feture to control tx size search depth"	Hui Su

2018-04-24	Calculate transform size cost once per frame	Hui Su
	Instead of doing it in every transform search loop. Change-Id: I12dc402a6633d1a27d32cb6b58710b8c0ebf0fd4
2018-04-23	Remove get_tx_probs2()	Hui Su
	This function is redundant. Change-Id: I7651fc34787c09e59cb1366495f6b525dec8510d
2018-04-23	Add speed feture to control tx size search depth	Hui Su
	Set the max depth as 2 for speed 0. Compression(negative means gain): speed 0 speed 1 lowres -0.01% 0.00% midres 0.05% -0.01% hdres -0.01% 0.01% Encoding speed gain: Tested on crowd_run_1080p 30 frames Fixed QP = 20, speed 0: 669.7s -> 656.1s speed 1: 104.5s -> 101.5s Fixed QP = 40, speed 0: 440.7s -> 435.8s speed 1: 47.7s -> 45.1s Change-Id: I61bc13818c72317b9f1d596727d54a906b20c012
2018-03-28	Shrink size of mode_map in struct TileDataEnc	Linfeng Zhang
	To reduce the memcpy() cycles in vp9_rd_pick_inter_mode_sb(). The maximum value of mode_map is (MAX_MODES - 1) = 29. Change-Id: I5704bd66838ea0b075f0afb001f5cbebfd3f1602
2018-01-18	clang-format v5.0.0 vp9/	Johann
	Remove trailing commas to keep multiple elements on one line. Add blank lines to prevent comments from being treated as blocks. clang-format guards for struct with a comment in the middle. Change-Id: I3bcb8313ae8aaf69179249a13b4087b1272cdbc0
2017-11-13	New content type to improve grain retention.	paulwilkins
	For new VP9 only content type adjust the rate distortion and ARF filter based on the relative spatial variance of the source and reconstruction. In regards to the RD loop the method favors modes where the reconstruction variance is similar to the source variance. However it is currently only applied to regions where the source variance is quite low. For very low variance blocks it applies a further bias against intra coding and large prediction block sizes (the later in particular limit the usefulness of the loop filter). The final part of this change is to lower the strength of the ARF filter for blocks where the source has very low spatial variance, to encourage some low amplitude texture or noise to pass through the filter. This change improves the retention of film grain and fine noise / texture in spatially flat regions, but as expected causes a significant drop in PSNR on many clips. This is to be expected because similar but misaligned noise or texture will give a lower PSNR than a flat noise free reconstruction. However, it is worth noting that most clips show a strong gain in FAST SSIM. The features are enabled on the vpxenc command line by setting --tune-content=film. VPX_ENCODER_ABI_VERSION bumped for this change and cvbr. Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a
2017-09-08	Fix bug in intra mode rd penalty.	paulwilkins
	The intra mode rd penalty was implemented as a rate penalty. Code was added to scale the penalty according to block size but this was not done correctly for the SB level or sub 8x8. The code did a weird double scaling in regard to bit depth that has been removed. Given that it is a rate penalty the bit depth should not matter. This bug fix improves average metrics on our standard test sets by about 0.1% Change-Id: I7cf81b66aad0cda389fe234f47beba01c7493b1e
2017-09-05	Remove get_filter_base() and get_filter_offset() in convolve	Linfeng Zhang
	so that the convolve functions are independent of table alignment. Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
2017-08-21	Remove skip_block from quantize	Johann
	This condition is handled before this code is reached. The ssse3 version of the function has always crashed when attempting to handle the skip_block condition. Add assert() and comments regarding the usage of skip_block. Removing the parameter is a fairly involved process so leave it be for the moment. Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
2017-07-06	cosmetics,vp9/: normalize inv/fwd_txfm naming	James Zern
	+ vpx_dsp/, test/ itxfm -> inv_txfm, ftxfm -> fwd_txfm Change-Id: I3aacdb65143576d64cfe5c9b14dd358c17c1fe7e
2017-06-29	cosmetics,vp9/encoder: s/txm/txfm/	James Zern
	txfm is more commonly used as an abbreviation through the codebase Change-Id: I86fd90ef132468f9da270091c05daa1f5a49ece2
2017-05-03	Update highbd idct functions arguments to use uint16_t dst	Linfeng Zhang
	BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03	Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct	Linfeng Zhang
	BUG=webm:1388 Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
2017-05-01	Merge "Clean vp9_highbd_build_inter_predictor() and highbd_inter_predictor()"	Linfeng Zhang

2017-04-26	Merge "Make the row based multi-threaded encoder deterministic"	Yunqing Wang

2017-04-25	Clean vp9_highbd_build_inter_predictor() and highbd_inter_predictor()	Linfeng Zhang
	BUG=webm:1388 Change-Id: I7ee32e0c08f0fb41712a8cc640b2c5bba872421d
2017-04-25	Update highbd convolve functions arguments to use uint16_t src/dst	Linfeng Zhang
	BUG=webm:1388 Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-24	Make the row based multi-threaded encoder deterministic	Yunqing Wang
	This patch followed allow_exhaustive_searches feature modification and continued to modify the encoder to achieve the determinism in the row based multi-threaded encoding. While row-mt = 1 and using multiple threads, the adaptive feature in encoder was disabled, which gave BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%), but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at speed 2). These speed losses were acceptable considering the speed gains obtained from row-mt. Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
2017-04-19	Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve	Linfeng Zhang
	Replace by CAST_TO_BYTEPTR/SHORTPTR. The rule is: if a short ptr is casted to a byte ptr, any offset operation on the byte ptr must be doubled. We do this by casting to short ptr first, adding offset, then casting back to byte ptr. BUG=webm:1388 Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-06	VP9 motion vector unit test	Yunqing Wang
	To prevent the motion vector out of range bug, added a motion vector unit test in VP9. In the 4k video encoding, always forced to use extreme motion vectors and also encouraged to use INTER modes. In the decoding, checked if the motion vector was valid, and also checked the encoder/decoder mismatch. The tests showed that this unit test could reveal the issue we saw before. Change-Id: I0a880bd847dad8a13f7fd2012faf6868b02fa3b4
2017-03-22	vp9_rdopt: correct size to vpx_sum_squares_2d_i16	James Zern
	the current implementations expect pixel size, not the block type BUG=webm:1392 Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
2017-03-20	Merge "Record the sum of tx block eobs in the partition block"	Yunqing Wang

2017-03-20	Record the sum of tx block eobs in the partition block	Yunqing Wang
	The sum of tx bloxk eobs is needed in the machine learning based partition early termination. The eobs are first accumulated during tx search, and then the value associated with the best tx_size is copied to ctx for later use. After the sum of eobs are calculated correctly, re-enabled ml_partition_search_early_termination speed feature. Re-did the quality/speed test to check the impact of the fix. 1. Borg test BDRATE result: 4k set: PSNR: +0.183%; SSIM: +0.100%; hdres set: PSNR: +0.168%; SSIM: +0.256%; midres set: PSNR: +0.186%; SSIM: +0.326%; 2.Average speed gain result: 4k clips: 21%; hd clips: 26%; midres clips: 15%. The result is in line with the original result. Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
2017-03-16	Add a vector form of routine vp9_model_rd_from_var_lapndz	Gabriel Marin
	Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb to model the rate and distortion for MAX_MB_PLANE Laplacian sources in parallel. The caller ensures that all sources have non-zero variance. Measured a 18% to 25% reduction in retired instructions, and 17% to 24% reduction in instruction execution cost with different compilers for the Laplacian modeling. No change in behavior. TEST=Verified that encoded files match bit for bit, with and without this change. BUG=b/33678225 Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
2017-03-03	Merge "Narrow cat6_high_cost tables to uint16_t"	Alex Converse

2017-03-03	Narrow cat6_high_cost tables to uint16_t	Alex Converse
	Saves 2688 bytes of rodata. Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56
2017-02-27	vp9: Rename new_mt to row_mt	Vignesh Venkatasubramanian
	new_mt is a very generic name that will get obsolete soon enough. Since this is exposed as a codec control, renaming it to row_mt to signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH codec control to ROW_MT_BIT_EXACT. Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
2017-02-24	consolidate block_error functions	Johann
	vp9_highbd_block_error_8bit_c was a very simple wrapper around vp9_block_error_c. The SSE2 implemention was practically identical to the non-HBD one. It was missing some minor improvements which only went into the original version. In quick speed tests, the AVX implementation showed minimal improvement over SSE2 when it does not detect overflow. However, when overflow is detected the function is run a second time. The OperationCheck test seems to trigger this case and reverses any speed benefits by running ~60% slower. AVX2 on the other hand is always 30-40% faster. Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
2017-02-16	Structured the mode ordering code to avoid redundant memcpy	Ranjit Kumar Tulabandu
	Change-Id: I4f5d6b54018bd1928cd9e5e42619e6f55b334803
2017-02-15	Row based multi-threading of encoding stage	Ranjit Kumar Tulabandu
	(Yunqing Wang) This patch implements the row-based multi-threading within tiles in the encoding pass, and substantially speeds up the multi-threaded encoder in VP9. Speed tests at speed 1 on STDHD(using 4 tiles) set show that the average speedups of the encoding pass(second pass in the 2-pass encoding) is 7% while using 2 threads, 16% while using 4 threads, 85% while using 8 threads, and 116% while using 16 threads. Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
2017-02-01	Merge "Changes to facilitate row based multi-threading of ARNR filtering"	Yunqing Wang