Age | Commit message (Collapse) | Author |
|
|
|
If a ref frame is masked out, we do not need to do motion search for it.
It makes speed 0 a little faster.
Change-Id: I68f71255b2798b24fd1d5b28ed24a2ef87251413
|
|
Previously we often skip all compound inter prediction modes,
causing large coding loss. This patch modifies how we set the
ref_frame_skip_mask so that compound modes are considered in RDO.
This affects speed>=1.
Coding gains(overall psnr):
lowres midres hdres average
speed 1 0.54% 0.43% 0.64% 0.53%
speed 2 0.59% 0.48% 0.60% 0.56%
Tested encoding speed on 10 HD sequences, average speed loss is
5% for speed 1; 2% for speed 2.
Change-Id: Ib8758af7ee7c9812022bd21c5fe61631e2bb8e5c
|
|
Do full pixel MV search around all 3 MV candidates.
Coding gains for speed 0:
avg_psnr ovr_psnr ssim
lowres -0.088% -0.095% -0.117%
midres -0.175% -0.177% -0.148%
hdres -0.115% -0.146% -0.146%
Coding gains for speed 1:
avg_psnr ovr_psnr ssim
lowres -0.089% -0.104% -0.124%
midres -0.151% -0.171% -0.195%
hdres -0.110% -0.105% -0.132%
Tested encoding speed with speed 1 QP=30,40 over 10 midres sequences,
average speed loss is about 1%.
Change-Id: I9e6de035f4ed2e814e6494aefc2f84aae333a6b4
|
|
1: Lower rdmult used in trellis optimization
2: Shut off the end of block optimization that tries end of block
at every sub position if any of the coefficients are > 1.
3: Change the rounding and zbin factor according to sharpness.
4: Disable the skip block check that calculates RD using SSE from
predictor.
Change-Id: I247b61a26fa22f12f8b684e7cd6d4e368de7c3e4
|
|
Before this patch, pred_mv is used only when the
adaptive_motion_search speed feature is on(speed>=1).
This patch enables pred_mv for speed 0 as well.
Coding gains:
avg_psnr ovr_psnr ssim
lowres -0.31% -0.32% -0.38%
midres -0.37% -0.41% -0.42%
hdres -0.30% -0.31% -0.29%
Tested encoding speed over 18 midres sequences with QP=40. The
overall speed loss is about 0.6%.
Change-Id: I8987e9efb5a70d2bf8779fc2a43838009f9bbd8a
|
|
Do some extra full pixel search to improve motion vector quality.
Currently it is enabled for speed 1 only; disabled for real time mode.
Coding gain for speed 1:
avg_psnr ovr_psnr ssim
lowres -0.23% -0.23% -0.35%
midres -0.33% -0.35% -0.38%
hdres -0.28% -0.29% -0.28%
Tested encoding time over 10 HD sequences. Overall speed overhead is
1.5% for QP=30; 0.6 % for QP=40.
Change-Id: Ic2ea4d78c4979de9d5090c9d7c702944f155f8af
|
|
Change-Id: I1c62e51f5ccd33ff74abc3385410525bcae2fedd
|
|
Add a speed feature to prune reference frames for rectangular
partitions. Rectangular partition RD search happens after square
partition RD search. With this feature, we keep record of the ref
frames picked by square partitions, and only consider those ref
frames during rect partition RD search.
With this feature on, the computation cost of rect partition RD
search is greatly reduced, so we can afford to skip rect partition
RD search less aggressively.
Overall, both compression and encoding speed are improved. Only
speed 0 is affected.
Coding gains:
lowres midres hdres
ovr psnr 0.00% -0.36% -0.37%
avg psnr 0.00% -0.36% -0.36%
Tested encoding speed with QP=40 on about 30 sequences.
Speed gains:
lowres midres hdres
average 13.4% 7.1% 6.1%
max 28.0% 12.0% 9.8%
Change-Id: Id5f36dd2ac75028ae98550d67b0a524aa251b692
|
|
|
|
The compression performance change is +/-0.01% for both speed 0/1.
Locally tested the encoding speed:
ped_1080p 150 frames speed 0
79544 b/f 41.339 dB 503072 ms ->
79566 b/f 41.338 dB 493009 ms.
speed 1
79789 b/f 41.152 dB 104583 ms ->
79770 b/f 41.153 dB 102607 ms
Change-Id: Ief200b613608643e5708cebe979982eb4a84831b
|
|
This commit replace a hard coded macro with a macro defined by
a configure command.
Change-Id: Ib31354d61865314ed43e2c429c72b4ef2c8fa2a7
|
|
Change-Id: I094bca857f0fc2c067a4d08d1b36370fe61c25aa
|
|
When eob is 0, pixel domain distortion is more accurate and efficient.
This mainly affects speed >= 2. Speed 0 always use pixel domain
distortion; speed 1 use it most of the time.
Compression impact(negative means gain):
speed 2 speed 3 speed 4
lowres -0.04% -0.06% -0.06%
midres -0.10% -0.10% -0.20%
hdres -0.01% -0.03% -0.06%
Encoding speed is about neutral.
Change-Id: I77b957658deeaad57381fd13afc11bacdec8c08f
|
|
To save a branch.
Change-Id: Ifa2be7583e95c6991784731c654bbd4cce31e993
|
|
|
|
If block size is larger than 32x32, search transform size for one level
less than the other blocks.
This mainly affects speed 0 and 1, as speed >= 2 uses largest transform
size(except for keyframes and alt-ref frames).
Compression(negative means gain):
speed 0 speed 1
lowres -0.007% 0.00%
midres 0.023% -0.011%
hdres 0.002% -0.016%
Encoder speed:
Tested on crowd_run_1080p 30 frames
Fixed QP = 30, speed 0: 582.5s -> 564.6s
speed 1: 75.0s -> 73.3s
Change-Id: I46622efafe0e88d502efa1480a5324ead1d1e8d0
|
|
|
|
|
|
Instead of doing it in every transform search loop.
Change-Id: I12dc402a6633d1a27d32cb6b58710b8c0ebf0fd4
|
|
This function is redundant.
Change-Id: I7651fc34787c09e59cb1366495f6b525dec8510d
|
|
Set the max depth as 2 for speed 0.
Compression(negative means gain):
speed 0 speed 1
lowres -0.01% 0.00%
midres 0.05% -0.01%
hdres -0.01% 0.01%
Encoding speed gain:
Tested on crowd_run_1080p 30 frames
Fixed QP = 20, speed 0: 669.7s -> 656.1s
speed 1: 104.5s -> 101.5s
Fixed QP = 40, speed 0: 440.7s -> 435.8s
speed 1: 47.7s -> 45.1s
Change-Id: I61bc13818c72317b9f1d596727d54a906b20c012
|
|
To reduce the memcpy() cycles in vp9_rd_pick_inter_mode_sb().
The maximum value of mode_map is (MAX_MODES - 1) = 29.
Change-Id: I5704bd66838ea0b075f0afb001f5cbebfd3f1602
|
|
Remove trailing commas to keep multiple elements on one line.
Add blank lines to prevent comments from being treated as blocks.
clang-format guards for struct with a comment in the middle.
Change-Id: I3bcb8313ae8aaf69179249a13b4087b1272cdbc0
|
|
For new VP9 only content type adjust the rate distortion and ARF
filter based on the relative spatial variance of the source and
reconstruction.
In regards to the RD loop the method favors modes where the
reconstruction variance is similar to the source variance. However it
is currently only applied to regions where the source variance is quite
low.
For very low variance blocks it applies a further bias against intra
coding and large prediction block sizes (the later in particular limit
the usefulness of the loop filter).
The final part of this change is to lower the strength of the ARF
filter for blocks where the source has very low spatial variance, to
encourage some low amplitude texture or noise to pass through
the filter.
This change improves the retention of film grain and fine noise /
texture in spatially flat regions, but as expected causes a significant
drop in PSNR on many clips. This is to be expected because similar
but misaligned noise or texture will give a lower PSNR than a flat
noise free reconstruction. However, it is worth noting that most clips
show a strong gain in FAST SSIM.
The features are enabled on the vpxenc command line by setting
--tune-content=film.
VPX_ENCODER_ABI_VERSION bumped for this change and cvbr.
Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a
|
|
The intra mode rd penalty was implemented as a rate penalty.
Code was added to scale the penalty according to block size but
this was not done correctly for the SB level or sub 8x8.
The code did a weird double scaling in regard to bit depth that
has been removed. Given that it is a rate penalty the bit depth
should not matter.
This bug fix improves average metrics on our standard test
sets by about 0.1%
Change-Id: I7cf81b66aad0cda389fe234f47beba01c7493b1e
|
|
so that the convolve functions are independent of table alignment.
Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
|
|
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.
Add assert() and comments regarding the usage of skip_block.
Removing the parameter is a fairly involved process so leave it be for
the moment.
Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
|
|
+ vpx_dsp/, test/
itxfm -> inv_txfm, ftxfm -> fwd_txfm
Change-Id: I3aacdb65143576d64cfe5c9b14dd358c17c1fe7e
|
|
txfm is more commonly used as an abbreviation through the codebase
Change-Id: I86fd90ef132468f9da270091c05daa1f5a49ece2
|
|
BUG=webm:1388
Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
|
|
BUG=webm:1388
Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
|
|
|
|
|
|
BUG=webm:1388
Change-Id: I7ee32e0c08f0fb41712a8cc640b2c5bba872421d
|
|
BUG=webm:1388
Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
|
|
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
|
|
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
|
|
To prevent the motion vector out of range bug, added a motion vector unit
test in VP9. In the 4k video encoding, always forced to use extreme motion
vectors and also encouraged to use INTER modes. In the decoding, checked if
the motion vector was valid, and also checked the encoder/decoder mismatch.
The tests showed that this unit test could reveal the issue we saw before.
Change-Id: I0a880bd847dad8a13f7fd2012faf6868b02fa3b4
|
|
the current implementations expect pixel size, not the block type
BUG=webm:1392
Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
|
|
|
|
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.
After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.
Re-did the quality/speed test to check the impact of the fix.
1. Borg test BDRATE result:
4k set: PSNR: +0.183%; SSIM: +0.100%;
hdres set: PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;
2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.
The result is in line with the original result.
Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
|
|
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.
Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
|
|
|
|
Saves 2688 bytes of rodata.
Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56
|
|
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.
Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
|
|
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.
In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.
Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
|
|
Change-Id: I4f5d6b54018bd1928cd9e5e42619e6f55b334803
|
|
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
|
|
|