Age | Commit message (Collapse) | Author |
|
1: Lower rdmult used in trellis optimization
2: Shut off the end of block optimization that tries end of block
at every sub position if any of the coefficients are > 1.
3: Change the rounding and zbin factor according to sharpness.
4: Disable the skip block check that calculates RD using SSE from
predictor.
Change-Id: I247b61a26fa22f12f8b684e7cd6d4e368de7c3e4
|
|
|
|
Reset segment to base (segment#0) on spatially flat
stationary blocks (source_variance = 0). Also increase
dc_skip threshold for these blocks.
Reduces artifacts on flat areas in screen content mode.
Change-Id: I7ee0c80d37536db7896fa74a83f75799f1dcf73d
|
|
Adapt the Lagrangian multipler based on the spatial variance in
the temporal dependency model. The functionality is disabled by
default. To turn on, set enable_tpl_model to 1.
Change-Id: I1b50606d9e2c8eb9c790c49eacc12c00d3d7c211
|
|
Add stats for past ARF usage, and use it to disable
ARF usage based on some conditions.
Overall improvement on ytlive set, reduces the regression
on the problem clips for this feature.
Only affects when sf->use_altref_onepass is enabled
(currently off by default).
Change-Id: I66267f227ea132dc86acb730e9882f85bead2cdb
|
|
When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).
Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04
|
|
When int_pro_motion_estimation is done for superblock in
choose_partitioning, use it to avoid the full_pixel_search
for NEWMV mode, if bsize is >= 32X32.
For speed > 7.
Small/neutral change on RTC metrics.
~1-2% speedup on arm on high motion clip.
Change-Id: I3cfe6833ff4bf75d4afa83eaf058ad45729de85b
|
|
+ vpx_dsp/, test/
itxfm -> inv_txfm, ftxfm -> fwd_txfm
Change-Id: I3aacdb65143576d64cfe5c9b14dd358c17c1fe7e
|
|
txfm is more commonly used as an abbreviation through the codebase
Change-Id: I86fd90ef132468f9da270091c05daa1f5a49ece2
|
|
BUG=webm:1388
Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
|
|
Add a low-variance high-sumdiff to the superblock content state
and use it to limit the mv and bias some decisions in non-rd pickmode.
Only affects speed >= 6.
Reduces artifact for lighting changes.
Small/no difference in metrics on RTC set.
Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b
|
|
A previous patch turned on allow_exhaustive_searches feature only for
FC_GRAPHICS_ANIMATION content. This patch further modified the feature
by removing the exhaustive search limit, and made it no longer adaptive.
As a result, the 2 counts that recorded the number of motion searches
were removed, which helped achieve the determinism in the row based
multi-threading encoding. Tests showed that this patch didn't cause
the encoder much slower.
Used exhaustive_searches_thresh for this speed feature, and removed
allow_exhaustive_searches. Also, refactored the speed feature code
to follow the general speed feature setting style.
Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa
|
|
Refactor to split the 1 passs source sad computation into scene
detection (currently used for VBR and screen-content mode), and
superblock based source sad computation (used in non-rd CBR mode).
This allows the source sad computation for CBR mode to be
multi-threaded.
No change in compression.
Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388
|
|
|
|
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.
Only enabled for speed >= 8.
avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.
Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
|
|
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.
After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.
Re-did the quality/speed test to check the impact of the fix.
1. Borg test BDRATE result:
4k set: PSNR: +0.183%; SSIM: +0.100%;
hdres set: PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;
2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.
The result is in line with the original result.
Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
|
|
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.
https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code
BUG=webm:1054
Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
|
|
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
|
|
(yunqingwang)
1. Rebased the patch. Incorporated recent first pass changes.
2. Turned on the first pass unit test.
Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee
|
|
Only for speed >= 7, and affects skipping of intra modes.
Threshold is set low for now, needs to be tuned.
Small/no difference in metrics on rtc clips.
Change-Id: If9bdbd43f08d1f80407cdd2e9e5e96780dcd2424
|
|
Previously Tx domain rd was used in all cases above speed 0.
Coefficient optimization was only enabled for best and speed 0.
This patch selectively sets these features at other speed settings
based on block complexity.
For the Netflix and HD sets in particular the quality gains are
large compared to the speed hit. At speed 1 the average psnr
gain in the NF set is > 2.5% with one clip coming in at 18%
and some points almost 30%. Average gains for the lower
resolution test sets are around 1%.
The gains are biggest at low Q so some further optimization
may be possible.
Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576
|
|
Change-Id: Ifebdc9ef37850508eb4b8e572fd0f6026ab04987
|
|
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
|
|
1. Skip golden non-zeromv and newmv-last for bsize >= 16x16 if the
temporal variance obtained from choose_partitioning is very low.
2. Skip horz and vert INTRA mode for speed 8.
This change works best on the clips with little noise and with some
motion (e.g. gips_motion which has > 5% speed up). PSNR drop is 1.78%
on rtc test set, no obvious visual quality regression found.
Change-Id: Ib43b5b20e67809d03c5a6890818ddff59e1fc94a
|
|
Skip intra-mode and some inter-modes (newmv, nearmv, nearestmv) for
golden frame if the variance got from choose_partitioning is very low.
Only for 1 pass real-time CBR mode and bsize >= 32x32, it has ~2.5%
speed up with less than 0.1% PSNR drop for rtc test set. Don't see
visual regression.
Change-Id: I70efbc95a1007231ae36f02c5b2fbf6cd35077ad
|
|
The bit to error transformation got doubled as a result of going from
8-bit to 9-bit costs (change d13385c).
Use defines to derive the scale numbers and comment some of the fields.
derf: -0.023 BDRATE
hevcmr: +0.067 BDRATE
stdhd: +0.098 BDRATE
(These are substantially smaller than than the original gains from 8 to
9 bit costing.)
Change-Id: I6a2b3b029b2f1415e4f90a05709b2333ec0eea9b
|
|
If a superblock contains alot of "skin" then force split
of 64x64 partition, and make some adjustments in mode selection.
This helps to reduce artifacts on moving face/skin areas at low bitrates.
Little/no change in metrics: avgPSNR/SSIM down by ~0.12%.
Small encoding time increase < 1%.
Change-Id: Ic57f52148c3716f391419fab0530d916e4c1d186
|
|
This change alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.
Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
|
|
This is using a define instead of an enum to keep byte packing.
Change-Id: I3abb07c8bfe377e19be4531b624af7b7b4207792
|
|
Change-Id: Iefc2eb78e71472ecf51802ec59ff32caef4bd0f4
|
|
to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
|
|
This commit allows the encoder to account for additional chroma
plane costs in the mode decision process, if the current block
potentially contains significant color change. It improves the
visual quality at very low bit-rates.
The compression performance of dark720p is improved by 12.39% in
speed 6. For jimred at 150 kbps, the PSNR of V component (red)
increased by 0.2 dB, at the expense of about 5% increase in
encoding time. Note that for sequences where the chroma components
are fairly consistent, the encoding time increase is negligible.
On average the rtc set compression performance is improved by
1.172% in PSNR and 1.920% in SSIM.
Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c
|
|
This reverts commit 9946ee23e0a4c158e26a505b162a072f81b8a3be.
Fix the ssse3 asm function.
Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07
|
|
This reverts commit e9b586e21bb899e247346e82bccf5afb42604910.
Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4
|
|
zbin extra / zbin_oq_value was widely passed around,
hence removal touches a lot of code.
Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5
|
|
The max_partition_size and max_partition_size are set at the
beginning while setting speed features, and then adjusted at
SB level. Moving them to mb struct ensures there is a local
copy for each thread.
Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4
|
|
This commit removes the cyclic aq mode dependency on
in_static_area and reworks the corresponding cut-off thresholds.
It improves the compression performance of speed -5 by 1.47% in
PSNR and 2.07% in SSIM, and the compression performance of speed
-6 by 3.10% in PSNR and 5.25% in SSIM. Speed wise, about 1% faster
in both settings at high bit-rates.
Change-Id: I1ffc775afdc047964448d9dff5751491ba4ff4a9
|
|
Uses highbd_ prefix convention consistently.
Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
|
|
This commit removes unused header file vp9_onyxc_int.h and repeatedly
included file vpx_ports/mem.h from vp9_block.h
Change-Id: I400b210bd1da48f1880bd50a8f4a6e2c690e15a1
|
|
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
|
|
This commit allows the encoder to store outcomes of single reference
frame modes and compares them to decide if the inter prediction
filter, forward transform, and quantization can be skipped.
The compression performance of speed 3 is down
derf -0.364%
stdhd -0.198%
For test sequences, the speed 3 runtime is reduced
highway CIF 100 kbps, 51976 ms -> 45033 ms, 13% speed-up
stockholm 720p 1000 kbps, 71826 ms -> 67838 ms, 5.5% speed-up
pedestrian 1080p 2000 kbps, 154924 ms -> 150702 ms, 2.6% speed-up
Change-Id: I5aa26f918d2b4b5197a2c0afa2779319f1c88e44
|
|
This commit extends the sse and forward transform computation flag
to support the case 64x64 blocks where there are 4 32x32 2D-DCT
blocks.
Change-Id: I86a3e805dfaa0f3abd812f590520c71aa0e40473
|
|
The mv cost table set is maintained at frame level, hence moved to
VP9_COMP.
Change-Id: Icb3d0185d47443590bd11357de729aa4ba5c5e5e
|
|
|
|
|
|
This commit enables encoder to select fast forward transform and
quantization path according to the prediction residual sse/variance,
in the rate-distortion optimization scheme.
Change-Id: Ief9fc3844fd4107166d401970e800c6e5ce2b5fe
|
|
Change-Id: Ieae182d72d625d0d3fd4ed7c7d24cb521a0f21b0
|
|
Change-Id: Idd1327852f0df0eab0ea3b33959f2b8292b77301
|
|
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.
The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.
Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
|
|
Before encoding a frame, calculate and store each 16x16 block's
variance of source difference between last and current frame.
Find partitioning threshold T for the frame from its variance
histogram, and then use T to make partition decisions.
Comparing with fixed 16x16 partitioning, rtc set test showed an
overall psnr gain of 3.242%, and ssim gain of 3.751%. The best
psnr gain is 8.653%.
The overall encoding speed didn't change much. It got faster for
some clips(for example, 12% speedup for vidyo1), and a little
slower for others.
Also, a minor modification was made in datarate unit test.
Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c
|