Age | Commit message (Collapse) | Author |
|
This patch puts in an adjustment to the maximum gf/arf
interval based on the active q range. It sets a fixed
baseline maximum of 16 but can drop this down to 12 at
lower q. This required some re-ordering in the first pass
code to insure we have a Q range estimate before defining
the first gf sequence.
The main gains seed are int he STD hd set on 50fps clips
where previously the interval could rise as high as 25.
On the std hd clip the gains are around 2.8% with limit set
to 300 frames.
When combined with the one shot rate control flags we get
combined of:
derf 1.55% (limit300), yt 7.25%, hd 5.17% std-hd 5.84% (limit300)
Change-Id: Ib380d51354511f2ff0f171a8df4e74291c0421f9
|
|
The automatic merge result was incomplete.
Change-Id: I8976318bfc346d867660a013a302c80edb25fc29
|
|
|
|
Remove the temporary branch count arrays and build the adapted probabilities
while walking the tree. Gives an additional 1.5% or so on CIF.
Change-Id: I875d61e5e0ec778e5d2f7f9d0837b989a91cf3a3
|
|
|
|
|
|
Adds a check to exit from the increment_nmv_count function when the
increment is 0.
Change-Id: I99c1e342d351f7800e23590f9c2419881bf1d708
|
|
The previous implementation visited each node in the tree multiple times
because it used each symbol's encoding to revisit the branches taken and
increment its count. Instead, we can traverse the tree depth first and
calculate the probabilities and branch counts as we walk back up. The
complexity goes from somewhere between O(nlogn) and O(n^2) (depending on
how balanced the tree is) to O(n).
Only tested one clip (256kbps, CIF), saw 13% decoding perf improvement.
Note that this optimization should port trivially to VP8 as well. In VP8,
the decoder doesn't use this function, but it does routinely show up
on the profile for realtime encoding.
Change-Id: I4f2848e4f41dc9a7694f73f3e75034bce08d1b12
|
|
Adds probability updates for extra bits for the nzcs, code for
getting nzc stats, plus some minor cleanups and fixes.
Change-Id: If2814e7f04fb52f5025ad9f400f3e6c50a00b543
|
|
experimental
|
|
|
|
Added SSE2 idct4_1d which is called by vp9_short_iht4x4. Also,
modified the parameter type passed to vp9_short_iht functions to
make it work with rtcd prototype.
Change-Id: I81ba7cb4db6738f1923383b52a06deb760923ffe
|
|
|
|
|
|
Increase the motion search range by 4x. Change MV_CLASS tree of the
entropy coding to allow two additional mv classes to cover the
extended motion vector limit. The codec determines the effective
motion search range conditioned on the actual frame dimension.
It provides coding gains:
stdhd 0.39%
yt 0.56%
hd 0.47%
Major coding performance gains are packed in several sequences with
intense motion activities, e.g., ped_1080p gains 7% at high bit-rates,
and on average 3%.
TODO: Need to further tune the rate control and motion search units.
Change-Id: Ib842540a6796fbee5a797809433ef6a477c6d78d
|
|
Also enable tx_select for keyframes.
Change-Id: Iadb1231d9fa7af0c8dce3d9b41830b93a302479e
|
|
Optimized adding constant diff to predictor, which gave about
2% decoder performance gain.
Change-Id: I47db20c31428e8c4a8f16214a85cbe386a6e9303
|
|
|
|
This was done based on John's suggestion.
Change-Id: I62516a513c31fe3dbea0d6cd063df79d9e819ec8
|
|
Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08
|
|
Change-Id: Ic9b336486774c95ffbb92adcb110cc0fc2a83cc5
|
|
This also changes the RD search to take account of the correct block
index when searching (this is required for ADST positioning to work
correctly in combination with tx_select).
Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
|
|
Yaowu found this function had a compiling issue with MSVC because
of using _mm_storel_pi((__m64 *)(dest + 0 * stride), (__m128)p0).
To be safe, changed back to use integer store instruction.
Also, for some build, diff could not always be 16-byte aligned.
Changed that in the code.
Change-Id: I9995e5446af15dad18f3c5c0bad1ae68abef6c0d
|
|
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.
STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.
TODO:
Training the default probs/counts for nzcs
Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
|
|
|
|
|
|
Added a variant of the one shot maxQ flag
for two pass that forces a fixed Q for the
normal inter frames. Disabled by default.
Also small adjustment to the Bits per MB
estimation.
Change-Id: I87efdfb2d094fe1340ca9ddae37470d7b278c8b8
|
|
|
|
Optimized adding diff to predictor, which gave 0.8% decoder
performance gain.
Change-Id: Ic920f0baa8cbd13a73fa77b7f9da83b58749f0f8
|
|
Removing redundant 'extern' keywords, fixing formatting and #include order,
code simplification.
Change-Id: I0e5fdc8009010f3f885f13b5d76859b9da511758
|
|
* changes:
vpxenc: actually report mismatch on stderr.
Make superblocks independent of macroblock code and data.
|
|
experimental
|
|
Because ctx->err is not set in that case, it will not report the error
on stderr.
Change-Id: Ifacbf5a03e676fd56522b03c0281d6c723c563ee
|
|
Split macroblock and superblock tokenization and detokenization
functions and coefficient-related data structs so that the bitstream
layout and related code of superblock coefficients looks less like it's
a hack to fit macroblocks in superblocks.
In addition, unify chroma transform size selection from luma transform
size (i.e. always use the same size, as long as it fits the predictor);
in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
transform will now use the 16x16 (instead of the 8x8) chroma transform,
and 64x64 superblocks using the 32x32 luma transform will now use the
32x32 (instead of the 16x16) chroma transform.
Lastly, add a trellis optimize function for 32x32 transform blocks.
HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
a few negative points here and there that I might want to analyze
a little closer.
Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
|
|
|
|
|
|
collision." into experimental
|
|
Change-Id: I5637d491eb6a9b7633f72e03fd9df72131eeb121
|
|
Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder
performance.
Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000
|
|
Fixed a couple of variable/function definitions, as well as header
handling to support 16K sequence coding at high bit-rates.
The width and height are each specified by two bytes in the header.
Use an extra byte to explicitly indicate the scaling factors in
both directions, each ranging from 0 to 15.
Tested coding up to 16400x16400 dimension.
Change-Id: Ibc2225c6036620270f2c0cf5172d1760aaec10ec
|
|
* changes:
Add unit test for x4 multi-SAD functions
Add VP9 1 block SAD functions to unit test
Merge master branch into experimental
|
|
Update the function prototypes to match between VP9 and VP8.
Change-Id: If58965073989e87df3b62b67a030ec6ce23ca04f
|
|
Change-Id: I06b5ba5c457944cfa4cd9f53c3bd8cda132439c2
|
|
Change-Id: Iab0176f058045181821ded95ff1cf423af1625f9
|
|
Removing redundant 'extern' keyword, lowercase variable names.
Change-Id: I608e8d8579aba8981f5fac3493f77b4481b13808
|
|
Change-Id: I7977694223521404fc69f29ae2cff03e36e87299
|
|
Picks up some build system changes, compiler warning fixes, etc.
Change-Id: I2712f99e653502818a101a72696ad54018152d4e
|
|
|
|
|
|
to be a fixed value of 15.
Test results:
cif: .124%, .068%, .081%
std-hd: 2.809%, 3.174%, 2.705%
Change-Id: I380c8152c973506094da15eab59e3aa22b75a983
|