Age | Commit message (Collapse) | Author |
|
|
|
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.
In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.
Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
|
|
|
|
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.
Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.
BUG=webm:1371
Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
|
|
Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb
|
|
|
|
|
|
The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.
Only affects 1 pass CBR.
Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664
|
|
For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.
Disabled for now.
Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a
|
|
|
|
This prevent possible reduction of cyclic refresh after key frame.
Change-Id: Idd4e49b69cd95476e7eccfa31b2bd8669569e9e8
|
|
Only affects speed 8. No obvious quality regression. Systematic speed
ups by ~1% on Nexus 6.
Change-Id: Ia904ca28ea041c3281c532911ec38fb7d7f46a17
|
|
|
|
|
|
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.
Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.
Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
|
|
|
|
Re-enable the affected test.
BUG=webm:1374
Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
|
|
|
|
Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.
For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).
This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.
Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754
|
|
Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness
Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df
|
|
This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.
The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)
Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c
|
|
|
|
|
|
agressive -> aggressive
after:
ce7b38459 Aggressive VBR method.
Change-Id: Ie0f30b1bbc77ed9f32bec047b4a9b3d0cf4853f5
|
|
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.
scan is used for C code and iscan is used for SIMD implementations.
Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
|
|
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
|
|
* changes:
quantize_fp_32x32 highbd ssse3: enable existing function
quantize_fp highbd ssse3: use tran_low_t for coeff
quantize_fp highbd sse2: use tran_low_t for coeff
|
|
|
|
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
|
|
Change-Id: Id96a8df33354a7987ce890a3d6798c7375ffa4aa
|
|
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
|
|
Change-Id: I4f5d6b54018bd1928cd9e5e42619e6f55b334803
|
|
|
|
|
|
|
|
|
|
The weight segment needs to only be computed once per frame,
so remove it from the funciton vp9_cyclic_refresh_rc_bits_per_mb(),
which is called within a loop inside vp9_rc_regulate_q.
Change-Id: Ia0e18b89abb97e42c466d4dbc47700d7f76555db
|
|
Only affects real-time mode.
Change-Id: Iba836f110c4da936f5173cc0f54424d5b6121bff
|
|
vp9_compute_qdelta_by_rate has almost 2% overhead in profiling on Nexus 6.
Reduce the calling of that function in speed 8 by estimating the delta-q.
Both rtc and rtc_derf show little/no change in avg psnr/ssim.
Encoding speed is 2~3% faster on Nexus 6.
Change-Id: If25933715783f31104a18a5092ea347b1221b5f5
|
|
This small change replaces the frame boost check in the arf group
length break out clause with a test against a prediction decay value.
The boost value is in fact partly dependent on the decay value but
this change means that the per frame boost calculation can be adjusted
without influencing the group length calculation.
The value chosen gives a close match on all the test sets with the previous
code (on average) but it was noted that a lower threshold was slightly better
for 1080P and up and a slightly higher value for small image sizes.
Change-Id: I4d5b9f67d5b17b0d99ea3f796d3d6202fd61ee0c
|
|
Removed unnecessary scaling factor to simplify.
Change-Id: I3fc9c5975a2597e72f1324e09dd586dea1facfa7
|
|
The function scale_sse_threshold() returns a threshold scaled
if necessary for use with 10 and 12 bit from an 8 bit baseline.
SSE error values would be expected to rise for the 10 and 12
bit cases where there are more bits of precision.
Hence the threshold used for the test should also be scaled up.
Change-Id: I4009c98b6eecd1bf64c3c38aaa56598e0136b03d
|
|
Added counts that split the intra coded blocks into low and high variance.
Change-Id: Ic540144b34d5141659081bb22f7ee16fd6861f14
|
|
|
|
|
|
|
|
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
|
|
Change-Id: Ia8f822bd6e70b3911bc433a5a750bfb6f9a3a75c
|
|
|
|
Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce
|