Age | Commit message (Collapse) | Author |
|
|
|
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.
It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.
In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.
Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
|
|
* changes:
vp9_sub_pixel_*variance*: disable avx2 variants
vp9_sad*x4d: disable avx2 variants
vp9_f(dct|ht): disable avx2 variants
convolve: disable avx2 variants
fdct8x8_test: add missing avx2 functions
dct4x4_test: add missing avx2 functions
|
|
tests failing under Win32/Win64
+ variance_test: add missing avx2 functions (partially disabled)
Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
|
|
tests failing under Win32/Win64
+ sad_test: add missing avx2 functions (disabled)
Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
|
|
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.
In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.
Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
|
|
tests failing under Win32/Win64
+ dct16x16_test: add missing avx2 functions (partially disabled)
exercises the forward transforms
no idct/iht implementations, so the c-code is used
Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
|
|
tests failing under Win32/Win64
Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
|
|
|
|
|
|
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.
Fixes a bug in original patch:
(https://gerrit.chromium.org/gerrit/#/c/70163/8)
that was reverted due to a nightly test failure.
Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6
|
|
|
|
This reverts commit 916550428db803c54c993ff9d3c34b9b0bcebb7c
Change-Id: I500822b03f09c64ff6ec5396c68edee9ca3b75cb
|
|
|
|
An overflow issue could potentially happen in the second round 1-D
transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes
this issue.
Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc
|
|
|
|
Change-Id: I87b7c657d8813d7fb383ab519d150c0ffb1dd377
|
|
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
|
|
|
|
Need to include math.h before tmmintrin.h in some versions of MSVS.
Change-Id: Ia6b83ae599316887ecf30c4e4b9e4355fb8a4219
|
|
This reverts commit e8bbb3d9db797dab7c2f947cc43e8d0f168e4953.
Change-Id: Ie368d36fd249d323d859d208609c711f04537bbc
|
|
|
|
|
|
The subpixel SSSE3 was fixed in this patch:
https://gerrit.chromium.org/gerrit/#/c/70283/
So the equivalent AVX2 is fixed accordingly.
Change-Id: Ieebbc1949c99d34b12b8b47692df71aca5001f3a
|
|
|
|
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.
Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
|
|
|
|
|
|
In 8-tap filtering, to guarantee the intermediate results fit in
16 bits, the order of accumulating the products needs to be done
correctly, and the largest product should be added last. This
patch fixed the problem using the method in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation".
Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
|
|
|
|
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.
Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71
|
|
As mismatchs were found between the intrinsic version and c only. The
commit temporarily revert to use the matching assembly version to
allow further investigation.
Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
|
|
|
|
This did the same correction as the one in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation" to avoid saturation
during filtering.
Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6
|
|
Change-Id: I9120a87e27e73e496932d11716937e2fad246521
|
|
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
|
|
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.
Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
|
|
|
|
This is needed for profiles 1 and 2.
Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028
|
|
The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.
|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|
For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.
For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
|
|
|
|
Change-Id: I9ef40f3d95ab8f94f69e92ea25678a40956bc1ce
|
|
|
|
|
|
|
|
|
|
Change-Id: I7ed7fecc959c6598ff98895f1a5cf7e11ac1615f
|
|
Make all post-processor code conditionally
compilable based on the CONFIG_VP9_POSTPROC
macro.
Also, remove the vizualization code from VP9
since it is out of date and will not compile.
Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd
|
|
This reverts commit 7ab9a9587b96db4edce6be916c1f02297a9555ff
Nightly test http://build.webmproject.org/jenkins/view/libvpx-nightly-tests/job/libvpx%20unit%20tests%20(valgrind-2)/arch=x86_64-linux-gcc,filter=-*VP8*:*Large.*/276/console
Failed
This patch did not address all the assembly issues
some of the vp8 assembly counts on 5 arguments being passed in to this function:
one example : vp8_sad8x16_wmt
Please address or split this into vp9 and vp8 patches.
Change-Id: I78afcc171649894f887bb8ee3c66de24aaddc7ca
|
|
|