Age | Commit message (Collapse) | Author |
|
|
|
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.
Fixes a bug in original patch:
(https://gerrit.chromium.org/gerrit/#/c/70163/8)
that was reverted due to a nightly test failure.
Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6
|
|
|
|
This reverts commit 916550428db803c54c993ff9d3c34b9b0bcebb7c
Change-Id: I500822b03f09c64ff6ec5396c68edee9ca3b75cb
|
|
|
|
An overflow issue could potentially happen in the second round 1-D
transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes
this issue.
Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc
|
|
|
|
Change-Id: I87b7c657d8813d7fb383ab519d150c0ffb1dd377
|
|
|
|
Need to include math.h before tmmintrin.h in some versions of MSVS.
Change-Id: Ia6b83ae599316887ecf30c4e4b9e4355fb8a4219
|
|
This reverts commit e8bbb3d9db797dab7c2f947cc43e8d0f168e4953.
Change-Id: Ie368d36fd249d323d859d208609c711f04537bbc
|
|
|
|
|
|
The subpixel SSSE3 was fixed in this patch:
https://gerrit.chromium.org/gerrit/#/c/70283/
So the equivalent AVX2 is fixed accordingly.
Change-Id: Ieebbc1949c99d34b12b8b47692df71aca5001f3a
|
|
|
|
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.
Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
|
|
|
|
|
|
In 8-tap filtering, to guarantee the intermediate results fit in
16 bits, the order of accumulating the products needs to be done
correctly, and the largest product should be added last. This
patch fixed the problem using the method in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation".
Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
|
|
|
|
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.
Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71
|
|
As mismatchs were found between the intrinsic version and c only. The
commit temporarily revert to use the matching assembly version to
allow further investigation.
Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
|
|
|
|
This did the same correction as the one in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation" to avoid saturation
during filtering.
Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6
|
|
Change-Id: I9120a87e27e73e496932d11716937e2fad246521
|
|
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
|
|
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.
Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
|
|
|
|
This is needed for profiles 1 and 2.
Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028
|
|
The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.
|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|
For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.
For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
|
|
|
|
Change-Id: I9ef40f3d95ab8f94f69e92ea25678a40956bc1ce
|
|
|
|
|
|
|
|
|
|
Change-Id: I7ed7fecc959c6598ff98895f1a5cf7e11ac1615f
|
|
Make all post-processor code conditionally
compilable based on the CONFIG_VP9_POSTPROC
macro.
Also, remove the vizualization code from VP9
since it is out of date and will not compile.
Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd
|
|
This reverts commit 7ab9a9587b96db4edce6be916c1f02297a9555ff
Nightly test http://build.webmproject.org/jenkins/view/libvpx-nightly-tests/job/libvpx%20unit%20tests%20(valgrind-2)/arch=x86_64-linux-gcc,filter=-*VP8*:*Large.*/276/console
Failed
This patch did not address all the assembly issues
some of the vp8 assembly counts on 5 arguments being passed in to this function:
one example : vp8_sad8x16_wmt
Please address or split this into vp9 and vp8 patches.
Change-Id: I78afcc171649894f887bb8ee3c66de24aaddc7ca
|
|
|
|
Change-Id: Id401da740b0a0141caaef9e1bcccd981e5cef4a4
|
|
vp9_block_error_sse2 can only handle 16 bytes at a time but
the function requires to handle a sequence of 32 bytes at a time
so each 16 bytes is handled in a different register.
With AVX2 optimization the 32 bytes can be handled in one register instead
of two in the SSE2
The vp9_block_error was optimized by 85%.
The user level was optimized by 1.2%
Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
|
|
|
|
Change-Id: I0315cea6a5e58182bc2556e9825ec2ef0b1480c3
|
|
|
|
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.
Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
|
|
|
|
The warning messages complained that there are unused arguments
in a few prediction modes. This structure was designed on purpose,
such that a wrapper function can cover all prediction mode cases
and make them readily accessible as an pointer array.
This commit silences such warnings.
Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0
|
|
Change-Id: I04930aca2293ebbaeb96dfedd2f9c5a55762fd2e
|
|
Inline loopfilter has been already handled in vp9_decode_frame().
Collecting all similar code in one place now.
Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
|