Age | Commit message (Collapse) | Author |
|
|
|
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.
Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
|
|
average improvement ~4x-6x
Change-Id: I5edf713721b9e24c7e0ce2e69d8fc3ecab625d91
|
|
|
|
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.
Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
|
|
average improvement ~4x-6x
Change-Id: Idaba7e49fbd7f388caee0d73773ccf6e4807ef17
|
|
average improvement ~4x-6x
Change-Id: I55e95b7f2ba403dff11813958dc7c73a900dd022
|
|
average improvement ~3x-5x
Change-Id: I422e4c33ea7e6d6783ba40029438ccf21b0e76bb
|
|
average improvement ~6x-8x
Change-Id: I7c91eec41aada3b0a5231dda7869b3b968f3ad18
|
|
average improvement ~5x-8x
Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b
|
|
average improvement ~6x-10x
Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa
|
|
Change-Id: Ia31ada59172eb1818e1eb91009f83cbb1f581223
|
|
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.
The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).
Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
|
|
Unifying transform function names across libvpx, 1d is a redundant suffix.
Change-Id: I077c19f3bc7d4842ed7ca5814d77b3dce1728e13
|
|
Change-Id: Ic334da9aee968e33762c2b25d9fbad24c844b411
|
|
This renames all the loop filter functions so that they no
longer refer to mb
Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b
|
|
Change-Id: If4ddbdcfb3ab387cbca6910b42cf4df8111e6879
|
|
This patch followed "Add filter_selectively_vert_row2 to enable
parallel loopfiltering" commit, and added x86 SSE2 optimization
to do 16-pixel filtering in parallel. For other optimizations
(neon and dspr2), current 16-pixel functions were done by calling
8-pixel functions twice, and real 16-pixel functions could be added
later.
Decoder speedup:
tulip clip: 2% speed gain;
old_town_cross: 1.2% speed gain;
bus: 2% speed gain.
Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
|
|
|
|
Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c
|
|
Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d
|
|
Change-Id: I2ba9467525b87a8e4a58f0c546e63031b4e38a4e
|
|
Change-Id: Iedcdb8867084f328f4fce2fadb968e0984217308
|
|
|
|
|
|
Change-Id: I0a0f9c07e774450896abc9455728b97fd38ef00c
|
|
Change-Id: Iac55891ac9e6f13718c9f822aa099b5ca491832a
|
|
Change-Id: Idd7bdb0c364d94c5a0d24c87bb8574292e4c840c
|
|
Change-Id: Ic31b4ef85e65070b4f8b9f26e068ccfaae00c4f0
|
|
Change-Id: I401536778e3c68ba2b3ae3955c689d005e1f1d59
|