Age | Commit message (Collapse) | Author |
|
include vpx_ports/msvc.h to avoid issues with snprintf issues with MSVC.
Change-Id: Ida09cff8ee3b84e09fd61de131f84b32c113fa1a
|
|
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
32x32 C time = 93.1 ms (±0.4 ms), VSX time = 6.5 ms (±0.2 ms) [14.4x]
Change-Id: I7f1fd0fc987af86baf2b74147a25aee811289112
|
|
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
4x4 C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x]
8x8 C time = 57.7 ms (±0.3 ms), VSX time = 7.6 ms (±0.0 ms) [ 7.6x]
16x16 C time = 50.7 ms (±0.1 ms), VSX time = 4.9 ms (±0.0 ms) [10.3x]
Change-Id: Ic09bc786c57cc89bba14624064216b52996075eb
|
|
functions: upper camelcase
members: lowercase with trailing '_'
decl order: functions (overrides marked virtual), members
after:
656e8ac61 VSX version of vpx_post_proc_down_and_across_mb_row
766d875b9 VSX version of vpx_mbpost_proc_ip
35e98a70b VSX version of vpx_mbpost_proc_down
b2898a9ad Bench Class For More Robust Speed Tests
Change-Id: Ib257bd607c5c1248d30e619ec9e8a47cc629825b
|
|
To make speed testing more robust, the AbstractBench runs the
desired code multiple times and report the median run time with
mean absolute deviation around the median.
To use the AbstractBench, simply add it as a parent to your test
class, and implement the run() method (with the code you want to
benchmark).
Sample output for VP9QuantizeTest
[ BENCH ] Bypass calculations 4x4 165.8 ms ( ±1.0 ms )
[ BENCH ] Full calculations 4x4 165.8 ms ( ±0.9 ms )
[ BENCH ] Bypass calculations 8x8 129.7 ms ( ±0.9 ms )
[ BENCH ] Full calculations 8x8 130.3 ms ( ±1.4 ms )
[ BENCH ] Bypass calculations 16x16 110.3 ms ( ±1.4 ms )
[ BENCH ] Full calculations 16x16 110.1 ms ( ±0.9 ms )
Change-Id: I1dd649754cb8c4c621eee2728198ea6a555f38b3
|
|
Low bit depth version only. Passes the VP9QuantizeTest.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
Full calculations:
C time = 1456 ms, VSX time = 80 ms (18x)
Change-Id: I1b1d6d03b1aeff63640efbdeb222cab857ddd95e
|
|
Low bit depth version only. Passes the VP9QuantizeTest.
Change-Id: I6546f872864bd404a7e353348b0554aab1de5bf0
|
|
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017
Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
|
|
Started from vp9_quantize_fp_sse2 and tweaked to use avx2.
Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
|
|
This c version uses the shortcuts found in the
vp9_quantize_fp_32x32_ssse3 function.
Change-Id: I2e983adb00064e070b7f2b1ac088cc58cf778137
|
|
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.
The test was updated to use the correct quant/round range.
Change-Id: Ie5871f710d9eb39047d8d9f48b907c0633e1f830
|
|
This reverts commit 86842855d30d6ca6befdcf5108003e027d90daa9.
SSSE3/VP9QuantizeTest.EOBCheck/1 fails on Mac and the build breaks under
visual studio due to a #if within another macro.
Change-Id: I475095a04aafcc714fade2b24e4df7b682be2cd1
|
|
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.
The test was updated to use the correct quant/round range.
Change-Id: I5d19f8af2fddda8e50910249eafb740acb29415b
|
|
This reverts commit 8c42237bb200253931c49e2c530838f3a877dd65.
Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
|
|
|
|
This reverts commit f60d1dcd3de46f72bafc5eeef481bd1a4e203301.
Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org
Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
|
|
|
|
|
|
Ensure avx and ssse3 stay in sync by testing them against each other.
Change-Id: I699f3b48785c83260825402d7826231f475f697c
|
|
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.
Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
|
|
Change-Id: I1d93698bc27529b0544d79dd7b9fe37afa51ef87
|
|
Change-Id: I77be617c7d7c64929dd51c6077322f4f8ad23897
|
|
|
|
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
|
|
About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.
Both numbers would improve if the division for dqcoeff could be
simplified.
BUG=webm:1426
Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
|
|
None of the x86 optimizations pass the tests.
Change-Id: Ic67f2ba1977b657e68f2a13b0711fc5fcbafd909
|
|
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.
Add assert() and comments regarding the usage of skip_block.
Removing the parameter is a fairly involved process so leave it be for
the moment.
Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
|
|
Promote the result of RandRange to signed
Change-Id: I89313cace3bcbe9af96946bef00b6857fc48b128
|
|
* changes:
quantize test: check skip_block
quantize test: use negative input
|
|
this test fails with the configuration similar to the assembly prior to:
d52cb5972 quantize: copy ssse3 optimizations to intrinsics
BUG=webm:1458
Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf
|
|
|
|
Not all sizes were tested previously. Only 4x4 and 32x32
Change-Id: I4b4beab1b92a810a097a7306de04cc9e0e260315
|
|
coeff contains signed values.
Change-Id: I02f74decf30379a28122169ab3e844d0f3bd7d23
|
|
With skip block the neon is about twice as fast as C.
The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.
BUG=webm:1426
Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
|
|
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
|
|
Pass a max txfm size parameter and combine the base quantize
test with the 32x32 test.
Change-Id: I72ddf020fe6888e864ea9f3642ee2d9a8e48a04b
|
|
Test some possible scenarios.
Change-Id: I1a612e7153b31756be66390ceea55877856d5a33
|
|
With skip block or coeff < zbin it is about twice as fast as C.
If most coeff values are > zbin it is about 10-15x as fast as C.
BUG=webm:1426
Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
|
|
Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'
Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12
|
|
qcoeff output looks OK but dqcoeff is no good.
BUG=webm:1448
Change-Id: I07211db8a8b74f1f45fdd059852e2de0e5ee18fd
|
|
eob values are generated by the function.
Change-Id: I8ce92100e83022bff99888a5a7e6ef378c49fda3
|
|
ssse3 does not pass either of the tests.
avx 32x32 does not pass.
Change-Id: I62c2e31336fd2327327afaa0da896ad79a3def44
|
|
Officially the quant structures are 8 elements, with one dc element and
7 repeated ac elements. The low bit depth optimizations take advantage
of this to fill the xmm registers. The high bit depth version manually
duplicates the values.
If all the optimizations were unified, the structure sizes could be
greatly reduced.
Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae
|
|
Use only valid values for quantize inputs. These were determined by
looping over vp9_init_quantizer and looking for max and min values.
This allows extending the test to the low bit depth functions which were
not designed to handle all possible inputs but only valid inputs.
Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e
|
|
Although the low bitdepth functions are identical (excepting the need
for larger intermediate values) they do not pass these tests. This
improves the error output to aid debugging.
Simplify buffer usage with Buffer and removing unnecessarily aligned
variables.
eob is a single element and never written using aligned instructions.
BUG=webm:1426
Change-Id: Ic95789a135cf1e8a3846d85270f2b818f6ec7e35
|
|
Change-Id: I0d9ab85855eb723f653a7bb09b3d0d31dd6cfd2f
|
|
This commit clears all the vp9_ prefix use case in vpx_dsp. It gets
the vp9 folder ready to branch out vp10.
Change-Id: I2906eec179ee792b4af8c9b4161313653050e931
|
|
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32
vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32
The purpose of doing that was to allow these functions to be shared
by multiple codecs.
Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
|
|
Various header/test files had to be re-worked in order to
build "Remove cm parameter from vp9_decode_block_tokens()".
This patch reverts the "Remove cm" part and only contains
the re-worked header files.
Change-Id: I520958a88d1991fee988a3c784d0eac40e117a32
|
|
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.
Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
|