Age | Commit message (Collapse) | Author |
|
BUG=webm:1270
Change-Id: I7d56667d946196bbbe355303de805422e40b0763
|
|
Change-Id: Ib8ca37f1b58e9903e7efa29689a0a49f14b4d73a
|
|
Exclude low bit depth optimizations from high bit depth builds.
BUG=webm:1584
Change-Id: I86a7ebafa557d262257358e1e055a06d52659977
|
|
|
|
BUG=webm:1584
Change-Id: Ifdebf33356abcc6869f695d129165ba17e042dcd
|
|
BUG=webm:1584
Change-Id: I4cbfafe8ea72b3d4523aabcaed4848fa29bb19fe
|
|
Reduces the number of rows calculated for 2D 4-tap interpolation filter
from h+7 rows to h+3 rows.
Also fixes a bug in the avx2 function for 4-tap filters where the last
row is computed incorrectly.
Performance:
| Baseline | Result | Pct Gain |
bitdepth lo| 4.00 fps | 4.02 fps | 0.5% |
bitdepth 10| 1.90 fps | 1.91 fps | 0.5% |
The performance is evaluated on speed 1 on jets.y4m br 500 over 100
frames.
No BDBR loss is observed.
Change-Id: I90b0d4d697319b7bba599f03c5dc01abd85d13b1
|
|
BUG=webm:1584
Change-Id: I596f5f0e1a1c152493cd8177b32d416cc79937e0
|
|
Add rtcd headers and make local functions static.
BUG=webm:1584
Change-Id: Ic19aec1dc90703b0b89d1092baee487d0fd0cb4e
|
|
|
|
|
|
BUG=webm:1584
Change-Id: I2dcf39f2327b72b58be72c27f952ea781a790dd3
|
|
BUG=webm:1584
Change-Id: I1be768446b9304123da7b1ea0aed0db056db31c5
|
|
vp8_norm table has 256 elements while index to it can be higher on
fuzzed data. Typecasting it to unsigned char will ensure valid range and
will trigger proper error later. Also declaring "shift" as unsigned char to
avoid UB sanitizer warning
BUG=b/122373286,b/122373822,b/122371119
Change-Id: I3cef1d07f107f061b1504976a405fa0865afe9f5
|
|
|
|
|
|
BUG=webm:1584
Change-Id: Iaba854952534a95e710a985acfcab46e093872c2
|
|
BUG=webm:1584
Change-Id: Ia2d9fcbccbad0c2142a3759e610670b86af0fef4
|
|
BUG=webm:1584
Change-Id: I5990c0100af83d13f7a4800147473bc997f5e5d1
|
|
|
|
BUG=webm:1584
Change-Id: I48b9a9cdcfe52536f685c41fb2d3c0f3e9192d34
|
|
vpx_asm_stubs.c only references these sse2 functions. Combine the files
similar to the way the ssse3/avx2 files are set up.
Mark the intrinsics as static because they are only used within the
macros here. It is unfortunate that the assembly functions can not be
marked static as well.
BUG=webm:1584
Change-Id: I342687a1046ae6ca46ae58644a7c170440de1dfb
|
|
BUG=webm:1584
Change-Id: I92504ed4a2e54129c981b7380249962afb7966df
|
|
|
|
BUG=webm:1584
Change-Id: Ia3f152bf2a37f8a1ea4178eeb1a6a262ea034a8d
|
|
The optimizations were accidentally disabled during the move from vp9
commit c3bdffb0a508ad08d5dfa613c029f368d4293d4c
author Johann <johannkoenig@google.com> Fri May 15 18:52:03 2015
Move variance functions to vpx_dsp
subpel functions will be moved in another patch.
BUG=webm:1584
Change-Id: Ia7899ee0cfad13a0e1516b89756552064846e81c
|
|
|
|
Speed test:
[ RUN ] C/HadamardHighbdTest.DISABLED_Speed/2
Hadamard32x32[ 10 runs]: 9 us
Hadamard32x32[ 10000 runs]: 8914 us
Hadamard32x32[ 10000000 runs]: 8991776 us
[ RUN ] AVX2/HadamardHighbdTest.DISABLED_Speed/2
Hadamard32x32[ 10 runs]: 5 us
Hadamard32x32[ 10000 runs]: 4582 us
Hadamard32x32[ 10000000 runs]: 4548203 us
Change-Id: Ied1b38b510bd033299f05869216d394e3b7f70f1
|
|
Speed Test:
C/SatdHighbdTest
blocksize: 16 time: 138 us
blocksize: 64 time: 315 us
blocksize: 256 time: 1120 us
blocksize: 1024 time: 3955 us
AVX2/SatdHighbdTest
blocksize: 16 time: 89 us
blocksize: 64 time: 189 us
blocksize: 256 time: 590 us
blocksize: 1024 time: 1912 us
Change-Id: I6357174462fccd589a475b13d8114b853cab5383
|
|
Speed test:
[ RUN ] C/HadamardHighbdTest.DISABLED_Speed/1
Hadamard16x16[ 10 runs]: 2 us
Hadamard16x16[ 10000 runs]: 1836 us
Hadamard16x16[ 10000000 runs]: 1829451 us
[ RUN ] AVX2/HadamardHighbdTest.DISABLED_Speed/1
Hadamard16x16[ 10 runs]: 1 us
Hadamard16x16[ 10000 runs]: 1009 us
Hadamard16x16[ 10000000 runs]: 984856 us
Change-Id: I89b9cdbe19350815576d66e627df87e5025ed0a4
|
|
|
|
|
|
Example internal stats
Before the fix:
Bitrate AVGPsnr GLBPsnr AVPsnrP GLPsnrP VPXSSIM VPSSIMP FASTSIM PSNRHVS WstPsnr WstSsim WstFast WstHVS AVPsnrY APsnrCb APsnrCr Block WstBlck Consist WstCons Time RcErr AbsErr
153.39 37.131 36.420 37.151 36.437 716.077 817.445 10.422 34.347 32.980 0.916 9.281 30.208 36.024 41.830 40.581 0.000 0.000 100.000 100.000 55006 2.26 2.26
No mismatch detected in recon buffers
After the fix:
Bitrate AVGPsnr GLBPsnr AVPsnrP GLPsnrP VPXSSIM VPSSIMP FASTSIM PSNRHVS WstPsnr WstSsim WstFast WstHVS AVPsnrY APsnrCb APsnrCr Block WstBlck Consist WstCons Time RcErr AbsErr
153.39 37.131 36.420 37.151 36.437 69.808 70.023 10.422 34.347 32.980 0.910 9.281 30.208 36.024 41.830 40.581 0.000 0.000 100.000 100.000 55067 2.26 2.26
No mismatch detected in recon buffers
Change-Id: I820abc498c1543548f193874046582b50afd0238
|
|
BUG=webm:1448
Change-Id: I2140fb9b6ce92716d2d9509f3031244088a62127
|
|
Speed tests:
[ RUN ] C/HadamardHighbdTest.DISABLED_Speed/0
Hadamard8x8[ 10 runs]: 0 us
Hadamard8x8[ 10000 runs]: 316 us
Hadamard8x8[ 10000000 runs]: 311749 us
[ OK ] C/HadamardHighbdTest.DISABLED_Speed/0 (371 ms)
[ RUN ] AVX2/HadamardHighbdTest.DISABLED_Speed/0
Hadamard8x8[ 10 runs]: 0 us
Hadamard8x8[ 10000 runs]: 161 us
Hadamard8x8[ 10000000 runs]: 156910 us
[ OK ] AVX2/HadamardHighbdTest.DISABLED_Speed/0 (160 ms)
Change-Id: I94f7324be20405ff55f8a02ad4651c4ab4c10202
|
|
This slows down low bitdepth builds but is necessary to obtain correct
values.
BUG=webm:1448
Change-Id: I4ca9145f576089bb8496fcfeedeb556dc8fe6574
|
|
|
|
Change-Id: I2f04937d8a4e171d42b25ee6c6555ccad29eb192
|
|
Calculate the high bits of dqcoeff and store them appropriately in high
bit depth builds.
Low bit depth builds still do not pass. C truncates the results after
division. X86 only supports packing with saturation at this step.
BUG=webm:1448
Change-Id: Ic80def575136c7ca37edf18d21e26925b475da98
|
|
Calculate the high bits of dqcoeff in high bit depth builds and store
them appropriately.
BUG=webm:1448
Change-Id: I61a2f8bfcf2e30765f10a94073c4d58321d2fa24
|
|
Pave the way for new quantize_OPT.h helper files.
Change-Id: Ice7225612983f5587a9660af3320c7d0c8bb1c2f
|
|
|
|
Array index wasn't checked on boundary.
BUG=webm:1572
Change-Id: I55a93c024af77a4fd904b0e992d5587a142d66a4
|
|
Simplify max value calculation on aarch64 by using vmaxv. Much
faster for 4x4 but diminishing returns as the block size grows.
Only the vp9 quantize has a speed test hooked up. Anticipate
similar results for the other quantize versions.
Before:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
After:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
Change-Id: Ic95812b3fdbd4e47b4dcb8ed46c68a9617de38d2
|
|
|
|
|
|
|
|
Change-Id: Ibec078c80ca1dfe6fbbc4288db89d719dac453a7
|
|
BUG=webm:1444
Change-Id: Iee19be068afc6c81396c79218a89c469d2e66207
|
|
Always use src/ref and _ptr/_stride suffixes.
Normalize to [xy]_offset and second_pred.
Drop some stray source/recon_strides.
BUG=webm:1444
Change-Id: I32362a50988eb84464ab78686348610ea40e5c80
|