summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2019-02-06Use wide integer to avoid overflowYaowu Xu
BUG=webm:1270 Change-Id: I7d56667d946196bbbe355303de805422e40b0763
2019-02-05ppc: use c89 loop declarationJohann
Change-Id: Ib8ca37f1b58e9903e7efa29689a0a49f14b4d73a
2019-01-23mips: resolve missing declarationsJohann
Exclude low bit depth optimizations from high bit depth builds. BUG=webm:1584 Change-Id: I86a7ebafa557d262257358e1e055a06d52659977
2019-01-16Merge "mips highbd: resolve missing declarations"Johann Koenig
2019-01-15mips: add rtcd.h to resolve missing declarationsJohann
BUG=webm:1584 Change-Id: Ifdebf33356abcc6869f695d129165ba17e042dcd
2019-01-15mips highbd: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I4cbfafe8ea72b3d4523aabcaed4848fa29bb19fe
2019-01-15Remove unnecessary calculation in 4-tap interpolation filterchiyotsai
Reduces the number of rows calculated for 2D 4-tap interpolation filter from h+7 rows to h+3 rows. Also fixes a bug in the avx2 function for 4-tap filters where the last row is computed incorrectly. Performance: | Baseline | Result | Pct Gain | bitdepth lo| 4.00 fps | 4.02 fps | 0.5% | bitdepth 10| 1.90 fps | 1.91 fps | 0.5% | The performance is evaluated on speed 1 on jets.y4m br 500 over 100 frames. No BDBR loss is observed. Change-Id: I90b0d4d697319b7bba599f03c5dc01abd85d13b1
2019-01-09highbd idct: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I596f5f0e1a1c152493cd8177b32d416cc79937e0
2019-01-08ppc: resolve missing declarationsJohann
Add rtcd headers and make local functions static. BUG=webm:1584 Change-Id: Ic19aec1dc90703b0b89d1092baee487d0fd0cb4e
2019-01-07Merge "vpx_filter: resolve missing declarations"Johann Koenig
2019-01-07Merge "arm neon: resolve missing declarations"Johann Koenig
2019-01-07arm neon: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I2dcf39f2327b72b58be72c27f952ea781a790dd3
2019-01-07vpx_filter: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I1be768446b9304123da7b1ea0aed0db056db31c5
2019-01-07Fix OOB memory access on fuzzed datakyslov
vp8_norm table has 256 elements while index to it can be higher on fuzzed data. Typecasting it to unsigned char will ensure valid range and will trigger proper error later. Also declaring "shift" as unsigned char to avoid UB sanitizer warning BUG=b/122373286,b/122373822,b/122371119 Change-Id: I3cef1d07f107f061b1504976a405fa0865afe9f5
2018-12-24Merge "fwd_dct32x32 avx2: resolve missing declarations"Johann Koenig
2018-12-21Merge "fwd_dct32x32 sse2: resolve missing declarations"Johann Koenig
2018-12-21fwd_dct32x32 avx2: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: Iaba854952534a95e710a985acfcab46e093872c2
2018-12-21fwd_dct32x32 sse2: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: Ia2d9fcbccbad0c2142a3759e610670b86af0fef4
2018-12-21convolve avx2: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I5990c0100af83d13f7a4800147473bc997f5e5d1
2018-12-21Merge "subpixel_8t sse2: resolve missing declarations"Johann Koenig
2018-12-21subpixel_8t ssse3: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I48b9a9cdcfe52536f685c41fb2d3c0f3e9192d34
2018-12-21subpixel_8t sse2: resolve missing declarationsJohann
vpx_asm_stubs.c only references these sse2 functions. Combine the files similar to the way the ssse3/avx2 files are set up. Mark the intrinsics as static because they are only used within the macros here. It is unfortunate that the assembly functions can not be marked static as well. BUG=webm:1584 Change-Id: I342687a1046ae6ca46ae58644a7c170440de1dfb
2018-12-21subpixel_8t avx2: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I92504ed4a2e54129c981b7380249962afb7966df
2018-12-21Merge "highbd quantize: resolve missing declarations"Johann Koenig
2018-12-20highbd quantize: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: Ia3f152bf2a37f8a1ea4178eeb1a6a262ea034a8d
2018-12-20highbd variance: resolve missing declarationsJohann
The optimizations were accidentally disabled during the move from vp9 commit c3bdffb0a508ad08d5dfa613c029f368d4293d4c author Johann <johannkoenig@google.com> Fri May 15 18:52:03 2015 Move variance functions to vpx_dsp subpel functions will be moved in another patch. BUG=webm:1584 Change-Id: Ia7899ee0cfad13a0e1516b89756552064846e81c
2018-12-08Merge "Add satd avx2 implementation"Sai Deng
2018-12-07Add high bit Hadamard 32x32 avx2 implementationsdeng
Speed test: [ RUN ] C/HadamardHighbdTest.DISABLED_Speed/2 Hadamard32x32[ 10 runs]: 9 us Hadamard32x32[ 10000 runs]: 8914 us Hadamard32x32[ 10000000 runs]: 8991776 us [ RUN ] AVX2/HadamardHighbdTest.DISABLED_Speed/2 Hadamard32x32[ 10 runs]: 5 us Hadamard32x32[ 10000 runs]: 4582 us Hadamard32x32[ 10000000 runs]: 4548203 us Change-Id: Ied1b38b510bd033299f05869216d394e3b7f70f1
2018-12-06Add satd avx2 implementationsdeng
Speed Test: C/SatdHighbdTest blocksize: 16 time: 138 us blocksize: 64 time: 315 us blocksize: 256 time: 1120 us blocksize: 1024 time: 3955 us AVX2/SatdHighbdTest blocksize: 16 time: 89 us blocksize: 64 time: 189 us blocksize: 256 time: 590 us blocksize: 1024 time: 1912 us Change-Id: I6357174462fccd589a475b13d8114b853cab5383
2018-12-05Add high bit Hadamard 16x16 avx2 implementationsdeng
Speed test: [ RUN ] C/HadamardHighbdTest.DISABLED_Speed/1 Hadamard16x16[ 10 runs]: 2 us Hadamard16x16[ 10000 runs]: 1836 us Hadamard16x16[ 10000000 runs]: 1829451 us [ RUN ] AVX2/HadamardHighbdTest.DISABLED_Speed/1 Hadamard16x16[ 10 runs]: 1 us Hadamard16x16[ 10000 runs]: 1009 us Hadamard16x16[ 10000000 runs]: 984856 us Change-Id: I89b9cdbe19350815576d66e627df87e5025ed0a4
2018-12-05Merge "quantize neon: fix hbd builds"Johann Koenig
2018-12-05Merge "Fix overflow in calculating highbd SSIM"Sai Deng
2018-12-05Fix overflow in calculating highbd SSIMsdeng
Example internal stats Before the fix: Bitrate AVGPsnr GLBPsnr AVPsnrP GLPsnrP VPXSSIM VPSSIMP FASTSIM PSNRHVS WstPsnr WstSsim WstFast WstHVS AVPsnrY APsnrCb APsnrCr Block WstBlck Consist WstCons Time RcErr AbsErr 153.39 37.131 36.420 37.151 36.437 716.077 817.445 10.422 34.347 32.980 0.916 9.281 30.208 36.024 41.830 40.581 0.000 0.000 100.000 100.000 55006 2.26 2.26 No mismatch detected in recon buffers After the fix: Bitrate AVGPsnr GLBPsnr AVPsnrP GLPsnrP VPXSSIM VPSSIMP FASTSIM PSNRHVS WstPsnr WstSsim WstFast WstHVS AVPsnrY APsnrCb APsnrCr Block WstBlck Consist WstCons Time RcErr AbsErr 153.39 37.131 36.420 37.151 36.437 69.808 70.023 10.422 34.347 32.980 0.910 9.281 30.208 36.024 41.830 40.581 0.000 0.000 100.000 100.000 55067 2.26 2.26 No mismatch detected in recon buffers Change-Id: I820abc498c1543548f193874046582b50afd0238
2018-12-03quantize neon: fix hbd buildsJohann
BUG=webm:1448 Change-Id: I2140fb9b6ce92716d2d9509f3031244088a62127
2018-12-03Add high bit Hadamard 8x8 avx2 implementationsdeng
Speed tests: [ RUN ] C/HadamardHighbdTest.DISABLED_Speed/0 Hadamard8x8[ 10 runs]: 0 us Hadamard8x8[ 10000 runs]: 316 us Hadamard8x8[ 10000000 runs]: 311749 us [ OK ] C/HadamardHighbdTest.DISABLED_Speed/0 (371 ms) [ RUN ] AVX2/HadamardHighbdTest.DISABLED_Speed/0 Hadamard8x8[ 10 runs]: 0 us Hadamard8x8[ 10000 runs]: 161 us Hadamard8x8[ 10000000 runs]: 156910 us [ OK ] AVX2/HadamardHighbdTest.DISABLED_Speed/0 (160 ms) Change-Id: I94f7324be20405ff55f8a02ad4651c4ab4c10202
2018-11-30quantize 32x32: saturate dqcoeff on x86Johann
This slows down low bitdepth builds but is necessary to obtain correct values. BUG=webm:1448 Change-Id: I4ca9145f576089bb8496fcfeedeb556dc8fe6574
2018-11-30Merge "Use 16 bit ints in Hadamard highbd col8 first pass"Sai Deng
2018-11-29Use 16 bit ints in Hadamard highbd col8 first passsdeng
Change-Id: I2f04937d8a4e171d42b25ee6c6555ccad29eb192
2018-11-28quantize 32x32: fix dqcoeffJohann
Calculate the high bits of dqcoeff and store them appropriately in high bit depth builds. Low bit depth builds still do not pass. C truncates the results after division. X86 only supports packing with saturation at this step. BUG=webm:1448 Change-Id: Ic80def575136c7ca37edf18d21e26925b475da98
2018-11-28quantize: fix x86 hbd buildsJohann
Calculate the high bits of dqcoeff in high bit depth builds and store them appropriately. BUG=webm:1448 Change-Id: I61a2f8bfcf2e30765f10a94073c4d58321d2fa24
2018-11-27rename quantize_x86.hJohann
Pave the way for new quantize_OPT.h helper files. Change-Id: Ice7225612983f5587a9660af3320c7d0c8bb1c2f
2018-11-20Merge "Fix oob in vpx_setup_noise"Jerome Jiang
2018-11-16Fix oob in vpx_setup_noiseJerome Jiang
Array index wasn't checked on boundary. BUG=webm:1572 Change-Id: I55a93c024af77a4fd904b0e992d5587a142d66a4
2018-11-12quantize: use aarch64 vmaxvJohann
Simplify max value calculation on aarch64 by using vmaxv. Much faster for 4x4 but diminishing returns as the block size grows. Only the vp9 quantize has a speed test hooked up. Anticipate similar results for the other quantize versions. Before: [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2 [ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms ) [ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms ) [ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms ) [ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms) [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3 [ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms ) After: [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2 [ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms ) [ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms ) [ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms ) [ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms ) [ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms) [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3 [ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms ) Change-Id: Ic95812b3fdbd4e47b4dcb8ed46c68a9617de38d2
2018-11-05Merge "clang-tidy: fix vpx_dsp parameters"Johann Koenig
2018-11-02Merge "vpx postproc: rewrite in intrinsics"Johann Koenig
2018-11-02Merge "Add highbd Hadamard transform C implementations"Sai Deng
2018-11-01Add highbd Hadamard transform C implementationssdeng
Change-Id: Ibec078c80ca1dfe6fbbc4288db89d719dac453a7
2018-11-01clang-tidy: fix vpx_dsp parametersJohann
BUG=webm:1444 Change-Id: Iee19be068afc6c81396c79218a89c469d2e66207
2018-10-31clang-tidy: normalize variance functionsJohann
Always use src/ref and _ptr/_stride suffixes. Normalize to [xy]_offset and second_pred. Drop some stray source/recon_strides. BUG=webm:1444 Change-Id: I32362a50988eb84464ab78686348610ea40e5c80