summaryrefslogtreecommitdiff
path: root/vpx_dsp/vpx_dsp.mk
AgeCommit message (Collapse)Author
2018-07-26Add New Neon Assemblies for Motion CompensationVenkatarama NG. Avadhani
Commit adds neon assemblies for motion compensation which show an improvement over the existing neon code. Performance Improvement - Platform Resolution 1 Thread 4 Threads Nexus 6 720p 12.16% 7.21% @2.65 GHz 1080p 18.00% 15.28% Change-Id: Ic0b0412eeb01c8317642b20bb99092c2f5baba37
2018-06-27VSX Version of fdct32x32_rdLuc Trudeau
Low bit depth version only. Passes the Trans32x32Test test suite. Trans32x32Test Speed Test (POWER9 Model 2.2) 32x32 C time = 212.7 ms (±0.1 ms), VSX time = 82.3 ms (±0.0 ms) [2.6x] Change-Id: If906ec9b56ce3818cae0cc462c7277284ab29859
2018-06-08Implement subtract_block for VSXLuca Barbato
~2x speedup or better. [ RUN ] C/VP9SubtractBlockTest.Speed/0 [ BENCH ] 4x4 365.1 ms ( ±2.2 ms ) [ BENCH ] 8x4 258.5 ms ( ±0.3 ms ) [ BENCH ] 4x8 202.7 ms ( ±0.2 ms ) [ BENCH ] 8x8 162.2 ms ( ±0.5 ms ) [ BENCH ] 16x8 138.8 ms ( ±0.3 ms ) [ BENCH ] 8x16 121.5 ms ( ±0.4 ms ) [ BENCH ] 16x16 110.2 ms ( ±0.5 ms ) [ BENCH ] 32x16 104.8 ms ( ±0.1 ms ) [ BENCH ] 16x32 32.7 ms ( ±0.1 ms ) [ BENCH ] 32x32 30.0 ms ( ±0.0 ms ) [ BENCH ] 64x32 28.7 ms ( ±0.0 ms ) [ BENCH ] 32x64 20.1 ms ( ±0.0 ms ) [ BENCH ] 64x64 19.3 ms ( ±0.0 ms ) [ RUN ] VSX/VP9SubtractBlockTest.Speed/0 [ BENCH ] 4x4 155.3 ms ( ±0.9 ms ) [ BENCH ] 8x4 99.3 ms ( ±0.4 ms ) [ BENCH ] 4x8 77.2 ms ( ±0.1 ms ) [ BENCH ] 8x8 45.7 ms ( ±0.0 ms ) [ BENCH ] 16x8 34.1 ms ( ±0.0 ms ) [ BENCH ] 8x16 29.5 ms ( ±0.0 ms ) [ BENCH ] 16x16 19.9 ms ( ±0.0 ms ) [ BENCH ] 32x16 15.1 ms ( ±0.0 ms ) [ BENCH ] 16x32 16.7 ms ( ±0.0 ms ) [ BENCH ] 32x32 14.1 ms ( ±0.0 ms ) [ BENCH ] 64x32 12.6 ms ( ±0.0 ms ) [ BENCH ] 32x64 12.0 ms ( ±0.0 ms ) [ BENCH ] 64x64 11.2 ms ( ±0.0 ms ) Change-Id: I89ce12b6475871dc9e8fde84d0b6fe5c420c28c7
2018-05-29VSX version of vpx_mbpost_proc_downLuc Trudeau
Low bit depth version only. Passes the VpxMbPostProcDownTest. VpxMbPostProcDownTest Speed Test (POWER8 Model 2.1) Full calculations: C time = 195.4 ms, VSX time = 33.7 ms (5.8x) Change-Id: If1aca7c135de036a1ab7923c0d1e6733bfe27ef7
2018-05-09VSX version of vpx_quantize_b_vsxLuc Trudeau
Low bit depth version only. Passes the VP9QuantizeTest. Change-Id: I6546f872864bd404a7e353348b0554aab1de5bf0
2018-05-07Add vpx_sum_squares_2d_i16_neon()Linfeng Zhang
Perf shows CPU time of this function dropped from 0.81% to 0.15%. Change-Id: I8a7649ca5c15af2fc65cfb848f5befa0cc5e64f2
2018-03-28vp9: [loongson] optimize vpx_convolve8 with mmi.gxw
1. vpx_convolve8_vert_mmi 2. vpx_convolve8_horiz_mmi 3. vpx_convolve8_mmi 4. vpx_convolve8_avg_mmi 5. vpx_convolve8_avg_vert_mmi Change-Id: I41a6b3b4f327d6b67d282e0163cfa0aee8648abe
2018-02-05Add vp9_highbd_iht4x4_16_add_neon()Linfeng Zhang
BUG=webm:1403 Change-Id: Id9833e985fb70958cf4bde38f8e6303ed83c12f9
2017-11-27quantize x86: dedup some partsJohann
Change-Id: I9f95f47bc7ecbb7980f21cbc3a91f699624141af
2017-11-03Support building AVX-512 and implement sadx4 for AVX-512Kyle Siefring
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X. Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
2017-10-19Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"Scott LaVarnway
2017-10-18vpx: [x86] add vpx_hadamard_16x16_avx2()Scott LaVarnway
This version is ~1.91x faster than the sse2 version. When highbitdepth is enabled, it is ~1.74x. Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
2017-10-17Refactor x86/vpx_subpixel_8t_intrin_avx2.cKyle Siefring
Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305
2017-10-03Refactor x86/vpx_subpixel_8t_intrin_ssse3.cLinfeng Zhang
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03Add vpx_dsp/x86/mem_sse2.hLinfeng Zhang
Add some load and store sse2 inline functions. Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222
2017-09-26Add vpx_scaled_2d_neon()Linfeng Zhang
BUG=webm:1419 Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
2017-09-12Revert "Revert "quantize avx: copy 32x32 implementation""Johann
This reverts commit 8c42237bb200253931c49e2c530838f3a877dd65. Because ssse3 code is used for the reference, the qcoeff and dqcoeff reference buffers must be aligned. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
2017-09-11vpxdsp: [x86] add highbd_d207_predictor functionsScott LaVarnway
C vs SSE2 speed gains: _4x4 : ~2.31x C vs SSSE3 speed gains: _8x8 : ~4.73x _16x16 : ~10.88x _32x32 : ~4.80x BUG=webm:1411 Change-Id: I0bac29db261079181ddabc6814bd62c463109caf
2017-09-08Merge "vpxdsp: [loongson] optimize sad functions with mmi"Shiyou Yin
2017-09-06Refactor convolve8 NEON functionsLinfeng Zhang
Change-Id: I4ac576875c91fee7cb150d298fae4a2c156d374c
2017-09-02vpxdsp: [loongson] optimize sad functions with mmiShiyou Yin
1. vpx_sadWxH_c 2. vpx_sadWxH_avg_c 3. vpx_sadWxHx3_c 4. vpx_sadWxHx8_c 5. vpx_sadWxHx4d_c Change-Id: Ie13161e3d73a052ea6ea7bac9cfadf55598fea7a
2017-08-28vpxdsp: [x86] add highbd_h_predictor functionsScott LaVarnway
C vs SSE2 speed gains: _4x4 : ~8.12x _8x8 : ~9.71x _16x16 : ~8.21x _32x32 : ~5.0x BUG=webm:1422 Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993
2017-08-25Revert "quantize avx: copy 32x32 implementation"Marco Paniconi
This reverts commit f60d1dcd3de46f72bafc5eeef481bd1a4e203301. Reason for revert: <INSERT REASONING HERE> Failures in AVX/VP9QuantizeTest in nightly tests. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org Change-Id: Ibd38636212269328317dd0721be9d25452113d1c No-Presubmit: true No-Tree-Checks: true No-Try: true
2017-08-24quantize avx: copy 32x32 implementationJohann
Ensure avx and ssse3 stay in sync by testing them against each other. Change-Id: I699f3b48785c83260825402d7826231f475f697c
2017-08-24quantize ssse3: copy implementation to intrinsicsJohann
Still does not pass tests. Does match the previous assembly, although saving the sign before multiplying is dubious. Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
2017-08-24Merge "vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with ↵Shiyou Yin
mmi."
2017-08-23quantize avx: copy implementation to intrinsicsJohann
Adds an early exit based on ptest. Slightly slower than ssse3 in the full case because of the extra check, but potentially faster if lots of rows can be skipped. Very close in speed to the assembly. Can run in 32 bit, unlike the assembly. Allows reworking the function prototype to use structs. Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
2017-08-23vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi.Shiyou Yin
Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2
2017-08-22Merge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) ↵Shiyou Yin
with mmi."
2017-08-18vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi.Shiyou Yin
Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e
2017-08-14Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}Linfeng Zhang
BUG=webm:1412 Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f
2017-08-10Merge "quantize: copy ssse3 optimizations to intrinsics"Johann Koenig
2017-08-08quantize: copy ssse3 optimizations to intrinsicsJohann
Fairly minor differences from sse2. pabsw and psignw are the big gains. Also re-uses some values in eob calculation to avoid an extra pcmp. Fixes test failures in HBD and OS X builds. Allows using it in 32bit builds, where it is about 40% faster than sse2. Substantially faster than the assembly for skip_block. 10-20% faster the rest of the time. Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
2017-08-04Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1Linfeng Zhang
BUG=webm:1412 Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca
2017-08-04vpx_dsp: merge avx2 variance filesScott LaVarnway
BUG=webm:1404 Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4
2017-07-31neon: vpx_quantize_bJohann
With skip block or coeff < zbin it is about twice as fast as C. If most coeff values are > zbin it is about 10-15x as fast as C. BUG=webm:1426 Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
2017-07-10neon: consolidate horizontal addsJohann
Change-Id: Iaf9e88ff636ccf8f0ef310869c6827f3f205cca8
2017-07-01ppc: Add vpx_idct4x4_16_add_vsxAlexandra Hájková
Change-Id: Id2673eece32027fb245919c7a5c81994a4a19fd8
2017-06-30Merge changes I5d038b4f,I9d00d1dd,I0722841d,I1f640db7Linfeng Zhang
* changes: Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1 sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4() Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
2017-06-29Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1Linfeng Zhang
BUG=webm:1412 Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591
2017-06-29Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.hLinfeng Zhang
Also clean highbd_inv_txfm_sse2.h BUG=webm:1412 Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b
2017-06-29Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.hLinfeng Zhang
BUG=webm:1412 Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d
2017-06-28partial fdct neon: move 8x8_1 and enable hbd testsJohann
The function was originally written with HBD in mind. Enable it and configure the tests. BUG=webm:1424 Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
2017-06-23Add vpx_highbd_idct4x4_16_add_sse4_1()Linfeng Zhang
BUG=webm:1412 Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a
2017-06-22fdct32x32 neon implementationJohann
Almost 3x faster in constrained loop testing. Over 10x faster in HBD builds. BUG=webm:1424 Change-Id: I2b7f8453e1d4ada63cde729d8115d684c4a71ff9
2017-06-08Merge "Merge skin detection code in vp8/9."Jerome Jiang
2017-06-08Merge "fdct16x16 neon optimization"Johann Koenig
2017-06-07Merge skin detection code in vp8/9.Jerome Jiang
BUG=webm:1438 Change-Id: Ie3dc034c7dbb498a0b088a767b1936ddeed4df14
2017-06-07fdct16x16 neon optimizationJohann
Roughly 2x speedup. Since the only change for HBD is to store(), the improvement appears to hold there as well. BUG=webm:1424 Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
2017-05-30comp_avg_pred neon: used by sub pixel avg varianceJohann
BUG=webm:1423 Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804