summaryrefslogtreecommitdiff
path: root/vpx_dsp/vpx_dsp.mk
AgeCommit message (Collapse)Author
2016-11-08Optimize idct32x32_135_add for NEONJohann
BUG=webm:1295 Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08Merge "Add high bitdepth intra prediction NEON optimization (mode dc)"Linfeng Zhang
2016-11-04Extract high bit depth helper functionsJohann
These can be used in the vp9 fdct as well. Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e
2016-11-01Add high bitdepth intra prediction NEON optimization (mode dc)Linfeng Zhang
BUG=webm:1316 Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-10-31idct,NEON: add a tran_low_t->s16 load adapterJames Zern
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the expansion can be avoided at all in this case remains a TODO. roughly matches sse2. BUG=webm:1294 Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-25Optimize idct32x32_34_add for NEONJohann
Approximately 3 times faster than the 1024 version which was used previously. BUG=webm:1295 Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-20remove idct32x32*_add_neon.asmJames Zern
the intrinsics are neutral to ~20% faster on cros/android devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the r13 ndk. neutral results typically came with gcc-4.9 while larger positive gains were achieved with clang 3.8.x. BUG=webm:1303 Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
2016-10-17add vpx high bitdepth convolve8 NEON intrinsics optimizationLinfeng Zhang
BUG=webm:1299 Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-13add vpx_highbd_convolve_{copy,avg}_neon()Linfeng Zhang
BUG=webm:1299 Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-06[vpx highbd lpf NEON 1/6] horizontal 4Linfeng Zhang
BUG=webm:1300 Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40
2016-10-05enable idct*_1_add_neon in high-bitdepth buildsJames Zern
these are compatible as they only load one element of the input so the larger size of tran_low_t makes no difference in little endian builds. note the asm is incompatible with big-endian, but there are other points of failure there so currently it's considered unsupported. BUG=webm:1294 Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028
2016-10-01Merge "Refactor vpx lpf NEON files (step 2/2)"Linfeng Zhang
2016-10-01Merge "Refactor vpx lpf NEON files (step 1/2)"Linfeng Zhang
2016-09-30Merge changes from topic '8bit-hbd-idct'James Zern
* changes: *idct*_neon.c: add missing rtcd include idct,msa/neon: exclude idct files from hbd build *rtcd_defs.pl: remove empty specialize calls
2016-09-30idct,msa/neon: exclude idct files from hbd buildJames Zern
these functions are incompatible currently and unreferenced in rtcd, exclude them from the build. BUG=webm:1294 Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896
2016-09-30Refactor vpx lpf NEON files (step 2/2)Linfeng Zhang
Change-Id: I0744407cd3361ff752bd7f6e654b70ab6b41a58f
2016-09-30Refactor vpx lpf NEON files (step 1/2)Linfeng Zhang
Change-Id: I4016d096d46ca691f3b17199b259b7231e983cfb
2016-09-29Refine vpx convolve8 NEON intrinsics optimizationLinfeng Zhang
BUG=webm:1290 Change-Id: I5d7fce62270f9d76ef9ce98b3d188ad11fb21873
2016-09-19Refactor lpf (size 4 and 8) NEON intrinsics optimizationLinfeng Zhang
Also check in 8x8 8-bit transpose NEON intrinsics optimization transpose_u8_8x8() Change-Id: I32d321cf97ea21eab158ac4896990fc9a51681c4
2016-08-23Remove halfpix specializationJohann
This function only exists as a shortcut to subpixel variance with predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical and both for "hv" Removing this allows the existing optimizations for the variance functions to be called. Instead of having only sse2 optimizations, this gives sse2, ssse3, msa and neon. BUG=webm:1273 Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
2016-08-12NEON intrinsics for 4 loopfilter functionsLinfeng Zhang
New NEON intrinsics functions: vpx_lpf_horizontal_edge_8_neon() vpx_lpf_horizontal_edge_16_neon() vpx_lpf_vertical_16_neon() vpx_lpf_vertical_16_dual_neon() BUG=webm:1262, webm:1263, webm:1264, webm:1265. Change-Id: I7a2aff2a358b22277429329adec606e08efbc8cb
2016-08-04Merge "Remove armv6 target"Johann Koenig
2016-08-04Remove armv6 targetJohann
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
2016-08-04Extract neon transpose for re-useJohann
Change-Id: I5e1c7f4c80d1c6f7fd582ac468c6eaaa3603a06c
2016-07-13postproc - move filling of noise buffer to vpx_dsp.Jim Bankoski
Change-Id: I63ba35dc0ae9286c9812367a531e01d79a4c1635
2016-07-12deblock filter : moved from vp8 code branchJim Bankoski
The deblocking filters used in vp8 have been moved to vpx_dsp for use by both vp8 and vp9. Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
2016-07-07Merge "Support measure distortion in the pixel domain"Jingning Han
2016-07-06Support measure distortion in the pixel domainJingning Han
Use pixel domain distortion metric in speed 0. This improves the compression performance by 0.3% for both low and high resolution test sets. Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
2016-06-29vpx_dsp: remove x86inc.asm distinctionJohann
BUG=b:29583530 Change-Id: I397d77536b0d3cee0a92cdfe8b76bc4e434d0720
2016-06-24Port metric computation changes from nextgenv2Yaowu Xu
Change-Id: I4aceffcdf7af59ffeb51984f0345c3a4c7e76a9f
2016-06-17remove vp10James Zern
development has moved to the nextgenv2 branch and a snapshot from here was used to seed aomedia BUG=b/29457125 Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
2016-06-02vpx_dsp,add_noise: remove mmx implementationJames Zern
a sse2 version exists, this is a reasonable modern baseline. Change-Id: If31d36c8412d25b53f41b4a93cf02f46802c0c33
2016-06-02vpx_dsp: remove mmx variance implementationsJames Zern
there are sse2 equivalents for all remaining variance implementations Change-Id: I10b947e73fc0067688181f819b59e47966bec3d2
2016-05-27Merge "Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2"Linfeng Zhang
2016-05-26Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2Linfeng Zhang
Followed the code style of other lpf fuctions. These 2 functions put 2 rows of data in a single xmm register, so they have similar but not identical filter operations, and cannot share the same macros. Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
2016-05-18Merge "neon hadamard 8x8"Johann Koenig
2016-05-16neon hadamard 8x8Johann
Runs about 30% faster than the C BUG=webm:1021 Change-Id: I6809d6d84c3077ab619c53298296950e976bdaba
2016-05-11remove mmx sad functionsLinfeng Zhang
there are sse2 equivalents which is a reasonable modern baseline Change-Id: Ibbe536a5ad1c2cccef6bdcc75c13b3dde35a56ba
2016-05-10vpx_dsp: Rename postproc.c add_noise.Jim Bankoski
Change-Id: I4906d1b79a2951e659995202b9fa97e2ea5cfba0
2016-05-02Move vpx_add_plane from codec to vpx_dsp and dedup.Jim Bankoski
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2016-03-08VPX: loopfilter_mmx.asm using x86inc 2Scott LaVarnway
This reverts commit 9aa083d164e0d39086aa0c83f0d1a0d0f0d1ba61. Fixes a decoder mismatch with 32bit PIC builds. Change-Id: I94717df662834810302fe3594b38c53084a4e284
2016-03-04Revert "VPX: loopfilter_mmx.asm using x86inc"James Zern
This reverts commit 15ecdc3970462c15fdf7185d373cb52664f40c0f. breaks 32-bit pic builds Change-Id: I8bb1b9471a293f05ac7423aaba0339d408931b7a
2016-02-18VPX: loopfilter_mmx.asm using x86incScott LaVarnway
Change-Id: Idcf29281d617b275e3ca50f77e6d00c60992a36d
2015-12-14move vp9_avg to vpx_dspJames Zern
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-10-20Optimize vpx_quantize_{b,b_32x32} assembler.Geza Lore
Added optimization of the 8 bit assembly quantizer routines. This makes these functions up to 100% faster, depending on encoding parameters. This patch maskes the encoder faster in both the high bitdepth and 8bit configurations. In the high bitdepth configuration, it effects profile 0 only. Based on my profiling using 1080p input the net gain is between 1-3% for the 8 bit config, and around 2.5-4.5% for the high bitdepth config, depending on target bitrate. The difference between the 8 bit and high bitdepth configurations for the same encoder run is reduced by 1% in all cases I have profiled. Change-Id: I86714a6b7364da20cd468cd784247009663a5140
2015-09-30vp8: change build_intra_predictors_mby_s to use vpx_dsp.Ronald S. Bultje
Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2
2015-09-03VPX: subpixel_8t_ssse3 asm using x86incScott LaVarnway
This is based on the original patch optimized for 32bit platforms by Tamar/Ilya and now uses the x86inc style asm. The assembly was also modified to support 64bit platforms. Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
2015-08-27Add sse2 versions of halfpix varianceJohann
These were lost in the great sub pixel variance move of 6a82f0d7fb9ee908c389e8d55444bbaed3d54e9c Not having these functions caused a ~10% performance regression in some realtime vp8 encodes. Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
2015-08-19Merge "Rename inv_txfm_sse2.asm to inv_wht_sse2.asm"Jingning Han
2015-08-19Rename inv_txfm_sse2.asm to inv_wht_sse2.asmJingning Han
Change-Id: I43bcc70680503e4c18d8f021097307778cf9ea70