summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2016-10-31idct,NEON: add a tran_low_t->s16 load adapterJames Zern
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the expansion can be avoided at all in this case remains a TODO. roughly matches sse2. BUG=webm:1294 Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-27Refine 8-bit intra prediction NEON optimization (mode d45 and d135)Linfeng Zhang
dst += stride behaving better with gcc/clang. Unroll loops. Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1
2016-10-26Merge "Refine 8-bit intra prediction NEON optimization (mode dc)"Linfeng Zhang
2016-10-25Optimize idct32x32_34_add for NEONJohann
Approximately 3 times faster than the 1024 version which was used previously. BUG=webm:1295 Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-24Refine 8-bit intra prediction NEON optimization (mode dc)Linfeng Zhang
dst += stride behaving better with gcc/clang Expanding inline function dc_SIZExSIZE() save intructions for vpx_dc_predictor_SIZExSIZE_neon(). Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
2016-10-22Merge "remove idct32x32*_add_neon.asm"James Zern
2016-10-22Merge "vpx_highbd_convolve_copy_neon: use multi reg loads"James Zern
2016-10-20remove idct32x32*_add_neon.asmJames Zern
the intrinsics are neutral to ~20% faster on cros/android devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the r13 ndk. neutral results typically came with gcc-4.9 while larger positive gains were achieved with clang 3.8.x. BUG=webm:1303 Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
2016-10-18Merge "Fix warnings reported by -Wshadow: Part1: vpx_dsp directory"James Zern
2016-10-18Merge "Optimize sad_64width_x4d_msa function"Kaustubh Raste
2016-10-18Optimize sad_64width_x4d_msa functionKaustubh Raste
Reduced HADD_UH_U32 macro calls Change-Id: Ie089b9a443de516646b46e8f72156aa826ca8cfa
2016-10-17Fix warnings reported by -Wshadow: Part1: vpx_dsp directoryUrvang Joshi
While we are at it: - Rename some variables to more meaningful names - Reuse some common consts from a header instead of redefining them. Change-Id: I75c4248cb75aa54c52111686f139b096dc119328 (cherry picked from aomedia 09eea21)
2016-10-17vpx_highbd_convolve_copy_neon: use multi reg loadsJames Zern
for copy16/32/64 BUG=webm:1299 Change-Id: I5080d736bde7e487c80ef3d7024dda1e96a57eaf
2016-10-17add vpx high bitdepth convolve8 NEON intrinsics optimizationLinfeng Zhang
BUG=webm:1299 Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-13add vpx_highbd_convolve_{copy,avg}_neon()Linfeng Zhang
BUG=webm:1299 Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-13Merge "cosmetics,*loopfilter_neon.c: s/tranpose/transpose/"James Zern
2016-10-13Merge "Optimize vpx_mbpost_proc_across_ip_msa function"Kaustubh Raste
2016-10-13Merge "Optimize vpx_get4x4sse_cs_msa function"Kaustubh Raste
2016-10-12cosmetics,*loopfilter_neon.c: s/tranpose/transpose/James Zern
Change-Id: I267d6a9d715ddb6110f0881c2e820c37fc673fe1
2016-10-11[vpx highbd lpf NEON 6/6] vertical 16Linfeng Zhang
BUG=webm:1300 Change-Id: I29d0b482d66f05e278325ddebcf108fbf0b6e222
2016-10-11[vpx highbd lpf NEON 5/6] horizontal 16Linfeng Zhang
BUG=webm:1300 Change-Id: I21da32d6cfb8a1a6f58bc9756d17f48f13a59a12
2016-10-11[vpx highbd lpf NEON 4/6] vertical 8Linfeng Zhang
BUG=webm:1300 Change-Id: If06b12bc081bab60059b100414dd7018f83ac62d
2016-10-12[vpx highbd lpf NEON 3/6] horizontal 8Linfeng Zhang
BUG=webm:1300 Change-Id: Ica2379e294be60b7f80fcfcec110dca4c3b59d81
2016-10-10Merge "[vpx highbd lpf NEON 2/6] vertical 4"Linfeng Zhang
2016-10-10Merge "[vpx highbd lpf NEON 1/6] horizontal 4"Linfeng Zhang
2016-10-10Optimize vpx_mbpost_proc_across_ip_msa functionKaustubh Raste
Removed HADD_SW_S32 calculation Change-Id: I7384dc881451d197404d09beb7c27b222e1d6875
2016-10-10Optimize vpx_get4x4sse_cs_msa functionKaustubh Raste
Reuse CALC_MSE_B macro Change-Id: I39f0a92ac2dbb5fa8628df1a5d556cfdc42a3648
2016-10-07Optimize vp9 loopfilter msa functionsKaustubh Raste
Updated code to process in 8bit as saturation/clipping takes care of overflow Removed unused macro Change-Id: I113df60286fb28b216df800d95b2d3695ef71440
2016-10-06[vpx highbd lpf NEON 2/6] vertical 4Linfeng Zhang
BUG=webm:1300 Change-Id: Ia33a9f2d6c7e2e6b3497ad6f1a09439a85b33983
2016-10-06[vpx highbd lpf NEON 1/6] horizontal 4Linfeng Zhang
BUG=webm:1300 Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40
2016-10-05vpx_dsp/idct*_neon.asm: simplify immediate loadsJames Zern
mov supports 0-65535 Change-Id: I019de0d784836d7bd60e6b36f2cdeefb541cb3fd
2016-10-05enable idct*_1_add_neon in high-bitdepth buildsJames Zern
these are compatible as they only load one element of the input so the larger size of tran_low_t makes no difference in little endian builds. note the asm is incompatible with big-endian, but there are other points of failure there so currently it's considered unsupported. BUG=webm:1294 Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028
2016-10-04Merge "Move highbd txfm input range check from 2d iht transform to 1d ↵Angie Chiang
idct/iadst"
2016-10-04Merge "Fix vpx_plane_add_noise_msa functionality bit-mismatch"Kaustubh Raste
2016-10-03Move highbd txfm input range check from 2d iht transform to 1d idct/iadstAngie Chiang
This change will make the highbd txfm input range check more comprehensive The 25-bit highbd input range is composed by 12 signal input bits + 7 bits for 2D forward transform amplification + 5 bits for 1D inverse transform amplification + 1 bit for contingency in rounding and quantizing BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286 BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=651625 Change-Id: I04c0796edd7653f8d463fba5dc418132986131e7
2016-10-03Merge "cosmetics,*_neon.c: rm redundant return from void fns"James Zern
2016-10-03Fix vpx_plane_add_noise_msa functionality bit-mismatchKaustubh Raste
Change-Id: I04961afb592ae6a67fdcfd8c9066e920dd4b30e7
2016-10-01Merge "vpx_convolve8_neon,load/store*: correct param type"James Zern
2016-10-01vpx_convolve8_neon,load/store*: correct param typeJames Zern
stride/pitch in convolve is expressed with a ptrdiff_t Change-Id: Ia5a6732dc509f06ccf7035386fa8ae721b4b1a71
2016-10-01Remove a stray END declaration in loopfilter_4_neon.asmMartin Storsjo
Change-Id: Ic8c359a5677f9c663787aac74f530e886163bc69
2016-10-01Merge "Refactor vpx lpf NEON files (step 2/2)"Linfeng Zhang
2016-10-01Merge "Refactor vpx lpf NEON files (step 1/2)"Linfeng Zhang
2016-09-30cosmetics,*_neon.c: rm redundant return from void fnsJames Zern
+ a couple of 'break's after a return Change-Id: Ia21f12ebcef98244feb923c17b689fc8115da015
2016-09-30Merge changes from topic '8bit-hbd-idct'James Zern
* changes: *idct*_neon.c: add missing rtcd include idct,msa/neon: exclude idct files from hbd build *rtcd_defs.pl: remove empty specialize calls
2016-09-30*idct*_neon.c: add missing rtcd includeJames Zern
+ correct declarations as necessary BUG=webm:1294 Change-Id: I719602df9a56e79188a78e7f8b31257c6d3cc11d
2016-09-30idct,msa/neon: exclude idct files from hbd buildJames Zern
these functions are incompatible currently and unreferenced in rtcd, exclude them from the build. BUG=webm:1294 Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896
2016-09-30Refactor vpx lpf NEON files (step 2/2)Linfeng Zhang
Change-Id: I0744407cd3361ff752bd7f6e654b70ab6b41a58f
2016-09-30Refactor vpx lpf NEON files (step 1/2)Linfeng Zhang
Change-Id: I4016d096d46ca691f3b17199b259b7231e983cfb
2016-09-30Merge "Unify loopfilter function names"Linfeng Zhang
2016-09-30Merge "Refine vpx convolve8 NEON intrinsics optimization"Linfeng Zhang