summaryrefslogtreecommitdiff
path: root/vpx_dsp/x86/convolve.h
AgeCommit message (Collapse)Author
2020-02-14move common attribute defs to compiler_attributes.hJames Zern
BUG=b/148271109 Change-Id: I620e26ff1233fcd34ebe0723cb913e82eb58271c
2019-11-23Disable -ftrivial-auto-var-init= for hot codeVitaly Buka
This helps to improve some benchmarks by 10%, e.g. decode_time PCFullStackTest.VP9SVC_3SL_Low Bug: 1020220, 977230 Change-Id: Ic992f1eec369f46a08e19eb33bc3a7c15c1e7c87
2019-01-15Remove unnecessary calculation in 4-tap interpolation filterchiyotsai
Reduces the number of rows calculated for 2D 4-tap interpolation filter from h+7 rows to h+3 rows. Also fixes a bug in the avx2 function for 4-tap filters where the last row is computed incorrectly. Performance: | Baseline | Result | Pct Gain | bitdepth lo| 4.00 fps | 4.02 fps | 0.5% | bitdepth 10| 1.90 fps | 1.91 fps | 0.5% | The performance is evaluated on speed 1 on jets.y4m br 500 over 100 frames. No BDBR loss is observed. Change-Id: I90b0d4d697319b7bba599f03c5dc01abd85d13b1
2018-11-01clang-tidy: fix vpx_dsp parametersJohann
BUG=webm:1444 Change-Id: Iee19be068afc6c81396c79218a89c469d2e66207
2018-10-29Add AVX2 support for hbd 4-tap interpolation filter.chiyotsai
Speed gain: BIT DEPTH | 8TAP FPS | 4TAP FPS | PCT INC | 10 | 1.69 | 1.85 | 9.46% | 12 | 1.64 | 1.78 | 8.54% | Speed test is done on jet.y4m on speed 1 profile 2 over 100 frame with br=500. Change-Id: I411e122553e2c466be7a26e64b4dd144efb884a9
2018-10-17Adds SSE2 support for interpolation filter for width 4 and 8chiyotsai
Performance: The chart below shows the speed relative to baseline (baseline_time/new_time) _____| 4X4 | 8X8 |16X16|64X64| 2 DIM|1.889|1.780|1.811|1.963| HORZ|2.266|1.834|1.617|1.595| VERI|2.043|2.190|2.373|2.485| Change-Id: Ic4262222db78f013b94a8c61b46efb8520722927
2018-10-17Add SSE2 support for 4-tap interpolation filter for width 16.chiyotsai
Horizontal filter on 64x64 block: 1.59 times as fast as baseline. Vertical filter on 64x64 block: 2.5 times as fast as baseline. 2D filter on 64x64 block: 1.96 times as fast as baseline. Change-Id: I12e46679f3108616d5b3475319dd38b514c6cb3c
2018-09-15cosmetics: normalize include guardsJames Zern
use the recommended format [1] of: <PROJECT>_<PATH>_<FILE>_H_ [1] https://google.github.io/styleguide/cppguide.html#The__define_Guard "All header files should have #define guards to prevent multiple inclusion. The format of the symbol name should be <PROJECT>_<PATH>_<FILE>_H_." Change-Id: I2e8ab0b32fb23c30fa43cff5fec12d043c0d2037
2017-09-05Remove get_filter_base() and get_filter_offset() in convolveLinfeng Zhang
so that the convolve functions are independent of table alignment. Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
2017-04-25Update highbd convolve functions arguments to use uint16_t src/dstLinfeng Zhang
BUG=webm:1388 Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-19Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolveLinfeng Zhang
Replace by CAST_TO_BYTEPTR/SHORTPTR. The rule is: if a short ptr is casted to a byte ptr, any offset operation on the byte ptr must be doubled. We do this by casting to short ptr first, adding offset, then casting back to byte ptr. BUG=webm:1388 Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2016-09-27Add compiler warning flag -Wextra and fix related warnings.Urvang Joshi
Note: some of these warnings are enabled by a combination of -Wunused (added earlier) and -Wextra. Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16 Expands use of (void)x; on unused variables. AOM only supports one codec in codec_factory.h Does not include changes to HandleDecodeResult. AOM removed invalid_file_test.cc which does use the video parameter. Does not enable -Wextra yet. There are more issues to fix. BUG=webm:1069 Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-07-25vpx_dsp: apply clang-formatclang-format
Change-Id: I3ea3e77364879928bd916f2b0a7838073ade5975
2016-02-25x86/convolve.h: remove redundant check in FUN_CONV_2DJames Zern
the filter will be the same in this case Change-Id: I95159bcb05bbfb71b57da741393e80cc7ffc5cff
2016-02-25x86/convolve.h: replace while w/if for w < 16James Zern
in non-hbd configurations; any high-bitdepth changes will be done in a follow-up Change-Id: Ia74e30971b744c1faab68c92fdeda1a053988c77
2016-02-24x86/convolve.h: change filter[] || chains to |James Zern
Change-Id: I661f64390f232826857b259e7a67e77f5a3a91ad
2015-08-10VPX: remove step == 16 and filter[3] != 128 checksScott LaVarnway
from FUN_CONV_1D and FUN_CONV_2D macros. The functions will not be called with these inputs. Change-Id: I67ec75e4edafc0acee70190521a80ea85dfa521b
2015-08-05VPX: remove scaled calls from FUN_CONV_1DScott LaVarnway
and FUN_CONV_2D macros. The predict lut now handles this case. The encoder now calls vpx_scaled_2d() instead of vpx_convolve8() for scaling. Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
2015-08-03VPX: Add rtcd support for scaling.Scott LaVarnway
Change-Id: If34bfb0d918967445aea7dc30cd7b55ebfedb1f2
2015-07-31Code refactor on InterpKernelZoe Liu
It in essence refactors the code for both the interpolation filtering and the convolution. This change includes the moving of all the files as well as the changing of the code from vp9_ prefix to vpx_ prefix accordingly, for underneath architectures: (1) x86; (2) arm/neon; and (3) mips/msa. The work on mips/drsp2 will be done in a separate change list. Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46