summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2015-10-14Upstream Mozilla fix for older Apple clang buildsJohann
Also use the _mm_broadcastsi128_si256 intrisic for Apple clang versions 4.[012] https://bugzilla.mozilla.org/show_bug.cgi?id=1085607 https://code.google.com/p/webm/issues/detail?id=1082 Change-Id: I6bc821d8163387194ef663e94bfed91fa7281d88
2015-10-13Fix compiler warningshui su
Change-Id: I761256a8100d83abf1b937f3739580237e3fad2a
2015-10-09Add vpx_highbd_convolve_{copy,avg}_sse2Alex Converse
single-threaded: swanky (silvermont): ~1% faster overall peppy (celeron,haswell): ~1.5% faster overall Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073
2015-10-09Remove 4 mova insts from quantize_ssse3_x86_64.asmGeza Lore
Change-Id: If3cb9345b44162e600e6c74873e0cb4c207fc7fb
2015-10-06SSSE3 optimisation for quantize in high bit depthJulia Robson
When configured with high bit detpth enabled, the 8bit quantize function stopped using optimised code. This made 8bit content decode slowly. This commit re-enables the SSSE3 optimisations. Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5
2015-10-06Merge "VPX: refactor vpx_idct32x32_1_add_sse2()"Scott LaVarnway
2015-10-05SSE2 optimisation for quantize in high bit depthJulia Robson
When configured with high bit detpth enabled, the 8bit quantize function stopped using optimised code. This made 8bit content decode slowly. This commit re-enables the SSE2 optimisation (but not the SSSE3 optimisation). Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9
2015-10-05VPX: refactor vpx_idct32x32_1_add_sse2()Scott LaVarnway
Change-Id: Ia1a2cac0e9dc05f3207b3433a6c1589fa7f2aee3
2015-10-02Merge "vp10: reimplement d45/4x4 to match vp8 instead of vp9."Ronald S. Bultje
2015-10-02Merge "Accelerated transform in high bit depth"Debargha Mukherjee
2015-10-01vp10: reimplement d45/4x4 to match vp8 instead of vp9.Ronald S. Bultje
This is more a proof of concept than anything else. The problem here isn't so much how to code it, but rather where to place the resulting code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10 specific intra pred functions to live there, or in vp10/? See issue 1015. Change-Id: I675f7badcc8e18fd99a9553910ecf3ddf81f0a05
2015-09-30vp8: change build_intra4x4_predictors() to use vpx_dsp.Ronald S. Bultje
I've added a few new functions (d45e, d63e, he, ve) to cover the filtered h/v 4x4 predictors that are vp8-specific, the "correct" d45 with the correctly filtered bottom-right pixel (as opposed to the unfiltered version in vp9), and the "broken" d63 with weirdly filtered bottom-right pixels (which is correctly filtered in vp9). There may be a minor performance impact on all systems because we have to do an extra copy of the Above pixel array to incorporate the topleft pixel in the same array (thus fitting the vpx_dsp API). In addition, armv6 will have a more serious performance impact b/c I removed the armv6/vp8-specific assembly. I'm not sure anyone cares... Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
2015-09-30vp8: change build_intra_predictors_mby_s to use vpx_dsp.Ronald S. Bultje
Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2
2015-09-28Accelerated transform in high bit depthJulia Robson
When configured with high bitdepth enabled, the 8bit transform stopped using optimised code. This made 8bit content decode slowly. Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
2015-09-18Remove vpx_filter_block1d16_v8_intrin_ssse3Johann
This was rewritten and moved to vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm in 195883023bb39b5ee5c6811a316ab96d9225034d Change-Id: I117ce983dae12006e302679ba7f175573dd9e874
2015-09-17vpx_subpixel_8t_ssse3: fix reg counts/accessJames Zern
fixes build on windows x64; previously 'heightq' i.e., the 64-bit register was accessed when only the 32-bit value was needed. given this is from a stack variable the upper bits were undefined. + bump register/xmm counts; users of SETUP_LOCAL_VARS touch xmm13 in 64-bit builds and filter_block1d16_v* uses one extra temp variable Change-Id: I9c768c0b2047481d1d3b11c2e16b2f8de6eb0d80
2015-09-16vp10: code sign bit before absolute value in non-arithcoded header.Ronald S. Bultje
For reading, this makes the operation branchless, although it still requires two shifts. For writing, this makes the operation as fast as writing an unsigned value, branchlessly. This is also how other codecs typically code signed, non-arithcoded bitstream elements. See issue 1039. Change-Id: I6a8182cc88a16842fb431688c38f6b52d7f24ead
2015-09-08Remove some trailing whitespacesDebargha Mukherjee
Change-Id: Icf06d35ca347713253d1eba341a894b51efa81a9
2015-09-03VPX: subpixel_8t_ssse3 asm using x86incScott LaVarnway
This is based on the original patch optimized for 32bit platforms by Tamar/Ilya and now uses the x86inc style asm. The assembly was also modified to support 64bit platforms. Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
2015-08-31Include vpx_dsp_common.h when using VPXMIN/MAXJohann
Change-Id: I2e387a06484a06301f3cd6600c4ba2f4335b61ee
2015-08-28Expand the idct4_c() function in idct8_c()Angie Chiang
Change-Id: I5afa3c351ba7c5e7deb3889f7471619ac60af255
2015-08-27Merge changes I53b5bdc5,Ib81168a7,Ie0113945Johann Koenig
* changes: Only build ssse3 filter functions on 64 bit Clean up unused function warnings in vp8 encoder Clean up unused function warnings in vp8 onyx_if.c
2015-08-27Merge "Add sse2 versions of halfpix variance"Johann Koenig
2015-08-27Add sse2 versions of halfpix varianceJohann
These were lost in the great sub pixel variance move of 6a82f0d7fb9ee908c389e8d55444bbaed3d54e9c Not having these functions caused a ~10% performance regression in some realtime vp8 encodes. Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
2015-08-26vpx_dsp_common: add VPX prefix to MIN/MAXJames Zern
prevents redeclaration warnings; vp8 has its own define which will be resolved in a future commit Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c
2015-08-26Only build ssse3 filter functions on 64 bitJohann
Avoid an unused function warning by only building the functions when they will be used. Change-Id: I53b5bdc5a180c79d63b34e4c8921d679bbc54009
2015-08-21Merge "VPX: scaled convolve : fix windows build errors"Scott LaVarnway
2015-08-20VPX: scaled convolve : fix windows build errorsScott LaVarnway
Change-Id: Ic81d435ea928183197040cdf64b6afd7dbaf57e4
2015-08-19Merge "VPX ssse3 scaled convolve"Scott LaVarnway
2015-08-19Merge "Rename inv_txfm_sse2.asm to inv_wht_sse2.asm"Jingning Han
2015-08-19Rename inv_txfm_sse2.asm to inv_wht_sse2.asmJingning Han
Change-Id: I43bcc70680503e4c18d8f021097307778cf9ea70
2015-08-18VPX ssse3 scaled convolveScott LaVarnway
Change-Id: I71d5994e21813554a927d35ebcc26bf7a68984fd
2015-08-17Turn on dspr2 loop filter functions in vpx_dspJingning Han
Add the dspr2 files to vpx_dsp.mk and enable these functions in vpx_dsp_rtcd_defs.pl file. Change-Id: I79feb5af24f174f4a0788dc6f3b6df7f4e1fa467
2015-08-14Merge changes I2fe52bfb,I5e5084ebJames Zern
* changes: VPX: removed filter == 128 checks from mips convolve code VPX: removed step checks from mips convolve code
2015-08-14Merge "VPX: removed step checks from neon convolve code"James Zern
2015-08-14vpx_highbd_ssim_parms_8x8: make parameter types consistentYaowu Xu
Change-Id: Ie1fe6603232adc22dbe4d51bd1008c856a6d40ca
2015-08-13VPX: removed filter == 128 checks from mips convolve codeScott LaVarnway
The check is handled by the predictor table. Change-Id: I2fe52bfbbfccb2edd13ba250986e3a4b4b589459
2015-08-13VPX: removed step checks from mips convolve codeScott LaVarnway
The check is handled by the predictor table. Change-Id: I5e5084ebb46be8087c8c9d80b5f76e919a1cd05b
2015-08-12VPX: removed step checks from neon convolve codeScott LaVarnway
The check is handled by the predictor table. Change-Id: I42479f843e77a2d40cdcdfc9e2e6c48a05a36561
2015-08-12Merge "VPX: remove step == 16 and filter[3] != 128 checks"Scott LaVarnway
2015-08-12Merge "fix build w/only mmx+sse enabled"James Zern
2015-08-11Fork VP9 and VP10 codebaseJingning Han
This commit folks the VP9 and VP10 codebase and makes libvpx support VP8, VP9, and VP10. Change-Id: I81782e0b809acb3c9844bee8c8ec8f4d5e8fa356
2015-08-11fix build w/only mmx+sse enabledJames Zern
many _sse2.asm have sse implementations as well Change-Id: Idfa1f5cab593e4913aaad37f7223e8430188c44a
2015-08-11Merge "VPX: remove scaled calls from FUN_CONV_1D"Scott LaVarnway
2015-08-11Merge "VPX: Add rtcd support for scaling."Scott LaVarnway
2015-08-11Merge "Move vp9_systemdependent.h to vpx_ports bitops.h and system_state.h"Aℓex Converse
2015-08-10Move vp9_systemdependent.h to vpx_ports bitops.h and system_state.hAlex Converse
Use system_state.h in vpx_dsp and remove unneeded includes of vp9_systemdependent.h. Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4
2015-08-10Merge changes from topic 'x86inc'James Zern
* changes: Only use .text sections for aout Use newer x86inc.asm Use .text instead of .rodata on macho Copy PIC handling code from x86_abi_support Set 'private_extern' visibility for macho targets Avoid 'amdnop' when building with nasm Catch all elf formats Expand PIC default to macho64 and respect CONFIG_PIC from libvpx Use libvpx defines to set name mangling rules Customize x86inc.asm for libvpx
2015-08-10VPX: remove step == 16 and filter[3] != 128 checksScott LaVarnway
from FUN_CONV_1D and FUN_CONV_2D macros. The functions will not be called with these inputs. Change-Id: I67ec75e4edafc0acee70190521a80ea85dfa521b
2015-08-10fastssim: Add some missing constsAlex Converse
Change-Id: Id36f180032c8a92c686da6f716a7468332b23b94