summaryrefslogtreecommitdiff
path: root/vp9/common/x86
AgeCommit message (Collapse)Author
2020-04-13simplify x86_abi_support.asm symbol declarationJohann
Define LIBVPX_{ELF,MACHO} to simplify blocks. Create new globalsym macro and include logic for PRIVATE. BUG=webm:1679 Change-Id: I303ba1492a2813f685de51155ccef7e4831e1881
2020-04-01x86_abi_support: use correct hidden syntaxJohann
Chromium needs :function hidden and the space between the symbol and the colon removed, at least for nasm. This matches x86inc.asm. BUG=webm:1679 Change-Id: Ie47bb75d44d3130791639cbf4e2ebe019e2d686e
2018-02-05Update tx_type switch code in idctLinfeng Zhang
Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
2018-02-05Add vp9_highbd_iht4x4_16_add_neon()Linfeng Zhang
BUG=webm:1403 Change-Id: Id9833e985fb70958cf4bde38f8e6303ed83c12f9
2018-01-23Add vp9_highbd_iht16x16_256_add_sse4_1()Linfeng Zhang
BUG=webm:1413 Change-Id: I8d7eeae1bd219eb848c1a86071046a477f7a91af
2018-01-23Add "vpx_" prefix to 2 idct x86 functionsLinfeng Zhang
Change-Id: I4f3052d8748e16b06e9155f8daf22f867dfaa7a3
2018-01-18Add vp9_highbd_iht8x8_64_add_sse4_1()Linfeng Zhang
BUG=webm:1413 Change-Id: Id9038226902b2d793fc6c17ac81bb104c1a18988
2018-01-08Add vp9_highbd_iht4x4_16_add_sse4_1()Linfeng Zhang
BUG=webm:1413 Change-Id: I14930d0af24370a44ab359de5bba5512eef4e29f
2017-12-01explicitly label .text sectionsJohann
nasm should infer .text but does not for windows: https://bugzilla.nasm.us/show_bug.cgi?id=3392451 Change-Id: Ib195465e5f33405f5ff61c4cf88aa2a72640cacb
2017-08-14Update 32x32 idct sse2 and ssse3 optimizations.Linfeng Zhang
Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70
2017-06-26Update load_input_data() in x86Linfeng Zhang
Split to load_input_data4() and load_input_data8(). Use pack with signed saturation instruction for high bitdepth. Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
2017-06-13Convert 8x8 idct x86 macros to inline functionsLinfeng Zhang
Change-Id: Id59865fd6c453a24121ce7160048d67875fc67ce
2017-05-08Update 4x4 idct sse2 functionsLinfeng Zhang
It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2() Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876
2016-08-02vp9/common: apply clang-formatclang-format
Change-Id: Ie0f150fdcfcbf7c4db52d3a08bc8238ed1c72e3b
2016-07-12deblock filter : moved from vp8 code branchJim Bankoski
The deblocking filters used in vp8 have been moved to vpx_dsp for use by both vp8 and vp9. Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
2016-06-27*.asm: normalize label formatJames Zern
add a trailing ':', though it's optional with the tools we support, it's more common to use it to mark a label. this also quiets the orphan-labels warning with nasm/yasm. BUG=b/29583530 Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
2016-05-04vp9_idct_intrin_sse2: add missing vp9_rtcd.h includeJames Zern
Change-Id: I39a67ffea7b0a55b45cdf935986439537b65601f
2016-05-02Move vpx_add_plane from codec to vpx_dsp and dedup.Jim Bankoski
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2015-09-28Accelerated transform in high bit depthJulia Robson
When configured with high bitdepth enabled, the 8bit transform stopped using optimised code. This made 8bit content decode slowly. Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
2015-07-31Factor inverse transform functions into vpx_dspJingning Han
This commit moves the module inverse transform functions from vp9 to vpx_dsp folder. The hybrid transform wrapper functions stay in the vp9 folder, since it involves codec-specific data structures. Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-31Code refactor on InterpKernelZoe Liu
It in essence refactors the code for both the interpolation filtering and the convolution. This change includes the moving of all the files as well as the changing of the code from vp9_ prefix to vpx_ prefix accordingly, for underneath architectures: (1) x86; (2) arm/neon; and (3) mips/msa. The work on mips/drsp2 will be done in a separate change list. Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
2015-07-28Merge "Move intra prediction functions from vp9/common/ to vpx_dsp/"Hui Su
2015-07-27Move intra prediction functions from vp9/common/ to vpx_dsp/hui su
Change-Id: I64edc26cf4aab050c83f2d393df6250628ad43b8
2015-07-26Refactor vp9_idct.h fileJingning Han
Separate the common coefficient constant into vpx_dsp/txfm_common.h. Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h. This clears the use case of vp9_idct.h in vpx_dsp folder. Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
2015-07-16Migrate loop filter functions from vp9/ to vpx_dsp/Jingning Han
The various tap loop filter operations are common functions across codec. This commit moves them along with SIMD optimizations to vpx_dsp folder. Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
2015-07-14Add an SSE2 version of vp9_iwht4x4_16_addAlex Converse
Roughly half as many cycles as plain C. Change-Id: I8c16c29940b76d54ee7e4fb874c328ce90bff5d4
2015-07-13Revert "Add an SSE2 version of vp9_iwht4x4_16_add."Yaowu Xu
This reverts commit f8d35016408f3957c67945160d65be467ca97fdc. Change-Id: If8c7af403c091b7fb447a6f0c73fecdbccbc51b3
2015-07-08Add an SSE2 version of vp9_iwht4x4_16_add.Alex Converse
80% fewer cycles than C Change-Id: I841bde1e268ddd33ae2ee75eee94737a400e2cde
2015-07-02VP9_LPF_VERTICAL_16_DUAL_SSE2 optimizationlevytamar82
The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16. The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%. credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com) Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
2015-06-03Optimize the idct assembly code.hkuang
Change-Id: Ia0ff859ff1c813dbe100e2f27b1ef78167483f4e
2015-05-22vp9: move ssse3 convolve fns to intrinsics fileJames Zern
+ synchronize filter function signatures this makes any intrinsics filters available for inlining and has the side-effect of making those filters static, quieting missing-prototype warnings. Change-Id: I1908875caffa585bd4fc65aaf10d17a5e20cfb46
2015-05-22vp9: move avx2 convolve fns to intrinsics fileJames Zern
+ synchronize filter function signatures this makes any intrinsics filters available for inlining and has the side-effect of making those filters static, quieting missing-prototype warnings. Change-Id: I1cd55c9d52547793ad65aa90c7620f0e426edaa2
2015-05-22add vp9/common/x86/convolve.hJames Zern
collect the vp9_convolve function definition macros there; this will allow some relocation of functions from vp9_asm_stubs.c Change-Id: Idadd117fa256dd48748379856973fd985b8204e8
2015-05-22vp9_subpixel_8t_intrin_ssse3: quiet vs9 warningJames Zern
reorder includes to avoid: warning C4985: 'ceil': attributes not present on previous declaration. this is the same workaround used in vp9/common/vp9_systemdependent.h Change-Id: Ia10dd63de24f96fa1507a6179220e9d6ec774db6
2015-05-15vp9 intrinsics: add vp9_rtcd includeJames Zern
silences a missing declaration warning Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1
2015-05-13Relocate memory operations for common codeJohann
With the sad functions, and hopefully the variance functions soon, moving to the vpx_dsp location, place the defines used in the reference C code in a common location. Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
2015-05-08Merge "Add more sse2 code for intra prediction."hkuang
2015-05-07replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNEDJames Zern
this macro was used inconsistently and only differs in behavior from DECLARE_ALIGNED when an alignment attribute is unavailable. this macro is used with calls to assembly, while generic c-code doesn't rely on it, so in a c-only build without an alignment attribute the code will function as expected. Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
2015-05-06Add more sse2 code for intra prediction.hkuang
vp9_dc_left_predictor_16x16 vp9_dc_top_predictor_32x32 vp9_dc_left_predictor_32x32 vp9_dc_128_predictor_32x32 Change-Id: Ib9861deefd01c3527235b92ff6b3d571ef6b4bc6
2015-05-05fix and enable vp9_dc_128_predictor_16x16James Zern
widen the loads and stores to 128-bit. this was added, but not enabled in: 493a857 Add some sse2 code for intra prediction. Change-Id: I277d7db608a7db7d75cc0bde86f48fa66ad487e4
2015-05-05Merge "Add some sse2 code for intra prediction."hkuang
2015-05-01vp9_idct_intrin_sse2: cosmetics: reindentJames Zern
+ fix some whitespace Change-Id: Id61b739282014288a7e5d3c17a9d6448d9d4cda2
2015-04-30vp9: RECON_AND_STORE4X4: remove dest offsetJames Zern
offsetting by a variable stride prevents instruction reordering, resulting in poor assembly Change-Id: Id62d6b3299cdd23f8c44f97b630abf4fea241446
2015-04-30vp9_idct_intrin_*: RECON_AND_STORE: remove dest offsetJames Zern
offsetting by a variable stride prevents instruction reordering, resulting in poor assembly. additionally reroll 16x16/32x32 loops to reduce register spill with this new format Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11
2015-04-30Add some sse2 code for intra prediction.hkuang
Change-Id: I16c0a62e52dab62837c547345df31e7518620ed4
2015-04-30Remove vp9_idct16x16_10_add_ssse3()Yaowu Xu
The rotation computation using 2X of cos(pi/16) has a potential to overflow 32 bit, this commit disable the function to allow further investigation and optimization. Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 4"Jingning Han
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 3"Jingning Han
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 2"Jingning Han
2015-02-26Fix high bit-depth loop-filter sse2 compiling issue - part 3Jingning Han
Change-Id: Idb14b9a285f8098126f967c5e2750221d6a58f69