summaryrefslogtreecommitdiff
path: root/vp9/common/x86
AgeCommit message (Collapse)Author
2015-06-03Optimize the idct assembly code.hkuang
Change-Id: Ia0ff859ff1c813dbe100e2f27b1ef78167483f4e
2015-05-22vp9: move ssse3 convolve fns to intrinsics fileJames Zern
+ synchronize filter function signatures this makes any intrinsics filters available for inlining and has the side-effect of making those filters static, quieting missing-prototype warnings. Change-Id: I1908875caffa585bd4fc65aaf10d17a5e20cfb46
2015-05-22vp9: move avx2 convolve fns to intrinsics fileJames Zern
+ synchronize filter function signatures this makes any intrinsics filters available for inlining and has the side-effect of making those filters static, quieting missing-prototype warnings. Change-Id: I1cd55c9d52547793ad65aa90c7620f0e426edaa2
2015-05-22add vp9/common/x86/convolve.hJames Zern
collect the vp9_convolve function definition macros there; this will allow some relocation of functions from vp9_asm_stubs.c Change-Id: Idadd117fa256dd48748379856973fd985b8204e8
2015-05-22vp9_subpixel_8t_intrin_ssse3: quiet vs9 warningJames Zern
reorder includes to avoid: warning C4985: 'ceil': attributes not present on previous declaration. this is the same workaround used in vp9/common/vp9_systemdependent.h Change-Id: Ia10dd63de24f96fa1507a6179220e9d6ec774db6
2015-05-15vp9 intrinsics: add vp9_rtcd includeJames Zern
silences a missing declaration warning Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1
2015-05-13Relocate memory operations for common codeJohann
With the sad functions, and hopefully the variance functions soon, moving to the vpx_dsp location, place the defines used in the reference C code in a common location. Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
2015-05-08Merge "Add more sse2 code for intra prediction."hkuang
2015-05-07replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNEDJames Zern
this macro was used inconsistently and only differs in behavior from DECLARE_ALIGNED when an alignment attribute is unavailable. this macro is used with calls to assembly, while generic c-code doesn't rely on it, so in a c-only build without an alignment attribute the code will function as expected. Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
2015-05-06Add more sse2 code for intra prediction.hkuang
vp9_dc_left_predictor_16x16 vp9_dc_top_predictor_32x32 vp9_dc_left_predictor_32x32 vp9_dc_128_predictor_32x32 Change-Id: Ib9861deefd01c3527235b92ff6b3d571ef6b4bc6
2015-05-05fix and enable vp9_dc_128_predictor_16x16James Zern
widen the loads and stores to 128-bit. this was added, but not enabled in: 493a857 Add some sse2 code for intra prediction. Change-Id: I277d7db608a7db7d75cc0bde86f48fa66ad487e4
2015-05-05Merge "Add some sse2 code for intra prediction."hkuang
2015-05-01vp9_idct_intrin_sse2: cosmetics: reindentJames Zern
+ fix some whitespace Change-Id: Id61b739282014288a7e5d3c17a9d6448d9d4cda2
2015-04-30vp9: RECON_AND_STORE4X4: remove dest offsetJames Zern
offsetting by a variable stride prevents instruction reordering, resulting in poor assembly Change-Id: Id62d6b3299cdd23f8c44f97b630abf4fea241446
2015-04-30vp9_idct_intrin_*: RECON_AND_STORE: remove dest offsetJames Zern
offsetting by a variable stride prevents instruction reordering, resulting in poor assembly. additionally reroll 16x16/32x32 loops to reduce register spill with this new format Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11
2015-04-30Add some sse2 code for intra prediction.hkuang
Change-Id: I16c0a62e52dab62837c547345df31e7518620ed4
2015-04-30Remove vp9_idct16x16_10_add_ssse3()Yaowu Xu
The rotation computation using 2X of cos(pi/16) has a potential to overflow 32 bit, this commit disable the function to allow further investigation and optimization. Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 4"Jingning Han
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 3"Jingning Han
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 2"Jingning Han
2015-02-26Fix high bit-depth loop-filter sse2 compiling issue - part 3Jingning Han
Change-Id: Idb14b9a285f8098126f967c5e2750221d6a58f69
2015-02-26Fix high bit-depth loop-filter sse2 compiling issue - part 2Jingning Han
Change-Id: I6728b69bb3dff1daa64ff7142f691e80a089f1c4
2015-02-25Fix high bit-depth loop-filter sse2 compiling issue - part 1Jingning Han
The intrinsic statement _mm_subs_epi16() should take immediate. Feeding variable as its input argument will cause compile failure in older version gcc. Change-Id: I6a71efcc8d3b16b84715e0a9bcfa818494eea3f4
2015-02-24Fix high bit-depth loop-filter sse2 compiling issue - part 4Jingning Han
Change-Id: I39f56f60425836f2e1ec07da71edd4810a4c78bb
2015-02-10vp9_highbd_tm_predictor_16x16: fix win64James Zern
by saving xmm8; cglobal's xmm reg arg is 0-based Change-Id: Ic8426ec9ac59ab4478716aa812452a6406794dcb
2015-01-18SSE2 code for the filter in MFQE.JackyChen
The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8 side. In our testing, we achieve 2X speed by adopting this change. Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716
2014-12-12Merge "Remove redundant loads on 1d16_v8 filter."James Zern
2014-12-12Merge "Remove redundant loads on 1d8_v8 filter."James Zern
2014-12-12Remove redundant loads on 1d16_v8 filter.Frank Galligan
This CL showed about a 3% gain in performance on some systems. Change-Id: Id27e7e0b8e69068aa364e67859436da852669250
2014-12-12Remove redundant loads on 1d8_v8 filter.Frank Galligan
This CL showed a modest gain in performance on some systems. Change-Id: Iad636a89a1a9804ab7a0dea302bf2c6a4d1653a4
2014-12-12vp9_loopfilter_mmx: remove some unused tablesJames Zern
Change-Id: I964d25cc91c8e4864d73b142d9c7a1b39cb6cfbb
2014-12-11Corrected optimization of 8x8 DCT codePeter de Rivaz
The 8x8 DCT uses a fast version whenever possible. There was a mistake in the checking code which meant sometimes the fast version was used when it was not safe to do so. Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7 (cherry picked from commit fd05fb0c21e253b4d6f92d7e0b752850ff8ab188)
2014-12-08Merge "SSSE3 Optimization for Atom processors using new instruction ↵Yunqing Wang
selection and ordering"
2014-12-08Merge "Changes to assembler for NASM on mac."James Zern
2014-12-08SSSE3 Optimization for Atom processors using new instruction selection and ↵levytamar82
ordering The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors. By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved. In the original code, the PSHUBF uses every byte and is consecutively copied. This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result. For example: filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7 REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15 PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors. There was no observed performance impact on Core processors (expected). Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0
2014-12-02Added high bitdepth sse2 transform functionsPeter de Rivaz
Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba) (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3) (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
2014-11-24Changes to assembler for NASM on mac.John Stark
fixes non-Apple nasm part of issue #755 Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a
2014-11-05Fix visual studio 2013 compiler warningsYaowu Xu
For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
2014-11-01WORKAROUND FIX FOR GCC4.9.1levytamar82
In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic _mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1. until it will be fixed I created a workaround that create the up convert by using broadcast128+shuffle. The bug was reported here: https://code.google.com/p/webm/issues/detail?id=867 Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351
2014-10-09Rename highbitdepth functions to use highbd prefixDeb Mukherjee
Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-09-23Merge "High bit-depth loop/arf/postproc filter functions"Deb Mukherjee
2014-09-23High bit-depth loop/arf/postproc filter functionsDeb Mukherjee
Adds high-bitdepth loopfilter, temporal filter and postproc functions Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3
2014-09-18Merge "FIX: vp9_loopfilter_intrin_sse2.c"Frank Galligan
2014-09-18FIX: vp9_loopfilter_intrin_sse2.cScott LaVarnway
Fixes Visual Studio build failures Change-Id: I233719cd63b3ad0db16e2834bf1d7ea1df805880
2014-09-18Merge "Adds high bitdepth convolve, interpred & scaling"Deb Mukherjee
2014-09-18Adds high bitdepth convolve, interpred & scalingDeb Mukherjee
Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3
2014-09-17Merge "Improved mb_lpf_horizontal_edge_w_sse2_16() #2"Frank Galligan
2014-09-17Improved mb_lpf_horizontal_edge_w_sse2_16() #2Scott LaVarnway
The decoder performance improved up to 1% for the test clips used. Change-Id: I4621112bdccfba01640322facfa4ba8da8290ea5
2014-09-16Adding high-bitdepth intra prediction functionsDeb Mukherjee
Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111
2014-09-11Allow specifying opt dependenciesJohann
If optimizations use more than one cpu feature, allow specifying them so that '--disable-X' still works https://code.google.com/p/webm/issues/detail?id=854 Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18