summaryrefslogtreecommitdiff
path: root/vp9/encoder/x86
AgeCommit message (Collapse)Author
2019-09-30namespace ARCH_* definesJames Zern
this prevents redefinition warnings if a toolchain sets one BUG=b/117240165 Change-Id: Ib5d8c303cd05b4dbcc8d42c71ecfcba8f6d7b90c
2019-09-10vp9_quantize_sse2: quiet clang-7 integer sanitizer warningJames Zern
nzflag is used as a boolean, it doesn't need to be a sized type, int is enough (and _mm_movemask_epi8 returns one) fixes: vp9_quantize_sse2.c:136:16: implicit conversion from type 'int' of value 65535 (32-bit, signed) to type 'int16_t' (aka 'short') changed the value to -1 (16-bit, signed) BUG=webm:1649 Change-Id: I0e3f5278af49d84760f3dfb607f28099cf02f21d
2019-04-30cast ambiguous _mm_set1_epiNN() constantsJohann
clang 7 integer sanitizer warns on unsigned->signed conversions when the highest bit is 1. BUG=webm:1615 Change-Id: I6381efaff9233254b40cb78f7bcf87090e0ad353
2019-03-29update .clang-format for version clang-7.0.1 update.Hien Ho
added files that are affected by clang-format version 7. BUG=b/120815481 Change-Id: I40662ce962e4f4b1fcdf183b700f85cc5c0f9f82
2019-03-25Remove deprecated code for vp9_fdct8x8_quant()Jingning Han
Change-Id: If146bbf24f446f71be9147402e6d30533eee99d1
2019-03-07Optimize SSE4_1 lowbd temporal filter implementationchiyotsai
- Change some unaligned loads to aligned loads - Preload filter weights BUG=webm:1591 Change-Id: I4e5e755e1fa5613d1c14191265bf80b0bfd0b75c
2019-03-04Add SSE4_1 highbd version of temporal filterchiyotsai
The SSE4_1 version of temporal filter does not distinguish between bd 10 and bd 12. Speed up: Function Level: | !SS_X | SS_X !SS_Y | 6.44X | 6.37X SS_Y | 6.56X | 6.63X Video Level: 2.5% speed up on basketballpass_240p over 150 frames on speed 1, bitdepth 10, auto-alt-ref=1 BUG=webm:1591 Change-Id: I49aa2ed4acfe80a8d627038322de66cbe691296e
2019-02-04Fix an inline varible declaration in temporal filterchiyotsai
bug=webm:1595 Change-Id: I7fbb16444a8526eb9479007772fbf52b09ff8338
2019-02-04Some cosmetic fixes to temporal filterchiyotsai
BUG=webm:1591 Change-Id: I34fd7e6cbe6f3d5486a669d0895402fd21de7641
2019-02-01Remove old version of temporal_filter_applychiyotsai
BUG=webm:1591 Change-Id: I926566ac1bf4bac8cb1ce1c6ded9ba940109283e
2019-01-28Fix mismatch between SIMD/C version of vp9_apply_temporal_filterchiyotsai
Change-Id: I6503ebc79beaac2947992437ac133f3ac4379019
2019-01-24Add SSE4 version of new apply_temporal_filterchiyotsai
This adds a preliminary version of vp9_apply_temporal_filter in SSE4.1. This patch merely adds the function and does not enable it yet. Speed Up: | ss_x=1 | ss_x=0 | ss_y=1 | 19.80X | 19.04X | ss_y=0 | 21.09X | 20.21X | BUG=webm:1591 Change-Id: If590f1ccf1d0c6c3b47410541d54f2ce37d8305b
2019-01-07fix vp9 fdct_quantJohann
Values in [q]coeff1 were not correctly stored. This caused a segfault in the sse2 libvpx__nightly_optimization jobs. Broken in: commit 85032bac388917916f7a149173db8b34e93e8f6e Author: Johann <johannkoenig@google.com> Date: Fri Dec 21 00:27:00 2018 +0000 fdct_quant: resolve missing declarations BUG=webm:1584 Change-Id: I5f5fad34ec5e32023f5b40ff3691125754c11ced
2018-12-21vp9_highbd_block_error_sse2: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I43d051c538bf4a6f6210eefa398dc0901ab8d157
2018-12-21fdct_quant: resolve missing declarationsJohann
Store outputs using store_tran_low() BUG=webm:1584 Change-Id: I213abe047e14625c5ef80df7fa6fdc2a31e40fb6
2018-11-27rename quantize_x86.hJohann
Pave the way for new quantize_OPT.h helper files. Change-Id: Ice7225612983f5587a9660af3320c7d0c8bb1c2f
2018-10-30clang-tidy: fix vp9/encoder parametersJohann
BUG=webm:1444 Change-Id: I6823635eb1a99c3fcca0a8f091878e3ab2fdd2ac
2018-08-08Simplify temporal filter strength calculationJingning Han
Change-Id: I5f878e9b6581bcb427ecc29ce490feb68378f8af
2018-02-05Update tx_type switch code in idctLinfeng Zhang
Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
2018-01-18vp9_quantize_fp_avx2()Scott LaVarnway
Started from vp9_quantize_fp_sse2 and tweaked to use avx2. Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
2017-12-21vp9_quantize_ssse3_x86_64: fix out of bounds writeJames Zern
eob is a pointer to a uint16_t. previously the code would store 64-bits causing a crash or test failure with the right stack layout. Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a
2017-12-14fix typo in boilerplateJohann
The extra 'e' was causing the chromium license check to flag this file. BUG=chromium:98319 Change-Id: Ic875ba66370298bf998438d14ff5f7e760293706
2017-11-29Remove unnecessary includes of emmintrin_compat.hKyle Siefring
Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9
2017-11-09vpx: [x86] add vp9_block_error_fp_avx2()Scott LaVarnway
SSE2 asm vs AVX2 intrinsics speed gains: blocksize 16: ~1.00 blocksize 64: ~1.17 blocksize 256: ~1.67 blocksize 1024: ~1.81 Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
2017-10-16Add 4 to 3 scaling SSSE3 optimizationLinfeng Zhang
Note this change will trigger the different C version on SSSE3 and generate different scaled output. Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3(). Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
2017-10-10Add 4 to 1 scaling x86 optimizationLinfeng Zhang
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
2017-10-04Generalize 2:1 vp9_scale_and_extend_frame_ssse3()Linfeng Zhang
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
2017-10-04Update vp9_scale_and_extend_frame_ssse3()Linfeng Zhang
Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4
2017-09-20Remove the unnecessary cast of (int16_t)cospi_{1...31}_64Linfeng Zhang
BUG=webm:1450 Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
2017-09-06Add ScaleFrameTestLinfeng Zhang
Move class VpxScaleBase to new file test/vpx_scale_test.h. Add new file test/vp9_scale_test.cc with ScaleFrameTest. BUG=webm:1419 Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1
2017-08-21quantize fp: ignore skip_block in x86Johann
Change-Id: I1272917c49cf6e6710e52c36535b2fc8c8dced78
2017-06-21vp[89],vpx_dsp: add missing includesJames Zern
quiets -Wmissing-prototypes Change-Id: I841cfc019d592f2bc6b3fec5818051a31f4c53b5
2017-06-13Clean array_transpose_{4X8,16x16,16x16_2) in x86Linfeng Zhang
Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0
2017-06-13Remove array_transpose_8x8() in x86Linfeng Zhang
Duplicate of transpose_16bit_8x8() Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f
2017-05-09vp9: SVC: Add option to set downsampling filter type.Marco
Add option in SVC to set the filter type and phase for the frame level downsampling filters. For 3 spatial layers: set downsampling filter type to bilinear and set phase to 8, for lowest spatial layer. Change-Id: Id81f4b1ba93db19c1cd37b6a46d1281a2c61bc43
2017-05-02Merge "block error sse2: sum in 32 bits when possible"Johann Koenig
2017-05-01block error avx2: rename variablesJohann
Change-Id: I2b8a9253f2c3d1fd85304c2970ebe70213870fe9
2017-05-01block error avx2: sum in 32 bits when possibleKyle Siefring
Add 31bit pairs before unpacking in x86 block error code AVX2 code provides a very minor performance improvement. BUG=webm:1210 Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
2017-05-01block error sse2: sum in 32 bits when possibleKyle Siefring
Add 31bit pairs before unpacking in x86 block error code BUG=webm:1210 Change-Id: I5ca8c7f7775585a17fe09d6bbfc25e1f2955eb0a
2017-05-01move vp9_error_intrin_avx2.cJohann
There is only one avx2 implementation. Drop '_intrin' Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
2017-04-28Use uint32_t for accumulatorJohann
Be specific about the data type size. Use convenience macro vp9_zero_array. Change-Id: I5fadf7dbd408befb73820d85db0be4832e8cfcbd
2017-04-26vp9 temporal filter: sse4 implementationJohann
Approximates division using multiply and shift. Speeds up both sizes (8x8 and 16x16) by 30 times. Fix the call sites to use the RTCD function. Delete sse2 and mips implementation. They were based on a previous implementation of the filter. It was changed in Dec 2015: ece4fd5d2247c9512b31a93dd593de567beaf928 BUG=webm:1378 Change-Id: I0818e767a802966520b5c6e7999584ad13159276
2017-04-18vp9: Add phase to get averaging filter for 1:2 downsampling.Marco
The scaling filter with zero shift will give sub-sampling for 2x downsampling. Allow for a phase shift to get an averaging filter. Usage is for source scaling in 1 pass SVC mode for 1:2 downscale. Reduces aliasing in downsampled image. Keep the phase to 0/off for now. Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
2017-02-24consolidate block_error functionsJohann
vp9_highbd_block_error_8bit_c was a very simple wrapper around vp9_block_error_c. The SSE2 implemention was practically identical to the non-HBD one. It was missing some minor improvements which only went into the original version. In quick speed tests, the AVX implementation showed minimal improvement over SSE2 when it does not detect overflow. However, when overflow is detected the function is run a second time. The OperationCheck test seems to trigger this case and reverses any speed benefits by running ~60% slower. AVX2 on the other hand is always 30-40% faster. Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
2017-02-24Merge "block error sse2: use tran_low_t"Johann Koenig
2017-02-24block error sse2: use tran_low_tJohann
Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb
2017-02-21Merge "Drop zbin_ptr and quant_shift_ptr"Johann Koenig
2017-02-16Merge "block error avx2: use tran_low_t"Johann Koenig
2017-02-16Drop zbin_ptr and quant_shift_ptrJohann
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use of these parameters. scan is used for C code and iscan is used for SIMD implementations. Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
2017-02-16block error avx2: use tran_low_tJohann
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c