summaryrefslogtreecommitdiff
path: root/vp9/encoder/x86
AgeCommit message (Collapse)Author
2019-02-01Remove old version of temporal_filter_applychiyotsai
BUG=webm:1591 Change-Id: I926566ac1bf4bac8cb1ce1c6ded9ba940109283e
2019-01-28Fix mismatch between SIMD/C version of vp9_apply_temporal_filterchiyotsai
Change-Id: I6503ebc79beaac2947992437ac133f3ac4379019
2019-01-24Add SSE4 version of new apply_temporal_filterchiyotsai
This adds a preliminary version of vp9_apply_temporal_filter in SSE4.1. This patch merely adds the function and does not enable it yet. Speed Up: | ss_x=1 | ss_x=0 | ss_y=1 | 19.80X | 19.04X | ss_y=0 | 21.09X | 20.21X | BUG=webm:1591 Change-Id: If590f1ccf1d0c6c3b47410541d54f2ce37d8305b
2019-01-07fix vp9 fdct_quantJohann
Values in [q]coeff1 were not correctly stored. This caused a segfault in the sse2 libvpx__nightly_optimization jobs. Broken in: commit 85032bac388917916f7a149173db8b34e93e8f6e Author: Johann <johannkoenig@google.com> Date: Fri Dec 21 00:27:00 2018 +0000 fdct_quant: resolve missing declarations BUG=webm:1584 Change-Id: I5f5fad34ec5e32023f5b40ff3691125754c11ced
2018-12-21vp9_highbd_block_error_sse2: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I43d051c538bf4a6f6210eefa398dc0901ab8d157
2018-12-21fdct_quant: resolve missing declarationsJohann
Store outputs using store_tran_low() BUG=webm:1584 Change-Id: I213abe047e14625c5ef80df7fa6fdc2a31e40fb6
2018-11-27rename quantize_x86.hJohann
Pave the way for new quantize_OPT.h helper files. Change-Id: Ice7225612983f5587a9660af3320c7d0c8bb1c2f
2018-10-30clang-tidy: fix vp9/encoder parametersJohann
BUG=webm:1444 Change-Id: I6823635eb1a99c3fcca0a8f091878e3ab2fdd2ac
2018-08-08Simplify temporal filter strength calculationJingning Han
Change-Id: I5f878e9b6581bcb427ecc29ce490feb68378f8af
2018-02-05Update tx_type switch code in idctLinfeng Zhang
Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
2018-01-18vp9_quantize_fp_avx2()Scott LaVarnway
Started from vp9_quantize_fp_sse2 and tweaked to use avx2. Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
2017-12-21vp9_quantize_ssse3_x86_64: fix out of bounds writeJames Zern
eob is a pointer to a uint16_t. previously the code would store 64-bits causing a crash or test failure with the right stack layout. Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a
2017-12-14fix typo in boilerplateJohann
The extra 'e' was causing the chromium license check to flag this file. BUG=chromium:98319 Change-Id: Ic875ba66370298bf998438d14ff5f7e760293706
2017-11-29Remove unnecessary includes of emmintrin_compat.hKyle Siefring
Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9
2017-11-09vpx: [x86] add vp9_block_error_fp_avx2()Scott LaVarnway
SSE2 asm vs AVX2 intrinsics speed gains: blocksize 16: ~1.00 blocksize 64: ~1.17 blocksize 256: ~1.67 blocksize 1024: ~1.81 Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
2017-10-16Add 4 to 3 scaling SSSE3 optimizationLinfeng Zhang
Note this change will trigger the different C version on SSSE3 and generate different scaled output. Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3(). Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
2017-10-10Add 4 to 1 scaling x86 optimizationLinfeng Zhang
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
2017-10-04Generalize 2:1 vp9_scale_and_extend_frame_ssse3()Linfeng Zhang
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
2017-10-04Update vp9_scale_and_extend_frame_ssse3()Linfeng Zhang
Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4
2017-09-20Remove the unnecessary cast of (int16_t)cospi_{1...31}_64Linfeng Zhang
BUG=webm:1450 Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
2017-09-06Add ScaleFrameTestLinfeng Zhang
Move class VpxScaleBase to new file test/vpx_scale_test.h. Add new file test/vp9_scale_test.cc with ScaleFrameTest. BUG=webm:1419 Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1
2017-08-21quantize fp: ignore skip_block in x86Johann
Change-Id: I1272917c49cf6e6710e52c36535b2fc8c8dced78
2017-06-21vp[89],vpx_dsp: add missing includesJames Zern
quiets -Wmissing-prototypes Change-Id: I841cfc019d592f2bc6b3fec5818051a31f4c53b5
2017-06-13Clean array_transpose_{4X8,16x16,16x16_2) in x86Linfeng Zhang
Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0
2017-06-13Remove array_transpose_8x8() in x86Linfeng Zhang
Duplicate of transpose_16bit_8x8() Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f
2017-05-09vp9: SVC: Add option to set downsampling filter type.Marco
Add option in SVC to set the filter type and phase for the frame level downsampling filters. For 3 spatial layers: set downsampling filter type to bilinear and set phase to 8, for lowest spatial layer. Change-Id: Id81f4b1ba93db19c1cd37b6a46d1281a2c61bc43
2017-05-02Merge "block error sse2: sum in 32 bits when possible"Johann Koenig
2017-05-01block error avx2: rename variablesJohann
Change-Id: I2b8a9253f2c3d1fd85304c2970ebe70213870fe9
2017-05-01block error avx2: sum in 32 bits when possibleKyle Siefring
Add 31bit pairs before unpacking in x86 block error code AVX2 code provides a very minor performance improvement. BUG=webm:1210 Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
2017-05-01block error sse2: sum in 32 bits when possibleKyle Siefring
Add 31bit pairs before unpacking in x86 block error code BUG=webm:1210 Change-Id: I5ca8c7f7775585a17fe09d6bbfc25e1f2955eb0a
2017-05-01move vp9_error_intrin_avx2.cJohann
There is only one avx2 implementation. Drop '_intrin' Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
2017-04-28Use uint32_t for accumulatorJohann
Be specific about the data type size. Use convenience macro vp9_zero_array. Change-Id: I5fadf7dbd408befb73820d85db0be4832e8cfcbd
2017-04-26vp9 temporal filter: sse4 implementationJohann
Approximates division using multiply and shift. Speeds up both sizes (8x8 and 16x16) by 30 times. Fix the call sites to use the RTCD function. Delete sse2 and mips implementation. They were based on a previous implementation of the filter. It was changed in Dec 2015: ece4fd5d2247c9512b31a93dd593de567beaf928 BUG=webm:1378 Change-Id: I0818e767a802966520b5c6e7999584ad13159276
2017-04-18vp9: Add phase to get averaging filter for 1:2 downsampling.Marco
The scaling filter with zero shift will give sub-sampling for 2x downsampling. Allow for a phase shift to get an averaging filter. Usage is for source scaling in 1 pass SVC mode for 1:2 downscale. Reduces aliasing in downsampled image. Keep the phase to 0/off for now. Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
2017-02-24consolidate block_error functionsJohann
vp9_highbd_block_error_8bit_c was a very simple wrapper around vp9_block_error_c. The SSE2 implemention was practically identical to the non-HBD one. It was missing some minor improvements which only went into the original version. In quick speed tests, the AVX implementation showed minimal improvement over SSE2 when it does not detect overflow. However, when overflow is detected the function is run a second time. The OperationCheck test seems to trigger this case and reverses any speed benefits by running ~60% slower. AVX2 on the other hand is always 30-40% faster. Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
2017-02-24Merge "block error sse2: use tran_low_t"Johann Koenig
2017-02-24block error sse2: use tran_low_tJohann
Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb
2017-02-21Merge "Drop zbin_ptr and quant_shift_ptr"Johann Koenig
2017-02-16Merge "block error avx2: use tran_low_t"Johann Koenig
2017-02-16Drop zbin_ptr and quant_shift_ptrJohann
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use of these parameters. scan is used for C code and iscan is used for SIMD implementations. Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
2017-02-16block error avx2: use tran_low_tJohann
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
2017-02-16quantize_fp highbd ssse3: use tran_low_t for coeffJohann
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
2017-02-16quantize_fp highbd sse2: use tran_low_t for coeffJohann
Change-Id: Id96a8df33354a7987ce890a3d6798c7375ffa4aa
2017-02-16bitdepth conversion: really use num elementsJohann
The previous implementation confused bit/bytes/elements. It was using '32' as the multiplier but that was mistakenly adopted because a 32x32 transform embedded the stride. Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
2017-02-07block_error_fp highbd sse2: use tran_low_t for coeffJohann
BUG=webm:1365 Change-Id: Id2ed3ebaaaa6a4b68628c23e08b64ea5f1341761
2017-02-06highbd x86: consolidate tran_low_t conversionsJohann
Create new helper files specifically for converting tran_low_t types. Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905
2016-11-05Update vp9_fdct8x8_quant_ssse3 for highbitdepthJohann
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2 BUG=webm:1304 Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
2016-08-08Refactor mv limits.Alex Converse
Change-Id: Ifebdc9ef37850508eb4b8e572fd0f6026ab04987
2016-08-02vp9/encoder: apply clang-formatclang-format
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
2016-06-28remove visual studio < 2010 workaroundsJames Zern
BUG=b/29583530 Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a