summaryrefslogtreecommitdiff
path: root/vp9/encoder/x86
AgeCommit message (Collapse)Author
2018-02-05Update tx_type switch code in idctLinfeng Zhang
Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
2018-01-18vp9_quantize_fp_avx2()Scott LaVarnway
Started from vp9_quantize_fp_sse2 and tweaked to use avx2. Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
2017-12-21vp9_quantize_ssse3_x86_64: fix out of bounds writeJames Zern
eob is a pointer to a uint16_t. previously the code would store 64-bits causing a crash or test failure with the right stack layout. Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a
2017-12-14fix typo in boilerplateJohann
The extra 'e' was causing the chromium license check to flag this file. BUG=chromium:98319 Change-Id: Ic875ba66370298bf998438d14ff5f7e760293706
2017-11-29Remove unnecessary includes of emmintrin_compat.hKyle Siefring
Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9
2017-11-09vpx: [x86] add vp9_block_error_fp_avx2()Scott LaVarnway
SSE2 asm vs AVX2 intrinsics speed gains: blocksize 16: ~1.00 blocksize 64: ~1.17 blocksize 256: ~1.67 blocksize 1024: ~1.81 Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
2017-10-16Add 4 to 3 scaling SSSE3 optimizationLinfeng Zhang
Note this change will trigger the different C version on SSSE3 and generate different scaled output. Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3(). Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
2017-10-10Add 4 to 1 scaling x86 optimizationLinfeng Zhang
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
2017-10-04Generalize 2:1 vp9_scale_and_extend_frame_ssse3()Linfeng Zhang
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
2017-10-04Update vp9_scale_and_extend_frame_ssse3()Linfeng Zhang
Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4
2017-09-20Remove the unnecessary cast of (int16_t)cospi_{1...31}_64Linfeng Zhang
BUG=webm:1450 Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
2017-09-06Add ScaleFrameTestLinfeng Zhang
Move class VpxScaleBase to new file test/vpx_scale_test.h. Add new file test/vp9_scale_test.cc with ScaleFrameTest. BUG=webm:1419 Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1
2017-08-21quantize fp: ignore skip_block in x86Johann
Change-Id: I1272917c49cf6e6710e52c36535b2fc8c8dced78
2017-06-21vp[89],vpx_dsp: add missing includesJames Zern
quiets -Wmissing-prototypes Change-Id: I841cfc019d592f2bc6b3fec5818051a31f4c53b5
2017-06-13Clean array_transpose_{4X8,16x16,16x16_2) in x86Linfeng Zhang
Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0
2017-06-13Remove array_transpose_8x8() in x86Linfeng Zhang
Duplicate of transpose_16bit_8x8() Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f
2017-05-09vp9: SVC: Add option to set downsampling filter type.Marco
Add option in SVC to set the filter type and phase for the frame level downsampling filters. For 3 spatial layers: set downsampling filter type to bilinear and set phase to 8, for lowest spatial layer. Change-Id: Id81f4b1ba93db19c1cd37b6a46d1281a2c61bc43
2017-05-02Merge "block error sse2: sum in 32 bits when possible"Johann Koenig
2017-05-01block error avx2: rename variablesJohann
Change-Id: I2b8a9253f2c3d1fd85304c2970ebe70213870fe9
2017-05-01block error avx2: sum in 32 bits when possibleKyle Siefring
Add 31bit pairs before unpacking in x86 block error code AVX2 code provides a very minor performance improvement. BUG=webm:1210 Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
2017-05-01block error sse2: sum in 32 bits when possibleKyle Siefring
Add 31bit pairs before unpacking in x86 block error code BUG=webm:1210 Change-Id: I5ca8c7f7775585a17fe09d6bbfc25e1f2955eb0a
2017-05-01move vp9_error_intrin_avx2.cJohann
There is only one avx2 implementation. Drop '_intrin' Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
2017-04-28Use uint32_t for accumulatorJohann
Be specific about the data type size. Use convenience macro vp9_zero_array. Change-Id: I5fadf7dbd408befb73820d85db0be4832e8cfcbd
2017-04-26vp9 temporal filter: sse4 implementationJohann
Approximates division using multiply and shift. Speeds up both sizes (8x8 and 16x16) by 30 times. Fix the call sites to use the RTCD function. Delete sse2 and mips implementation. They were based on a previous implementation of the filter. It was changed in Dec 2015: ece4fd5d2247c9512b31a93dd593de567beaf928 BUG=webm:1378 Change-Id: I0818e767a802966520b5c6e7999584ad13159276
2017-04-18vp9: Add phase to get averaging filter for 1:2 downsampling.Marco
The scaling filter with zero shift will give sub-sampling for 2x downsampling. Allow for a phase shift to get an averaging filter. Usage is for source scaling in 1 pass SVC mode for 1:2 downscale. Reduces aliasing in downsampled image. Keep the phase to 0/off for now. Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
2017-02-24consolidate block_error functionsJohann
vp9_highbd_block_error_8bit_c was a very simple wrapper around vp9_block_error_c. The SSE2 implemention was practically identical to the non-HBD one. It was missing some minor improvements which only went into the original version. In quick speed tests, the AVX implementation showed minimal improvement over SSE2 when it does not detect overflow. However, when overflow is detected the function is run a second time. The OperationCheck test seems to trigger this case and reverses any speed benefits by running ~60% slower. AVX2 on the other hand is always 30-40% faster. Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
2017-02-24Merge "block error sse2: use tran_low_t"Johann Koenig
2017-02-24block error sse2: use tran_low_tJohann
Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb
2017-02-21Merge "Drop zbin_ptr and quant_shift_ptr"Johann Koenig
2017-02-16Merge "block error avx2: use tran_low_t"Johann Koenig
2017-02-16Drop zbin_ptr and quant_shift_ptrJohann
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use of these parameters. scan is used for C code and iscan is used for SIMD implementations. Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
2017-02-16block error avx2: use tran_low_tJohann
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
2017-02-16quantize_fp highbd ssse3: use tran_low_t for coeffJohann
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
2017-02-16quantize_fp highbd sse2: use tran_low_t for coeffJohann
Change-Id: Id96a8df33354a7987ce890a3d6798c7375ffa4aa
2017-02-16bitdepth conversion: really use num elementsJohann
The previous implementation confused bit/bytes/elements. It was using '32' as the multiplier but that was mistakenly adopted because a 32x32 transform embedded the stride. Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
2017-02-07block_error_fp highbd sse2: use tran_low_t for coeffJohann
BUG=webm:1365 Change-Id: Id2ed3ebaaaa6a4b68628c23e08b64ea5f1341761
2017-02-06highbd x86: consolidate tran_low_t conversionsJohann
Create new helper files specifically for converting tran_low_t types. Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905
2016-11-05Update vp9_fdct8x8_quant_ssse3 for highbitdepthJohann
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2 BUG=webm:1304 Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
2016-08-08Refactor mv limits.Alex Converse
Change-Id: Ifebdc9ef37850508eb4b8e572fd0f6026ab04987
2016-08-02vp9/encoder: apply clang-formatclang-format
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
2016-06-28remove visual studio < 2010 workaroundsJames Zern
BUG=b/29583530 Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a
2016-06-17remove vp10James Zern
development has moved to the nextgenv2 branch and a snapshot from here was used to seed aomedia BUG=b/29457125 Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
2016-06-10vp9_diamond_search_sad_avx cosmeticsScott LaVarnway
Fixed cosmetic issues noted in Change 349854. Change-Id: I1d94070e4066fa920173013c5a36a30dd1cb357d
2016-06-07Revert "remove vp9_diamond_search_sad_avx.c"Scott LaVarnway
This reverts commit be12fefa4b7d224e9f39275a6bb4fab01b8bae3b and commit 057c1c4034ba5b9bf360c5c1f600ebc6d0718c3a. Also, the mismatch between the avx version and the c version has been fixed. BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168 For a rt encode using 1080p@60fps material, up to 11% performance improvement overall was seen. Change-Id: Icd1f216209ebc6fc0b8da885f32f356fa4355ed0
2016-05-27Merge "Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10."Linfeng Zhang
2016-05-27Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.Linfeng Zhang
Function level timing test shows about 27% time saving on a Xeon E5-2680 v2 desktop. Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid duplicate basenames. Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2() are identical. TODO: They should be unified later if there is no intention to keep a duplicate. Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
2016-05-24remove vp9_diamond_search_sad_avx.cJames Zern
vp9_diamond_search_sad_avx was disabled in: 057c1c4 disable vp9_diamond_search_sad_avx this removes a missing prototype warning as the prototype is no longer included in vp9_rtcd.h. the file can be restored if someone gets around to fixing the issue. BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168 Change-Id: Ia9fda4b81c53dc5fba7c31d780d761f886940b52
2016-05-04vp9_frame_scale_ssse3.c: make 2 functions staticJames Zern
downsample_2_to_1_ssse3/upsample_1_to_2_ssse3() are local to this module Change-Id: I78a9de8e1eca475ba1bf137102580c531aa3f7dd
2016-05-02vp9: Refactor vp9_denoiser_NxM_sse2.JackyChen
Denoiser is ~1.5% faster in speed 6~8. Change-Id: I7b350f3c50cce6773d9c4eded4c0c1b722d0a5fc
2016-04-26vp9: Simplify the logic in denoiser SSE2 code.JackyChen
Block size passed into denoiser filter is always >= BLOCK_8X8 (in vp9_pick_inter_mode), it is not necessary to check smaller block size. Passed the bitexact test on clips with different resolutions and noise levels. Change-Id: I19fa3195d18c27d9e5de60dc11cff1522ef3714e