summaryrefslogtreecommitdiff
path: root/vp9/encoder/x86
AgeCommit message (Collapse)Author
2023-06-06Add comments in vp9_diamond_search_sad_avx()Deepa K G
Added comments related to re-arranging the elements of the SAD vector to find the minimum. Change-Id: I58b702d304a6cdd32f04775fba603e39c19a8947
2023-06-05Fix c vs avx mismatch of diamond_search_sad()Deepa K G
In the function vp9_diamond_search_sad_avx(), arranged the cost vector in a specific order. This ensures that the motion vector with the least index is selected, when there exists more than one candidate motion vector with the minimum cost, thus resolving the c vs avx mismatch. STATS_CHANGED Change-Id: I4f8864f464f9ea2aae6250db3d8ad91cb08b26e2
2023-04-18Merge "Downsample SAD computation in motion search" into mainYunqing Wang
2023-04-12vp9_frame_scale_ssse3: clear -Wshadow warningsJames Zern
Bug: webm:1793 Change-Id: I85608ac7bb6d3a61649ba342c13c3bf6a39a5dea
2023-04-11Merge "vp9_quantize_avx2,highbd_get_max_lane_eob: fix mask" into mainJames Zern
2023-04-11Downsample SAD computation in motion searchDeepa K G
Added a speed feature to skip every other row in SAD computation during motion search. Instruction Count BD-Rate Loss(%) cpu Resolution Reduction(%) avg.psnr ovr.psnr ssim 0 LOWRES2 0.958 0.0204 0.0095 0.0275 0 MIDRES2 1.891 -0.0636 0.0032 0.0247 0 HDRES2 2.869 0.0434 0.0345 0.0686 0 Average 1.905 0.0000 0.0157 0.0403 STATS_CHANGED Change-Id: I1a8692757ed0cbcb2259729b3ecfb0436cdf49ce
2023-04-11Avoid redundant start MV SAD calculationDeepa K G
Avoided repeated calculation of start MV SAD during full pixel motion search. Instruction Count cpu Resolution Reduction(%) 0 LOWRES2 0.162 0 MIDRES2 0.246 0 HDRES2 0.325 0 Average 0.245 Change-Id: I2b4786901f254ce32ee8ca8a3d56f1c9f112f1d4
2023-04-10vp9_quantize_avx2,highbd_get_max_lane_eob: fix maskJames Zern
Pack nz_mask with zero. After the result is permuted this has the effect of ignoring the upper half of the iscan register which is only loaded with 128-bits. Depending on the optimization level and the load used the upper half of the ymm register may contain undefined values which can produce an incorrect eob. If this is large enough it can cause a crash. Bug: chromium:1431729 Change-Id: I4ebae9fa39f228bdd29dcc19935f3f07759d75f5
2023-03-14[NEON] Add temporal filter functions, 8-bit and highbdKonstantinos Margaritis
Both are around 3x faster than original C version. 8-bit gives a small 0.5% speed increase, whereas highbd gives ~2.5%. Change-Id: I71d75ddd2757b19aa201e879fd9fa8f3a25431ad
2023-03-02[SSE4_1] Fix overflow in highbd temporal_filterKonstantinos Margaritis
While porting this function to NEON, using SSE4_1 implementation as base I noticed that both were producing files with different checksums to the C reference implementation. After investigating further I found that this saturating pack was the culprit. Doing the multiplication on the 32-bit values, leads to producing the correct results with the C implementation. Change-Id: I40c2a36551b2db363a58ea9aa19ef327f2676de3
2022-10-25Merge "quantize: consolidate sse2 conditionals" into mainJohann Koenig
2022-10-17quantize: consolidate sse2 conditionalsJohann
Change-Id: I43de579e30f2967b97064063e29676e0af1a498f
2022-10-17vp9 quantize: rewrite ssse3 in intrinsicsJohann
Change-Id: I3177251a5935453a23a23c39ea5f6fd41254775e
2022-10-01vp9 quantize: change indexJohann
In assembly it made sense to iterate using n_coeffs. In intrinsics it's just as fast to use index and easier to read. Change-Id: I403c959709309dad68123d0a3d0efe183874543d
2022-09-26quantize: standardize vp9_quantize_fp_sse2Johann
Match style for vpx_quantize_b_sse2 and prepare to rewrite ssse3 version in intrinsics. Need to evaluate the value of threshold breakout before going further. Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b
2022-09-23quantize: increase iscan by 1Johann
All of the assembly adds 1 to iscan to convert from a 0 based array to the EOB value. Add 1 to all iscan values and remove the extra instructions from the assembly. Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9
2022-09-02x86,cosmetics: prefer _mm_setzero_si128/_mm256_setzero_si256James Zern
over *_set1_*(0) Change-Id: I136e1798a2ce286480ebb9418db67a2f1e92b9a2
2022-08-09VPX: Fix vp9_quantize_fp_avx2() VS build error.Scott LaVarnway
Add build fix for _mm256_extract_epi16() being undefined. Bug: b/237714063 Change-Id: I855b1828ce1b6b2b2f063fe097999481881bf074
2022-08-03VPX: Add vp9_highbd_quantize_fp_32x32_avx2().Scott LaVarnway
~4x faster than vp9_highbd_quantize_fp_32x32_c() for full calculations. Bug: b/237714063 Change-Id: Iff2182b8e7b1ac79811e33080d1f6cac6679382d
2022-08-03VPX: Add vp9_highbd_quantize_fp_avx2().Scott LaVarnway
Up to 5.37x faster than vp9_highbd_quantize_fp_c() for full calculations. ~1.6% overall encoder improvement for the test clip used. Bug: b/237714063 Change-Id: I584fd1f60a3e02f1ded092de98970725fc66c5b8
2022-08-01VPX: Add vp9_quantize_fp_32x32_avx2().Scott LaVarnway
Up to 1.80x faster than vp9_quantize_fp_32x32_ssse3() for full calculations. Bug: b/237714063 Change-Id: Ic4ae4724fce7ac85c7a089535b16a999e02f0a10
2022-07-27VPX: vp9_quantize_fp_avx2() cleanup.Scott LaVarnway
No change in performance. Bug: b/237714063 Change-Id: I8ea42759cc4dc57be6a29c23784997cb90ad4090
2022-07-26highbd_temporal_filter_sse4: remove unused function paramsJames Zern
this clears warnings under clang-13 of the form: vp9/encoder/x86/highbd_temporal_filter_sse4.c|196 col 63| warning: parameter 'v_pre' set but not used [-Wunused-but-set-parameter] this is the high-bitdepth version of: 73b8aade8 temporal_filter_sse4: remove unused function params Change-Id: I9b2c9bf27c16975e4855df6a2c967da4c8c63a3a
2022-06-28rtc-svc: Fix to make SVC work for Profile 1Marco Paniconi
Added datarate unittest for 4:4:4 and 4:2:2 input, for spatial and temporal layers. Fix is needed in vp9_set_literal_size(): the sampling_x/y should be passed into update_inital_width(), othewise sampling_x/y = 1/1 (4:2:0) was forced. vp9_set_literal_size() is only called by the svc and on dynamic resize. Fix issue with the normative optimized scaler: UV width/height was assumed to be 1/2 of Y, for the ssse and neon code. Also fix to assert for the scaled width/height: in case scaled width/height is odd it should be incremented by 1 (make it even). Change-Id: I3a2e40effa53c505f44ef05aaa3132e1b7f57dd5
2022-06-01vp9,encoder: fix some integer sanitizer warningsJames Zern
the issues fixed in this change are related to implicit conversions between int / unsigned int: vp9/encoder/vp9_segmentation.c:42:36: runtime error: implicit conversion from type 'int' of value -9 (32-bit, signed) to type 'unsigned int' changed the value to 4294967287 (32-bit, unsigned) vpx_dsp/x86/sum_squares_sse2.c:36:52: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) vpx_dsp/x86/sum_squares_sse2.c:36:67: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) vp9/encoder/x86/vp9_diamond_search_sad_avx.c:81:45: runtime error: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 4290576316 (32-bit, unsigned) to type 'int' changed the value to -4390980 (32-bit, signed) vp9/encoder/vp9_rdopt.c:3472:31: runtime error: implicit conversion from type 'int' of value -1024 (32-bit, signed) to type 'uint16_t' (aka 'unsigned short') changed the value to 64512 (16-bit, unsigned) unsigned is forced for masks and int is used with intel intrinsics Bug: webm:1767 Change-Id: Icfa4179e13bc98a36ac29586b60d65819d3ce9ee Fixed: webm:1767
2022-04-18temporal_filter_sse4,cosmetics: fix some typosJames Zern
Change-Id: If8318068a32da52d15c0ba595f80092611f4c847
2022-04-14temporal_filter_sse4: remove unused function paramsJames Zern
this clears warnings under clang-13 of the form: ../vp9/encoder/x86/temporal_filter_sse4.c:275:39: warning: parameter 'u_pre' set but not used [-Wunused-but-set-parameter] Change-Id: I21519b5b0b9c21b04b174327415e0e73b56bdfda
2022-03-30remove skip_block from quantizeJohann
Whether a block is skipped is handled by mi->skip. x->skip_block is kept exclusively to verify that the quantize functions are not called for skip blocks. Finishes the cleanup in 13eed991f Bug: libvpx:1612 Change-Id: I1598c3b682d3c5e6c57a15fa4cb5df2c65b3a58a
2021-12-09vp9_diamond_search_sad_avx: quiet -Wmaybe-uninitialized warningJames Zern
w/gcc-11 v_these_mv_w is always initialized in this block with _mm_add_epi16(); converting this to a _mm_storeu_si32(tmp) call also works, but introduces more stack usage || ../vp9/encoder/x86/vp9_diamond_search_sad_avx.c: In function ‘vp9_diamond_search_sad_avx’: vp9/encoder/x86/vp9_diamond_search_sad_avx.c|285 col 19| warning: ‘v_these_mv_w’ may be used uninitialized [-Wmaybe-uninitialized] || 285 | new_bmv = ((const int_mv *)&v_these_mv_w)[local_best_idx]; || | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vp9/encoder/x86/vp9_diamond_search_sad_avx.c|149 col 21| note: ‘v_these_mv_w’ declared here || 149 | const __m128i v_these_mv_w = _mm_add_epi16(v_bmv_w, v_ss_mv_w); || | ^~~~~~~~~~~~ Change-Id: I1cd2fcb41030db16f51c94f3a70eb8eb2a526401
2019-09-30namespace ARCH_* definesJames Zern
this prevents redefinition warnings if a toolchain sets one BUG=b/117240165 Change-Id: Ib5d8c303cd05b4dbcc8d42c71ecfcba8f6d7b90c
2019-09-10vp9_quantize_sse2: quiet clang-7 integer sanitizer warningJames Zern
nzflag is used as a boolean, it doesn't need to be a sized type, int is enough (and _mm_movemask_epi8 returns one) fixes: vp9_quantize_sse2.c:136:16: implicit conversion from type 'int' of value 65535 (32-bit, signed) to type 'int16_t' (aka 'short') changed the value to -1 (16-bit, signed) BUG=webm:1649 Change-Id: I0e3f5278af49d84760f3dfb607f28099cf02f21d
2019-04-30cast ambiguous _mm_set1_epiNN() constantsJohann
clang 7 integer sanitizer warns on unsigned->signed conversions when the highest bit is 1. BUG=webm:1615 Change-Id: I6381efaff9233254b40cb78f7bcf87090e0ad353
2019-03-29update .clang-format for version clang-7.0.1 update.Hien Ho
added files that are affected by clang-format version 7. BUG=b/120815481 Change-Id: I40662ce962e4f4b1fcdf183b700f85cc5c0f9f82
2019-03-25Remove deprecated code for vp9_fdct8x8_quant()Jingning Han
Change-Id: If146bbf24f446f71be9147402e6d30533eee99d1
2019-03-07Optimize SSE4_1 lowbd temporal filter implementationchiyotsai
- Change some unaligned loads to aligned loads - Preload filter weights BUG=webm:1591 Change-Id: I4e5e755e1fa5613d1c14191265bf80b0bfd0b75c
2019-03-04Add SSE4_1 highbd version of temporal filterchiyotsai
The SSE4_1 version of temporal filter does not distinguish between bd 10 and bd 12. Speed up: Function Level: | !SS_X | SS_X !SS_Y | 6.44X | 6.37X SS_Y | 6.56X | 6.63X Video Level: 2.5% speed up on basketballpass_240p over 150 frames on speed 1, bitdepth 10, auto-alt-ref=1 BUG=webm:1591 Change-Id: I49aa2ed4acfe80a8d627038322de66cbe691296e
2019-02-04Fix an inline varible declaration in temporal filterchiyotsai
bug=webm:1595 Change-Id: I7fbb16444a8526eb9479007772fbf52b09ff8338
2019-02-04Some cosmetic fixes to temporal filterchiyotsai
BUG=webm:1591 Change-Id: I34fd7e6cbe6f3d5486a669d0895402fd21de7641
2019-02-01Remove old version of temporal_filter_applychiyotsai
BUG=webm:1591 Change-Id: I926566ac1bf4bac8cb1ce1c6ded9ba940109283e
2019-01-28Fix mismatch between SIMD/C version of vp9_apply_temporal_filterchiyotsai
Change-Id: I6503ebc79beaac2947992437ac133f3ac4379019
2019-01-24Add SSE4 version of new apply_temporal_filterchiyotsai
This adds a preliminary version of vp9_apply_temporal_filter in SSE4.1. This patch merely adds the function and does not enable it yet. Speed Up: | ss_x=1 | ss_x=0 | ss_y=1 | 19.80X | 19.04X | ss_y=0 | 21.09X | 20.21X | BUG=webm:1591 Change-Id: If590f1ccf1d0c6c3b47410541d54f2ce37d8305b
2019-01-07fix vp9 fdct_quantJohann
Values in [q]coeff1 were not correctly stored. This caused a segfault in the sse2 libvpx__nightly_optimization jobs. Broken in: commit 85032bac388917916f7a149173db8b34e93e8f6e Author: Johann <johannkoenig@google.com> Date: Fri Dec 21 00:27:00 2018 +0000 fdct_quant: resolve missing declarations BUG=webm:1584 Change-Id: I5f5fad34ec5e32023f5b40ff3691125754c11ced
2018-12-21vp9_highbd_block_error_sse2: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I43d051c538bf4a6f6210eefa398dc0901ab8d157
2018-12-21fdct_quant: resolve missing declarationsJohann
Store outputs using store_tran_low() BUG=webm:1584 Change-Id: I213abe047e14625c5ef80df7fa6fdc2a31e40fb6
2018-11-27rename quantize_x86.hJohann
Pave the way for new quantize_OPT.h helper files. Change-Id: Ice7225612983f5587a9660af3320c7d0c8bb1c2f
2018-10-30clang-tidy: fix vp9/encoder parametersJohann
BUG=webm:1444 Change-Id: I6823635eb1a99c3fcca0a8f091878e3ab2fdd2ac
2018-08-08Simplify temporal filter strength calculationJingning Han
Change-Id: I5f878e9b6581bcb427ecc29ce490feb68378f8af
2018-02-05Update tx_type switch code in idctLinfeng Zhang
Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
2018-01-18vp9_quantize_fp_avx2()Scott LaVarnway
Started from vp9_quantize_fp_sse2 and tweaked to use avx2. Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
2017-12-21vp9_quantize_ssse3_x86_64: fix out of bounds writeJames Zern
eob is a pointer to a uint16_t. previously the code would store 64-bits causing a crash or test failure with the right stack layout. Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a