libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-06-06	Add comments in vp9_diamond_search_sad_avx()	Deepa K G
	Added comments related to re-arranging the elements of the SAD vector to find the minimum. Change-Id: I58b702d304a6cdd32f04775fba603e39c19a8947
2023-06-05	Fix c vs avx mismatch of diamond_search_sad()	Deepa K G
	In the function vp9_diamond_search_sad_avx(), arranged the cost vector in a specific order. This ensures that the motion vector with the least index is selected, when there exists more than one candidate motion vector with the minimum cost, thus resolving the c vs avx mismatch. STATS_CHANGED Change-Id: I4f8864f464f9ea2aae6250db3d8ad91cb08b26e2
2023-04-18	Merge "Downsample SAD computation in motion search" into main	Yunqing Wang

2023-04-12	vp9_frame_scale_ssse3: clear -Wshadow warnings	James Zern
	Bug: webm:1793 Change-Id: I85608ac7bb6d3a61649ba342c13c3bf6a39a5dea
2023-04-11	Merge "vp9_quantize_avx2,highbd_get_max_lane_eob: fix mask" into main	James Zern

2023-04-11	Downsample SAD computation in motion search	Deepa K G
	Added a speed feature to skip every other row in SAD computation during motion search. Instruction Count BD-Rate Loss(%) cpu Resolution Reduction(%) avg.psnr ovr.psnr ssim 0 LOWRES2 0.958 0.0204 0.0095 0.0275 0 MIDRES2 1.891 -0.0636 0.0032 0.0247 0 HDRES2 2.869 0.0434 0.0345 0.0686 0 Average 1.905 0.0000 0.0157 0.0403 STATS_CHANGED Change-Id: I1a8692757ed0cbcb2259729b3ecfb0436cdf49ce
2023-04-11	Avoid redundant start MV SAD calculation	Deepa K G
	Avoided repeated calculation of start MV SAD during full pixel motion search. Instruction Count cpu Resolution Reduction(%) 0 LOWRES2 0.162 0 MIDRES2 0.246 0 HDRES2 0.325 0 Average 0.245 Change-Id: I2b4786901f254ce32ee8ca8a3d56f1c9f112f1d4
2023-04-10	vp9_quantize_avx2,highbd_get_max_lane_eob: fix mask	James Zern
	Pack nz_mask with zero. After the result is permuted this has the effect of ignoring the upper half of the iscan register which is only loaded with 128-bits. Depending on the optimization level and the load used the upper half of the ymm register may contain undefined values which can produce an incorrect eob. If this is large enough it can cause a crash. Bug: chromium:1431729 Change-Id: I4ebae9fa39f228bdd29dcc19935f3f07759d75f5
2023-03-14	[NEON] Add temporal filter functions, 8-bit and highbd	Konstantinos Margaritis
	Both are around 3x faster than original C version. 8-bit gives a small 0.5% speed increase, whereas highbd gives ~2.5%. Change-Id: I71d75ddd2757b19aa201e879fd9fa8f3a25431ad
2023-03-02	[SSE4_1] Fix overflow in highbd temporal_filter	Konstantinos Margaritis
	While porting this function to NEON, using SSE4_1 implementation as base I noticed that both were producing files with different checksums to the C reference implementation. After investigating further I found that this saturating pack was the culprit. Doing the multiplication on the 32-bit values, leads to producing the correct results with the C implementation. Change-Id: I40c2a36551b2db363a58ea9aa19ef327f2676de3
2022-10-25	Merge "quantize: consolidate sse2 conditionals" into main	Johann Koenig

2022-10-17	quantize: consolidate sse2 conditionals	Johann
	Change-Id: I43de579e30f2967b97064063e29676e0af1a498f
2022-10-17	vp9 quantize: rewrite ssse3 in intrinsics	Johann
	Change-Id: I3177251a5935453a23a23c39ea5f6fd41254775e
2022-10-01	vp9 quantize: change index	Johann
	In assembly it made sense to iterate using n_coeffs. In intrinsics it's just as fast to use index and easier to read. Change-Id: I403c959709309dad68123d0a3d0efe183874543d
2022-09-26	quantize: standardize vp9_quantize_fp_sse2	Johann
	Match style for vpx_quantize_b_sse2 and prepare to rewrite ssse3 version in intrinsics. Need to evaluate the value of threshold breakout before going further. Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b
2022-09-23	quantize: increase iscan by 1	Johann
	All of the assembly adds 1 to iscan to convert from a 0 based array to the EOB value. Add 1 to all iscan values and remove the extra instructions from the assembly. Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9
2022-09-02	x86,cosmetics: prefer _mm_setzero_si128/_mm256_setzero_si256	James Zern
	over _set1_(0) Change-Id: I136e1798a2ce286480ebb9418db67a2f1e92b9a2
2022-08-09	VPX: Fix vp9_quantize_fp_avx2() VS build error.	Scott LaVarnway
	Add build fix for _mm256_extract_epi16() being undefined. Bug: b/237714063 Change-Id: I855b1828ce1b6b2b2f063fe097999481881bf074
2022-08-03	VPX: Add vp9_highbd_quantize_fp_32x32_avx2().	Scott LaVarnway
	~4x faster than vp9_highbd_quantize_fp_32x32_c() for full calculations. Bug: b/237714063 Change-Id: Iff2182b8e7b1ac79811e33080d1f6cac6679382d
2022-08-03	VPX: Add vp9_highbd_quantize_fp_avx2().	Scott LaVarnway
	Up to 5.37x faster than vp9_highbd_quantize_fp_c() for full calculations. ~1.6% overall encoder improvement for the test clip used. Bug: b/237714063 Change-Id: I584fd1f60a3e02f1ded092de98970725fc66c5b8
2022-08-01	VPX: Add vp9_quantize_fp_32x32_avx2().	Scott LaVarnway
	Up to 1.80x faster than vp9_quantize_fp_32x32_ssse3() for full calculations. Bug: b/237714063 Change-Id: Ic4ae4724fce7ac85c7a089535b16a999e02f0a10
2022-07-27	VPX: vp9_quantize_fp_avx2() cleanup.	Scott LaVarnway
	No change in performance. Bug: b/237714063 Change-Id: I8ea42759cc4dc57be6a29c23784997cb90ad4090
2022-07-26	highbd_temporal_filter_sse4: remove unused function params	James Zern
	this clears warnings under clang-13 of the form: vp9/encoder/x86/highbd_temporal_filter_sse4.c\|196 col 63\| warning: parameter 'v_pre' set but not used [-Wunused-but-set-parameter] this is the high-bitdepth version of: 73b8aade8 temporal_filter_sse4: remove unused function params Change-Id: I9b2c9bf27c16975e4855df6a2c967da4c8c63a3a
2022-06-28	rtc-svc: Fix to make SVC work for Profile 1	Marco Paniconi
	Added datarate unittest for 4:4:4 and 4:2:2 input, for spatial and temporal layers. Fix is needed in vp9_set_literal_size(): the sampling_x/y should be passed into update_inital_width(), othewise sampling_x/y = 1/1 (4:2:0) was forced. vp9_set_literal_size() is only called by the svc and on dynamic resize. Fix issue with the normative optimized scaler: UV width/height was assumed to be 1/2 of Y, for the ssse and neon code. Also fix to assert for the scaled width/height: in case scaled width/height is odd it should be incremented by 1 (make it even). Change-Id: I3a2e40effa53c505f44ef05aaa3132e1b7f57dd5
2022-06-01	vp9,encoder: fix some integer sanitizer warnings	James Zern
	the issues fixed in this change are related to implicit conversions between int / unsigned int: vp9/encoder/vp9_segmentation.c:42:36: runtime error: implicit conversion from type 'int' of value -9 (32-bit, signed) to type 'unsigned int' changed the value to 4294967287 (32-bit, unsigned) vpx_dsp/x86/sum_squares_sse2.c:36:52: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) vpx_dsp/x86/sum_squares_sse2.c:36:67: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) vp9/encoder/x86/vp9_diamond_search_sad_avx.c:81:45: runtime error: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 4290576316 (32-bit, unsigned) to type 'int' changed the value to -4390980 (32-bit, signed) vp9/encoder/vp9_rdopt.c:3472:31: runtime error: implicit conversion from type 'int' of value -1024 (32-bit, signed) to type 'uint16_t' (aka 'unsigned short') changed the value to 64512 (16-bit, unsigned) unsigned is forced for masks and int is used with intel intrinsics Bug: webm:1767 Change-Id: Icfa4179e13bc98a36ac29586b60d65819d3ce9ee Fixed: webm:1767
2022-04-18	temporal_filter_sse4,cosmetics: fix some typos	James Zern
	Change-Id: If8318068a32da52d15c0ba595f80092611f4c847
2022-04-14	temporal_filter_sse4: remove unused function params	James Zern
	this clears warnings under clang-13 of the form: ../vp9/encoder/x86/temporal_filter_sse4.c:275:39: warning: parameter 'u_pre' set but not used [-Wunused-but-set-parameter] Change-Id: I21519b5b0b9c21b04b174327415e0e73b56bdfda
2022-03-30	remove skip_block from quantize	Johann
	Whether a block is skipped is handled by mi->skip. x->skip_block is kept exclusively to verify that the quantize functions are not called for skip blocks. Finishes the cleanup in 13eed991f Bug: libvpx:1612 Change-Id: I1598c3b682d3c5e6c57a15fa4cb5df2c65b3a58a
2021-12-09	vp9_diamond_search_sad_avx: quiet -Wmaybe-uninitialized warning	James Zern
	w/gcc-11 v_these_mv_w is always initialized in this block with _mm_add_epi16(); converting this to a _mm_storeu_si32(tmp) call also works, but introduces more stack usage \|\| ../vp9/encoder/x86/vp9_diamond_search_sad_avx.c: In function ‘vp9_diamond_search_sad_avx’: vp9/encoder/x86/vp9_diamond_search_sad_avx.c\|285 col 19\| warning: ‘v_these_mv_w’ may be used uninitialized [-Wmaybe-uninitialized] \|\| 285 \| new_bmv = ((const int_mv *)&v_these_mv_w)[local_best_idx]; \|\| \| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vp9/encoder/x86/vp9_diamond_search_sad_avx.c\|149 col 21\| note: ‘v_these_mv_w’ declared here \|\| 149 \| const __m128i v_these_mv_w = _mm_add_epi16(v_bmv_w, v_ss_mv_w); \|\| \| ^~~~~~~~~~~~ Change-Id: I1cd2fcb41030db16f51c94f3a70eb8eb2a526401
2019-09-30	namespace ARCH_* defines	James Zern
	this prevents redefinition warnings if a toolchain sets one BUG=b/117240165 Change-Id: Ib5d8c303cd05b4dbcc8d42c71ecfcba8f6d7b90c
2019-09-10	vp9_quantize_sse2: quiet clang-7 integer sanitizer warning	James Zern
	nzflag is used as a boolean, it doesn't need to be a sized type, int is enough (and _mm_movemask_epi8 returns one) fixes: vp9_quantize_sse2.c:136:16: implicit conversion from type 'int' of value 65535 (32-bit, signed) to type 'int16_t' (aka 'short') changed the value to -1 (16-bit, signed) BUG=webm:1649 Change-Id: I0e3f5278af49d84760f3dfb607f28099cf02f21d
2019-04-30	cast ambiguous _mm_set1_epiNN() constants	Johann
	clang 7 integer sanitizer warns on unsigned->signed conversions when the highest bit is 1. BUG=webm:1615 Change-Id: I6381efaff9233254b40cb78f7bcf87090e0ad353
2019-03-29	update .clang-format for version clang-7.0.1 update.	Hien Ho
	added files that are affected by clang-format version 7. BUG=b/120815481 Change-Id: I40662ce962e4f4b1fcdf183b700f85cc5c0f9f82
2019-03-25	Remove deprecated code for vp9_fdct8x8_quant()	Jingning Han
	Change-Id: If146bbf24f446f71be9147402e6d30533eee99d1
2019-03-07	Optimize SSE4_1 lowbd temporal filter implementation	chiyotsai
	- Change some unaligned loads to aligned loads - Preload filter weights BUG=webm:1591 Change-Id: I4e5e755e1fa5613d1c14191265bf80b0bfd0b75c
2019-03-04	Add SSE4_1 highbd version of temporal filter	chiyotsai
	The SSE4_1 version of temporal filter does not distinguish between bd 10 and bd 12. Speed up: Function Level: \| !SS_X \| SS_X !SS_Y \| 6.44X \| 6.37X SS_Y \| 6.56X \| 6.63X Video Level: 2.5% speed up on basketballpass_240p over 150 frames on speed 1, bitdepth 10, auto-alt-ref=1 BUG=webm:1591 Change-Id: I49aa2ed4acfe80a8d627038322de66cbe691296e
2019-02-04	Fix an inline varible declaration in temporal filter	chiyotsai
	bug=webm:1595 Change-Id: I7fbb16444a8526eb9479007772fbf52b09ff8338
2019-02-04	Some cosmetic fixes to temporal filter	chiyotsai
	BUG=webm:1591 Change-Id: I34fd7e6cbe6f3d5486a669d0895402fd21de7641
2019-02-01	Remove old version of temporal_filter_apply	chiyotsai
	BUG=webm:1591 Change-Id: I926566ac1bf4bac8cb1ce1c6ded9ba940109283e
2019-01-28	Fix mismatch between SIMD/C version of vp9_apply_temporal_filter	chiyotsai
	Change-Id: I6503ebc79beaac2947992437ac133f3ac4379019
2019-01-24	Add SSE4 version of new apply_temporal_filter	chiyotsai
	This adds a preliminary version of vp9_apply_temporal_filter in SSE4.1. This patch merely adds the function and does not enable it yet. Speed Up: \| ss_x=1 \| ss_x=0 \| ss_y=1 \| 19.80X \| 19.04X \| ss_y=0 \| 21.09X \| 20.21X \| BUG=webm:1591 Change-Id: If590f1ccf1d0c6c3b47410541d54f2ce37d8305b
2019-01-07	fix vp9 fdct_quant	Johann
	Values in [q]coeff1 were not correctly stored. This caused a segfault in the sse2 libvpx__nightly_optimization jobs. Broken in: commit 85032bac388917916f7a149173db8b34e93e8f6e Author: Johann <johannkoenig@google.com> Date: Fri Dec 21 00:27:00 2018 +0000 fdct_quant: resolve missing declarations BUG=webm:1584 Change-Id: I5f5fad34ec5e32023f5b40ff3691125754c11ced
2018-12-21	vp9_highbd_block_error_sse2: resolve missing declarations	Johann
	BUG=webm:1584 Change-Id: I43d051c538bf4a6f6210eefa398dc0901ab8d157
2018-12-21	fdct_quant: resolve missing declarations	Johann
	Store outputs using store_tran_low() BUG=webm:1584 Change-Id: I213abe047e14625c5ef80df7fa6fdc2a31e40fb6
2018-11-27	rename quantize_x86.h	Johann
	Pave the way for new quantize_OPT.h helper files. Change-Id: Ice7225612983f5587a9660af3320c7d0c8bb1c2f
2018-10-30	clang-tidy: fix vp9/encoder parameters	Johann
	BUG=webm:1444 Change-Id: I6823635eb1a99c3fcca0a8f091878e3ab2fdd2ac
2018-08-08	Simplify temporal filter strength calculation	Jingning Han
	Change-Id: I5f878e9b6581bcb427ecc29ce490feb68378f8af
2018-02-05	Update tx_type switch code in idct	Linfeng Zhang
	Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
2018-01-18	vp9_quantize_fp_avx2()	Scott LaVarnway
	Started from vp9_quantize_fp_sse2 and tweaked to use avx2. Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
2017-12-21	vp9_quantize_ssse3_x86_64: fix out of bounds write	James Zern
	eob is a pointer to a uint16_t. previously the code would store 64-bits causing a crash or test failure with the right stack layout. Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a