[SSE4_1] Fix overflow in highbd temporal_filter

While porting this function to NEON, using SSE4_1 implementation as base I noticed that both were producing files with different checksums to the C reference implementation. After investigating further I found that this saturating pack was the culprit. Doing the multiplication on the 32-bit values, leads to producing the correct results with the C implementation. Change-Id: I40c2a36551b2db363a58ea9aa19ef327f2676de3
author: Konstantinos Margaritis <konstantinos@vectorcamp.gr> 2023-03-01 23:54:51 +0000
committer: Konstantinos Margaritis <konstantinos@vectorcamp.gr> 2023-03-02 00:02:16 +0000
commit: 817248e1be1548af10f3d4f0922e01e372d10cea (patch)
tree: 38f8e09b6f7185f14fd1bac53d72d28c8d2fc6c3 /vp9/encoder/x86
parent: 0e7804ca30c367eefb17594a0c5096f2f26de732 (diff)
download: libvpx-817248e1be1548af10f3d4f0922e01e372d10cea.tar
libvpx-817248e1be1548af10f3d4f0922e01e372d10cea.tar.gz
libvpx-817248e1be1548af10f3d4f0922e01e372d10cea.tar.bz2
libvpx-817248e1be1548af10f3d4f0922e01e372d10cea.zip
1 files changed, 3 insertions, 2 deletions
diff --git a/vp9/encoder/x86/highbd_temporal_filter_sse4.c b/vp9/encoder/x86/highbd_temporal_filter_sse4.c
index a7f5117cf..bcbf6d77e 100644
--- a/vp9/encoder/x86/highbd_temporal_filter_sse4.c
+++ b/vp9/encoder/x86/highbd_temporal_filter_sse4.c
@@ -141,11 +141,12 @@ static INLINE void highbd_accumulate_and_store_8(const __m128i sum_first_u32,
   count_u16 = _mm_adds_epu16(count_u16, sum_u16);
   _mm_storeu_si128((__m128i *)count, count_u16);
 
-  pred_u16 = _mm_mullo_epi16(sum_u16, pred_u16);
-
   pred_0_u32 = _mm_cvtepu16_epi32(pred_u16);
   pred_1_u32 = _mm_unpackhi_epi16(pred_u16, zero);
 
+  pred_0_u32 = _mm_mullo_epi32(sum_first_u32, pred_0_u32);
+  pred_1_u32 = _mm_mullo_epi32(sum_second_u32, pred_1_u32);
+
   accum_0_u32 = _mm_loadu_si128((const __m128i *)accumulator);
   accum_1_u32 = _mm_loadu_si128((const __m128i *)(accumulator + 4));
author	Konstantinos Margaritis <konstantinos@vectorcamp.gr>	2023-03-01 23:54:51 +0000
committer	Konstantinos Margaritis <konstantinos@vectorcamp.gr>	2023-03-02 00:02:16 +0000
commit	817248e1be1548af10f3d4f0922e01e372d10cea (patch)
tree	38f8e09b6f7185f14fd1bac53d72d28c8d2fc6c3 /vp9/encoder/x86
parent	0e7804ca30c367eefb17594a0c5096f2f26de732 (diff)
download	libvpx-817248e1be1548af10f3d4f0922e01e372d10cea.tar libvpx-817248e1be1548af10f3d4f0922e01e372d10cea.tar.gz libvpx-817248e1be1548af10f3d4f0922e01e372d10cea.tar.bz2 libvpx-817248e1be1548af10f3d4f0922e01e372d10cea.zip