Keep vp8 sixtap read within bounds

When filtering it needs 6 pixels: 2 prior to the source, the source, and 3 after the source. When filtering 16 wide, that means 21. To accomplish this the SSE2 reads [-2] to [5], [6] to [13], and [14] to [21], a total of 24 bytes (reading in groups of 8 is easy) The filter then shifts this last set to the top half of the register and uses 'or' to combine it with the previous set. Valgrind detected an issue reading pixels [19], [20] and [21]: Address 0x7f581c2 is 434 bytes inside a block of size 441 alloc'd Note: we only need pixels [16], [17], and [18] as context for [15]. To fix this, it now reads 8 bytes starting at [11], which re-loads [11] through [13], but stops at [18] and does not over-read any values. This is shifted by 5 and 'or'd with xmm1. Although the lower bits are not cleared, they overlap directly with [11] through [13], so 'or' produces the correct results. Change-Id: I0c89c03afa660fc9b0108ac055d7bd403e493320
author: Johann <johannkoenig@google.com> 2016-09-21 15:55:45 -0700
committer: Johann <johannkoenig@google.com> 2016-09-21 16:17:07 -0700
commit: 2bed8b6acd60f6e3db768e06170364c43a92faa2 (patch)
tree: 54b0762e25f6676784be3eacf6036d59bdecd849 /vp8
parent: 35ebc1cddf3542692acf3690302dd2028ce251fb (diff)
download: libvpx-2bed8b6acd60f6e3db768e06170364c43a92faa2.tar
libvpx-2bed8b6acd60f6e3db768e06170364c43a92faa2.tar.gz
libvpx-2bed8b6acd60f6e3db768e06170364c43a92faa2.tar.bz2
libvpx-2bed8b6acd60f6e3db768e06170364c43a92faa2.zip
1 files changed, 6 insertions, 2 deletions
diff --git a/vp8/common/x86/subpixel_sse2.asm b/vp8/common/x86/subpixel_sse2.asm
index 69f8d103c..ca00583ca 100644
--- a/vp8/common/x86/subpixel_sse2.asm
+++ b/vp8/common/x86/subpixel_sse2.asm
@@ -181,8 +181,12 @@ sym(vp8_filter_block1d16_h6_sse2):
         movq        xmm3,       MMWORD PTR [rsi - 2]
         movq        xmm1,       MMWORD PTR [rsi + 6]
 
-        movq        xmm2,       MMWORD PTR [rsi +14]
-        pslldq      xmm2,       8
+        ; Load from 11 to avoid reading out of bounds.
+        movq        xmm2,       MMWORD PTR [rsi +11]
+        ; The lower bits are not cleared before 'or'ing with xmm1,
+        ; but that is OK because the values in the overlapping positions
+        ; are already equal to the ones in xmm1.
+        pslldq      xmm2,       5
 
         por         xmm2,       xmm1
         prefetcht2  [rsi+rax-2]
author	Johann <johannkoenig@google.com>	2016-09-21 15:55:45 -0700
committer	Johann <johannkoenig@google.com>	2016-09-21 16:17:07 -0700
commit	2bed8b6acd60f6e3db768e06170364c43a92faa2 (patch)
tree	54b0762e25f6676784be3eacf6036d59bdecd849 /vp8
parent	35ebc1cddf3542692acf3690302dd2028ce251fb (diff)
download	libvpx-2bed8b6acd60f6e3db768e06170364c43a92faa2.tar libvpx-2bed8b6acd60f6e3db768e06170364c43a92faa2.tar.gz libvpx-2bed8b6acd60f6e3db768e06170364c43a92faa2.tar.bz2 libvpx-2bed8b6acd60f6e3db768e06170364c43a92faa2.zip