diff options
author | H.J. Lu <hjl.tools@gmail.com> | 2017-05-19 10:46:29 -0700 |
---|---|---|
committer | H.J. Lu <hjl.tools@gmail.com> | 2017-05-19 10:48:45 -0700 |
commit | 402bf0695218bbe290418b9486b1dd5fe284d903 (patch) | |
tree | 0107d383f8a38c75076dae69996b15b46e13b04a /sysdeps/x86_64/memchr.S | |
parent | 1d71a6315396f6e1cc79a1d7ecca0a559929230a (diff) | |
download | glibc-402bf0695218bbe290418b9486b1dd5fe284d903.tar glibc-402bf0695218bbe290418b9486b1dd5fe284d903.tar.gz glibc-402bf0695218bbe290418b9486b1dd5fe284d903.tar.bz2 glibc-402bf0695218bbe290418b9486b1dd5fe284d903.zip |
x86: Optimize SSE2 memchr overflow calculation
SSE2 memchr computes "edx + ecx - 16" where ecx is less than 16. Use
"edx - (16 - ecx)", instead of satured math, to avoid possible addition
overflow. This replaces
add %ecx, %edx
sbb %eax, %eax
or %eax, %edx
sub $16, %edx
with
neg %ecx
add $16, %ecx
sub %ecx, %edx
It is the same for x86_64, except for rcx/rdx, instead of ecx/edx.
* sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Use
"edx + ecx - 16" to avoid possible addition overflow.
* sysdeps/x86_64/memchr.S (memchr): Likewise.
Diffstat (limited to 'sysdeps/x86_64/memchr.S')
-rw-r--r-- | sysdeps/x86_64/memchr.S | 14 |
1 files changed, 6 insertions, 8 deletions
diff --git a/sysdeps/x86_64/memchr.S b/sysdeps/x86_64/memchr.S index a205a25998..f82e1c5bf7 100644 --- a/sysdeps/x86_64/memchr.S +++ b/sysdeps/x86_64/memchr.S @@ -76,14 +76,12 @@ L(crosscache): .p2align 4 L(unaligned_no_match): - /* Calculate the last acceptable address and check for possible - addition overflow by using satured math: - rdx = rcx + rdx - rdx |= -(rdx < rcx) */ - add %rcx, %rdx - sbb %rax, %rax - or %rax, %rdx - sub $16, %rdx + /* "rcx" is less than 16. Calculate "rdx + rcx - 16" by using + "rdx - (16 - rcx)" instead of "(rdx + rcx) - 16" to void + possible addition overflow. */ + neg %rcx + add $16, %rcx + sub %rcx, %rdx jbe L(return_null) add $16, %rdi sub $64, %rdx |