diff options
author | H.J. Lu <hjl.tools@gmail.com> | 2017-06-09 05:45:43 -0700 |
---|---|---|
committer | H.J. Lu <hjl.tools@gmail.com> | 2017-06-09 05:45:52 -0700 |
commit | d2538b91568af2a63c9d8649ce11756d4dfbdac3 (patch) | |
tree | 22cc8602e6ab159f296651224be8a6c3460f2581 /sysdeps/x86_64/multiarch/Makefile | |
parent | 5ac7aa1d7cce8580f8225c33c819991abca102b9 (diff) | |
download | glibc-d2538b91568af2a63c9d8649ce11756d4dfbdac3.tar glibc-d2538b91568af2a63c9d8649ce11756d4dfbdac3.tar.gz glibc-d2538b91568af2a63c9d8649ce11756d4dfbdac3.tar.bz2 glibc-d2538b91568af2a63c9d8649ce11756d4dfbdac3.zip |
x86-64: Optimize strrchr/wcsrchr with AVX2
Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector
instructions. It is as fast as SSE2 version for small data sizes
and up to 1X faster for large data sizes on Haswell. Select AVX2
version on AVX2 machines where vzeroupper is preferred and AVX
unaligned load is fast.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
strrchr-sse2, strrchr-avx2, wcsrchr-sse2 and wcsrchr-avx2.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add tests for __strrchr_avx2,
__strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2.
* sysdeps/x86_64/multiarch/strrchr-avx2.S: New file.
* sysdeps/x86_64/multiarch/strrchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strrchr.c: Likewise.
* sysdeps/x86_64/multiarch/wcsrchr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/wcsrchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/wcsrchr.c: Likewise.
Diffstat (limited to 'sysdeps/x86_64/multiarch/Makefile')
-rw-r--r-- | sysdeps/x86_64/multiarch/Makefile | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile index 2fa390b3dd..c901704b11 100644 --- a/sysdeps/x86_64/multiarch/Makefile +++ b/sysdeps/x86_64/multiarch/Makefile @@ -15,6 +15,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c strcmp-ssse3 \ memmove-ssse3-back \ memmove-avx512-no-vzeroupper strcasecmp_l-ssse3 \ strchr-sse2 strchrnul-sse2 strchr-avx2 strchrnul-avx2 \ + strrchr-sse2 strrchr-avx2 \ strlen-sse2 strnlen-sse2 strlen-avx2 strnlen-avx2 \ strncase_l-ssse3 strcat-ssse3 strncat-ssse3\ strcpy-ssse3 strncpy-ssse3 stpcpy-ssse3 stpncpy-ssse3 \ @@ -40,6 +41,7 @@ sysdep_routines += wmemcmp-sse4 wmemcmp-ssse3 wmemcmp-c \ wmemchr-sse2 wmemchr-avx2 \ wcscpy-ssse3 wcscpy-c \ wcschr-sse2 wcschr-avx2 \ + wcsrchr-sse2 wcsrchr-avx2 \ wcsnlen-sse4_1 wcsnlen-c \ wcslen-sse2 wcslen-avx2 wcsnlen-avx2 endif |