diff options
author | Andrew Senkevich <andrew.senkevich@intel.com> | 2014-12-29 14:39:46 +0300 |
---|---|---|
committer | H.J. Lu <hjl.tools@gmail.com> | 2014-12-30 07:19:38 -0800 |
commit | 8b4416d83c79ba77b0669203741c712880a09ae4 (patch) | |
tree | c0701090d02b9e3c9ddcb840e2ad62084c498b4a /sysdeps/i386/i686/multiarch/memmove.S | |
parent | 6d6d7fde04c8ef830205a9900bf101597a2f4b18 (diff) | |
download | glibc-8b4416d83c79ba77b0669203741c712880a09ae4.tar glibc-8b4416d83c79ba77b0669203741c712880a09ae4.tar.gz glibc-8b4416d83c79ba77b0669203741c712880a09ae4.tar.bz2 glibc-8b4416d83c79ba77b0669203741c712880a09ae4.zip |
i386: memcpy functions with SSE2 unaligned load/store
These new memcpy functions are the 32-bit version of x86_64 SSE2 unaligned
memcpy. Memcpy average performace benefit is 18% on Silvermont, other
platforms also improved about 35%, benchmarked on Silvermont, Haswell, Ivy
Bridge, Sandy Bridge and Westmere, performance results attached in
https://sourceware.org/ml/libc-alpha/2014-07/msg00157.html
* sysdeps/i386/i686/multiarch/bcopy-sse2-unaligned.S: New file.
* sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove-sse2-unaligned.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy-sse2-unaligned.S: Likewise.
* sysdeps/i386/i686/multiarch/bcopy.S: Select the sse2_unaligned
version if bit_Fast_Unaligned_Load is set.
* sysdeps/i386/i686/multiarch/memcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/Makefile (sysdep_routines): Add
bcopy-sse2-unaligned, memcpy-sse2-unaligned,
memmove-sse2-unaligned and mempcpy-sse2-unaligned.
* sysdeps/i386/i686/multiarch/ifunc-impl-list.c (MAX_IFUNC): Set
to 4.
(__libc_ifunc_impl_list): Test __bcopy_sse2_unaligned,
__memmove_chk_sse2_unaligned, __memmove_sse2_unaligned,
__memcpy_chk_sse2_unaligned, __memcpy_sse2_unaligned,
__mempcpy_chk_sse2_unaligned, and __mempcpy_sse2_unaligned.
Diffstat (limited to 'sysdeps/i386/i686/multiarch/memmove.S')
-rw-r--r-- | sysdeps/i386/i686/multiarch/memmove.S | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/sysdeps/i386/i686/multiarch/memmove.S b/sysdeps/i386/i686/multiarch/memmove.S index d8de7c6894..29644dd916 100644 --- a/sysdeps/i386/i686/multiarch/memmove.S +++ b/sysdeps/i386/i686/multiarch/memmove.S @@ -35,6 +35,11 @@ ENTRY(memmove) jne 1f call __init_cpu_features 1: leal __memmove_ia32@GOTOFF(%ebx), %eax + testl $bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx) + jz 2f + leal __memmove_sse2_unaligned@GOTOFF(%ebx), %eax + testl $bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx) + jnz 2f testl $bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx) jz 2f leal __memmove_ssse3@GOTOFF(%ebx), %eax @@ -63,6 +68,11 @@ ENTRY(memmove) jne 1f call __init_cpu_features 1: leal __memmove_ia32, %eax + testl $bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features + jz 2f + leal __memmove_sse2_unaligned, %eax + testl $bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features + jnz 2f testl $bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features jz 2f leal __memmove_ssse3, %eax |