diff options
author | H.J. Lu <hjl.tools@gmail.com> | 2016-03-31 10:05:51 -0700 |
---|---|---|
committer | H.J. Lu <hjl.tools@gmail.com> | 2016-03-31 10:06:07 -0700 |
commit | 830566307f038387ca0af3fd327706a8d1a2f595 (patch) | |
tree | 22d89ebf426a8799ec13913fd6591a53d4663973 /sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S | |
parent | 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb (diff) | |
download | glibc-830566307f038387ca0af3fd327706a8d1a2f595.tar glibc-830566307f038387ca0af3fd327706a8d1a2f595.tar.gz glibc-830566307f038387ca0af3fd327706a8d1a2f595.tar.bz2 glibc-830566307f038387ca0af3fd327706a8d1a2f595.zip |
Add x86-64 memset with unaligned store and rep stosb
Implement x86-64 memset with unaligned store and rep movsb. Support
16-byte, 32-byte and 64-byte vector register sizes. A single file
provides 2 implementations of memset, one with rep stosb and the other
without rep stosb. They share the same codes when size is between 2
times of vector register size and REP_STOSB_THRESHOLD which defaults
to 2KB.
Key features:
1. Use overlapping store to avoid branch.
2. For size <= 4 times of vector register size, fully unroll the loop.
3. For size > 4 times of vector register size, store 4 times of vector
register size at a time.
[BZ #19881]
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
memset-avx512-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
__memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
__memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
__memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
__memset_sse2_unaligned_erms, __memset_erms,
__memset_avx2_unaligned, __memset_avx2_unaligned_erms,
__memset_avx512_unaligned_erms and __memset_avx512_unaligned.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
file.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
Likewise.
Diffstat (limited to 'sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S')
-rw-r--r-- | sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S new file mode 100644 index 0000000000..437a858dab --- /dev/null +++ b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S @@ -0,0 +1,16 @@ +#define VEC_SIZE 16 +#define VEC(i) xmm##i +#define VMOVU movdqu +#define VMOVA movdqa + +#define VDUP_TO_VEC0_AND_SET_RETURN(d, r) \ + movd d, %xmm0; \ + movq r, %rax; \ + punpcklbw %xmm0, %xmm0; \ + punpcklwd %xmm0, %xmm0; \ + pshufd $0, %xmm0, %xmm0 + +#define SECTION(p) p +#define MEMSET_SYMBOL(p,s) p##_sse2_##s + +#include "memset-vec-unaligned-erms.S" |