diff options
author | Noah Goldstein <goldstein.w.n@gmail.com> | 2021-11-01 00:49:51 -0500 |
---|---|---|
committer | Noah Goldstein <goldstein.w.n@gmail.com> | 2021-11-06 16:18:03 -0500 |
commit | a6b7502ec0c2da89a7437f43171f160d713e39c6 (patch) | |
tree | d2ff01bb7c3ea8b1e1415542a50913a27bbb3707 /sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S | |
parent | ac759b1fbf28a82d99afde9046f8b72c7cba5dae (diff) | |
download | glibc-a6b7502ec0c2da89a7437f43171f160d713e39c6.tar glibc-a6b7502ec0c2da89a7437f43171f160d713e39c6.tar.gz glibc-a6b7502ec0c2da89a7437f43171f160d713e39c6.tar.bz2 glibc-a6b7502ec0c2da89a7437f43171f160d713e39c6.zip |
x86: Optimize memmove-vec-unaligned-erms.S
No bug.
The optimizations are as follows:
1) Always align entry to 64 bytes. This makes behavior more
predictable and makes other frontend optimizations easier.
2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have
significant benefits in the case that:
0 < (dst - src) < [256, 512]
3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%]
improvement and for FSRM [-10%, 25%].
In addition to these primary changes there is general cleanup
throughout to optimize the aligning routines and control flow logic.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Diffstat (limited to 'sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S')
-rw-r--r-- | sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S index 848848ab39..0fa7126830 100644 --- a/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S @@ -25,7 +25,7 @@ # define VMOVU vmovdqu64 # define VMOVA vmovdqa64 # define VZEROUPPER - +# define MOV_SIZE 6 # define SECTION(p) p##.evex512 # define MEMMOVE_SYMBOL(p,s) p##_avx512_##s |