diff options
author | Wilco Dijkstra <wdijkstr@arm.com> | 2021-12-02 18:33:26 +0000 |
---|---|---|
committer | Wilco Dijkstra <wdijkstr@arm.com> | 2021-12-02 18:36:03 +0000 |
commit | b31bd11454fade731e5158b1aea40b133ae19926 (patch) | |
tree | d6d25ad11615c9f2c91a607f7d7a7cdc958bb5d7 /nss/proto-lookup.c | |
parent | b51eb35c572b015641f03e3682c303f7631279b7 (diff) | |
download | glibc-b31bd11454fade731e5158b1aea40b133ae19926.tar glibc-b31bd11454fade731e5158b1aea40b133ae19926.tar.gz glibc-b31bd11454fade731e5158b1aea40b133ae19926.tar.bz2 glibc-b31bd11454fade731e5158b1aea40b133ae19926.zip |
AArch64: Improve A64FX memcpy
v2 is a complete rewrite of the A64FX memcpy. Performance is improved
by streamlining the code, aligning all large copies and using a single
unrolled loop for all sizes. The code size for memcpy and memmove goes
down from 1796 bytes to 868 bytes. Performance is better in all cases:
bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33%
faster for large sizes, bench-memcpy-walk is 25% faster for small sizes
and 20% for the largest sizes. The geomean of all tests in bench-memcpy
is 5.1% faster, and total time is reduced by 4%.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Diffstat (limited to 'nss/proto-lookup.c')
0 files changed, 0 insertions, 0 deletions