glibc.git - Mirror of https://sourceware.org/git/glibc.git

diff options

author	Krzysztof Koch <Krzysztof.Koch@arm.com>	2020-06-08 14:06:15 +0100
committer	Wilco Dijkstra <wdijkstr@arm.com>	2020-06-08 14:13:05 +0100
commit	d1f75e964484504e4f30f4623569d5889a97ac18 (patch)
tree	5b123eeec64573bb9fa690b3f0186b66bb53ae6a /libio/iofwide.c
parent	f112dcc506a6ec0aac5c34891736eec3c4f5dad6 (diff)
download	glibc-d1f75e964484504e4f30f4623569d5889a97ac18.tar glibc-d1f75e964484504e4f30f4623569d5889a97ac18.tar.gz glibc-d1f75e964484504e4f30f4623569d5889a97ac18.tar.bz2 glibc-d1f75e964484504e4f30f4623569d5889a97ac18.zip

AArch64: Merge Falkor memcpy and memmove implementations

Falkor's memcpy and memmove share some implementation details, therefore, the two routines are moved to a single source file for code reuse. The two routines now share code for small and medium copies (up to and including 128 bytes). Large copies in memcpy do not handle overlap correctly, consequently, the loops for moving/copying more than 128 bytes stay separate for memcpy and memmove. To increase code reuse a number of small modifications were made: 1. The old implementation of memcpy copied the first 16-bytes as soon as the size of data was determined to be greater than 32 bytes. For memcpy code to also work when copying small/medium overlapping data, the first load and store was moved to the large copy case. 2. Medium memcpy case no longer assumes that 16 bytes were already copied and uses 8 registers to copy up to 128 bytes. 3. Small case for memmove was enlarged to that of memcpy, which is less than or equal to 32 bytes. 4. Medium case for memmove was enlarged to that of memcpy, which is less than or equal to 128 bytes. Other changes include: 1. Improve alignment of existing loop bodies. 2. 'Delouse' memmove and memcpy input arguments. Make sure that upper 32-bits of input registers are zeroed if unused. 3. Do one more iteration in memmove loops and reduce the number of copies made from the start/end of the buffer, depending on the direction of the memmove loop. Benchmarking: Looking at the results from bench-memcpy-random.out, we can see that now memmove_falkor is about 5% faster than memcpy_falkor_old, while memmove_falkor_old was more than 15% slower. The memcpy implementation remained largely unmodified, so there is no significant performance change. The reason for such a significant memmove performance gain is the increase of the upper bound on the small copy case to 32 bytes and the increase of the upper bound on the medium copy case to 128 bytes. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Diffstat (limited to 'libio/iofwide.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: