aboutsummaryrefslogtreecommitdiff
path: root/ChangeLog.18
diff options
context:
space:
mode:
authorWilco Dijkstra <wdijkstr@arm.com>2017-08-10 17:00:38 +0100
committerWilco Dijkstra <wdijkstr@arm.com>2017-08-10 17:00:38 +0100
commit922369032c604b4dcfd535e1bcddd4687e7126a5 (patch)
tree82779a2afc66f4ef2f2c9006f90a412bffaad23e /ChangeLog.18
parent2449ae7b2da24c9940962304a3e44bc80e389265 (diff)
downloadglibc-922369032c604b4dcfd535e1bcddd4687e7126a5.tar
glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.tar.gz
glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.tar.bz2
glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.zip
[AArch64] Optimized memcmp.
This is an optimized memcmp for AArch64. This is a complete rewrite using a different algorithm. The previous version split into cases where both inputs were aligned, the inputs were mutually aligned and unaligned using a byte loop. The new version combines all these cases, while small inputs of less than 8 bytes are handled separately. This allows the main code to be sped up using unaligned loads since there are now at least 8 bytes to be compared. After the first 8 bytes, align the first input. This ensures each iteration does at most one unaligned access and mutually aligned inputs behave as aligned. After the main loop, process the last 8 bytes using unaligned accesses. This improves performance of (mutually) aligned cases by 25% and unaligned by >500% (yes >6 times faster) on large inputs. * sysdeps/aarch64/memcmp.S (memcmp): Rewrite of optimized memcmp.
Diffstat (limited to 'ChangeLog.18')
0 files changed, 0 insertions, 0 deletions