diff options
author | Siddhesh Poyarekar <siddhesh@sourceware.org> | 2018-03-13 23:57:03 +0530 |
---|---|---|
committer | Siddhesh Poyarekar <siddhesh@sourceware.org> | 2018-03-13 23:57:04 +0530 |
commit | 7108f1f944792ac68332967015d5e6418c5ccc88 (patch) | |
tree | f29c0d34fcaead85b891a02f9ea1f13507ad4e78 /include/fpu_control.h | |
parent | 2cc7bad0ae0a412e75270be5ed41d45c03e7a931 (diff) | |
download | glibc-7108f1f944792ac68332967015d5e6418c5ccc88.tar glibc-7108f1f944792ac68332967015d5e6418c5ccc88.tar.gz glibc-7108f1f944792ac68332967015d5e6418c5ccc88.tar.bz2 glibc-7108f1f944792ac68332967015d5e6418c5ccc88.zip |
aarch64: Improve strncmp for mutually misaligned inputs
The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark is as follows:
falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)
All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.
* sysdeps/aarch64/strncmp.S (count): New macro.
(strncmp): Store misaligned length in SRC1 in COUNT.
(mutual_align): Adjust.
(misaligned8): Load dword at a time when it is safe.
Diffstat (limited to 'include/fpu_control.h')
0 files changed, 0 insertions, 0 deletions