aboutsummaryrefslogtreecommitdiff
path: root/localedata/mr_IN.UTF-8.in
diff options
context:
space:
mode:
authorSiddhesh Poyarekar <siddhesh@sourceware.org>2018-05-11 00:11:52 +0530
committerSiddhesh Poyarekar <siddhesh@sourceware.org>2018-05-11 00:11:52 +0530
commitdb725a458e1cb0e17204daa543744faf08bb2e06 (patch)
treefc19e9be431ff0128b7bdd6ea3f46609ec0cf303 /localedata/mr_IN.UTF-8.in
parent70c97f8493ab2a215c2543d78f212abb23f151ed (diff)
downloadglibc-db725a458e1cb0e17204daa543744faf08bb2e06.tar
glibc-db725a458e1cb0e17204daa543744faf08bb2e06.tar.gz
glibc-db725a458e1cb0e17204daa543744faf08bb2e06.tar.bz2
glibc-db725a458e1cb0e17204daa543744faf08bb2e06.zip
aarch64,falkor: Ignore prefetcher tagging for smaller copies
For smaller and medium sized copies, the effect of hardware prefetching are not as dominant as instruction level parallelism. Hence it makes more sense to load data into multiple registers than to try and route them to the same prefetch unit. This is also the case for the loop exit where we are unable to latch on to the same prefetch unit anyway so it makes more sense to have data loaded in parallel. The performance results are a bit mixed with memcpy-random, with numbers jumping between -1% and +3%, i.e. the numbers don't seem repeatable. memcpy-walk sees a 70% improvement (i.e. > 2x) for 128 bytes and that improvement reduces down as the impact of the tail copy decreases in comparison to the loop. * sysdeps/aarch64/multiarch/memcpy_falkor.S (__memcpy_falkor): Use multiple registers to copy data in loop tail.
Diffstat (limited to 'localedata/mr_IN.UTF-8.in')
0 files changed, 0 insertions, 0 deletions