aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/x86_64/strspn.c
diff options
context:
space:
mode:
authorNoah Goldstein <goldstein.w.n@gmail.com>2022-10-18 17:44:04 -0700
committerNoah Goldstein <goldstein.w.n@gmail.com>2022-10-19 17:31:03 -0700
commit69717709ec5c2769322678e96a7672d1e270de3a (patch)
treec0c9b1aeb7dc4058a38f9b3ab4f9e419b440ea8b /sysdeps/x86_64/strspn.c
parent330881763efff626d6b1cdf8de9ffee4ed7a1ba1 (diff)
downloadglibc-69717709ec5c2769322678e96a7672d1e270de3a.tar
glibc-69717709ec5c2769322678e96a7672d1e270de3a.tar.gz
glibc-69717709ec5c2769322678e96a7672d1e270de3a.tar.bz2
glibc-69717709ec5c2769322678e96a7672d1e270de3a.zip
x86: Shrink / minorly optimize strchr-evex and implement with VMM headers
Size Optimizations: 1. Condence hot path for better cache-locality. - This is most impact for strchrnul where the logic strings with len <= VEC_SIZE or with a match in the first VEC no fits entirely in the first cache line. 2. Reuse common targets in first 4x VEC and after the loop. 3. Don't align targets so aggressively if it doesn't change the number of fetch blocks it will require and put more care in avoiding the case where targets unnecessarily split cache lines. 4. Align the loop better for DSB/LSD 5. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 6. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. Code Size Changes: strchr-evex.S : -63 bytes strchrnul-evex.S: -48 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. strchr-evex.S (Fixed) : 0.971 strchr-evex.S (Rand) : 0.932 strchrnul-evex.S : 0.965 Full results attached in email. Full check passes on x86-64.
Diffstat (limited to 'sysdeps/x86_64/strspn.c')
0 files changed, 0 insertions, 0 deletions