Age | Commit message (Collapse) | Author |
|
In most cases the simple/stupid/builtin functions were in there to
benchmark optimized implementations against. Only in some cases the
functions are used to check expected results.
Remove these tests from IMPL() and only keep them in wherever they're
used for a specific purpose, e.g. to generate expected results.
This improves timing of `make subdirs=string` by over a minute and a
half (over 15%) on a Whiskey Lake laptop.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Noah Goldstein <libc-alpha@sourceware.org>
|
|
Looks like an oversight in memcpy tests resulted in s2 and s1 not being
swapped for the second iteration of the memcpy test. Fix it. Also fix
a formatting nit.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
Checked on x86_64-linux-gnu.
|
|
The __closefrom_fallback tries to get a available file descriptor
if the initial open ("/proc/self/fd/", ...) fails. It assumes the
failure would be only if procfs is not mount (ENOENT), however if
the the proc file is not accessible (due some other kernel filtering
such apparmor) it will iterate over a potentially large file set
issuing close calls.
It should only try the close fallback if open returns EMFILE,
ENFILE, or ENOMEM.
Checked on x86_64-linux-gnu.
|
|
-z combreloc has been the default regadless of the architecture since
binutils commit f4d733664aabd7bd78c82895e030ec9779a92809 (2002). The
configure check added in commit fdde83499a05 (2001) has long been
unneeded.
We can therefore treat HAVE_Z_COMBRELOC as always 1 and delete dead code
paths in dl-machine.h files (many were copied from commit a711b01d34ca
and ee0cb67ec238).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
For sparc64 is the same as the generic implementation, while for
sparc32 the builtin generates the same code.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
|
|
The generic implementation fixes 5 fabs tests on ia64-linux-gnu:
math/test-double-fabs
math/test-float-fabs
math/test-float32-fabs
math/test-float32x-fabs
math/test-float64-fabs
Checked on ia64-linux-gnu.
|
|
For x86_64 is the same as the generic implementation, while for i686
the builtin generates the same code.
|
|
The generic implementation already uses builtins.
|
|
If the build itself is run in a container, we may not be able to
fully set up a nested container for test-container testing.
Notably is the mounting of /proc, since it's critical that it
be mounted from within the same PID namespace as its users, and
thus cannot be bind mounted from outside the container like other
mounts.
This patch defaults to using the parent's PID namespace instead of
creating a new one, as this is more likely to be allowed.
If the test needs an isolated PID namespace, it should add the "pidns"
command to its init script.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
Recent changes in test-strncasecmp and test-strncmp pushed the run time
of the tests above the 4 minute limit specified in test-string.h on an
arm tester machine.
|
|
The GNU extension for realpath states that if the path resolution fails
with ENOENT or EACCES and the resolved buffer is non-NULL, it will
contain part of the path that failed resolution.
commit 949ad78a189194048df8a253bb31d1d11d919044 broke this when it
omitted the copy on failure. Bring it back partially to continue
supporting this GNU extension.
Resolves: BZ #28996
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
|
|
The idea is to check if the up sizeof (buf) are equal, not only
the first byte.
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
|
|
And also use libsupport.
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
To take in consideration the extra '\0'.
Checked on x86_64-linux-gnu.
|
|
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
Just a few QOL changes.
1. Prefer `add` > `lea` as it has high execution units it can run
on.
2. Don't break macro-fusion between `test` and `jcc`
3. Reduce code size by removing gratuitous padding bytes (-90
bytes).
geometric_mean(N=20) of all benchmarks New / Original: 0.959
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Just a few small QOL changes.
1. Prefer `add` > `lea` as it has high execution units it can run
on.
2. Don't break macro-fusion between `test` and `jcc`
geometric_mean(N=20) of all benchmarks New / Original: 0.973
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
It is not a "buffer overflow detected" but an out of range
bit on fd_set
Signed-off-by: Cristian RodrÃguez <crrodriguez@opensuse.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
Add the new HWCAP2_AFP and HWCAP2_RPRES constants from Linux 5.17.
Tested with build-many-glibcs.py for aarch64-linux-gnu.
|
|
The rational is:
1. SSE42 has nearly identical logic so any benefit is minimal (3.4%
regression on Tigerlake using SSE42 versus AVX across the
benchtest suite).
2. AVX2 version covers the majority of targets that previously
prefered it.
3. The targets where AVX would still be best (SnB and IVB) are
becoming outdated.
All in all the saving the code size is worth it.
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
geometric_mean(N=40) of all benchmarks EVEX / SSE42: .621
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
geometric_mean(N=40) of all benchmarks AVX2 / SSE42: .702
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Test cases for when both `s1` and `s2` are near the end of a page
where previously missing.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Test cases for when both `s1` and `s2` are near the end of a page
where previously missing.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Slightly faster method of doing TOLOWER that saves an
instruction.
Also replace the hard coded 5-byte no with .p2align 4. On builds with
CET enabled this misaligned entry to strcasecmp.
geometric_mean(N=40) of all benchmarks New / Original: .920
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Slightly faster method of doing TOLOWER that saves an
instruction.
Also replace the hard coded 5-byte no with .p2align 4. On builds with
CET enabled this misaligned entry to strcasecmp.
geometric_mean(N=40) of all benchmarks New / Original: .894
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Add more robust tests that cover all the page cross edge cases.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Add more robust tests that cover all the page cross edge cases.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Just QOL change to make parsing the output of the benchtests more
consistent.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Just QOL change to make parsing the output of the benchtests more
consistent.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Overflow case for __wcsncmp_avx2_rtm should be __wcscmp_avx2_rtm not
__wcscmp_avx2.
commit ddf0992cf57a93200e0c782e2a94d0733a5a0b87
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Sun Jan 9 16:02:21 2022 -0600
x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]
Set the wrong fallback function for `__wcsncmp_avx2_rtm`. It was set
to fallback on to `__wcscmp_avx2` instead of `__wcscmp_avx2_rtm` which
can cause spurious aborts.
This change will need to be backported.
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The generic implementation is faster.
geometric_mean(N=20) of all benchmarks New / Original: .710
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The generic implementation is faster (see strcspn commit).
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The generic implementation is faster.
geometric_mean(N=20) of all benchmarks New / Original: .678
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of
_mm_cmpistri. Also change offset to unsigned to avoid unnecessary
sign extensions.
geometric_mean(N=20) of all benchmarks that dont fallback on
sse2; New / Original: .901
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of
_mm_cmpistri. Also change offset to unsigned to avoid unnecessary
sign extensions.
geometric_mean(N=20) of all benchmarks that dont fallback on
sse2/strlen; New / Original: .928
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Just QOL change to make parsing the output of the benchtests more
consistent.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Just QOL change to make parsing the output of the benchtests more
consistent.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Small code cleanup for size: -81 bytes.
Add comment justifying using a branch to do NULL/non-null return.
All string/memory tests pass and no regressions in benchtests.
geometric_mean(N=20) of all benchmarks New / Original: .985
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Small code cleanup for size: -53 bytes.
Add comment justifying using a branch to do NULL/non-null return.
All string/memory tests pass and no regressions in benchtests.
geometric_mean(N=20) of all benchmarks Original / New: 1.00
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Add benchmark that randomizes whether return should be NULL or pointer
to CHAR. The rationale is on many architectures there is a choice
between a predicate execution option (i.e cmovcc on x86) or a branch.
On x86 the results for cmovcc vs branch are something along the lines
of the following:
perc-zero, Br On Result, Time Br / Time cmov
0.10, 1, ,0.983
0.10, 0, ,1.246
0.25, 1, ,1.035
0.25, 0, ,1.49
0.33, 1, ,1.016
0.33, 0, ,1.579
0.50, 1, ,1.228
0.50, 0, ,1.739
0.66, 1, ,1.039
0.66, 0, ,1.764
0.75, 1, ,0.996
0.75, 0, ,1.642
0.90, 1, ,1.071
0.90, 0, ,1.409
1.00, 1, ,0.937
1.00, 0, ,0.999
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|