summaryrefslogtreecommitdiff
path: root/sysdeps
AgeCommit message (Collapse)Author
2022-07-26LoongArch: Thread-Local Storage Supportcaiyinyu
2022-07-26LoongArch: ABI Implementationcaiyinyu
2022-07-25struct stat is not posix conformant on microblaze with __USE_FILE_OFFSET64Arnout Vandecappelle (Essensium/Mind)
Commit a06b40cdf5ba0d2ab4f9b4c77d21e45ff284fac7 updated stat.h to use __USE_XOPEN2K8 instead of __USE_MISC to add the st_atim, st_mtim and st_ctim members to struct stat. However, for microblaze, there are two definitions of struct stat, depending on the __USE_FILE_OFFSET64 macro. The second one was not updated. Change __USE_MISC to __USE_XOPEN2K8 in the __USE_FILE_OFFSET64 version of struct stat for microblaze.
2022-07-25Linux: dirent/tst-readdir64-compat needs to use TEST_COMPAT (bug 27654)Florian Weimer
The hppa port starts libc at GLIBC_2.2, but has earlier symbol versions in other shared objects. This means that the compat symbol for readdir64 is not actually present in libc even though have-GLIBC_2.1.3 is defined as yes at the make level. Fixes commit 15e50e6c966fa0f26612602a95f0129543d9f9d5 ("Linux: dirent/tst-readdir64-compat can be a regular test") by mostly reverting it.
2022-07-22s390x: Add optimized chacha20Adhemerval Zanella Netto
It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-s390x.S. The final state register clearing is omitted. On a z15 it shows the following improvements (using formatted bench-arc4random data): GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 198.92 arc4random_buf(16) [single-thread] 244.49 arc4random_buf(32) [single-thread] 282.73 arc4random_buf(48) [single-thread] 286.64 arc4random_buf(64) [single-thread] 320.06 arc4random_buf(80) [single-thread] 297.43 arc4random_buf(96) [single-thread] 310.96 arc4random_buf(112) [single-thread] 308.10 arc4random_buf(128) [single-thread] 309.90 ----------------------------------------------- VX. MB/s ----------------------------------------------- arc4random [single-thread] 430.26 arc4random_buf(16) [single-thread] 735.14 arc4random_buf(32) [single-thread] 1029.99 arc4random_buf(48) [single-thread] 1206.76 arc4random_buf(64) [single-thread] 1311.92 arc4random_buf(80) [single-thread] 1378.74 arc4random_buf(96) [single-thread] 1445.06 arc4random_buf(112) [single-thread] 1484.32 arc4random_buf(128) [single-thread] 1517.30 ----------------------------------------------- Checked on s390x-linux-gnu.
2022-07-22powerpc64: Add optimized chacha20Adhemerval Zanella Netto
It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-ppc.c. It targets POWER8 and it is used on default for LE. On a POWER8 it shows the following improvements (using formatted bench-arc4random data): POWER8 GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 138.77 arc4random_buf(16) [single-thread] 174.36 arc4random_buf(32) [single-thread] 228.11 arc4random_buf(48) [single-thread] 252.31 arc4random_buf(64) [single-thread] 270.11 arc4random_buf(80) [single-thread] 278.97 arc4random_buf(96) [single-thread] 287.78 arc4random_buf(112) [single-thread] 291.92 arc4random_buf(128) [single-thread] 295.25 POWER8 MB/s ----------------------------------------------- arc4random [single-thread] 198.06 arc4random_buf(16) [single-thread] 278.79 arc4random_buf(32) [single-thread] 448.89 arc4random_buf(48) [single-thread] 551.09 arc4random_buf(64) [single-thread] 646.12 arc4random_buf(80) [single-thread] 698.04 arc4random_buf(96) [single-thread] 756.06 arc4random_buf(112) [single-thread] 784.12 arc4random_buf(128) [single-thread] 808.04 ----------------------------------------------- Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu. Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
2022-07-22x86: Add AVX2 optimized chacha20Adhemerval Zanella Netto
It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-amd64-avx2.S. It is used only if AVX2 is supported and enabled by the architecture. As for generic implementation, the last step that XOR with the input is omited. The final state register clearing is also omitted. On a Ryzen 9 5900X it shows the following improvements (using formatted bench-arc4random data): SSE MB/s ----------------------------------------------- arc4random [single-thread] 704.25 arc4random_buf(16) [single-thread] 1018.17 arc4random_buf(32) [single-thread] 1315.27 arc4random_buf(48) [single-thread] 1449.36 arc4random_buf(64) [single-thread] 1511.16 arc4random_buf(80) [single-thread] 1539.48 arc4random_buf(96) [single-thread] 1571.06 arc4random_buf(112) [single-thread] 1596.16 arc4random_buf(128) [single-thread] 1613.48 ----------------------------------------------- AVX2 MB/s ----------------------------------------------- arc4random [single-thread] 922.61 arc4random_buf(16) [single-thread] 1478.70 arc4random_buf(32) [single-thread] 2241.80 arc4random_buf(48) [single-thread] 2681.28 arc4random_buf(64) [single-thread] 2913.43 arc4random_buf(80) [single-thread] 3009.73 arc4random_buf(96) [single-thread] 3141.16 arc4random_buf(112) [single-thread] 3254.46 arc4random_buf(128) [single-thread] 3305.02 ----------------------------------------------- Checked on x86_64-linux-gnu.
2022-07-22x86: Add SSE2 optimized chacha20Adhemerval Zanella Netto
It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-amd64-ssse3.S. It replaces the ROTATE_SHUF_2 (which uses pshufb) by ROTATE2 and thus making the original implementation SSE2. As for generic implementation, the last step that XOR with the input is omited. The final state register clearing is also omitted. On a Ryzen 9 5900X it shows the following improvements (using formatted bench-arc4random data): GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 443.11 arc4random_buf(16) [single-thread] 552.27 arc4random_buf(32) [single-thread] 626.86 arc4random_buf(48) [single-thread] 649.81 arc4random_buf(64) [single-thread] 663.95 arc4random_buf(80) [single-thread] 674.78 arc4random_buf(96) [single-thread] 675.17 arc4random_buf(112) [single-thread] 680.69 arc4random_buf(128) [single-thread] 683.20 ----------------------------------------------- SSE MB/s ----------------------------------------------- arc4random [single-thread] 704.25 arc4random_buf(16) [single-thread] 1018.17 arc4random_buf(32) [single-thread] 1315.27 arc4random_buf(48) [single-thread] 1449.36 arc4random_buf(64) [single-thread] 1511.16 arc4random_buf(80) [single-thread] 1539.48 arc4random_buf(96) [single-thread] 1571.06 arc4random_buf(112) [single-thread] 1596.16 arc4random_buf(128) [single-thread] 1613.48 ----------------------------------------------- Checked on x86_64-linux-gnu.
2022-07-22aarch64: Add optimized chacha20Adhemerval Zanella Netto
It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-aarch64.S. It is used as default and only little-endian is supported (BE uses generic code). As for generic implementation, the last step that XOR with the input is omited. The final state register clearing is also omitted. On a virtualized Linux on Apple M1 it shows the following improvements (using formatted bench-arc4random data): GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 380.89 arc4random_buf(16) [single-thread] 500.73 arc4random_buf(32) [single-thread] 552.61 arc4random_buf(48) [single-thread] 566.82 arc4random_buf(64) [single-thread] 574.01 arc4random_buf(80) [single-thread] 581.02 arc4random_buf(96) [single-thread] 591.19 arc4random_buf(112) [single-thread] 592.29 arc4random_buf(128) [single-thread] 596.43 ----------------------------------------------- OPTIMIZED MB/s ----------------------------------------------- arc4random [single-thread] 569.60 arc4random_buf(16) [single-thread] 825.78 arc4random_buf(32) [single-thread] 987.03 arc4random_buf(48) [single-thread] 1042.39 arc4random_buf(64) [single-thread] 1075.50 arc4random_buf(80) [single-thread] 1094.68 arc4random_buf(96) [single-thread] 1130.16 arc4random_buf(112) [single-thread] 1129.58 arc4random_buf(128) [single-thread] 1137.91 ----------------------------------------------- Checked on aarch64-linux-gnu.
2022-07-22stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)Adhemerval Zanella Netto
The implementation is based on scalar Chacha20 with per-thread cache. It uses getrandom or /dev/urandom as fallback to get the initial entropy, and reseeds the internal state on every 16MB of consumed buffer. To improve performance and lower memory consumption the per-thread cache is allocated lazily on first arc4random functions call, and if the memory allocation fails getentropy or /dev/urandom is used as fallback. The cache is also cleared on thread exit iff it was initialized (so if arc4random is not called it is not touched). Although it is lock-free, arc4random is still not async-signal-safe (the per thread state is not updated atomically). The ChaCha20 implementation is based on RFC8439 [1], omitting the final XOR of the keystream with the plaintext because the plaintext is a stream of zeros. This strategy is similar to what OpenBSD arc4random does. The arc4random_uniform is based on previous work by Florian Weimer, where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete Uniform Generation from Coin Flips, and Applications (2013) [2], who credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform random number generation (1976), for solving the general case. The main advantage of this method is the that the unit of randomness is not the uniform random variable (uint32_t), but a random bit. It optimizes the internal buffer sampling by initially consuming a 32-bit random variable and then sampling byte per byte. Depending of the upper bound requested, it might lead to better CPU utilization. Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. Co-authored-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> [1] https://datatracker.ietf.org/doc/html/rfc8439 [2] https://arxiv.org/pdf/1304.1916.pdf
2022-07-19linux: return UNSUPPORTED from tst-mount if entering mount namespace failsMichael Hudson-Doyle
Before this the test fails if run in a chroot by a non-root user: warning: could not become root outside namespace (Operation not permitted) ../sysdeps/unix/sysv/linux/tst-mount.c:36: numeric comparison failure left: 1 (0x1); from: errno right: 19 (0x13); from: ENODEV error: ../sysdeps/unix/sysv/linux/tst-mount.c:39: not true: fd != -1 error: ../sysdeps/unix/sysv/linux/tst-mount.c:46: not true: r != -1 error: ../sysdeps/unix/sysv/linux/tst-mount.c:48: not true: r != -1 ../sysdeps/unix/sysv/linux/tst-mount.c:52: numeric comparison failure left: 1 (0x1); from: errno right: 9 (0x9); from: EBADF error: ../sysdeps/unix/sysv/linux/tst-mount.c:55: not true: mfd != -1 ../sysdeps/unix/sysv/linux/tst-mount.c:58: numeric comparison failure left: 1 (0x1); from: errno right: 2 (0x2); from: ENOENT error: ../sysdeps/unix/sysv/linux/tst-mount.c:61: not true: r != -1 ../sysdeps/unix/sysv/linux/tst-mount.c:65: numeric comparison failure left: 1 (0x1); from: errno right: 2 (0x2); from: ENOENT error: ../sysdeps/unix/sysv/linux/tst-mount.c:68: not true: pfd != -1 error: ../sysdeps/unix/sysv/linux/tst-mount.c:75: not true: fd_tree != -1 ../sysdeps/unix/sysv/linux/tst-mount.c:88: numeric comparison failure left: 1 (0x1); from: errno right: 38 (0x26); from: ENOSYS error: 12 test failures Checking that the test can enter a new mount namespace is more correct than just checking the return value of support_become_root() as the test code changes the mount namespace it runs in so running it as root on a system that does not support mount namespaces should still skip. Also change the test to remove the unnecessary fork. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-16x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA levelNoah Goldstein
1. Add default ISA level selection in non-multiarch/rtld implementations. 2. Add ISA level build guards to different implementations. - I.e strcpy-avx2.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (strcpy-evex.S). 3. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
2022-07-16x86: Add support to build wcscpy with explicit ISA levelNoah Goldstein
1. Add ISA level build guards to different implementations. - wcscpy-ssse3.S is used as ISA level 2/3/4. - wcscpy-generic.c is only used at ISA level 1 and will only build if compiled with ISA level == 1. Otherwise there is no reason to include it as we will always use wcscpy-ssse3.S 2. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
2022-07-16x86: Add support to build strcmp/strlen/strchr with explicit ISA levelNoah Goldstein
1. Add default ISA level selection in non-multiarch/rtld implementations. 2. Add ISA level build guards to different implementations. - I.e strcmp-avx2.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (strcmp-evex.S). 3. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
2022-07-14S390: Define SINGLE_THREAD_BY_GLOBAL only on s390xStefan Liebler
Starting with commit e070501d12b47e88c1ff8c313f887976fb578938 "Replace __libc_multiple_threads with __libc_single_threaded" the testcases nptl/tst-cancel-self and nptl/tst-cancel-self-cancelstate are failing. This is fixed by only defining SINGLE_THREAD_BY_GLOBAL on s390x, but not on s390. Starting with commit 09c76a74099826f4c6e1c4c431d7659f78112862 "Linux: Consolidate {RTLD_}SINGLE_THREAD_P definition", SINGLE_THREAD_BY_GLOBAL was defined in sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h. Lateron the commit 9a973da617772eff1f351989f8995f4305a2e63c "s390: Consolidate Linux syscall definition" consolidates the sysdep.h files from s390-32/s390-64 subdirectories. Unfortunately the macro is now always defined instead of only on s390-64. As information: TLS_MULTIPLE_THREADS_IN_TCB is also only defined for s390. See: sysdeps/s390/nptl/tls.h
2022-07-13x86: Add missing rtm tests for strcmp familyNoah Goldstein
Add new tests for: strcasecmp strncasecmp strcmp wcscmp These functions all have avx2_rtm implementations so should be tested.
2022-07-13x86: Remove unneeded rtld-wmemcmpNoah Goldstein
wmemcmp isn't used by the dynamic loader so their no need to add an RTLD stub for it. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
2022-07-13x86: Move wcslen SSE2 implementation to multiarch/wcslen-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move wcschr SSE2 implementation to multiarch/wcschr-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move strcat SSE2 implementation to multiarch/strcat-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move strchr SSE2 implementation to multiarch/strchr-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move strrchr SSE2 implementation to multiarch/strrchr-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move memrchr SSE2 implementation to multiarch/memrchr-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move strcpy SSE2 implementation to multiarch/strcpy-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move strlen SSE2 implementation to multiarch/strlen-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move strcmp SSE42 implementation to multiarch/strcmp-sse4_2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move wcscmp SSE2 implementation to multiarch/wcscmp-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Move strcmp SSE2 implementation to multiarch/strcmp-sse2.SNoah Goldstein
This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Because strcmp-sse2.S implements so many functions (more from avx2/evex/sse42) add a new file 'strcmp-naming.h' to assist in getting the correct symbol name for all the function across multiarch/non-multiarch builds. Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13x86: Rename STRCASECMP_NONASCII macro to STRCASECMP_L_NONASCIINoah Goldstein
The previous macro name can be confusing given that both `__strcasecmp_l_nonascii` and `__strcasecmp_nonascii` are functions and we use the `_l` version.
2022-07-12x86: Remove __mmask intrinsics in strstr-avx512.cNoah Goldstein
The intrinsics are not available before GCC7 and using standard operators generates code of equivalent or better quality. Removed: _cvtmask64_u64 _kshiftri_mask64 _kand_mask64 Geometric Mean of 5 Runs of Full Benchmark Suite New / Old: 0.958
2022-07-12x86: Remove generic strncat, strncpy, and stpncpy implementationsNoah Goldstein
These functions all have optimized versions: __strncat_sse2_unaligned, __strncpy_sse2_unaligned, and stpncpy_sse2_unaligned which are faster than their respective generic implementations. Since the sse2 versions can run on baseline x86_64, we should use these as the baseline implementation and can remove the generic implementations. Geometric mean of N=20 runs of the entire benchmark suite on: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz (Tigerlake) __strncat_sse2_unaligned / __strncat_generic: .944 __strncpy_sse2_unaligned / __strncpy_generic: .726 __stpncpy_sse2_unaligned / __stpncpy_generic: .650 Tested build with and without multiarch and full check with multiarch.
2022-07-12i386: Remove -Wa,-mtune=i686Fangrui Song
gas -mtune= may change NOP generating patterns but -mtune=i686 has no difference from the default by inspecting .o and .os files. Note: Clang doesn't support -Wa,-mtune=i686.
2022-07-08x86-64: Remove redundant strcspn-generic/strpbrk-generic/strspn-genericH.J. Lu
Remove redundant strcspn-generic, strpbrk-generic and strspn-generic from sysdep_routines in sysdeps/x86_64/multiarch/Makefile added by commit c69f960b017b2cdf39335739009526a72fb20379 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jul 3 21:28:07 2022 -0700 x86: Add support for building str{c|p}{brk|spn} with explicit ISA level since they have been added to sysdep_routines in sysdeps/x86_64/Makefile.
2022-07-07x86-64: Don't mark symbols as hidden in strcmp-XXX.SH.J. Lu
Don't mark symbols as hidden in strcmp-avx2.S, strcmp-evex.S and strcmp-sse42.S since they are marked as hidden in the IFUNC selectors.
2022-07-06stdlib: Implement mbrtoc8, c8rtomb, and the char8_t typedef.Tom Honermann
This change provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14 N2653. It also provides the char8_t typedef from WG14 N2653. The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined. The char8_t typedef is declared in uchar.h in C2X mode or when the _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-07-06aarch64: Optimize string functions with shrn instructionDanila Kutenin
We found that string functions were using AND+ADDP to find the nibble/syndrome mask but there is an easier opportunity through `SHRN dst.8b, src.8h, 4` (shift right every 2 bytes by 4 and narrow to 1 byte) and has same latency on all SIMD ARMv8 targets as ADDP. There are also possible gaps for memcmp but that's for another patch. We see 10-20% savings for small-mid size cases (<=128) which are primary cases for general workloads.
2022-07-05x86: Add support for building {w}memcmp{eq} with explicit ISA levelNoah Goldstein
1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memcmp sse2 from memcmp.S to multiarch/memcmp-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memcmp-evex-movbe.S). 3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the non-multiarch {w}memcmp{eq}.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
2022-07-05x86: Add support for building {w}memset{_chk} with explicit ISA levelNoah Goldstein
1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memset sse2 from memset.S to multiarch/memset-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memset-avx2-unaligned-erms.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memset-evex-unaligned-erms.S). 3. Add new multiarch/rtld-memset.S that just include the non-multiarch memset.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
2022-07-05x86: Add support for building {w}memmove{_chk} with explicit ISA levelNoah Goldstein
1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memmove sse2 from memmove.S to multiarch/memmove-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memmove-avx2-unaligned-erms.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memmove-evex-unaligned-erms.S). 3. Add new multiarch/rtld-memmove.S that just include the non-multiarch memmove.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch. isa raising memmove
2022-07-05x86: Add support for building str{c|p}{brk|spn} with explicit ISA levelNoah Goldstein
The changes for these functions are different than the others because the best implementation (sse4_2) requires the generic implementation as a fallback to be built as well. Changes are: 1. Add non-multiarch functions for str{c|p}{brk|spn}.c to statically select the best implementation based on the configured ISA build level. 2. Add stubs for str{c|p}{brk|spn}-generic and varshift.c to in the sysdeps/x86_64 directory so that the the sse4 implementation will have all of its dependencies for the non-multiarch / rtld build when ISA level >= 2. 3. Add new multiarch/rtld-strcspn.c that just include the non-multiarch strcspn.c which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
2022-07-05x86: Add comment explaining no Slow_SSE4_2 check in ifunc-sse4_2Noah Goldstein
Just for clarities sake and so that if a future implementation is added we remember to add the check.
2022-07-05Replace __libc_multiple_threads with __libc_single_threadedAdhemerval Zanella
And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL, since header inclusion single-thread.h is in the wrong order, the define needs to come before including sysdeps/unix/sysdep.h. The macro is now moved to a per-arch single-threade.h header. The SINGLE_THREAD_P is used on some more places. Checked on aarch64-linux-gnu and x86_64-linux-gnu.
2022-07-05linux: Add mount_setattrAdhemerval Zanella
It was added on Linux 5.12 (2a1867219c7b27f928e2545782b86daaf9ad50bd) to allow change the properties of a mount or a mount tree using file descriptors which the new mount api is based on. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05linux: Add tst-mount to check for Linux new mount APIAdhemerval Zanella
The new mount API was added on Linux 5.2 with six new syscalls: fsopen, fsconfig, fsmount, move_mount, fspick, and open_tree. The new test verifies minimal functionality along with error paths for specific arguments and their corner cases. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05linux: Add open_treeAdhemerval Zanella
It was added on Linux 5.2 (a07b20004793d8926f78d63eb5980559f7813404) to return a O_PATH-opened file descriptor to an existing mountpoint. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05linux: Add fspickAdhemerval Zanella
It was added on Linux 5.2 (cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb) that can be used to pick an existing mountpoint into an filesystem context which can thereafter be used to reconfigure a superblock with fsconfig syscall. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05linux: Add fsconfigAdhemerval Zanella
It was added on Linux 5.2 (ecdab150fddb42fe6a739335257949220033b782) as a way to a configure filesystem creation context and trigger actions upon it, to be used in conjunction with fsopen, fspick and fsmount. The fsconfig_command commands are currently only defined as an enum, so they can't be checked on tst-mount-consts.py with current test support. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05AArch64: Reset HWCAP2_AFP bits in FPCR for default fenvTejas Belagod
The AFP feature (Alternate floating-point behavior) was added in armv8.7 and introduced new FPCR bits. Currently, HWCAP2_AFP bits (bit 0, 1, 2) in FPCR are preserved when fenv is set to default environment. This is a deviation from standard behaviour. Clear these bits when setting the fenv to default. There is no libc API to modify the new FPCR bits. Restoring those bits matters if the user changed them directly.
2022-07-04Fix hurd namespace issues for internal signal functionsAdhemerval Zanella
It was introduced by "Refactor internal-signals.h (a1bdd81664aa681364d)". Use the internal symbols instead. Checked with a build for i686-gnu.
2022-06-30Refactor internal-signals.hAdhemerval Zanella
The main drive is to optimize the internal usage and required size when sigset_t is embedded in other data structures. On Linux, the current supported signal set requires up to 8 bytes (16 on mips), was lower than the user defined sigset_t (128 bytes). A new internal type internal_sigset_t is added, along with the functions to operate on it similar to the ones for sigset_t. The internal-signals.h is also refactored to remove unused functions Besides small stack usage on some functions (posix_spawn, abort) it lower the struct pthread by about 120 bytes (112 on mips). Checked on x86_64-linux-gnu. Reviewed-by: Arjun Shankar <arjun@redhat.com>