aboutsummaryrefslogtreecommitdiff
path: root/sysdeps
AgeCommit message (Collapse)Author
2021-12-28hurd: let csu initialize tlsSamuel Thibault
Since 9cec82de715b ("htl: Initialize later"), we let csu initialize pthreads. We can thus let it initialize tls later too, to better align with the generic order. Initialization however accesses ports which links/unlinks into the sigstate for unwinding. We can however easily skip that during initialization.
2021-12-27hurd: Fix XFAIL-ing mallocfork2 testsSamuel Thibault
They are using setpshared but are outside the htl directory.
2021-12-27hurd: XFAIL more tests that require setpshared supportSamuel Thibault
2021-12-27x86: Optimize L(less_vec) case in memcmpeq-evex.SNoah Goldstein
No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-27x86: Optimize L(less_vec) case in memcmp-evex-movbe.SNoah Goldstein
No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-23Set default __TIMESIZE default to 64Adhemerval Zanella
This is expected size for newer ABIs.
2021-12-22x86-64: Add vector acos/acosf implementation to libmvecSunil K Pandey
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-21s_sincosf.h: Change pio4 type to float [BZ #28713]H.J. Lu
s_cosf.c and s_sinf.c have if (abstop12 (y) < abstop12 (pio4)) where abstop12 takes a float argument, but pio4 is static const double. pio4 is used only in calls to abstop12 and never in arithmetic. Apply -static const double pio4 = 0x1.921FB54442D18p-1; +static const float pio4 = 0x1.921FB6p-1f; to fix: FAIL: math/test-float-cos FAIL: math/test-float-sin FAIL: math/test-float-sincos FAIL: math/test-float32-cos FAIL: math/test-float32-sin FAIL: math/test-float32-sincos when compiling with GCC 12. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-12-21Linux: Fix 32-bit vDSO for clock_gettime on powerpc32maminjie
When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID, on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully ( because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S does not support these two IDs, the 32-bit time_t syscall will be used), but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used, resulting in two system calls. Fix commit 72e84d1db22203e01a43268de71ea8669eca2863. Signed-off-by: maminjie <maminjie2@huawei.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-12-20Regenerate ulps on x86_64 with GCC 12H.J. Lu
Fix FAIL: math/test-float-clog10 FAIL: math/test-float32-clog10 on Intel Core i7-1165G7 with GCC 12.
2021-12-20Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.hJoseph Myers
Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol numbers for PF_CAN") but apparently missed for glibc at the time. Tested for x86_64.
2021-12-17Remove ununsed tcb-offsetAdhemerval Zanella
Some architectures do not use the auto-generated tcb-offsets.h.
2021-12-17riscv: align stack before calling _dl_init [BZ #28703]Aurelien Jarno
Align the stack pointer to 128 bits during the call to _dl_init() as specified by the RISC-V ABI [1]. This fixes the elf/tst-align2 test. Fixes bug 28703. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc
2021-12-17riscv: align stack in clone [BZ #28702]Aurelien Jarno
The RISC-V ABI [1] mandates that "the stack pointer shall be aligned to a 128-bit boundary upon procedure entry". This as not the case in clone. This fixes the misc/tst-misalign-clone-internal and misc/tst-misalign-clone tests. Fixes bug 28702. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc
2021-12-17elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704]Aurelien Jarno
On KVM guests running on some AMD systems, the IBRS feature is reported as a synthetic feature using the Intel feature, while the cpuinfo entry keeps the same. Handle that by first checking the presence of the Intel feature on AMD systems. Fixes bug 28704.
2021-12-17powerpc64[le]: Allocate extra stack frame on syscall.SMatheus Castanho
The syscall function does not allocate the extra stack frame for scv like other assembly syscalls using DO_CALL_SCV. So after commit d120fb9941 changed the offset that is used to save LR, syscall ended up using an invalid offset, causing regressions on powerpc64. So make sure the extra stack frame is allocated in syscall.S as well to make it consistent with other uses of DO_CALL_SCV and avoid similar issues in the future. Tested on powerpc, powerpc64, and powerpc64le (with and without scv) Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-12-17arm: Guard ucontext _rtld_global_ro access by SHARED, not PIC macroFlorian Weimer
Due to PIE-by-default, PIC is now defined in more cases. libc.a does not have _rtld_global_ro, and statically linking setcontext fails. SHARED is the right condition to use, so that libc.a references _dl_hwcap instead of _rtld_global_ro. For static PIE support, the !SHARED case would still have to be made PIC. This patch does not achieve that. Fixes commit 23645707f12f2dd9d80b51effb2d9618a7b65565 ("Replace --enable-static-pie with --disable-default-pie"). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-15malloc: Add Huge Page support for mmapAdhemerval Zanella
With the morecore hook removed, there is not easy way to provide huge pages support on with glibc allocator without resorting to transparent huge pages. And some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly with the system default size, while a positive value means and specific page size that is matched against the supported ones by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2021-12-15malloc: Add madvise support for Transparent Huge PagesAdhemerval Zanella
Linux Transparent Huge Pages (THP) current supports three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous pages. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.hugetlb', where setting to a value diffent than 0 enables the madvise call. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never" and on Linux only support one page size for THP, even if the architecture supports multiple sizes. To test is a new rule is added tests-malloc-hugetlb1, which run the addes tests with the required GLIBC_TUNABLE setting. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2021-12-15powerpc: Use global register variable in <thread_pointer.h>Florian Weimer
A local register variable is merely a compiler hint, and so not appropriate in this context. Move the global register variable into <thread_pointer.h> and include it from <tls.h>, as there can only be one global definition for one particular register. Fixes commit 8dbeb0561eeb876f557ac9eef5721912ec074ea5 ("nptl: Add <thread_pointer.h> for defining __thread_pointer"). Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-12-14Support target specific ALIGN for variable alignment test [BZ #28676]H.J. Lu
Add <tst-file-align.h> to support target specific ALIGN for variable alignment test: 1. Alpha: Use 0x10000. 2. MicroBlaze and Nios II: Use 0x8000. 3. All others: Use 0x200000. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-12-14hurd: Do not set PIE_UNSUPPORTEDSamuel Thibault
This is now supported.
2021-12-13sysdeps: Simplify sin Taylor Series calculationAkila Welihinda
The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes of regaining some precision as a function of da. However the comment says we add the term `-0.5*da*a^2 + 0.5*da` which is different. This fix updates the comment to reflect the code and also simplifies the calculation by replacing `a` with `x` because they always have the same value. Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-12-13math: Remove the error handling wrapper from hypot and hypotfAdhemerval Zanella
The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
2021-12-13math: Use fmin/fmax on hypotWilco Dijkstra
It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu.
2021-12-13aarch64: Add math-use-builtins-f{max,min}.hAdhemerval Zanella
It allows to remove the arch-specific implementations.
2021-12-13math: Add math-use-builtinds-fmin.hAdhemerval Zanella
It allows the architecture to use the builtin instead of generic implementation.
2021-12-13math: Add math-use-builtinds-fmax.hAdhemerval Zanella
It allows the architecture to use the builtin instead of generic implementation.
2021-12-13math: Remove powerpc e_hypotAdhemerval Zanella
The generic implementation is shows only slight worse performance: POWER10 reciprocal-throughput latency master 8.28478 13.7253 new hypot 7.21945 13.1933 POWER9 reciprocal-throughput latency master 13.4024 14.0967 new hypot 14.8479 15.8061 POWER8 reciprocal-throughput latency master 15.5767 16.8885 new hypot 16.5371 18.4057 One way to improve might to make gcc generate xsmaxdp/xsmindp for fmax/fmin (it onl does for -ffast-math, clang does for default options). Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu (power9).
2021-12-13i386: Move hypot implementation to CAdhemerval Zanella
The generic hypotf is slight slower, mostly due the tricks the assembly does to optimize the isinf/isnan/issignaling. The generic hypot is way slower, since the optimized implementation uses the i386 default excessive precision to issue the operation directly. A similar implementation is provided instead of using the generic implementation: Checked on i686-linux-gnu.
2021-12-13math: Use an improved algorithm for hypotl (ldbl-128)Adhemerval Zanella
This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.05% results with an error of 1 ulp (453266 results) while the new implementation only shows 0.0001% of total (1280). Checked on aarch64-linux-gnu and x86_64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf
2021-12-13math: Use an improved algorithm for hypotl (ldbl-96)Adhemerval Zanella
This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.02% results with an error of 1 ulp (23158 results) while the new implementation only shows 0.0001% of total (111). [1] https://arxiv.org/pdf/1904.09481.pdf
2021-12-13math: Improve hypot performance with FMAWilco Dijkstra
Improve hypot performance significantly by using fma when available. The fma version has twice the throughput of the previous version and 70% of the latency. The non-fma version has 30% higher throughput and 10% higher latency. Max ULP error is 0.949 with fma and 0.792 without fma. Passes GLIBC testsuite.
2021-12-13math: Use an improved algorithm for hypot (dbl-64)Wilco Dijkstra
This implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for denormal results. The main advantage of the new algorithm is its precision: with a random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc current implementation shows around 0.34% results with an error of 1 ulp (3424869 results) while the new implementation only shows 0.002% of total (18851). The performance result are also only slight worse than current implementation. On x86_64 (Ryzen 5900X) with gcc 12: Before: "hypot": { "workload-random": { "duration": 3.73319e+09, "iterations": 1.12e+08, "reciprocal-throughput": 22.8737, "latency": 43.7904, "max-throughput": 4.37184e+07, "min-throughput": 2.28361e+07 } } After: "hypot": { "workload-random": { "duration": 3.7597e+09, "iterations": 9.8e+07, "reciprocal-throughput": 23.7547, "latency": 52.9739, "max-throughput": 4.2097e+07, "min-throughput": 1.88772e+07 } } Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Checked on x86_64-linux-gnu and aarch64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf
2021-12-13math: Simplify hypotf implementationAdhemerval Zanella
Use a more optimized comparison for check for NaN and infinite and add an inlined issignaling implementation for float. With gcc it results in 2 FP comparisons. The file Copyright is also changed to use GPL, the implementation was completely changed by 7c10fd3515f to use double precision instead of scaling and this change removes all the GET_FLOAT_WORD usage. Checked on x86_64-linux-gnu.
2021-12-13Cleanup encoding in commentsSiddhesh Poyarekar
Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8 equivalents so that files don't end up with mixed encodings. With this, all files (except tests that actually test different encodings) have a single encoding. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-12-13Replace --enable-static-pie with --disable-default-pieSiddhesh Poyarekar
Build glibc programs and tests as PIE by default and enable static-pie automatically if the architecture and toolchain supports it. Also add a new configuration option --disable-default-pie to prevent building programs as PIE. Only the following architectures now have PIE disabled by default because they do not work at the moment. hppa, ia64, alpha and csky don't work because the linker is unable to handle a pcrel relocation generated from PIE objects. The microblaze compiler is currently failing with an ICE. GNU hurd tries to enable static-pie, which does not work and hence fails. All these targets have default PIE disabled at the moment and I have left it to the target maintainers to enable PIE on their targets. build-many-glibcs runs clean for all targets. I also tested x86_64 on Fedora and Ubuntu, to verify that the default build as well as --disable-default-pie work as expected with both system toolchains. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-12-12hurd: Add rules for static PIE buildSamuel Thibault
This fixes [BZ #28671].
2021-12-12hurd: Fix gmon-staticSamuel Thibault
We need to use crt0 for gmon-static too.
2021-12-10x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]H.J. Lu
Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since the first PT_LOAD segment is no longer executable due to defaulting to -z separate-code. This fixes [BZ #28656]. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-10nptl: Add one more barrier to nptl/tst-create1Florian Weimer
Without the bar_ctor_finish barrier, it was possible that thread2 re-locked user_lock before ctor had a chance to lock it. ctor then blocked in its locking operation, xdlopen from the main thread did not return, and thread2 was stuck waiting in bar_dtor: thread 1: started. thread 2: started. thread 2: locked user_lock. constructor started: 0. thread 1: in ctor: started. thread 3: started. thread 3: done. thread 2: unlocked user_lock. thread 2: locked user_lock. Fixes the test in commit 83b5323261bb72313bffcf37476c1b8f0847c736 ("elf: Avoid deadlock between pthread_create and ctors [BZ #28357]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGNFlorian Weimer
TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely introduced to support a configuration where the thread pointer has not the same alignment as THREAD_SELF. Only ia64 seems to use that, but for the stack/pointer guard, not for storing tcbhead_t. Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift the thread pointer, potentially landing in a different residue class modulo the alignment, but the changes should not impact that. In general, given that TLS variables have their own alignment requirements, having different alignment for the (unshifted) thread pointer and struct pthread would potentially result in dynamic offsets, leading to more complexity. hppa had different values before: __alignof__ (tcbhead_t), which seems to be 4, and __alignof__ (struct pthread), which was 8 (old default) and is now 32. However, it defines THREAD_SELF as: /* Return the thread descriptor for the current thread. */ # define THREAD_SELF \ ({ struct pthread *__self; \ __self = __get_cr27(); \ __self - 1; \ }) So the thread pointer points after struct pthread (hence __self - 1), and they have to have the same alignment on hppa as well. Similarly, on ia64, the definitions were different. We have: # define TLS_PRE_TCB_SIZE \ (sizeof (struct pthread) \ + (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t) \ ? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1) \ & ~(__alignof__ (struct pthread) - 1)) \ : 0)) # define THREAD_SELF \ ((struct pthread *) ((char *) __thread_self - TLS_PRE_TCB_SIZE)) And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment (confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c). On m68k, we have a larger gap between tcbhead_t and struct pthread. But as far as I can tell, the port is fine with that. The definition of TCB_OFFSET is sufficient to handle the shifted TCB scenario. This fixes commit 23c77f60181eb549f11ec2f913b4270af29eee38 ("nptl: Increase default TCB alignment to 32"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-09nptl: Add public rseq symbols and <sys/rseq.h>Florian Weimer
The relationship between the thread pointer and the rseq area is made explicit. The constant offset can be used by JIT compilers to optimize rseq access (e.g., for really fast sched_getcpu). Extensibility is provided through __rseq_size and __rseq_flags. (In the future, the kernel could request a different rseq size via the auxiliary vector.) Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09nptl: Add glibc.pthread.rseq tunable to control rseq registrationFlorian Weimer
This tunable allows applications to register the rseq area instead of glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-12-09Linux: Use rseq to accelerate sched_getcpuFlorian Weimer
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09nptl: Add rseq registrationFlorian Weimer
The rseq area is placed directly into struct pthread. rseq registration failure is not treated as an error, so it is possible that threads run with inconsistent registration status. <sys/rseq.h> is not yet installed as a public header. Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-12-09nptl: Introduce THREAD_GETMEM_VOLATILEFlorian Weimer
This will be needed for rseq TCB access. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09nptl: Introduce <tcb-access.h> for THREAD_* accessorsFlorian Weimer
These are common between most architectures. Only the x86 targets are outliers. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09nptl: Add <thread_pointer.h> for defining __thread_pointerFlorian Weimer
<tls.h> already contains a definition that is quite similar, but it is not consistent across architectures. Only architectures for which rseq support is added are covered. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-06x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNIH.J. Lu
Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used.