aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/aarch64
AgeCommit message (Collapse)Author
2021-09-03Remove "Contributed by" linesSiddhesh Poyarekar
We stopped adding "Contributed by" or similar lines in sources in 2012 in favour of git logs and keeping the Contributors section of the glibc manual up to date. Removing these lines makes the license header a bit more consistent across files and also removes the possibility of error in attribution when license blocks or files are copied across since the contributed-by lines don't actually reflect reality in those cases. Move all "Contributed by" and similar lines (Written by, Test by, etc.) into a new file CONTRIBUTED-BY to retain record of these contributions. These contributors are also mentioned in manual/contrib.texi, so we just maintain this additional record as a courtesy to the earlier developers. The following scripts were used to filter a list of files to edit in place and to clean up the CONTRIBUTED-BY file respectively. These were not added to the glibc sources because they're not expected to be of any use in future given that this is a one time task: https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-09-03AArch64: Update A64FX memset not to degrade at 16KBNaohiro Tamura via Libc-alpha
This patch updates unroll8 code so as not to degrade at the peak performance 16KB for both FX1000 and FX700. Inserted 2 instructions at the beginning of the unroll8 loop, cmp and branch, are a workaround that is found heuristically. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2021-08-18Remove sysdeps/*/tls-macros.hFangrui Song
They provide TLS_GD/TLS_LD/TLS_IE/TLS_IE macros for TLS testing. Now that we have migrated to __thread and tls_model attributes, these macros are unused and the tls-macros.h files can retire. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-08-11aarch64: Make elf_machine_{load_address,dynamic} robust [BZ #28203]Fangrui Song
The AArch64 ABI is largely platform agnostic and does not specify _GLOBAL_OFFSET_TABLE_[0] ([1]). glibc ld.so turns out to be probably the only user of _GLOBAL_OFFSET_TABLE_[0] and GNU ld defines the value to the link-time address _DYNAMIC. [2] In 2012, __ehdr_start was implemented in GNU ld and gold in binutils 2.23. Using adrp+add / (-mcmodel=tiny) adr to access __ehdr_start/_DYNAMIC gives us a robust way to get the load address and the link-time address of _DYNAMIC. [1]: From a psABI maintainer, https://bugs.llvm.org/show_bug.cgi?id=49672#c2 [2]: LLD's aarch64 port does not set _GLOBAL_OFFSET_TABLE_[0] to the link-time address _DYNAMIC. LLD is widely used on aarch64 Android and ChromeOS devices. Software just works without the need for _GLOBAL_OFFSET_TABLE_[0]. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-08-10[5/5] AArch64: Improve A64FX memset medium loopsWilco Dijkstra
Simplify the code for memsets smaller than L1. Improve the unroll8 and L1_prefetch loops. Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10[4/5] AArch64: Improve A64FX memset by removing unroll32Wilco Dijkstra
Remove unroll32 code since it doesn't improve performance. Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10[3/5] AArch64: Improve A64FX memset for remaining bytesWilco Dijkstra
Simplify handling of remaining bytes. Avoid lots of taken branches and complex whilelo computations, instead unconditionally write vectors from the end. Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10[2/5] AArch64: Improve A64FX memset for large sizesWilco Dijkstra
Improve performance of large memsets. Simplify alignment code. For zero memset use DC ZVA, which almost doubles performance. For non-zero memsets use the unroll8 loop which is about 10% faster. Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10[1/5] AArch64: Improve A64FX memset for small sizesWilco Dijkstra
Improve performance of small memsets by reducing instruction counts and improving code alignment. Bench-memset shows 35-45% performance gain for small sizes. Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-07-22glibc.malloc.check: Wean away from malloc hooksSiddhesh Poyarekar
The malloc-check debugging feature is tightly integrated into glibc malloc, so thanks to an idea from Florian Weimer, much of the malloc implementation has been moved into libc_malloc_debug.so to support malloc-check. Due to this, glibc malloc and malloc-check can no longer work together; they use altogether different (but identical) structures for heap management. This should not make a difference though since the malloc check hook is not disabled anywhere. malloc_set_state does, but it does so early enough that it shouldn't cause any problems. The malloc check tunable is now in the debug DSO and has no effect when the DSO is not preloaded. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-01AArch64: Add hp-timing.hWilco Dijkstra
Add hp-timing.h using the cntvct_el0 counter. Return timing in nanoseconds so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE in the dynamic linker since it adds unnecessary overheads and some ancient kernels may not handle emulating cntcvt correctly. Currently cntvct_el0 is only used for timing in the benchtests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-07-01AArch64: Improve strnlen performanceWilco Dijkstra
Optimize strnlen by avoiding UMINV which is slow on most cores. On Neoverse N1 large strings are 1.8x faster than the current version, and bench-strnlen is 50% faster overall. This version is MTE compatible. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-06-27Update math: redirect roundeven functionH.J. Lu
Redirect target specific roundeven functions for aarch64, ldbl-128ibm and riscv.
2021-06-08AArch64: Add support for roundeven[f]Wilco Dijkstra
Add inline assembler for the roundeven functions. Passes GLIBC regression. Note GCC does not inline the builtin (PR100966), so this cannot be used for now.
2021-05-27aarch64: Added optimized memset for A64FXNaohiro Tamura
This patch optimizes the performance of memset for A64FX [1] which implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB cache per NUMA node. The performance optimization makes use of Scalable Vector Register with several techniques such as loop unrolling, memory access alignment, cache zero fill and prefetch. SVE assembler code for memset is implemented as Vector Length Agnostic code so theoretically it can be run on any SOC which supports ARMv8-A SVE standard. We confirmed that all testcases have been passed by running 'make check' and 'make xcheck' not only on A64FX but also on ThunderX2. And also we confirmed that the SVE 512 bit vector register performance is roughly 4 times better than Advanced SIMD 128 bit register and 8 times better than scalar 64 bit register by running 'make bench'. [1] https://github.com/fujitsu/A64FX Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
2021-05-27aarch64: Added optimized memcpy and memmove for A64FXNaohiro Tamura
This patch optimizes the performance of memcpy/memmove for A64FX [1] which implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB cache per NUMA node. The performance optimization makes use of Scalable Vector Register with several techniques such as loop unrolling, memory access alignment, cache zero fill, and software pipelining. SVE assembler code for memcpy/memmove is implemented as Vector Length Agnostic code so theoretically it can be run on any SOC which supports ARMv8-A SVE standard. We confirmed that all testcases have been passed by running 'make check' and 'make xcheck' not only on A64FX but also on ThunderX2. And also we confirmed that the SVE 512 bit vector register performance is roughly 4 times better than Advanced SIMD 128 bit register and 8 times better than scalar 64 bit register by running 'make bench'. [1] https://github.com/fujitsu/A64FX Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
2021-05-26aarch64: define BTI_C and BTI_J macros as NOP unless HAVE_AARCH64_BTINaohiro Tamura
This patch defines BTI_C and BTI_J macros conditionally for performance. If HAVE_AARCH64_BTI is true, BTI_C and BTI_J are defined as HINT instruction for ARMv8.5 BTI (Branch Target Identification). If HAVE_AARCH64_BTI is false, both BTI_C and BTI_J are defined as NOP.
2021-05-26config: Added HAVE_AARCH64_SVE_ASM for aarch64Naohiro Tamura
This patch checks if assembler supports '-march=armv8.2-a+sve' to generate SVE code or not, and then define HAVE_AARCH64_SVE_ASM macro.
2021-04-21elf: Remove lazy tlsdesc relocation related codeSzabolcs Nagy
Remove generic tlsdesc code related to lazy tlsdesc processing since lazy tlsdesc relocation is no longer supported. This includes removing GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at load time when that lock is already held. Added a documentation comment too. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-04-08aarch64: update libm test ulpsSzabolcs Nagy
Update after commit 43576de04afc6a0896a3ecc094e1581069a0652a.
2021-04-06aarch64: free tlsdesc data on dlclose [BZ #27403]Szabolcs Nagy
DL_UNMAP_IS_SPECIAL and DL_UNMAP were not defined. The definitions are now copied from arm, since the same is needed on aarch64. The cleanup of tlsdesc data is handled by the custom _dl_unmap. Fixes bug 27403.
2021-04-02Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]Paul Zimmermann
For j0f/j1f/y0f/y1f, the largest error for all binary32 inputs is reduced to at most 9 ulps for all rounding modes. The new code is enabled only when there is a cancellation at the very end of the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not give any visible slowdown on average. Two different algorithms are used: * around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of degree 3 are used, computed using the Sollya tool (https://www.sollya.org/) * for large inputs, an asymptotic formula from [1] is used [1] Fast and Accurate Bessel Function Computation, John Harrison, Proceedings of Arith 19, 2009. Inputs yielding the new largest errors are added to auto-libm-test-in, and ulps are regenerated for various targets (thanks Adhemerval Zanella). Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-03-26aarch64: Optimize __libc_mtag_tag_zero_regionSzabolcs Nagy
This is a target hook for memory tagging, the original was a naive implementation. Uses the same algorithm as __libc_mtag_tag_region, but with instructions that also zero the memory. This was not benchmarked on real cpu, but expected to be faster than the naive implementation.
2021-03-26aarch64: Optimize __libc_mtag_tag_regionSzabolcs Nagy
This is a target hook for memory tagging, the original was a naive implementation. The optimized version relies on "dc gva" to tag 64 bytes at a time for large allocations and optimizes small cases without adding too many branches. This was not benchmarked on real cpu, but expected to be faster than the naive implementation.
2021-03-26aarch64: inline __libc_mtag_new_tagSzabolcs Nagy
This is a common operation when heap tagging is enabled, so inline the instructions instead of using an extern call.
2021-03-26aarch64: inline __libc_mtag_address_get_tagSzabolcs Nagy
This is a common operation when heap tagging is enabled, so inline the instruction instead of using an extern call. The .inst directive is used instead of the name of the instruction (or acle intrinsics) because malloc.c is not compiled for armv8.5-a+memtag architecture, runtime cpu support detection is used. Prototypes are removed from the comments as they were not always correct.
2021-03-26malloc: Only support zeroing and not arbitrary memset with mtagSzabolcs Nagy
The memset api is suboptimal and does not provide much benefit. Memory tagging only needs a zeroing memset (and only for memory that's sized and aligned to multiples of the tag granule), so change the internal api and the target hooks accordingly. This is to simplify the implementation of the target hook. Reviewed-by: DJ Delorie <dj@redhat.com>
2021-03-11math: Remove slow paths from asin and acos [BZ #15267]Wilco Dijkstra
This patch series removes all remaining slow paths and related code. First asin/acos, tan, atan, atan2 implementations are updated, and the final patch removes the unused mpa files, headers and probes. Passes buildmanyglibc. Remove slow paths from asin/acos. Add ULP annotations based on previous slow path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-01aarch64: update ulps.Szabolcs Nagy
For new test cases in commit 5a051454a9b50c27984bbc499ee1297de48e2dc8
2021-02-25Reduce the statically linked startup code [BZ #23323]Florian Weimer
It turns out the startup code in csu/elf-init.c has a perfect pair of ROP gadgets (see Marco-Gisbert and Ripoll-Ripoll, "return-to-csu: A New Method to Bypass 64-bit Linux ASLR"). These functions are not needed in dynamically-linked binaries because DT_INIT/DT_INIT_ARRAY are already processed by the dynamic linker. However, the dynamic linker skipped the main program for some reason. For maximum backwards compatibility, this is not changed, and instead, the main map is consulted from __libc_start_main if the init function argument is a NULL pointer. For statically linked binaries, the old approach based on linker symbols is still used because there is nothing else available. A new symbol version __libc_start_main@@GLIBC_2.34 is introduced because new binaries running on an old libc would not run their ELF constructors, leading to difficult-to-debug issues.
2021-01-25aarch64: Fix the list of tested IFUNC variants [BZ #26818]Szabolcs Nagy
Some IFUNC variants are not compatible with BTI and MTE so don't set them as usable for testing and benchmarking on a BTI or MTE enabled system. As far as IFUNC selectors are concerned a system is BTI enabled if the cpu supports it and glibc was built with BTI branch protection. Most IFUNC variants are BTI compatible, but thunderx2 memcpy and memmove use a jump table with indirect jump, without a BTI j. Fixes bug 26818.
2021-01-25aarch64: Move and update the definition of MTE_ENABLEDSzabolcs Nagy
The hwcap value is now in linux 5.10 and in glibc bits/hwcap.h, so use that definition. Move the definition to init-arch.h so all ifunc selectors can use it and expose an "mte" shorthand for mte enabled runtime. For now we allow user code to enable tag checks and use PROT_MTE mappings without libc involvment, this is not guaranteed ABI, but can be useful for testing and debugging with MTE.
2021-01-21aarch64: revert memcpy optimze for kunpeng to avoid performance degradationShuo Wang
In commit 863d775c481704baaa41855fc93e5a1ca2dc6bf6, kunpeng920 is added to default memcpy version, however, there is performance degradation when the copy size is some large bytes, eg: 100k. This is the result, tested in glibc-2.28: before backport after backport Performance improvement memcpy_1k 0.005 0.005 0.00% memcpy_10k 0.032 0.029 10.34% memcpy_100k 0.356 0.429 -17.02% memcpy_1m 7.470 11.153 -33.02% This is the demo #include "stdio.h" #include "string.h" #include "stdlib.h" char a[1024*1024] = {12}; char b[1024*1024] = {13}; int main(int argc, char *argv[]) { int i = atoi(argv[1]); int j; int size = atoi(argv[2]); for (j = 0; j < i; j++) memcpy(b, a, size*1024); return 0; } # gcc -g -O0 memcpy.c -o memcpy # time taskset -c 10 ./memcpy 100000 1024 Co-authored-by: liqingqing <liqingqing3@huawei.com>
2021-01-21configure: Check for static PIE supportSzabolcs Nagy
Add SUPPORT_STATIC_PIE that targets can define if they support static PIE. This requires PI_STATIC_AND_HIDDEN support and various linker features as described in commit 9d7a3741c9e59eba87fb3ca6b9f979befce07826 Add --enable-static-pie configure option to build static PIE [BZ #19574] Currently defined on x86_64, i386 and aarch64 where static PIE is known to work. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-01-08aarch64: define PI_STATIC_AND_HIDDENSzabolcs Nagy
AArch64 always uses pc relative access to static and hidden object symbols, but the config setting was previously missing. This affects ld.so start up code.
2021-01-07Remove dbl-64/wordsize-64 (part 2)Wilco Dijkstra
Remove the wordsize-64 implementations by merging them into the main dbl-64 directory. The second patch just moves all wordsize-64 files and removes a few wordsize-64 uses in comments and Implies files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-01-05aarch64: push the set of rules before falling into slow pathShuo Wang
It is supposed to save the rules for the instructions before falling into slow path. Tested in glibc-2.28 before fixing: Thread 2 "xxxxxxx" hit Breakpoint 1, _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:149 149 stp x1, x2, [sp, #-32]! Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.3.0-20190804.h24.aarch64 (gdb) ni _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:150 150 stp x3, x4, [sp, #16] (gdb) _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:157 157 mrs x4, tpidr_el0 (gdb) 158 ldr PTR_REG (1), [x0,#TLSDESC_ARG] (gdb) 159 ldr PTR_REG (0), [x4,#TCBHEAD_DTV] (gdb) 160 ldr PTR_REG (3), [x1,#TLSDESC_GEN_COUNT] (gdb) 161 ldr PTR_REG (2), [x0,#DTV_COUNTER] (gdb) 162 cmp PTR_REG (3), PTR_REG (2) (gdb) 163 b.hi 2f (gdb) 165 ldp PTR_REG (2), PTR_REG (3), [x1,#TLSDESC_MODID] (gdb) 166 add PTR_REG (0), PTR_REG (0), PTR_REG (2), lsl #(PTR_LOG_SIZE + 1) (gdb) 167 ldr PTR_REG (0), [x0] /* Load val member of DTV entry. */ (gdb) 168 cmp PTR_REG (0), #TLS_DTV_UNALLOCATED (gdb) 169 b.eq 2f (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:169 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=4294967295) at /home/test/test_function.c:30 #2 0x0000000000400c08 in initaaa () at thread.c:58 #3 0x0000000000400c50 in thread_proc (param=0x0) at thread.c:71 #4 0x0000ffffbf6918bc in start_thread (arg=0xfffffffff29f) at pthread_create.c:486 #5 0x0000ffffbf5669ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 (gdb) ni _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:184 184 stp x29, x30, [sp,#-16*NSAVEXREGPAIRS]! (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:184 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=4294967295) at /home/test/test_function.c:30 #2 0x0000000000000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Co-authored-by: liqingqing <liqingqing3@huawei.com>
2021-01-04aarch64: fix stack missing after sp is updatedShuo Wang
After sp is updated, the CFA offset should be set before next instruction. Tested in glibc-2.28: Thread 2 "xxxxxxx" hit Breakpoint 1, _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:149 149 stp x1, x2, [sp, #-32]! Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.3.0-20190804.h24.aarch64 (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:149 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184) at /home/test/test_function.c:30 #2 0x0000000000400c08 in initaaa () at thread.c:58 #3 0x0000000000400c50 in thread_proc (param=0x0) at thread.c:71 #4 0x0000ffffbf6918bc in start_thread (arg=0xfffffffff29f) at pthread_create.c:486 #5 0x0000ffffbf5669ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 (gdb) ni _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:150 150 stp x3, x4, [sp, #16] (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:150 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184) at /home/test/test_function.c:30 #2 0x0000000000000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) ni _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:157 157 mrs x4, tpidr_el0 (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:157 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184) at /home/test/test_function.c:30 #2 0x0000000000400c08 in initaaa () at thread.c:58 #3 0x0000000000400c50 in thread_proc (param=0x0) at thread.c:71 #4 0x0000ffffbf6918bc in start_thread (arg=0xfffffffff29f) at pthread_create.c:486 #5 0x0000ffffbf5669ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Signed-off-by: liqingqing <liqingqing3@huawei.com> Signed-off-by: Shuo Wang <wangshuo47@huawei.com>
2021-01-02Update copyright dates with scripts/update-copyrightsPaul Eggert
I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
2020-12-31aarch64: use PTR_ARG and SIZE_ARG instead of DELOUSESzabolcs Nagy
DELOUSE was added to asm code to make them compatible with non-LP64 ABIs, but it is an unfortunate name and the code was not compatible with ABIs where pointer and size_t are different. Glibc currently only supports the LP64 ABI so these macros are not really needed or tested, but for now the name is changed to be more meaningful instead of removing them completely. Some DELOUSE macros were dropped: clone, strlen and strnlen used it unnecessarily. The out of tree ILP32 patches are currently not maintained and will likely need a rework to rebase them on top of the time64 changes.
2020-12-21aarch64: update ulps.Szabolcs Nagy
For new test cases in commit cad5ad81d2f7f58a7ad0d8afa8c1b7101a0301fb
2020-12-21aarch64: Add aarch64-specific files for memory tagging supportRichard Earnshaw
This final patch provides the architecture-specific implementation of the memory-tagging support hooks for aarch64.
2020-12-15aarch64: remove the strlen_asimd symbolSzabolcs Nagy
This symbol is not in the implementation reserved namespace for static linking and it was never used: it seems it was mistakenly added in the orignal strlen_asimd commit 436e4d5b965abe592d26150cb518accf9ded8fe4
2020-12-15aarch64: fix static PIE start code for BTI [BZ #27068]Guillaume Gardet
A bti c was missing from rcrt1.o which made all -static-pie binaries fail at program startup on BTI enabled systems. Fixes bug 27068.
2020-12-11aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]Szabolcs Nagy
Re-mmap executable segments if possible instead of using mprotect to add PROT_BTI. This allows using BTI protection with security policies that prevent mprotect with PROT_EXEC. If the fd of the ELF module is not available because it was kernel mapped then mprotect is used and failures are ignored. To protect the main executable even when mprotect is filtered the linux kernel will have to be changed to add PROT_BTI to it. The delayed failure reporting is mainly needed because currently _dl_process_gnu_properties does not propagate failures such that the required cleanups happen. Using the link_map_machine struct for error propagation is not ideal, but this seemed to be the least intrusive solution. Fixes bug 26831. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-12-11elf: Pass the fd to note processingSzabolcs Nagy
To handle GNU property notes on aarch64 some segments need to be mmaped again, so the fd of the loaded ELF module is needed. When the fd is not available (kernel loaded modules), then -1 is passed. The fd is passed to both _dl_process_pt_gnu_property and _dl_process_pt_note for consistency. Target specific note processing functions are updated accordingly. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-12-11aarch64: align address for BTI protection [BZ #26988]Szabolcs Nagy
Handle unaligned executable load segments (the bfd linker is not expected to produce such binaries, but other linkers may). Computing the mapping bounds follows _dl_map_object_from_fd more closely now. Fixes bug 26988. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-12-11aarch64: Fix missing BTI protection from dependencies [BZ #26926]Szabolcs Nagy
The _dl_open_check and _rtld_main_check hooks are not called on the dependencies of a loaded module, so BTI protection was missed on every module other than the main executable and directly dlopened libraries. The fix just iterates over dependencies to enable BTI. Fixes bug 26926.
2020-11-16nptl: Move stack list variables into _rtld_globalFlorian Weimer
Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT, formerly __wait_lookup_done) can be implemented directly in ld.so, eliminating the unprotected GL (dl_wait_lookup_done) function pointer. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-11-09aarch64: Add unwind information to _start (bug 26853)Florian Weimer
This adds CFI directives which communicate that the stack ends with this function. Fixes bug 26853.