aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-13Merge branch release/2.28/master into ibm/2.28/masteribm/2.28/masterTulio Magno Quites Machado Filho
2021-01-27gconv: Fix assertion failure in ISO-2022-JP-3 module (bug 27256)Florian Weimer
The conversion loop to the internal encoding does not follow the interface contract that __GCONV_FULL_OUTPUT is only returned after the internal wchar_t buffer has been filled completely. This is enforced by the first of the two asserts in iconv/skeleton.c: /* We must run out of output buffer space in this rerun. */ assert (outbuf == outerr); assert (nstatus == __GCONV_FULL_OUTPUT); This commit solves this issue by queuing a second wide character which cannot be written immediately in the state variable, like other converters already do (e.g., BIG5-HKSCS or TSCII). Reported-by: Tavis Ormandy <taviso@gmail.com> (cherry picked from commit 7d88c6142c6efc160c0ee5e4f85cde382c072888)
2021-01-13x86: Check IFUNC definition in unrelocated executable [BZ #20019]H.J. Lu
Calling an IFUNC function defined in unrelocated executable also leads to segfault. Issue a fatal error message when calling IFUNC function defined in the unrelocated executable from a shared library. On x86, ifuncmain6pie failed with: [hjl@gnu-cfl-2 build-i686-linux]$ ./elf/ifuncmain6pie --direct ./elf/ifuncmain6pie: IFUNC symbol 'foo' referenced in '/export/build/gnu/tools-build/glibc-32bit/build-i686-linux/elf/ifuncmod6.so' is defined in the executable and creates an unsatisfiable circular dependency. [hjl@gnu-cfl-2 build-i686-linux]$ readelf -rW elf/ifuncmod6.so | grep foo 00003ff4 00000706 R_386_GLOB_DAT 0000400c foo_ptr 00003ff8 00000406 R_386_GLOB_DAT 00000000 foo 0000400c 00000401 R_386_32 00000000 foo [hjl@gnu-cfl-2 build-i686-linux]$ Remove non-JUMP_SLOT relocations against foo in ifuncmod6.so, which trigger the circular IFUNC dependency, and build ifuncmain6pie with -Wl,-z,lazy. (cherry picked from commits 6ea5b57afa5cdc9ce367d2b69a2cebfb273e4617 and 7137d682ebfcb6db5dfc5f39724718699922f06c)
2021-01-13x86: Set header.feature_1 in TCB for always-on CET [BZ #27177]H.J. Lu
Update dl_cet_check() to set header.feature_1 in TCB when both IBT and SHSTK are always on. (cherry picked from commit 2ef23b520597f4ea1790a669b83e608f24f4cf12)
2021-01-12x86-64: Avoid rep movsb with short distance [BZ #27130]H.J. Lu
When copying with "rep movsb", if the distance between source and destination is N*4GB + [1..63] with N >= 0, performance may be very slow. This patch updates memmove-vec-unaligned-erms.S for AVX and AVX512 versions with the distance in RCX: cmpl $63, %ecx // Don't use "rep movsb" if ECX <= 63 jbe L(Don't use rep movsb") Use "rep movsb" Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its performance impact is within noise range as "rep movsb" is only used for data size >= 4KB. (cherry picked from commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5)
2020-11-04aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798]Szabolcs Nagy
The variant PCS support was ineffective because in the common case linkmap->l_mach.plt == 0 but then the symbol table flags were ignored and normal lazy binding was used instead of resolving the relocs early. (This was a misunderstanding about how GOT[1] is setup by the linker.) In practice this mainly affects SVE calls when the vector length is more than 128 bits, then the top bits of the argument registers get clobbered during lazy binding. Fixes bug 26798. (cherry picked from commit 558251bd8785760ad40fcbfeaaee5d27fa5b0fe4)
2020-10-14AArch64: Use __memcpy_simd on Neoverse N2/V1Wilco Dijkstra
Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as the memcpy/memmove ifunc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit e11ed9d2b4558eeacff81557dc9557001af42a6b)
2020-10-14AArch64: Rename IS_ARES to IS_NEOVERSE_N1Wilco Dijkstra
Rename IS_ARES to IS_NEOVERSE_N1 since that is a bit clearer. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 0f6278a8793a5d04ea31878119eccf99f469a02d)
2020-10-14[AArch64] Improve integer memcpyWilco Dijkstra
Further optimize integer memcpy. Small cases now include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Comments have been rewritten. (cherry picked from commit 700065132744e0dfa6d4d9142d63f6e3a1934726)
2020-10-14aarch64: Increase small and medium cases for __memcpy_genericKrzysztof Koch
Increase the upper bound on medium cases from 96 to 128 bytes. Now, up to 128 bytes are copied unrolled. Increase the upper bound on small cases from 16 to 32 bytes so that copies of 17-32 bytes are not impacted by the larger medium case. Benchmarking: The attached figures show relative timing difference with respect to 'memcpy_generic', which is the existing implementation. 'memcpy_med_128' denotes the the version of memcpy_generic with only the medium case enlarged. The 'memcpy_med_128_small_32' numbers are for the version of memcpy_generic submitted in this patch, which has both medium and small cases enlarged. The figures were generated using the script from: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html Depending on the platform, the performance improvement in the bench-memcpy-random.c benchmark ranges from 6% to 20% between the original and final version of memcpy.S Tested against GLIBC testsuite and randomized tests. (cherry picked from commit b9f145df85145506f8e61bac38b792584a38d88f)
2020-10-14AArch64: Improve backwards memmove performanceWilco Dijkstra
On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses. So change the memmove loop in memcpy_advsimd.S to use 2x STR rather than STP. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit bd394d131c10c9ec22c6424197b79410042eed99)
2020-10-14AArch64: Add optimized Q-register memcpyWilco Dijkstra
Add a new memcpy using 128-bit Q registers - this is faster on modern cores and reduces codesize. Similar to the generic memcpy, small cases include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Large copies align the source rather than the destination. bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1, so make this memcpy the default on N1 (on Centriq it is 15% faster than memcpy_falkor). Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 4a733bf375238a6a595033b5785cea7f27d61307)
2020-10-14AArch64: Align ENTRY to a cachelineWilco Dijkstra
Given almost all uses of ENTRY are for string/memory functions, align ENTRY to a cacheline to simplify things. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 34f0d01d5e43c7dedd002ab47f6266dfb5b79c22)
2020-10-03Merge branch release/2.28/master into ibm/2.28/masterTulio Magno Quites Machado Filho
2020-07-04NEWS: Mention BZ 25933 fixH.J. Lu
2020-07-04Fix avx2 strncmp offset compare condition check [BZ #25933]Sunil K Pandey
strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size. (cherry picked from commit 75870237ff3bb363447b03f4b0af100227570910)
2020-03-23Merge branch release/2.28/master into ibm/2.28/masterTulio Magno Quites Machado Filho
2020-03-20Fix use-after-free in glob when expanding ~user (bug 25414)Andreas Schwab
The value of `end_name' points into the value of `dirname', thus don't deallocate the latter before the last use of the former. (cherry picked from commit ddc650e9b3dc916eab417ce9f79e67337b05035c)
2020-03-20Fix array overflow in backtrace on PowerPC (bug 25423)Andreas Schwab
When unwinding through a signal frame the backtrace function on PowerPC didn't check array bounds when storing the frame address. Fixes commit d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines"). (cherry picked from commit d93769405996dfc11d216ddbe415946617b5a494)
2020-01-21riscv: Do not use __has_include__Florian Weimer
The user-visible preprocessor construct is called __has_include. (cherry picked from commit 28dd3939221ab26c6774097e9596e30d9753f758)
2019-12-05misc/test-errno-linux: Handle EINVAL from quotactlFlorian Weimer
In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags argument, causing the test to fail due to too restrictive test expectations. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 1f7525d924b608a3e43b10fcfb3d46b8a6e9e4f9)
2019-12-05<string.h>: Define __CORRECT_ISO_CPP_STRING_H_PROTO for Clang [BZ #25232]Kamlesh Kumar
Without the asm redirects, strchr et al. are not const-correct. libc++ has a wrapper header that works with and without __CORRECT_ISO_CPP_STRING_H_PROTO (using a Clang extension). But when Clang is used with libstdc++ or just C headers, the overloaded functions with the correct types are not declared. This change does not impact current GCC (with libstdc++ or libc++). (cherry picked from commit 953ceff17a4a15b10cfdd5edc3c8cae4884c8ec3)
2019-12-04Merge branch release/2.28/master into ibm/2.28/masterTulio Magno Quites Machado Filho
2019-12-03x86: Assume --enable-cet if GCC defaults to CET [BZ #25225]Florian Weimer
This links in CET support if GCC defaults to CET. Otherwise, __CET__ is defined, yet CET functionality is not compiled and linked into the dynamic loader, resulting in a linker failure due to undefined references to _dl_cet_check and _dl_open_check. (cherry picked from commit 9fb8139079ef0bb1aa33a4ae418cbb113b9b9da7)
2019-11-28libio: Disable vtable validation for pre-2.1 interposed handles [BZ #25203]Florian Weimer
Commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0 ("libio: Disable vtable validation in case of interposition [BZ #23313]") only covered the interposable glibc 2.1 handles, in libio/stdfiles.c. The parallel code in libio/oldstdfiles.c needs similar detection logic. Fixes (again) commit db3476aff19b75c4fdefbe65fcd5f0a90588ba51 ("libio: Implement vtable verification [BZ #20191]"). Change-Id: Ief6f9f17e91d1f7263421c56a7dc018f4f595c21 (cherry picked from commit cb61630ed712d033f54295f776967532d3f4b46a)
2019-11-22Merge branch release/2.28/master into ibm/2.28/masterTulio Magno Quites Machado Filho
2019-11-22rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC ↵Marcin Kościelnicki
(CVE-2019-19126) [BZ #25204] The problem was introduced in glibc 2.23, in commit b9eb92ab05204df772eb4929eccd018637c9f3e9 ("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT"). (cherry picked from commit d5dfad4326fc683c813df1e37bbf5cf920591c8e)
2019-11-06Fix alignment of TLS variables for tls variant TLS_TCB_AT_TP [BZ #23403]Stefan Liebler
The alignment of TLS variables is wrong if accessed from within a thread for architectures with tls variant TLS_TCB_AT_TP. For the main thread the static tls data is properly aligned. For other threads the alignment depends on the alignment of the thread pointer as the static tls data is located relative to this pointer. This patch adds this alignment for TLS_TCB_AT_TP variants in the same way as it is already done for TLS_DTV_AT_TP. The thread pointer is also already properly aligned if the user provides its own stack for the new thread. This patch extends the testcase nptl/tst-tls1.c in order to check the alignment of the tls variables and it adds a pthread_create invocation with a user provided stack. The test itself is migrated from test-skeleton.c to test-driver.c and the missing support functions xpthread_attr_setstack and xposix_memalign are added. ChangeLog: [BZ #23403] * nptl/allocatestack.c (allocate_stack): Align pointer pd for TLS_TCB_AT_TP tls variant. * nptl/tst-tls1.c: Migrate to support/test-driver.c. Add alignment checks. * support/Makefile (libsupport-routines): Add xposix_memalign and xpthread_setstack. * support/support.h: Add xposix_memalign. * support/xthread.h: Add xpthread_attr_setstack. * support/xposix_memalign.c: New File. * support/xpthread_attr_setstack.c: Likewise. (cherry picked from commit bc79db3fd487daea36e7c130f943cfb9826a41b4)
2019-11-05mips: Force RWX stack for hard-float builds that can run on pre-4.8 kernelsDragan Mladjenovic
Linux/Mips kernels prior to 4.8 could potentially crash the user process when doing FPU emulation while running on non-executable user stack. Currently, gcc doesn't emit .note.GNU-stack for mips, but that will change in the future. To ensure that glibc can be used with such future gcc, without silently resulting in binaries that might crash in runtime, this patch forces RWX stack for all built objects if configured to run against minimum kernel version less than 4.8. * sysdeps/unix/sysv/linux/mips/Makefile (test-xfail-check-execstack): Move under mips-has-gnustack != yes. (CFLAGS-.o*, ASFLAGS-.o*): New rules. Apply -Wa,-execstack if mips-force-execstack == yes. * sysdeps/unix/sysv/linux/mips/configure: Regenerated. * sysdeps/unix/sysv/linux/mips/configure.ac (mips-force-execstack): New var. Set to yes for hard-float builds with minimum_kernel < 4.8.0 or minimum_kernel not set at all. (mips-has-gnustack): New var. Use value of libc_cv_as_noexecstack if mips-force-execstack != yes, otherwise set to no. (cherry picked from commit 33bc9efd91de1b14354291fc8ebd5bce96379f12)
2019-11-01elf: Refuse to dlopen PIE objects [BZ #24323]Florian Weimer
Another executable has already been mapped, so the dynamic linker cannot perform relocations correctly for the second executable. (cherry picked from commit 2c75b545de6fe3c44138799c68217a94bc669a88) (test omitted due to indirect dependency on test-in-container)
2019-10-31nss_db: fix endent wrt NULL mappings [BZ #24695] [BZ #24696]DJ Delorie
nss_db allows for getpwent et al to be called without a set*ent, but it only works once. After the last get*ent a set*ent is required to restart, because the end*ent did not properly reset the module. Resetting it to NULL allows for a proper restart. If the database doesn't exist, however, end*ent erroniously called munmap which set errno. The test case runs "makedb" inside the testroot, so needs selinux DSOs installed. (cherry picked from commit 99135114ba23c3110b7e4e650fabdc5e639746b7) (note: tests excluded as test-in-container infrastructure missing)
2019-10-31Call _dl_open_check after relocation [BZ #24259]H.J. Lu
This is a workaround for [BZ #20839] which doesn't remove the NODELETE object when _dl_open_check throws an exception. Move it after relocation in dl_open_worker to avoid leaving the NODELETE object mapped without relocation. [BZ #24259] * elf/dl-open.c (dl_open_worker): Call _dl_open_check after relocation. * sysdeps/x86/Makefile (tests): Add tst-cet-legacy-5a, tst-cet-legacy-5b, tst-cet-legacy-6a and tst-cet-legacy-6b. (modules-names): Add tst-cet-legacy-mod-5a, tst-cet-legacy-mod-5b, tst-cet-legacy-mod-5c, tst-cet-legacy-mod-6a, tst-cet-legacy-mod-6b and tst-cet-legacy-mod-6c. (CFLAGS-tst-cet-legacy-5a.c): New. (CFLAGS-tst-cet-legacy-5b.c): Likewise. (CFLAGS-tst-cet-legacy-mod-5a.c): Likewise. (CFLAGS-tst-cet-legacy-mod-5b.c): Likewise. (CFLAGS-tst-cet-legacy-mod-5c.c): Likewise. (CFLAGS-tst-cet-legacy-6a.c): Likewise. (CFLAGS-tst-cet-legacy-6b.c): Likewise. (CFLAGS-tst-cet-legacy-mod-6a.c): Likewise. (CFLAGS-tst-cet-legacy-mod-6b.c): Likewise. (CFLAGS-tst-cet-legacy-mod-6c.c): Likewise. ($(objpfx)tst-cet-legacy-5a): Likewise. ($(objpfx)tst-cet-legacy-5a.out): Likewise. ($(objpfx)tst-cet-legacy-mod-5a.so): Likewise. ($(objpfx)tst-cet-legacy-mod-5b.so): Likewise. ($(objpfx)tst-cet-legacy-5b): Likewise. ($(objpfx)tst-cet-legacy-5b.out): Likewise. (tst-cet-legacy-5b-ENV): Likewise. ($(objpfx)tst-cet-legacy-6a): Likewise. ($(objpfx)tst-cet-legacy-6a.out): Likewise. ($(objpfx)tst-cet-legacy-mod-6a.so): Likewise. ($(objpfx)tst-cet-legacy-mod-6b.so): Likewise. ($(objpfx)tst-cet-legacy-6b): Likewise. ($(objpfx)tst-cet-legacy-6b.out): Likewise. (tst-cet-legacy-6b-ENV): Likewise. * sysdeps/x86/tst-cet-legacy-5.c: New file. * sysdeps/x86/tst-cet-legacy-5a.c: Likewise. * sysdeps/x86/tst-cet-legacy-5b.c: Likewise. * sysdeps/x86/tst-cet-legacy-6.c: Likewise. * sysdeps/x86/tst-cet-legacy-6a.c: Likewise. * sysdeps/x86/tst-cet-legacy-6b.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-5.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-5a.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-5b.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-5c.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-6.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-6a.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-6b.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-6c.c: Likewise. (cherry picked from commit d0093c5cefb7f7a4143f3bb03743633823229cc6)
2019-10-31Base max_fast on alignment, not width, of bins (Bug 24903)DJ Delorie
set_max_fast sets the "impossibly small" value based on, eventually, MALLOC_ALIGNMENT. The comparisons for the smallest chunk used is, eventually, MIN_CHUNK_SIZE. Note that i386 is the only platform where these are the same, so a smallest chunk *would* be put in a no-fastbins fastbin. This change calculates the "impossibly small" value based on MIN_CHUNK_SIZE instead, so that we can know it will always be impossibly small. (cherry picked from commit ff12e0fb91b9072800f031cb21fb2651ee7b6251)
2019-10-30malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]Niklas Hambüchen
Fixes `<total type="rest" size="..."> incorrectly showing as 0 most of the time. The rest value being wrong is significant because to compute the actual amount of memory handed out via malloc, the user must subtract it from <system type="current" size="...">. That result being wrong makes investigating memory fragmentation issues like <https://bugzilla.redhat.com/show_bug.cgi?id=843478> close to impossible. (cherry picked from commit b6d2c4475d5abc05dd009575b90556bdd3c78ad0)
2019-10-30malloc: Remove unwanted leading whitespace in malloc_info [BZ #24867]Florian Weimer
It was introduced in commit 6c8dbf00f536d78b1937b5af6f57be47fd376344 ("Reformat malloc to gnu style."). Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit b0f6679bcd738ea244a14acd879d974901e56c8e)
2019-10-30malloc: Various cleanups for malloc/tst-mxfastFlorian Weimer
(cherry picked from commit f9769a239784772453d595bc2f4bed8739810e06)
2019-10-30Add glibc.malloc.mxfast tunableDJ Delorie
* elf/dl-tunables.list: Add glibc.malloc.mxfast. * manual/tunables.texi: Document it. * malloc/malloc.c (do_set_mxfast): New. (__libc_mallopt): Call it. * malloc/arena.c: Add mxfast tunable. * malloc/tst-mxfast.c: New. * malloc/Makefile: Add it. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit c48d92b430c480de06762f80c104922239416826)
2019-10-30nscd: avoid assertion failure during persistent db checkAndreas Schwab
nscd should not abort when it finds inconsistencies in the persistent db. (cherry picked from commit 61595e3d36ded374f97961503e843a314b0203c2)
2019-10-30Small tcache improvementsWilco Dijkstra
Change the tcache->counts[] entries to uint16_t - this removes the limit set by char and allows a larger tcache. Remove a few redundant asserts. bench-malloc-thread with 4 threads is ~15% faster on Cortex-A72. Reviewed-by: DJ Delorie <dj@redhat.com> * malloc/malloc.c (MAX_TCACHE_COUNT): Increase to UINT16_MAX. (tcache_put): Remove redundant assert. (tcache_get): Remove redundant asserts. (__libc_malloc): Check tcache count is not zero. * manual/tunables.texi (glibc.malloc.tcache_count): Update maximum. (cherry picked from commit 1f50f2ad854c84ead522bfc7331b46dbe6057d53)
2019-10-30Fix assertion in malloc.c:tcache_get.Joseph Myers
One of the warnings that appears with -Wextra is "ordered comparison of pointer with integer zero" in malloc.c:tcache_get, for the assertion: assert (tcache->entries[tc_idx] > 0); Indeed, a "> 0" comparison does not make sense for tcache->entries[tc_idx], which is a pointer. My guess is that tcache->counts[tc_idx] is what's intended here, and this patch changes the assertion accordingly. Tested for x86_64. * malloc/malloc.c (tcache_get): Compare tcache->counts[tc_idx] with 0, not tcache->entries[tc_idx]. (cherry picked from commit 77dc0d8643aa99c92bf671352b0a8adde705896f)
2019-09-13Improve performance of memmemWilco Dijkstra
This patch significantly improves performance of memmem using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 2 use a dedicated linear search. Very long needles use the Two-Way algorithm (to avoid increasing stack size or slowing down the common case, inlining is disabled). The performance gain is 6.6 times on English text on AArch64 using random needles with average size 8. Tested against GLIBC testsuite and randomized tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> * string/memmem.c (__memmem): Rewrite to improve performance. (cherry picked from commit 680942b0167715e123d934b609060cd382f8e39f)
2019-09-13Improve performance of strstrWilco Dijkstra
This patch significantly improves performance of strstr using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 3 use a dedicated linear search. Very long needles use the Two-Way algorithm. The performance gain using the improved bench-strstr on Cortex-A72 is 5.8 times basic_strstr and 3.7 times twoway_strstr. Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> * string/str-two-way.h (two_way_short_needle): Add inline to avoid warning. (two_way_long_needle): Block inlining. * string/strstr.c (strstr2): Add new function. (strstr3): Likewise. (STRSTR): Completely rewrite strstr to improve performance. (cherry picked from commit 5e0a7ecb6629461b28adc1a5aabcc0ede122f201)
2019-09-13Speedup first memmem matchRajalakshmi Srinivasaraghavan
As done in commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5, memcmp can be used after memchr to avoid the initialization overhead of the two-way algorithm for the first match. This has shown improvement >40% for first match. (cherry picked from commit c8dd67e7c958de04c3783cbea7c384431707b5f8)
2019-09-13Simplify and speedup strstr/strcasestr first matchWilco Dijkstra
Looking at the benchtests, both strstr and strcasestr spend a lot of time in a slow initialization loop handling one character per iteration. This can be simplified and use the much faster strlen/strnlen/strchr/memcmp. Read ahead a few cachelines to reduce the number of strnlen calls, which improves performance by ~3-4%. This patch improves the time taken for the full strstr benchtest by >40%. * string/strcasestr.c (STRCASESTR): Simplify and speedup first match. * string/strstr.c (AVAILABLE): Likewise. (cherry picked from commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5)
2019-09-06[AArch64] Add ifunc support for AresWilco Dijkstra
Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares supports 2 128-bit loads/stores, use Neon registers for memcpy by selecting __memcpy_falkor by default (we should rename this to __memcpy_simd or similar). * manual/tunables.texi (glibc.cpu.name): Add ares tunable. * sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use __memcpy_falkor for ares. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES): Add new define. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add ares cpu. (cherry picked from commit 02f440c1ef5d5d79552a524065aa3e2fabe469b9)
2019-07-12posix: Fix large mmap64 offset for mips64n32 (BZ#24699)Adhemerval Zanella
The fix for BZ#21270 (commit 158d5fa0e19) added a mask to avoid offset larger than 1^44 to be used along __NR_mmap2. However mips64n32 users __NR_mmap, as mips64n64, but still defines off_t as old non-LFS type (other ILP32, such x32, defines off_t being equal to off64_t). This leads to use the same mask meant only for __NR_mmap2 call for __NR_mmap, thus limiting the maximum offset it can use with mmap64. This patch fixes by setting the high mask only for __NR_mmap2 usage. The posix/tst-mmap-offset.c already tests it and also fails for mips64n32. The patch also change the test to check for an arch-specific header that defines the maximum supported offset. Checked on x86_64-linux-gnu, i686-linux-gnu, and I also tests tst-mmap-offset on qemu simulated mips64 with kernel 3.2.0 kernel for both mips-linux-gnu and mips64-n32-linux-gnu. [BZ #24699] * posix/tst-mmap-offset.c: Mention BZ #24699. (do_test_bz21270): Rename to do_test_large_offset and use mmap64_maximum_offset to check for maximum expected offset value. * sysdeps/generic/mmap_info.h: New file. * sysdeps/unix/sysv/linux/mips/mmap_info.h: Likewise. * sysdeps/unix/sysv/linux/mmap64.c (MMAP_OFF_HIGH_MASK): Define iff __NR_mmap2 is used. (cherry picked from commit a008c76b56e4f958cf5a0d6f67d29fade89421b7)
2019-07-12aarch64: handle STO_AARCH64_VARIANT_PCSSzabolcs Nagy
Backport of commit 82bc69c012838a381c4167c156a06f4598f34227 and commit 30ba0375464f34e4bf8129f3d3dc14d0c09add17 without using DT_AARCH64_VARIANT_PCS for optimizing the symbol table check. This is needed so the internal abi between ld.so and libc.so is unchanged. Avoid lazy binding of symbols that may follow a variant PCS with different register usage convention from the base PCS. Currently the lazy binding entry code does not preserve all the registers required for AdvSIMD and SVE vector calls. Saving and restoring all registers unconditionally may break existing binaries, even if they never use vector calls, because of the larger stack requirement for lazy resolution, which can be significant on an SVE system. The solution is to mark all symbols in the symbol table that may follow a variant PCS so the dynamic linker can handle them specially. In this patch such symbols are always resolved at load time, not lazily. So currently LD_AUDIT for variant PCS symbols are not supported, for that the _dl_runtime_profile entry needs to be changed e.g. to unconditionally save/restore all registers (but pass down arg and retval registers to pltentry/exit callbacks according to the base PCS). This patch also removes a __builtin_expect from the modified code because the branch prediction hint did not seem useful. * sysdeps/aarch64/dl-machine.h (elf_machine_lazy_rel): Check STO_AARCH64_VARIANT_PCS and bind such symbols at load time.
2019-07-10aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCSSzabolcs Nagy
STO_AARCH64_VARIANT_PCS is a non-visibility st_other flag for marking symbols that reference functions that may follow a variant PCS with different register usage convention from the base PCS. DT_AARCH64_VARIANT_PCS is a dynamic tag that marks ELF modules that have R_*_JUMP_SLOT relocations for symbols marked with STO_AARCH64_VARIANT_PCS (i.e. have variant PCS calls via a PLT). * elf/elf.h (STO_AARCH64_VARIANT_PCS): Define. (DT_AARCH64_VARIANT_PCS): Define.
2019-07-09io: Remove copy_file_range emulation [BZ #24744]Florian Weimer
The kernel is evolving this interface (e.g., removal of the restriction on cross-device copies), and keeping up with that is difficult. Applications which need the function should run kernels which support the system call instead of relying on the imperfect glibc emulation. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 5a659ccc0ec217ab02a4c273a1f6d346a359560a)
2019-06-27Merge branch release/2.28/master into ibm/2.28/masterTulio Magno Quites Machado Filho