Age | Commit message (Collapse) | Author |
|
The glibc memcpy benchmark on Intel Core i7-1065G7 (Ice Lake) showed
that REP MOVSB became faster after 2112 bytes:
Vector Move REP MOVSB
length=2112, align1=0, align2=0: 24.20 24.40
length=2112, align1=1, align2=0: 26.07 23.13
length=2112, align1=0, align2=1: 27.18 28.13
length=2112, align1=1, align2=1: 26.23 25.16
length=2176, align1=0, align2=0: 23.18 22.52
length=2176, align1=2, align2=0: 25.45 22.52
length=2176, align1=0, align2=2: 27.14 27.82
length=2176, align1=2, align2=2: 22.73 25.56
length=2240, align1=0, align2=0: 24.62 24.25
length=2240, align1=3, align2=0: 29.77 27.15
length=2240, align1=0, align2=3: 35.55 29.93
length=2240, align1=3, align2=3: 34.49 25.15
length=2304, align1=0, align2=0: 34.75 26.64
length=2304, align1=4, align2=0: 32.09 22.63
length=2304, align1=0, align2=4: 28.43 31.24
Use REP MOVSB for data size > 2112 bytes in memcpy on processors with
fast short REP MOVSB (FSRM).
* sysdeps/x86/dl-cacheinfo.h (dl_init_cacheinfo): Set
rep_movsb_threshold to 2112 on processors with fast short REP
MOVSB (FSRM).
|
|
Pass "amx-bf16", "amx-int8" and "amx-tile", instead of "amx_bf16",
"amx_int8" and "amx_tile", to __builtin_cpu_supports for GCC 11.
|
|
The definitions in libc are sufficient, the forwarders are no longer
needed.
The symbols have been moved using scripts/move-symbol-to-libc.py.
s390-linux-gnu and s390x-linux-gnu need a new version placeholder
to keep the GLIBC_2.19 symbol version in libpthread.
Tested on i386-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu,
x86_64-linux-gnu. Built with build-many-glibcs.py.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
All of the isnan functions are in libc.so due to printf_fp, so move
__isnanf128 there too for consistency.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort. When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation. Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.
|
|
1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered
by VZEROUPPER inside a transactionally executing RTM region.
2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
|
|
1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE.
2. Disable all features which depend on XSAVE:
a. If OSXSAVE is disabled by glibc tunables. Or
b. If both XSAVE and XSAVEC aren't usable.
|
|
|
|
__x86_get_cpuid_feature_leaf is called during early startup, before
the stack check guard is initialized and is hence not safe to build
with stack-protector.
Additionally, IFUNC resolvers for static tst-ifunc-isa tests get
called too early for stack protector to be useful, so fix them to
disable stack protector for the resolver functions.
This fixes all failures seen with --enable-stack-protector=all
configuration.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
commit 2d651eb9265d1366d7b9e881bfddd46db9c1ecc4
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Sep 18 07:55:14 2020 -0700
x86: Move x86 processor cache info to cpu_features
missed _SC_LEVEL1_ICACHE_LINESIZE.
1. Add level1_icache_linesize to struct cpu_features.
2. Initialize level1_icache_linesize by calling handle_intel,
handle_zhaoxin and handle_amd with _SC_LEVEL1_ICACHE_LINESIZE.
3. Return level1_icache_linesize for _SC_LEVEL1_ICACHE_LINESIZE.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
Since the full ISA set used in an ELF binary is unknown to compiler,
an x86-64 ISA level marker indicates the minimum, not maximum, ISA set
required to run such an ELF binary. We never guarantee a library with
an x86-64 ISA level v3 marker doesn't contain other ISAs beyond x86-64
ISA level v3, like AVX VNNI. We check the x86-64 ISA level marker for
the minimum ISA set. Since -march=sandybridge enables only some ISAs
in x86-64 ISA level v3, we should set the needed ISA marker to v2.
Otherwise, libc is compiled with -march=sandybridge will fail to run on
Sandy Bridge:
$ ./elf/ld.so ./libc.so
./libc.so: (p) CPU ISA level is lower than required: needed: 7; got: 3
Set the minimum, instead of maximum, x86-64 ISA level marker should have
no impact on the glibc-hwcaps directory assignment logic in ldconfig nor
ld.so.
|
|
|
|
Use a .def file to define the bitfield layout, so that it is possible
to iterate over field members using the preprocessor.
|
|
1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h.
2. Remove sysdeps/x86_64/nptl/pthreaddef.h.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
|
|
x86_cpu_TBM should be x86_cpu_index_80000001_ecx + 21.
|
|
Remove the extra space between "# endif" left over from
commit f380868f6dcfdeae8d449d556298d9c41012ed8d
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Dec 24 15:43:34 2020 -0800
Remove _ISOMAC check from <cpu-features.h>
|
|
The tunable types are SIZE_T, so set the ranges to the correct maximum
value, i.e. SIZE_MAX.
|
|
The TUNABLE_SET interface took a primitive C type argument, which
resulted in inconsistent type conversions internally due to incorrect
dereferencing of types, especialy on 32-bit architectures. This
change simplifies the TUNABLE setting logic along with the interfaces.
Now all numeric tunable values are stored as signed numbers in
tunable_num_t, which is intmax_t. All calls to set tunables cast the
input value to its primitive type and then to tunable_num_t for
storage. This relies on gcc-specific (although I suspect other
compilers woul also do the same) unsigned to signed integer conversion
semantics, i.e. the bit pattern is conserved. The reverse conversion
is guaranteed by the standard.
|
|
1. Add CPUID_INDEX_14_ECX_0 for CPUID leaf 0x14 to detect PTWRITE feature
in EBX of CPUID leaf 0x14 with ECX == 0.
2. Add PTWRITE detection to CPU feature tests.
3. Add 2 static CPU feature tests.
|
|
In the process of optimizing memcpy for AMD machines, we have found the
vector move operations are outperforming enhanced REP MOVSB for data
transfers above the L2 cache size on Zen3 architectures.
To handle this use case, we are adding an upper bound parameter on
enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'.
As per large-bench results, we are configuring this parameter to the
L2 cache size for AMD machines and applicable from Zen3 architecture
supporting the ERMS feature.
For architectures other than AMD, it is the computed value of
non-temporal threshold parameter.
Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
|
|
Add _SC_MINSIGSTKSZ for the minimum signal stack size derived from
AT_MINSIGSTKSZ, which is the minimum number of bytes of free stack
space required in order to gurantee successful, non-nested handling
of a single signal whose handler is an empty function, and _SC_SIGSTKSZ
which is the suggested minimum number of bytes of stack space required
for a signal stack.
If AT_MINSIGSTKSZ isn't available, sysconf (_SC_MINSIGSTKSZ) returns
MINSIGSTKSZ. On Linux/x86 with XSAVE, the signal frame used by kernel
is composed of the following areas and laid out as:
------------------------------
| alignment padding |
------------------------------
| xsave buffer |
------------------------------
| fsave header (32-bit only) |
------------------------------
| siginfo + ucontext |
------------------------------
Compute AT_MINSIGSTKSZ value as size of xsave buffer + size of fsave
header (32-bit only) + size of siginfo and ucontext + alignment padding.
If _SC_SIGSTKSZ_SOURCE or _GNU_SOURCE are defined, MINSIGSTKSZ and SIGSTKSZ
are redefined as
/* Default stack size for a signal handler: sysconf (SC_SIGSTKSZ). */
# undef SIGSTKSZ
# define SIGSTKSZ sysconf (_SC_SIGSTKSZ)
/* Minimum stack size for a signal handler: SIGSTKSZ. */
# undef MINSIGSTKSZ
# define MINSIGSTKSZ SIGSTKSZ
Compilation will fail if the source assumes constant MINSIGSTKSZ or
SIGSTKSZ.
The reason for not simply increasing the kernel's MINSIGSTKSZ #define
(apart from the fact that it is rarely used, due to glibc's shadowing
definitions) was that userspace binaries will have baked in the old
value of the constant and may be making assumptions about it.
For example, the type (char [MINSIGSTKSZ]) changes if this #define
changes. This could be a problem if an newly built library tries to
memcpy() or dump such an object defined by and old binary.
Bounds-checking and the stack sizes passed to things like sigaltstack()
and makecontext() could similarly go wrong.
|
|
commit 94cd37ebb293321115a36a422b091fdb72d2fb08
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Sep 16 05:27:32 2020 -0700
x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625]
broke
GLIBC_TUNABLES=glibc.cpu.hwcaps=-IBT,-SHSTK
since it can no longer disable IBT nor SHSTK. Handle IBT and SHSTK with:
1. Revert commit 94cd37ebb293321115a36a422b091fdb72d2fb08.
2. Clears the usable CET feature bits if kernel doesn't support CET.
3. Add GLIBC_TUNABLES tests without dlopen.
4. Add tests to verify that CPU_FEATURE_USABLE on IBT and SHSTK matches
_get_ssp.
5. Update GLIBC_TUNABLES tests with dlopen to verify that CET is disabled
with GLIBC_TUNABLES.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
Constify __x86_cacheinfo_p and __x86_cpu_features_p, not their pointer
target types.
|
|
Search " YYY " and " YYY\n", instead of "YYY", to avoid matching
"XXXYYYZZZ" with "YYY".
Update /proc/cpuinfo CPU feature names:
/proc/cpuinfo glibc
------------------------------------------------
avx512vbmi AVX512_VBMI
dts DS
pni SSE3
tsc_deadline_timer TSC_DEADLINE
|
|
Check ifunc resolver with CPU_FEATURE_USABLE and tunables in dynamic and
static executables to verify that CPUID features are initialized early in
static PIE.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Extern symbol access in position independent code usually involves GOT
indirection which needs RELATIVE reloc in a static linked PIE. (On
some targets this is avoided e.g. because the linker can relax a GOT
access to a pc-relative access, but this is not generally true.) Code
that runs before static PIE self relocation must avoid relying on
dynamic relocations which can be ensured by using hidden visibility.
However we cannot just make all symbols hidden:
On i386, all calls to IFUNC functions must go through PLT and calls to
hidden functions CANNOT go through PLT in PIE since EBX used in PIE PLT
may not be set up for local calls to hidden IFUNC functions.
This patch aims to make symbol references hidden in code that is used
before and by _dl_relocate_static_pie when building a static PIE libc.
Note: for an object that is used in the startup code, its references
and definition may not have consistent visibility: it is only forced
hidden in the startup code.
This is needed for fixing bug 27072.
Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
In <sys/platform/x86.h>, define CPU features as enum instead of using
the C preprocessor magic to make it easier to wrap this functionality
in other languages. Move the C preprocessor magic to internal header
for better GCC codegen when more than one features are checked in a
single expression as in x86-64 dl-hwcaps-subdirs.c.
1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX.
2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h.
3. Remove struct cpu_features and __x86_get_cpu_features from
<sys/platform/x86.h>.
4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it
in libc.
5. Make __get_cpu_features() private to glibc.
6. Replace __x86_get_cpu_features(N) with __get_cpu_features().
7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE.
8. Use a single enum index for each CPU feature detection.
9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf.
10. Return zero struct cpuid_feature for the older glibc binary with a
smaller CPUID_INDEX_MAX [BZ #27104].
11. Inside glibc, use the C preprocessor magic so that cpu_features data
can be loaded just once leading to more compact code for glibc.
256 bits are used for each CPUID leaf. Some leaves only contain a few
features. We can add exceptions to such leaves. But it will increase
code sizes and it is harder to provide backward/forward compatibilities
when new features are added to such leaves in the future.
When new leaves are added, _rtld_global_ro offsets will change which
leads to race condition during in-place updates. We may avoid in-place
updates by
1. Rename the old glibc.
2. Install the new glibc.
3. Remove the old glibc.
NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy
relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
|
|
1. Move x86 processor cache info to _dl_x86_cpu_features in ld.so.
2. Update tunable bounds with TUNABLE_SET_WITH_BOUNDS.
3. Move x86 cache info initialization to dl-cacheinfo.h and initialize
x86 cache info in init_cpu_features ().
4. Put x86 cache info for libc in cacheinfo.h, which is included in
libc-start.c in libc.a and is included in cacheinfo.c in libc.so.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Checked on x86_64-linux-gnu.
|
|
Store ISA level in the portion of the unused upper 32 bits of the hwcaps
field in cache and the unused pad field in aux cache. ISA level is stored
and checked only for shared objects in glibc-hwcaps subdirectories. The
shared objects in the default directories aren't checked since there are
no fallbacks for these shared objects.
Tested on x86-64-v2, x86-64-v3 and x86-64-v4 machines with
--disable-hardcoded-path-in-tests and --enable-hardcoded-path-in-tests.
|
|
Update dl_cet_check() to set header.feature_1 in TCB when both IBT and
SHSTK are always on.
|
|
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
levels:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250
and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with
GNU_PROPERTY_X86_ISA_1_V[234] marker:
https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13
Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by
commit b0ab06937385e0ae25cebf1991787d64f439bf12
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Oct 30 06:49:57 2020 -0700
x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker
and
commit 32930e4edbc06bc6f10c435dbcc63131715df678
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Oct 9 05:05:57 2020 -0700
x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker
GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the
micro-architecture ISA level required to execute the binary. The marker
must be added by programmers explicitly in one of 3 ways:
1. Pass -mneeded to GCC.
2. Add the marker in the linker inputs as this patch does.
3. Pass -z x86-64-v[234] to the linker.
Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker support to ld.so if binutils 2.32 or newer is used to build glibc:
1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
markers to elf.h.
2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker to abi-note.o based on the ISA level used to compile abi-note.o,
assuming that the same ISA level is used to compile the whole glibc.
3. Add isa_1 to cpu_features to record the supported x86 ISA level.
4. Rename _dl_process_cet_property_note to _dl_process_property_note and
add GNU_PROPERTY_X86_ISA_1_V[234] marker detection.
5. Update _rtld_main_check and _dl_open_check to check loaded objects
with the incompatible ISA level.
6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails
on lesser platforms.
7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c.
Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and
x86-64-v4 machines.
Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine
and got:
[hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1
./elf/tst-isa-level-1: CPU ISA level is lower than required
[hjl@gnu-cfl-2 build-x86_64-linux]$
|
|
Make the tests use TEST_COND_intel96 to decide on whether to build the
unnormal tests instead of the macro in nan-pseudo-number.h and then
drop the header inclusion. This unbreaks test runs on all
architectures that do not have ldbl-96.
Also drop the HANDLE_PSEUDO_NUMBERS macro since it is not used
anywhere.
|
|
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
|
|
Add support to treat pseudo-numbers specially and implement x86
version to consider all of them as signaling.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Remove _ISOMAC check from <cpu-features.h> since it isn't an installer
header file.
|
|
CPU_FEATURE_CPU_P is defined in sysdeps/x86/sys/platform/x86.h. Remove
the duplicated CPU_FEATURE_CPU_P in sysdeps/x86/include/cpu-features.h.
|
|
Do not attempt to fix the significand top bit in long double input
received in printf. The code should never reach here because isnan
should now detect unnormals as NaN. This is already a NOP for glibc
since it uses the gcc __builtin_isnan, which detects unnormals as NaN.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
This syncs up isnanl behaviour with gcc. Also move the isnanl
implementation to sysdeps/x86 and remove the sysdeps/x86_64 version.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Also move sysdeps/i386/fpu/s_fpclassifyl.c to
sysdeps/x86/fpu/s_fpclassifyl.c and remove
sysdeps/x86_64/fpu/s_fpclassifyl.c
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Add Intel Linear Address Masking (LAM) support to <sys/platform/x86.h>.
HAS_CPU_FEATURE (LAM) can be used to detect if LAM is enabled in CPU.
LAM modifies the checking that is applied to 64-bit linear addresses,
allowing software to use of the untranslated address bits for metadata.
|
|
Since we can't tell if the tunable value is set by user or not:
https://sourceware.org/bugzilla/show_bug.cgi?id=27069
remove the default REP MOVSB threshold tunable value so that the correct
default value will be set correctly by init_cacheinfo ().
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
To handle GNU property notes on aarch64 some segments need to
be mmaped again, so the fd of the loaded ELF module is needed.
When the fd is not available (kernel loaded modules), then -1
is passed.
The fd is passed to both _dl_process_pt_gnu_property and
_dl_process_pt_note for consistency. Target specific note
processing functions are updated accordingly.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Check HAS_CPU_FEATURE instead of CPU_FEATURE_USABLE for FSGSBASE, IBT,
LM, SHSTK and XSAVES since FSGSBASE requires kernel support, IBT/SHSTK/LM
require OS support and XSAVES is supervisor-mode only.
|
|
Set RDRAND usable if CPU supports RDRAND.
|
|
The UP macro is never defined. Also define LOCK_PREFIX
unconditionally, to the same string.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Fixes and partially reverts commit 59803e81f96b479c17f583b31eac44b5
("x86: Optimizing memcpy for AMD Zen architecture.").
|
|
Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.
Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
|
|
X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called. Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late. Since some fields in _rtld_global_ro
in ld.so are initialized by dynamic relocation, we can also initialize
x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so
by initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.
Key points:
1. IFUNC is always supported, independent of --enable-multi-arch or
--disable-multi-arch. Linker generates IFUNC relocations from input
IFUNC objects and ld.so performs IFUNC relocations.
2. There are no IFUNC dependencies in ld.so before dynamic relocation
have been performed,
3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT
in dynamic executable and by IFUNC relocation in dlopen in static
executable.
4. The x86 cache info in libc.o is initialized by IFUNC relocation.
5. In libc.a, both x86 CPU features and cache info are initialized from
ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init
is called.
Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
|