Age | Commit message (Collapse) | Author |
|
Save and restore shadow stack pointer in setjmp and longjmp to support
shadow stack in Intel CET. Use feature_1 in tcbhead_t to check if
shadow stack is enabled before saving and restoring shadow stack pointer.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/i386/__longjmp.S: Include <jmp_buf-ssp.h>.
(__longjmp): Restore shadow stack pointer if shadow stack is
enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
isn't defined for __longjmp_cancel.
* sysdeps/i386/bsd-_setjmp.S: Include <jmp_buf-ssp.h>.
(_setjmp): Save shadow stack pointer if shadow stack is enabled
and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/i386/bsd-setjmp.S: Include <jmp_buf-ssp.h>.
(setjmp): Save shadow stack pointer if shadow stack is enabled
and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/i386/setjmp.S: Include <jmp_buf-ssp.h>.
(__sigsetjmp): Save shadow stack pointer if shadow stack is
enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/unix/sysv/linux/i386/____longjmp_chk.S: Include
<jmp_buf-ssp.h>.
(____longjmp_chk): Restore shadow stack pointer if shadow stack
is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/unix/sysv/linux/x86/Makefile (gen-as-const-headers):
Remove jmp_buf-ssp.sym.
* sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S: Include
<jmp_buf-ssp.h>.
(____longjmp_chk): Restore shadow stack pointer if shadow stack
is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/x86/Makefile (gen-as-const-headers): Add
jmp_buf-ssp.sym.
* sysdeps/x86/jmp_buf-ssp.sym: New dummy file.
* sysdeps/x86_64/__longjmp.S: Include <jmp_buf-ssp.h>.
(__longjmp): Restore shadow stack pointer if shadow stack is
enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
isn't defined for __longjmp_cancel.
* sysdeps/x86_64/setjmp.S: Include <jmp_buf-ssp.h>.
(__sigsetjmp): Save shadow stack pointer if shadow stack is
enabled and SHADOW_STACK_POINTER_OFFSET is defined.
|
|
feature_1 has X86_FEATURE_1_IBT and X86_FEATURE_1_SHSTK bits for CET
run-time control.
CET_ENABLED, IBT_ENABLED and SHSTK_ENABLED are defined to 1 or 0 to
indicate that if CET, IBT and SHSTK are enabled.
<tls-setup.h> is added to set up thread-local data.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #22563]
* nptl/pthread_create.c: Include <tls-setup.h>.
(__pthread_create_2_1): Call tls_setup_tcbhead.
* sysdeps/generic/tls-setup.h: New file.
* sysdeps/x86/nptl/tls-setup.h: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET):
Likewise.
* sysdeps/i386/nptl/tls.h (tcbhead_t): Rename __glibc_reserved1
to feature_1.
* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Likewise.
* sysdeps/x86/sysdep.h (X86_FEATURE_1_IBT): New.
(X86_FEATURE_1_SHSTK): Likewise.
(CET_ENABLED): Likewise.
(IBT_ENABLED): Likewise.
(SHSTK_ENABLED): Likewise.
|
|
From Zen onwards this will be enabled. It was disabled for the
Excavator case and will remain disabled.
Reviewd-by: Carlos O'Donell <carlos@redhat.com>
|
|
Although the REP MOVSB implementations of memmove, memcpy and mempcpy
aren't used by the current processors, this patch adds Prefer_FSRM
check in ifunc-memmove.h so that they can be used in the future.
* sysdeps/x86/cpu-features.h (bit_arch_Prefer_FSRM): New.
(index_arch_Prefer_FSRM): Likewise.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Also check Prefer_FSRM.
* sysdeps/x86_64/multiarch/ifunc-memmove.h (IFUNC_SELECTOR):
Also return OPTIMIZE (erms) for Prefer_FSRM.
|
|
The newer Intel processors support Fast Short REP MOVSB which has a
feature bit in CPUID. This patch adds the Fast Short REP MOVSB (FSRM)
bit to x86 cpu-features.
* sysdeps/x86/cpu-features.h (bit_cpu_FSRM): New.
(index_cpu_FSRM): Likewise.
(reg_FSRM): Likewise.
|
|
Merge sysdeps/i386/ldsodefs.h and sysdeps/x86_64/ldsodefs.h into
sysdeps/x86/ldsodefs.h.
Tested on i686 and x86-64.
* sysdeps/i386/ldsodefs.h: Removed.
* sysdeps/x86_64/ldsodefs.h: Moved to ...
* sysdeps/x86/ldsodefs.h: This.
(La_i86_regs): New.
(La_i86_retval): Likewise.
(ARCH_PLTENTER_MEMBERS): Add i86_gnu_pltenter.
(ARCH_PLTEXIT_MEMBERS): i86_gnu_pltexit.
Acked-by: Christian Brauner (Ubuntu) christian@brauner.io
|
|
This patch continues cleaning up math_private.h by moving the
math_check_force_underflow set of macros to a separate header
math-underflow.h.
This header is included by the files that need it rather than from
math_private.h. Moving these macros to a separate file removes the
math_private.h uses of macros from float.h, so the inclusion of
float.h in math_private.h is also removed; files that were depending
on that inclusion are fixed to include float.h directly. The
inclusion of math-barriers.h from math_private.h will be removed in a
separate patch.
Tested for x86_64 and x86. Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* math/math-underflow.h: New file.
* sysdeps/generic/math_private.h: Do not include <float.h>.
(fabs_tg): Remove macro. Moved to math-underflow.h.
(min_of_type_f): Likewise.
(min_of_type_): Likewise.
(min_of_type_l): Likewise.
(min_of_type_f128): Likewise.
(min_of_type): Likewise.
(math_check_force_underflow): Likewise.
(math_check_force_underflow_nonneg): Likewise.
(math_check_force_underflow_complex): Likewise.
* math/e_exp2_template.c: Include <math-underflow.h>.
* math/k_casinh_template.c: Likewise.
* math/s_catan_template.c: Likewise.
* math/s_catanh_template.c: Likewise.
* math/s_ccosh_template.c: Likewise.
* math/s_cexp_template.c: Likewise.
* math/s_clog10_template.c: Likewise.
* math/s_clog_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_csqrt_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* sysdeps/ieee754/dbl-64/e_asin.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atanh.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_hypot.c: Likewise.
* sysdeps/ieee754/dbl-64/e_j1.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c: Likewise.
* sysdeps/ieee754/dbl-64/s_asinh.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_expm1.c: Likewise.
* sysdeps/ieee754/dbl-64/s_log1p.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tanh.c: Likewise.
* sysdeps/ieee754/flt-32/e_asinf.c: Likewise.
* sysdeps/ieee754/flt-32/e_atanhf.c: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/e_sinhf.c: Likewise.
* sysdeps/ieee754/flt-32/k_sinf.c: Likewise.
* sysdeps/ieee754/flt-32/k_tanf.c: Likewise.
* sysdeps/ieee754/flt-32/s_asinhf.c: Likewise.
* sysdeps/ieee754/flt-32/s_atanf.c: Likewise.
* sysdeps/ieee754/flt-32/s_erff.c: Likewise.
* sysdeps/ieee754/flt-32/s_expm1f.c: Likewise.
* sysdeps/ieee754/flt-32/s_log1pf.c: Likewise.
* sysdeps/ieee754/flt-32/s_tanhf.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_sincosl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_atanl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_expm1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_log1pl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_tanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sincosl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-96/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_tanhl.c: Likewise.
* sysdeps/powerpc/fpu/e_hypot.c: Likewise.
* sysdeps/x86/fpu/powl_helper.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nextup.c: Include <float.h>.
* sysdeps/ieee754/flt-32/s_nextupf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nextupl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextupl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_nextupl.c: Likewise.
|
|
This patch continues cleaning up math_private.h by moving the
math_opt_barrier and math_force_eval macros to a separate header
math-barriers.h.
At present, those macros are inside a "#ifndef math_opt_barrier" in
math_private.h to allow architectures to override them and then use
a separate math-barriers.h header, no such #ifndef or #include_next is
needed; architectures just have their own alternative version of
math-barriers.h when providing their own optimized versions that avoid
going through memory unnecessarily. The generic math-barriers.h has a
comment added to document these two macros.
In this patch, math_private.h is made to #include <math-barriers.h>,
so files using these macros do not need updating yet. That is because
of uses of math_force_eval in math_check_force_underflow and
math_check_force_underflow_nonneg, which are still defined in
math_private.h. Once those are moved out to a separate header, that
separate header can be made to include <math-barriers.h>, as can the
other files directly using these barrier macros, and then the include
of <math-barriers.h> from math_private.h can be removed.
Tested for x86_64 and x86. Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math-barriers.h: New file.
* sysdeps/generic/math_private.h [!math_opt_barrier]
(math_opt_barrier): Move to math-barriers.h.
[!math_opt_barrier] (math_force_eval): Likewise.
* sysdeps/aarch64/fpu/math-barriers.h: New file.
* sysdeps/aarch64/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/alpha/fpu/math-barriers.h: New file.
* sysdeps/alpha/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/x86/fpu/math-barriers.h: New file.
* sysdeps/i386/fpu/fenv_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/m68k/m680x0/fpu/math_private.h: Move to....
* sysdeps/m68k/m680x0/fpu/math-barriers.h: ... here. Adjust
multiple-include guard for rename.
* sysdeps/powerpc/fpu/math-barriers.h: New file.
* sysdeps/powerpc/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
|
|
The pad array in struct pthread_unwind_buf is used by setjmp to save
shadow stack register. We assert that size of struct pthread_unwind_buf
is no less than offset of shadow stack pointer + shadow stack pointer
size.
Since functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as
these with thread cancellation, call setjmp, but never return after
__libc_unwind_longjmp, __libc_unwind_longjmp, which is defined as
__libc_longjmp on x86, doesn't need to restore shadow stack register.
__libc_longjmp, which is a private interface for thread cancellation
implementation in libpthread, is changed to call __longjmp_cancel,
instead of __longjmp. __longjmp_cancel is a new internal function
in libc, which is similar to __longjmp, but doesn't restore shadow
stack register.
The compatibility longjmp and siglongjmp in libpthread.so are changed
to call __libc_siglongjmp, instead of __libc_longjmp, so that they will
restore shadow stack register.
Tested with build-many-glibcs.py.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl/pthread_create.c (START_THREAD_DEFN): Clear previous
handlers after setjmp.
* setjmp/longjmp.c (__libc_longjmp): Don't define alias if
defined.
* sysdeps/unix/sysv/linux/x86/setjmpP.h: Include
<libc-pointer-arith.h>.
(_JUMP_BUF_SIGSET_BITS_PER_WORD): New.
(_JUMP_BUF_SIGSET_NSIG): Changed to 96.
(_JUMP_BUF_SIGSET_NWORDS): Changed to use ALIGN_UP and
_JUMP_BUF_SIGSET_BITS_PER_WORD.
* sysdeps/x86/Makefile (sysdep_routines): Add __longjmp_cancel.
* sysdeps/x86/__longjmp_cancel.S: New file.
* sysdeps/x86/longjmp.c: Likewise.
* sysdeps/x86/nptl/pt-longjmp.c: Likewise.
|
|
Continuing the removals of inline functions from the x86
bits/mathinline.h, this patch removes an inline of __finite (which was
not actually architecture-specific at all beyond its
endianness-dependence).
This inline is not normally used with GCC 4.4 or later, because
isfinite now uses __builtin_isfinite except for -fsignaling-nans.
Allowing __builtin_isfinite etc. to work properly even for
-fsignaling-nans, by implementing versions of those built-in functions
that use integer arithmetic in GCC, is
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66462> (a patch was
committed but had to be reverted because it caused problems, and that
patch didn't address all formats for all architectures, only some, so
by itself would not have been sufficient to allow glibc to use
__builtin_isfinite unconditionally for new-enough GCC).
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h [__USE_MISC] (__finite):
Remove inline function.
|
|
Remove the now unused target specific__ieee754_sqrt(f/l) inlines.
Also remove inlines of sqrt which are for really old GCC versions.
Removing these is desirable, under the general principle of leaving
such inlining to the compiler rather than trying to do it in installed
headers, especially when only very old compilers are affected.
Note that removing inlines for __ieee754_sqrt disables inlining in the
sqrt wrapper functions. Given the sqrt function will typically only be
called for negative arguments, it doesn't matter whether the inlining
happens or not.
* sysdeps/aarch64/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/generic/math-type-macros.h (M_SQRT): Use sqrt.
* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
* sysdeps/powerpc/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/s390/fpu/bits/mathinline.h: Remove file.
* sysdeps/sparc/fpu/bits/mathinline.h (sqrt) Remove.
(sqrtf): Remove.
(sqrtl): Remove.
(__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
(__ieee754_sqrtl): Remove.
* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
* sysdeps/x86/fpu/math_private.h (__ieee754_sqrt): Remove.
* sysdeps/x86_64/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
(__ieee754_sqrtl): Remove.
|
|
This patch removes further parts of sysdeps/x86/fpu/bits/mathinline.h
that are only of value for optimization with older compiler versions,
in accordance with general principles of preferring the let the
compiler deal with such inlining through built-in functions.
In general, GCC supports inlining all these functions as of version
4.3 or earlier. However, some inlines in GCC may have had excessively
restrictive conditions in past GCC versions (e.g. requiring
-ffast-math when the inline is valid under broader conditions). (In
particular, GCC had, before GCC 7, unnecessarily restrictive
conditions on when it could apply floor and ceil inlines corresponding
to the ones removed here. The same was true for rint, but
bits/mathinline.h *also* was excessively restrictive there.)
The removed sincos inlines are for __sincos etc. functions (not a
public interface and not currently used in this header either; not in
a part of the header ever used for building glibc itself). Likewise,
the atan2 inlines included one for __atan2l, also not a public
interface and not used for building glibc itself (calls inside glibc
generally use __ieee754_atan2l, for which there is a separate
__LIBC_INTERNAL_MATH_INLINES case in this header).
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h [__FAST_MATH__]
(__sincos_code): Remove define and undefine.
[__FAST_MATH__] (__sincos): Remove inline function.
[__FAST_MATH__] (__sincosf): Remove inline function.
[__FAST_MATH__] (__sincosl): Remove inline function.
(__atan2l): Remove inline functions.
[!__GNUC_PREREQ (3, 4)] (__atan2_code): Remove macro.
[!__GNUC_PREREQ (3, 4) && __FAST_MATH__] (atan2): Remove inline
function.
(floor): Remove inline function.
(ceil): Likewise.
[__FAST_MATH__] (__ldexp_code): Remove macro.
[__FAST_MATH__] (ldexp): Remove inline function.
[__FAST_MATH__ && __USE_ISOC99] (ldexpf): Likewise.
[__FAST_MATH__ && __USE_ISOC99] (ldexpl): Likewise.
[__FAST_MATH__ && __USE_ISOC99] (rint): Likewise.
[__USE_ISOC99] (__lrint_code): Remove macro.
[__USE_ISOC99] (__llrint_code): Likewise.
[__USE_ISOC99] (lrintf): Remove inline function.
[__USE_ISOC99] (lrint): Likewise.
[__USE_ISOC99] (lrintl): Likewise.
[__USE_ISOC99] (llrint): Likewise.
[__USE_ISOC99] (llrintf): Likewise.
[__USE_ISOC99] (llrintl): Likewise.
|
|
In accordance with the general principle of preferring to let the
compiler optimize function calls based on their standard semantics
rather than putting inline definitions of such functions in installed
headers, this patch removes various such inline definitions in the x86
bits/mathinline.h that were already disabled for GCC 3.5 or later and
so were only used with very old compilers (for which good optimization
is particularly unimportant); along with those inlines, a definition
of __M_SQRT2, which was only used in such inline functions, is also
removed. This is similar to an early step in removing the string.h
inlines; I intend to follow up with further removals of
bits/mathinline.h inline definitions in appropriate logical groups
(with GCC bugs filed in cases where GCC doesn't already support
corresponding optimizations).
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h [!__GNUC_PREREQ (3, 4)]
(lrintf): Remove definitions used only with old GCC.
[!__GNUC_PREREQ (3, 4)] (lrint): Likewise.
[!__GNUC_PREREQ (3, 4)] (llrintf): Likewise.
[!__GNUC_PREREQ (3, 4)] (llrint): Likewise.
[!__GNUC_PREREQ (3, 4)] (fmaxf): Likewise.
[!__GNUC_PREREQ (3, 4)] (fmax): Likewise.
[!__GNUC_PREREQ (3, 4)] (fminf): Likewise.
[!__GNUC_PREREQ (3, 4)] (fmin): Likewise.
[!__GNUC_PREREQ (3, 4)] (rint): Likewise.
[!__GNUC_PREREQ (3, 4)] (rintf): Likewise.
[!__GNUC_PREREQ (3, 4)] (nearbyint): Likewise.
[!__GNUC_PREREQ (3, 4)] (nearbyintf): Likewise.
[!__GNUC_PREREQ (3, 4)] (ceil): Likewise.
[!__GNUC_PREREQ (3, 4)] (ceilf): Likewise.
[!__GNUC_PREREQ (3, 4)] (floor): Likewise.
[!__GNUC_PREREQ (3, 4)] (floorf): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (tan): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (fmod): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (sin): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (cos): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log10): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (asin): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (acos): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (atan): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log1p): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (logb): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log2): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (drem): Likewise.
[__FAST_MATH__] (__M_SQRT2): Remove macro.
|
|
bug 15512, bug 17082, bug 20530).
We have a general principle of preferring optimizations for library
facilities to use compiler built-in functions rather than being
located in library headers, where the compiler can reasonably optimize
code without needing to know glibc implementation details.
This patch applies this principle to bits/byteswap.h, eliminating all
the architecture-specific variants and bits/byteswap-16.h. The
__bswap_16, __bswap_32 and __bswap_64 interfaces all become inline
functions, never macros, using the GCC built-in functions where
available and otherwise a single architecture-independent definition
using shifts and masking (which compilers may well be able to detect
and optimize; GCC has detection of various byte-swapping idioms).
The __bswap_constant_32 macro needs to stay around because of uses in
static initializers within glibc and its tests, and so for consistency
all __bswap_constant_* are kept rather than just being inlined into
the old-GCC-or-non-GCC parts of the __bswap_* inline function
definitions.
Various open bugs are addressed by this cleanup, with caveats about
exactly what is covered by those bugs and when the bugs applied at
all.
Bug 14508 reports -Wformat warnings building glibc because __bswap_*
sometimes returned the wrong types. Obviously we already don't have
such warnings any more or the build would be failing, given -Werror,
and I suspect that bug was originally for wrong types for x86_64, as
fixed by commit d394eb742a3565d7fe7a4b02710a60b5f219ee64 (glibc 2.17).
The only case I saw removed by this patch where the types would still
have been wrong was the non-__GNUC__ case of __bswap_64 in the s390
header (using unsigned long long int, but uint64_t would be unsigned
long int for 64-bit). In any case, the single header consistently
uses __uintN_t types after this patch, thereby eliminating all such
bugs. The existing string/test-endian-types.c test already suffices
to verify that the types are correct with the compiler used to build
glibc and its tests.
Bug 15512 reports an error from __bswap_constant_16 with -Werror
-Wsign-conversion. I am unable to reproduce this with any GCC version
supporting -Wsign-conversion - all seem to be able to avoid warning
for ((x) >> 8) & 0xffu, where x is uint16_t, which while it formally
does involve an implicit conversion from int to unsigned int, is also
a case where it should be easy for the compiler to see that the value
converted is never negative. But in this patch __bswap_constant_16 is
changed to use signed 0xff so that no such implicit conversion occurs
at all, and a test with -Werror -Wsign-conversion is added.
Bug 17082 objects to the use of ({}) statement expressions in these
macros preventing use at file scope (in C, that's in sizeof etc.; in
C++, more generally in static initializers). The particular case of
these interfaces is fixed by this patch as it changes them to inline
functions, eliminating all uses of ({}) in bits/byteswap.h, and a
corresponding testcase is added. The bug tries to raise a more
general policy question about use of ({}) in macros in installed
headers, referring to "many other libc functions" (unspecified which
functions are being considered).
Since such policy questions belong on libc-alpha, and since there
*are* macros in installed headers which can't really avoid using ({})
(where they are type-generic, so can't use an inline function, but
need a temporary variable, and a few where the interface involves
returning memory from alloca so can't use an inline function either),
I propose to consider that bug fixed with this change. That is
without prejudice to any other new bugs anyone wishes to file *for
precisely defined sets of macros* requesting moving away from ({})
*where it is clearly possible for those interfaces*. Where ({}) can
be avoided, typically by use of an inline function, I think that's a
good idea - that inline functions are typically to be preferred to
({}) for header interfaces where such optimizations are useful but the
interface is suited to being defined using an inline function.
Bug 20530 requests use of __builtin_bswap16 when available (GCC 4.8
and later), which this patch implements.
Tested for x86_64, and with build-many-glibcs.py. Also did an x86_64
test with the __GNUC_PREREQ conditionals changed to "#if 0" to verify
the old-GCC/non-GCC case in the headers. (There are already existing
tests for correctness of results of these interfaces.)
[BZ #14508]
[BZ #15512]
[BZ #17082]
[BZ #20530]
* bits/byteswap.h: Update file comment. Do not include
<bits/byteswap-16.h>.
(__bswap_constant_16): Cast result to __uint16_t. Use signed 0xff
constant.
(__bswap_16): Define as inline function.
(__bswap_constant_32): Reformat definition.
(__bswap_32): Always define as inline function, not macro, using
__uint32_t. Use __builtin_bswap32 if [__GNUC_PREREQ (4, 3)],
otherwise __bswap_constant_32.
(__bswap_constant_64): Reformat definition. Do not use
__extension__ here.
(__bswap_64): Always define as inline function, not macro. Use
__extension__ on function definition. Use __builtin_bswap64 if
[__GNUC_PREREQ (4, 3)], otherwise __bswap_constant_64.
* string/test-endian-file-scope.c: New file.
* string/test-endian-sign-conversion.c: Likewise.
* string/Makefile (headers): Remove bits/byteswap-16.h.
(tests): Add test-endian-file-scope and
test-endian-sign-conversion.
(CFLAGS-test-endian-sign-conversion.c): New variable.
* bits/byteswap-16.h: Remove file.
* sysdeps/ia64/bits/byteswap-16.h: Likewise.
* sysdeps/ia64/bits/byteswap.h: Likewise.
* sysdeps/m68k/bits/byteswap.h: Likewise.
* sysdeps/s390/bits/byteswap-16.h: Likewise.
* sysdeps/s390/bits/byteswap.h: Likewise.
* sysdeps/tile/bits/byteswap.h: Likewise.
* sysdeps/x86/bits/byteswap-16.h: Likewise.
* sysdeps/x86/bits/byteswap.h: Likewise.
|
|
* All files with FSF copyright notices: Update copyright dates
using scripts/update-copyrights.
* locale/programs/charmap-kw.h: Regenerated.
* locale/programs/locfile-kw.h: Likewise.
|
|
This patch continues filling out TS 18661-3 support by adding *f64x
function aliases on platforms with _Float64x support. (It so happens
the set of such platforms is exactly the same as the set of platforms
with _Float128 support, although on x86_64, x86 and ia32 the _Float64x
format is Intel extended rather than binary128.) The API provided
corresponds exactly to that provided for _Float128, mostly coming from
TS 18661-3. As these functions always alias those for another type
(long double, _Float128 or both), __* function names are not provided,
as in other cases of alias types.
Given the preparation done in previous patches, this one just enables
the feature via Makeconfig and bits/floatn.h, adds symbol versions,
and updates documentation and ABI baselines. The symbol versions are
present unconditionally as GLIBC_2.27 in the relevant Versions files,
as it's OK for those to specify versions for functions that may not be
present in some configurations; no additional complexity is needed
unless in future some configuration gains support for this type that
didn't have such support in 2.27. The Makeconfig additions for ia64
and x86 aren't strictly needed, as those configurations also get
float64x-alias-fcts definitions from
sysdeps/ieee754/float128/Makeconfig, but still seem appropriate given
that _Float64x is not _Float128 for those configurations.
A libm-test-ulps update for x86 is included. This is because
bits/mathinline.h does not have _Float64x support added and for two
functions the use of out-of-line functions results in increased ulps
(ifloat64x shares ulps with ildouble / ifloat128 as appropriate).
Given that we'd like generally to eliminate bits/mathinline.h
optimizations, preferring to have such optimizations in GCC instead,
it seems reasonable not to add such support there for new types. GCC
support for _FloatN / _FloatNx built-in functions is limited, but has
been improved in GCC 8, and at some point I hope the full set of libm
built-in functions in GCC, and other optimizations with
per-floating-type aspects, will be enabled for all _FloatN / _FloatNx
types.
Tested for x86_64 and x86, and with build-many-glibcs.py, with both
GCC 6 and GCC 7.
* sysdeps/ia64/Makeconfig (float64x-alias-fcts): New variable.
* sysdeps/ieee754/float128/Makeconfig (float64x-alias-fcts):
Likewise.
* sysdeps/ieee754/ldbl-128/Makeconfig (float64x-alias-fcts):
Likewise.
* sysdeps/x86/Makeconfig: New file.
* bits/floatn-common.h (__HAVE_FLOAT64X): Remove macro.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* bits/floatn.h (__HAVE_FLOAT64X): New macro.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/ia64/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h (__HAVE_FLOAT64X):
Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/mips/ieee754/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/powerpc/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/x86/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* manual/math.texi (Mathematics): Document support for _Float64x.
* math/Versions (GLIBC_2.27): Add _Float64x functions.
* stdlib/Versions (GLIBC_2.27): Likewise.
* wcsmbs/Versions (GLIBC_2.27): Likewise.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
|
|
Further _FloatN / _FloatNx type alias support will involve making
architecture-specific .S files use the common macros for libm function
aliases. Making them use those macros will also serve to simplify
existing code for aliases / symbol versions in various cases, similar
to such simplifications for ldbl-opt code.
The libm-alias-*.h files sometimes need to include <bits/floatn.h> to
determine which aliases they should define. At present, this does not
work for inclusion from .S files because <bits/floatn.h> can define
typedefs for old compilers. This patch changes all the
<bits/floatn.h> and <bits/floatn-common.h> headers to include
__ASSEMBLER__ conditionals. Those conditionals disable everything
related to C syntax in the __ASSEMBLER__ case, not just the problem
typedefs, as that seemed cleanest. The __HAVE_* definitions remain in
the __ASSEMBLER__ case, as those provide information that is required
to define the correct set of aliases.
Tested with build-many-glibcs.py for a representative set of
configurations (x86_64-linux-gnu i686-linux-gnu ia64-linux-gnu
powerpc64le-linux-gnu mips64-linux-gnu-n64 sparc64-linux-gnu) with GCC
6. Also tested with GCC 6 for i686-linux-gnu in conjunction with
changes to use alias macros in .S files.
* bits/floatn-common.h [!__ASSEMBLER]: Disable everything related
to C syntax instead of availability and properties of types.
* bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/ia64/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/mips/ieee754/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/powerpc/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/x86/bits/floatn.h [!__ASSEMBLER]: Likewise.
|
|
This patch adds two new internal defines to set the internal
pthread_mutex_t layout required by the supported ABIS:
1. __PTHREAD_MUTEX_NUSERS_AFTER_KIND which control whether to define
__nusers fields before or after __kind. The preferred value for
is 0 for new ports and it sets __nusers before __kind.
2. __PTHREAD_MUTEX_USE_UNION which control whether internal __spins and
__list members will be place inside an union for linuxthreads
compatibility. The preferred value is 0 for ports and it sets
to not use an union to define both fields.
It fixes the wrong offsets value for __kind value on x86_64-linux-gnu-x32.
Checked with a make check run-built-tests=no on all afected ABIs.
[BZ #22298]
* nptl/allocatestack.c (allocate_stack): Check if
__PTHREAD_MUTEX_HAVE_PREV is non-zero, instead if
__PTHREAD_MUTEX_HAVE_PREV is defined.
* nptl/descr.h (pthread): Likewise.
* nptl/nptl-init.c (__pthread_initialize_minimal_internal):
Likewise.
* nptl/pthread_create.c (START_THREAD_DEFN): Likewise.
* sysdeps/nptl/fork.c (__libc_fork): Likewise.
* sysdeps/nptl/pthread.h (PTHREAD_MUTEX_INITIALIZER): Likewise.
* sysdeps/nptl/bits/thread-shared-types.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
defines.
(__pthread_internal_list): Check __PTHREAD_MUTEX_USE_UNION instead
of __WORDSIZE for internal layout.
(__pthread_mutex_s): Check __PTHREAD_MUTEX_NUSERS_AFTER_KIND instead
of __WORDSIZE for internal __nusers layout and __PTHREAD_MUTEX_USE_UNION
instead of __WORDSIZE whether to use an union for __spins and __list
fields.
(__PTHREAD_MUTEX_HAVE_PREV): Define also for __PTHREAD_MUTEX_USE_UNION
case.
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
defines.
* sysdeps/alpha/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/hppa/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/ia64/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/powerpc/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/s390/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/sparc/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/tile/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/x86/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Add a new header file, sysdeps/x86/sysdep.h, for common assembly code
macros between i386 and x86-64. Tested on i686 and x86-64. There are
no differences in outputs of "readelf -a" and "objdump -dw" on all glibc
shared objects before and after the patch.
* sysdeps/i386/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
of <sysdeps/generic/sysdep.h>.
(ALIGNARG): Removed.
(ASM_SIZE_DIRECTIVE): Likewise.
(ENTRY): Likewise.
(END): Likewise.
(ENTRY_CHK): Likewise.
(END_CHK): Likewise.
(syscall_error): Likewise.
(mcount): Likewise.
(PSEUDO_END): Likewise.
(L): Likewise.
(atom_text_section): Likewise.
* sysdeps/x86/sysdep.h: New file.
* sysdeps/x86_64/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
of <sysdeps/generic/sysdep.h>.
(ALIGNARG): Removed.
(ASM_SIZE_DIRECTIVE): Likewise.
(ENTRY): Likewise.
(END): Likewise.
(ENTRY_CHK): Likewise.
(END_CHK): Likewise.
(syscall_error): Likewise.
(mcount): Likewise.
(PSEUDO_END): Likewise.
(L): Likewise.
(atom_text_section): Likewise.
|
|
* sysdeps/x86/libc-start.c: Add /* !SHARED */.
|
|
* sysdeps/x86/libc-start.c: Reformat.
|
|
The glibc implementation of iseqsig relies on ordered comparison
operators raising the "invalid" exception for quiet NaN operands, with
a workaround on platforms where a GCC bug means that exception is not
raised. For x86, that bug has now been fixed for GCC 8, so this patch
disables the workaround in that case. If and when the corresponding
bugs for powerpc and s390 are fixed, the headers for those platforms
should of course be updated similarly.
Tested for x86_64 and x86, including with GCC mainline. Note that
other failures appear with GCC mainline because of spurious use of
ordered comparison instructions for unordered operations
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82692>.
* sysdeps/x86/fpu/fix-fp-int-compare-invalid.h
(FIX_COMPARE_INVALID): Define to 0 if [__GNUC_PREREQ (8, 0)].
|
|
The bits/floatn.h header currently only has defines relating to
_Float128. This patch adds defines relating to other _FloatN /
_FloatNx types.
The approach taken is to add defines for all _FloatN / _FloatNx types
known to GCC, and to put them in a common bits/floatn-common.h header
included at the end of all the individual bits/floatn.h headers. If
in future some defines become different for different glibc
configurations, they will move out into the separate bits/floatn.h
headers.
Some defines are expected always to be the same across glibc ports.
Corresponding defines are nevertheless put in this header. The intent
is that where there are conditionals (in headers or in non-installed
files) that can just repeat the same or nearly the same logic for each
floating-point type, they should do so, even if in fact the cases for
some types could be unconditionally present or absent because the same
conditionals are true or false for all glibc configurations. This
should make the glibc code with such conditionals easier to read,
because the reader can just see that the same conditionals are
repeated for each type, rather than seeing different conditionals for
different types and needing to reason, at each location with such
differences, why those differences are indeed correct there. (Cases
involving per-format rather than per-type logic are more likely still
to need differences in how they handle different types.)
Having such defines and conditionals also helps in incremental
preparation for adding _Float32 / _Float64 / _Float32x / _Float64x
function aliases. I intend subsequent patches to add such
conditionals corresponding to those already present for _Float128, as
well as making more architecture-specific function implementations use
common macros to define aliases in preparation for adding such _FloatN
/ _FloatNx aliases.
Tested for x86_64.
* bits/floatn-common.h: New file.
* math/Makefile (headers): Add bits/floatn-common.h.
* bits/floatn.h: Include <bits/floatn-common.h>.
* sysdeps/ia64/bits/floatn.h: Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise.
* sysdeps/mips/ieee754/bits/floatn.h: Likewise.
* sysdeps/powerpc/bits/floatn.h: Likewise.
* sysdeps/x86/bits/floatn.h: Likewise.
|
|
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers. It simplifies _dl_runtime_resolve and supports
different calling conventions. ld.so code size is reduced by more than
1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.
Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:
Before After Change
Westmere (SSE)/fxsave 345 866 151%
IvyBridge (AVX)/xsave 420 643 53%
Haswell (AVX)/xsave 713 1252 75%
Skylake (AVX+MPX)/xsavec 559 719 28%
Skylake (AVX512+MPX)/xsavec 145 272 87%
Ryzen (AVX)/xsavec 280 553 97%
This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases. With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.
On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises. On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.
[BZ #21265]
* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
New.
* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
(get_common_indeces): Set xsave_state_size, xsave_state_full_size
and bit_arch_XSAVEC_Usable if needed.
(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
and bit_arch_Use_dl_runtime_resolve_opt.
* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
Removed.
(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
(bit_arch_Prefer_No_AVX512): Updated.
(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
(bit_arch_XSAVEC_Usable): New.
(STATE_SAVE_OFFSET): Likewise.
(STATE_SAVE_MASK): Likewise.
[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
(cpu_features): Add xsave_state_size and xsave_state_full_size.
(index_arch_Use_dl_runtime_resolve_opt): Removed.
(index_arch_Use_dl_runtime_resolve_slow): Likewise.
(index_arch_XSAVEC_Usable): New.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow.
* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
is enabled.
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
_dl_runtime_resolve_xsavec.
* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
Removed.
(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
instead of VEC_SIZE.
(REGISTER_SAVE_BND0): Removed.
(REGISTER_SAVE_BND1): Likewise.
(REGISTER_SAVE_BND3): Likewise.
(REGISTER_SAVE_RAX): Always defined to 0.
(VMOV): Removed.
(_dl_runtime_resolve_avx): Likewise.
(_dl_runtime_resolve_avx_slow): Likewise.
(_dl_runtime_resolve_avx_opt): Likewise.
(_dl_runtime_resolve_avx512): Likewise.
(_dl_runtime_resolve_avx512_opt): Likewise.
(_dl_runtime_resolve_sse): Likewise.
(_dl_runtime_resolve_sse_vex): Likewise.
(USE_FXSAVE): New.
(_dl_runtime_resolve_fxsave): Likewise.
(USE_XSAVE): Likewise.
(_dl_runtime_resolve_xsave): Likewise.
(USE_XSAVEC): Likewise.
(_dl_runtime_resolve_xsavec): Likewise.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
Removed.
(_dl_runtime_resolve_avx512_opt): Likewise.
(_dl_runtime_resolve_avx): Likewise.
(_dl_runtime_resolve_avx_opt): Likewise.
(_dl_runtime_resolve_sse): Likewise.
(_dl_runtime_resolve_sse_vex): Likewise.
(_dl_runtime_resolve_fxsave): New.
(_dl_runtime_resolve_xsave): Likewise.
(_dl_runtime_resolve_xsavec): Likewise.
|
|
Since ld.so expands $PLATFORM with GLRO(dl_platform), don't set
GLRO(dl_platform) to NULL.
[BZ #22299]
* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
GLRO(dl_platform) to NULL.
* sysdeps/x86_64/Makefile (tests): Add tst-platform-1.
(modules-names): Add tst-platformmod-1 and
x86_64/tst-platformmod-2.
(CFLAGS-tst-platform-1.c): New.
(CFLAGS-tst-platformmod-1.c): Likewise.
(CFLAGS-tst-platformmod-2.c): Likewise.
(LDFLAGS-tst-platformmod-2.so): Likewise.
($(objpfx)tst-platform-1): Likewise.
($(objpfx)tst-platform-1.out): Likewise.
(tst-platform-1-ENV): Likewise.
($(objpfx)x86_64/tst-platformmod-2.os): Likewise.
* sysdeps/x86_64/tst-platform-1.c: New file.
* sysdeps/x86_64/tst-platformmod-1.c: Likewise.
* sysdeps/x86_64/tst-platformmod-2.c: Likewise.
|
|
This patch moves the generic definition from x86_64 init-arch
to a common header ifunc-init.h. No functional changes is expected.
Checked on a x86_64-linux-gnu build.
* sysdeps/generic/ifunc-init.h: New file.
* sysdeps/x86/init-arch.h: Use generic ifunc-init.h.
|
|
Remove __signbit inlines from mathinline.h. Math.h already uses
the builtin when supported, so additional inlines are only used
on pre 4.0 GCCs. Similarly remove ancient copysign and fabs
inlines.
* sysdeps/alpha/fpu/bits/mathinline.h: Delete file.
* sysdeps/ia64/fpu/bits/mathinline.h: Delete file.
* sysdeps/m68k/coldfire/fpu/bits/mathinline.h: Delete file.
* sysdeps/m68k/m680x0/fpu/bits/mathinline.h: (__signbitf): Remove.
(__signbit): Remove.
(__signbitl): Remove.
* sysdeps/powerpc/bits/mathinline.h (__signbitf): Remove.
(__signbit): Remove.
(__signbitl): Remove.
* sysdeps/s390/fpu/bits/mathinline.h: (__signbitf): Remove.
(__signbit): Remove.
(__signbitl): Remove
* sysdeps/sparc/fpu/bits/mathinline.h (__signbitf): Remove.
(__signbit): Remove.
(__signbitl): Remove.
* sysdeps/tile/bits/mathinline.h: Delete file.
* sysdeps/x86/fpu/bits/mathinline.h (__signbitf): Remove.
(__signbit): Remove.
(__signbitl): Remove.
|
|
Simplify the C99 isgreater macros. Although some support was added
in GCC 2.97, not all targets added support until GCC 3.1. Therefore
only use the builtins in math.h from GCC 3.1 onwards, and defer to
generic macros otherwise. Improve the generic isunordered macro
to use compares rather than call fpclassify twice - this is not only
faster but also correct for signaling NaNs.
* math/math.h: Improve handling of C99 isgreater macros.
* sysdeps/alpha/fpu/bits/mathinline.h: Remove isgreater macros.
* sysdeps/m68k/m680x0/fpu/bits/mathinline.h: Likewise.
* sysdeps/powerpc/bits/mathinline.h: Likewise.
* sysdeps/sparc/fpu/bits/mathinline.h: Likewise.
* sysdeps/x86/fpu/bits/mathinline.h: Likewise.
|
|
AVX512 functions in mathvec are used on machines with AVX512. An AVX2
wrapper is also provided and it can be used when the AVX512 version
isn't profitable. MathVec_Prefer_No_AVX512 is addded to cpu-features.
If glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 is set in GLIBC_TUNABLES
environment variable, the AVX2 wrapper will be used.
Tested on x86-64 machines with and without AVX512. Also verified
glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 on AVX512 machine.
[BZ #21967]
* sysdeps/x86/cpu-features.h (bit_arch_MathVec_Prefer_No_AVX512):
New.
(index_arch_MathVec_Prefer_No_AVX512): Likewise.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Handle MathVec_Prefer_No_AVX512.
* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h
(IFUNC_SELECTOR): Return AVX2 version if MathVec_Prefer_No_AVX512
is set.
|
|
Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the
"x86_64" subdirectory when loading a shared library. ld.so in glibc
2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based
on supported ISAs. This led to shared library loading failure for
shared libraries placed under the "x86_64" subdirectory.
This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always
search the "x86_64" subdirectory when loading a shared library.
NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip
the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi"
machines.
Tested on i686 and x86-64.
[BZ #22093]
* sysdeps/x86/cpu-features.c (init_cpu_features): Initialize
GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64.
* sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated.
(HWCAP_IMPORTANT): Likewise.
(HWCAP_X86_64): New enum.
(HWCAP_X86_AVX512_1): Updated.
* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64".
* sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1.
(modules-names): Add x86_64/tst-x86_64mod-1.
(LDFLAGS-tst-x86_64mod-1.so): New.
($(objpfx)tst-x86_64-1): Likewise.
($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise.
(tst-x86_64-1-clean): Likewise.
* sysdeps/x86_64/tst-x86_64-1.c: New file.
* sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
|
|
* sysdeps/x86/fpu/include/bits/fenv.h [NO_HIDDEN]: Redirect
__feraiseexcept_renamed to feraiseexcept instead of
__GI_feraiseexcept.
|
|
There are various bits/huge_val*.h headers to define HUGE_VAL and
related macros. All of them use __builtin_huge_val etc. for GCC 3.3
and later. Then there are various fallbacks, such as using a large
hex float constant for GCC 2.96 and later, or using unions (with or
without compound literals) to construct the bytes of an infinity, with
this last being the reason for having architecture-specific files.
Supporting TS 18661-3 _FloatN / _FloatNx types that have the same
format as other supported types will mean adding more such macros;
needing to add more headers for them doesn't seem very desirable.
The fallbacks based on bytes of the representation of an infinity do
not meet the standard requirements for a constant expression. At
least one of them is also wrong: sysdeps/sh/bits/huge_val.h is
producing a mixed-endian representation which does not match what GCC
does.
This patch eliminates all those headers, defining the macros directly
in math.h. For GCC 3.3 and later, the built-in functions are used as
now. For other compilers, a large constant 1e10000 (with appropriate
suffix) is used. This is like the fallback for GCC 2.96 and later,
but without using hex floats (which have no apparent advantage here).
It is unambiguously valid standard C for all floating-point formats
with infinities, which covers all formats supported by glibc or likely
to be supported by glibc in future (C90 DR#025 said that if a
floating-point format represents infinities, all real values lie
within the range of representable values, so the constraints for
constant expressions are not violated), but may generate compiler
warnings and wouldn't handle the TS 18661-1 FENV_ROUND pragma
correctly. If someone is actually using a compiler with glibc that
does not claim to be GCC 3.3 or later, but which has a better way to
define the HUGE_VAL macros, we can always add compiler conditionals in
with alternative definitions.
I intend to make similar changes for INF and NAN. The SNAN macros
already just use __builtin_nans etc. with no fallback for compilers
not claiming to be GCC 3.3 or later.
Tested for x86_64.
* math/math.h: Do not include bits/huge_val.h, bits/huge_valf.h,
bits/huge_vall.h or bits/huge_val_flt128.h.
(HUGE_VAL): Define directly here.
[__USE_ISOC99] (HUGE_VALF): Likewise.
[__USE_ISOC99] (HUGE_VALL): Likewise.
[__HAVE_FLOAT128 && __GLIBC_USE (IEC_60559_TYPES_EXT)]
(HUGE_VAL_F128): Likewise.
* math/Makefile (headers): Remove bits/huge_val.h,
bits/huge_valf.h, bits/huge_vall.h and bits/huge_val_flt128.h.
* bits/huge_val.h: Remove.
* bits/huge_val_flt128.h: Likewise.
* bits/huge_valf.h: Likewise.
* bits/huge_vall.h: Likewise.
* sysdeps/ia64/bits/huge_vall.h: Likewise.
* sysdeps/ieee754/bits/huge_val.h: Likewise.
* sysdeps/ieee754/bits/huge_valf.h: Likewise.
* sysdeps/m68k/m680x0/bits/huge_vall.h: Likewise.
* sysdeps/sh/bits/huge_val.h: Likewise.
* sysdeps/sparc/bits/huge_vall.h: Likewise.
* sysdeps/x86/bits/huge_vall.h: Likewise.
|
|
Since assembly versions of HAS_CPU_FEATURE and HAS_ARCH_FEATURE have
been removed, assembly versions of index_cpu_* and index_arch_* can
also be removed.
Tested on i686 and x86-64 with and without --disable-multi-arch.
* sysdeps/x86/cpu-features.h [__ASSEMBLER__]
(index_cpu_*, index_arch_*): Removed.
|
|
Add IBT/SHSTK bits to cpu-features for Shadow Stack in Intel Control-flow
Enforcement Technology (CET) instructions:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
* sysdeps/x86/cpu-features.h (bit_cpu_BIT): New.
(bit_cpu_SHSTK): Likewise.
(index_cpu_IBT): Likewise.
(index_cpu_SHSTK): Likewise.
(reg_IBT): Likewise.
(reg_SHSTK): Likewise.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Handle index_cpu_IBT and index_cpu_SHSTK.
|
|
Since all x86 IFUNC selectors are implemented in C, assembly versions of
HAS_CPU_FEATURE and HAS_ARCH_FEATURE can be removed.
* sysdeps/x86/cpu-features.h [__ASSEMBLER__]
(LOAD_RTLD_GLOBAL_RO_RDX, HAS_FEATURE, LOAD_FUNC_GOT_EAX,
HAS_CPU_FEATURE, HAS_ARCH_FEATURE): Removed.
|
|
On AVX machines with XGETBV (ECX == 1) like Skylake processors,
(gdb) disass _dl_runtime_resolve_avx_opt
Dump of assembler code for function _dl_runtime_resolve_avx_opt:
0x0000000000015890 <+0>: push %rax
0x0000000000015891 <+1>: push %rcx
0x0000000000015892 <+2>: push %rdx
0x0000000000015893 <+3>: mov $0x1,%ecx
0x0000000000015898 <+8>: xgetbv
0x000000000001589b <+11>: mov %eax,%r11d
0x000000000001589e <+14>: pop %rdx
0x000000000001589f <+15>: pop %rcx
0x00000000000158a0 <+16>: pop %rax
0x00000000000158a1 <+17>: and $0x4,%r11d
0x00000000000158a5 <+21>: bnd je 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.
is slower than:
(gdb) disass _dl_runtime_resolve_avx_slow
Dump of assembler code for function _dl_runtime_resolve_avx_slow:
0x0000000000015850 <+0>: vorpd %ymm0,%ymm1,%ymm8
0x0000000000015854 <+4>: vorpd %ymm2,%ymm3,%ymm9
0x0000000000015858 <+8>: vorpd %ymm4,%ymm5,%ymm10
0x000000000001585c <+12>: vorpd %ymm6,%ymm7,%ymm11
0x0000000000015860 <+16>: vorpd %ymm8,%ymm9,%ymm9
0x0000000000015865 <+21>: vorpd %ymm10,%ymm11,%ymm10
0x000000000001586a <+26>: vpcmpeqd %xmm8,%xmm8,%xmm8
0x000000000001586f <+31>: vorpd %ymm9,%ymm10,%ymm10
0x0000000000015874 <+36>: vptest %ymm10,%ymm8
0x0000000000015879 <+41>: bnd jae 0x158b0 <_dl_runtime_resolve_avx>
0x000000000001587c <+44>: vzeroupper
0x000000000001587f <+47>: bnd jmpq 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.
(gdb)
since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest. _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.
[BZ #21871]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.
|
|
This patch changes libm code to make consistent use of C99 uintN_t
types instead of sometimes using those and sometimes using the older
nonstandard u_intN_t names. This makes sense as a cleanup in its own
right, and also facilitates merges to GCC's libquadmath (which gets
the types from stdint.h and so may not have u_intN_t available at
all).
Tested for x86_64, and with build-many-glibcs.py.
* math/s_nextafter.c (__nextafter): Use uintN_t instead of
u_intN_t.
* math/s_nexttowardf.c (__nexttowardf): Likewise.
* sysdeps/generic/math_private.h (ieee_double_shape_type):
Likewise.
(ieee_float_shape_type): Likewise.
* sysdeps/i386/fpu/s_fpclassifyl.c (__fpclassifyl): Likewise.
* sysdeps/i386/fpu/s_isnanl.c (__isnanl): Likewise.
* sysdeps/i386/fpu/s_nextafterl.c (__nextafterl): Likewise.
* sysdeps/i386/fpu/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c (__nexttowardf): Likewise.
* sysdeps/ieee754/dbl-64/e_acosh.c (__ieee754_acosh): Likewise.
* sysdeps/ieee754/dbl-64/e_cosh.c (__ieee754_cosh): Likewise.
* sysdeps/ieee754/dbl-64/e_fmod.c (__ieee754_fmod): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
Likewise.
* sysdeps/ieee754/dbl-64/e_hypot.c (__ieee754_hypot): Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Likewise.
(__ieee754_yn): Likewise.
* sysdeps/ieee754/dbl-64/e_log10.c (__ieee754_log10): Likewise.
* sysdeps/ieee754/dbl-64/e_log2.c (__ieee754_log2): Likewise.
* sysdeps/ieee754/dbl-64/e_rem_pio2.c (__ieee754_rem_pio2):
Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c (__ieee754_sinh): Likewise.
* sysdeps/ieee754/dbl-64/s_ceil.c (__ceil): Likewise.
* sysdeps/ieee754/dbl-64/s_copysign.c (__copysign): Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c (__erf): Likewise.
(__erfc): Likewise.
* sysdeps/ieee754/dbl-64/s_expm1.c (__expm1): Likewise.
* sysdeps/ieee754/dbl-64/s_finite.c (FINITE): Likewise.
* sysdeps/ieee754/dbl-64/s_floor.c (__floor): Likewise.
* sysdeps/ieee754/dbl-64/s_fpclassify.c (__fpclassify): Likewise.
* sysdeps/ieee754/dbl-64/s_isnan.c (__isnan): Likewise.
* sysdeps/ieee754/dbl-64/s_issignaling.c (__issignaling):
Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c (__llrint): Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c (__llround): Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c (__lrint): Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c (__lround): Likewise.
* sysdeps/ieee754/dbl-64/s_modf.c (__modf): Likewise.
* sysdeps/ieee754/dbl-64/s_nextup.c (__nextup): Likewise.
* sysdeps/ieee754/dbl-64/s_remquo.c (__remquo): Likewise.
* sysdeps/ieee754/dbl-64/s_round.c (__round): Likewise.
* sysdeps/ieee754/dbl-64/s_trunc.c (__trunc): Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_issignaling.c
(__issignaling): Likewise.
* sysdeps/ieee754/flt-32/e_atan2f.c (__ieee754_atan2f): Likewise.
* sysdeps/ieee754/flt-32/e_fmodf.c (__ieee754_fmodf): Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_ynf): Likewise.
* sysdeps/ieee754/flt-32/e_log10f.c (__ieee754_log10f): Likewise.
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Likewise.
* sysdeps/ieee754/flt-32/e_rem_pio2f.c (__ieee754_rem_pio2f):
Likewise.
* sysdeps/ieee754/flt-32/e_remainderf.c (__ieee754_remainderf):
Likewise.
* sysdeps/ieee754/flt-32/e_sqrtf.c (__ieee754_sqrtf): Likewise.
* sysdeps/ieee754/flt-32/s_ceilf.c (__ceilf): Likewise.
* sysdeps/ieee754/flt-32/s_copysignf.c (__copysignf): Likewise.
* sysdeps/ieee754/flt-32/s_erff.c (__erff): Likewise.
(__erfcf): Likewise.
* sysdeps/ieee754/flt-32/s_expm1f.c (__expm1f): Likewise.
* sysdeps/ieee754/flt-32/s_finitef.c (FINITEF): Likewise.
* sysdeps/ieee754/flt-32/s_floorf.c (__floorf): Likewise.
* sysdeps/ieee754/flt-32/s_fpclassifyf.c (__fpclassifyf):
Likewise.
* sysdeps/ieee754/flt-32/s_isnanf.c (__isnanf): Likewise.
* sysdeps/ieee754/flt-32/s_issignalingf.c (__issignalingf):
Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c (__llrintf): Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c (__llroundf): Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c (__lrintf): Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c (__lroundf): Likewise.
* sysdeps/ieee754/flt-32/s_modff.c (__modff): Likewise.
* sysdeps/ieee754/flt-32/s_remquof.c (__remquof): Likewise.
* sysdeps/ieee754/flt-32/s_roundf.c (__roundf): Likewise.
* sysdeps/ieee754/ldbl-128/e_acoshl.c (__ieee754_acoshl):
Likewise.
* sysdeps/ieee754/ldbl-128/e_atan2l.c (__ieee754_atan2l):
Likewise.
* sysdeps/ieee754/ldbl-128/e_atanhl.c (__ieee754_atanhl):
Likewise.
* sysdeps/ieee754/ldbl-128/e_fmodl.c (__ieee754_fmodl): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128/e_rem_pio2l.c (__ieee754_rem_pio2l):
Likewise.
* sysdeps/ieee754/ldbl-128/e_remainderl.c (__ieee754_remainderl):
Likewise.
* sysdeps/ieee754/ldbl-128/e_sinhl.c (__ieee754_sinhl): Likewise.
* sysdeps/ieee754/ldbl-128/k_cosl.c (__kernel_cosl): Likewise.
* sysdeps/ieee754/ldbl-128/k_sincosl.c (__kernel_sincosl):
Likewise.
* sysdeps/ieee754/ldbl-128/k_sinl.c (__kernel_sinl): Likewise.
* sysdeps/ieee754/ldbl-128/s_ceill.c (__ceill): Likewise.
* sysdeps/ieee754/ldbl-128/s_copysignl.c (__copysignl): Likewise.
* sysdeps/ieee754/ldbl-128/s_erfl.c (__erfcl): Likewise.
* sysdeps/ieee754/ldbl-128/s_fabsl.c (__fabsl): Likewise.
* sysdeps/ieee754/ldbl-128/s_finitel.c (__finitel): Likewise.
* sysdeps/ieee754/ldbl-128/s_floorl.c (__floorl): Likewise.
* sysdeps/ieee754/ldbl-128/s_fpclassifyl.c (__fpclassifyl):
Likewise.
* sysdeps/ieee754/ldbl-128/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-128/s_isnanl.c (__isnanl): Likewise.
* sysdeps/ieee754/ldbl-128/s_issignalingl.c (__issignalingl):
Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c (__llrintl): Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c (__llroundl): Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c (__lrintl): Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c (__lroundl): Likewise.
* sysdeps/ieee754/ldbl-128/s_modfl.c (__modfl): Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c (__nearbyintl):
Likewise.
* sysdeps/ieee754/ldbl-128/s_nextafterl.c (__nextafterl):
Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128/s_nextupl.c (__nextupl): Likewise.
* sysdeps/ieee754/ldbl-128/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128/s_rintl.c (__rintl): Likewise.
* sysdeps/ieee754/ldbl-128/s_roundl.c (__roundl): Likewise.
* sysdeps/ieee754/ldbl-128/s_tanhl.c (__tanhl): Likewise.
* sysdeps/ieee754/ldbl-128/s_truncl.c (__truncl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_fmodl.c (__ieee754_fmodl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_rem_pio2l.c (__ieee754_rem_pio2l):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_remainderl.c
(__ieee754_remainderl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_cosl.c (__kernel_cosl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sinl.c (__kernel_sinl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fabsl.c (__fabsl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fpclassifyl.c (___fpclassifyl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_modfl.c (__modfl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-96/e_acoshl.c (__ieee754_acoshl): Likewise.
* sysdeps/ieee754/ldbl-96/e_asinl.c (__ieee754_asinl): Likewise.
* sysdeps/ieee754/ldbl-96/e_atanhl.c (__ieee754_atanhl): Likewise.
* sysdeps/ieee754/ldbl-96/e_coshl.c (__ieee754_coshl): Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c (__ieee754_hypotl): Likewise.
* sysdeps/ieee754/ldbl-96/e_j0l.c (__ieee754_j0l): Likewise.
(__ieee754_y0l): Likewise.
(pzero): Likewise.
(qzero): Likewise.
* sysdeps/ieee754/ldbl-96/e_j1l.c (__ieee754_j1l): Likewise.
(__ieee754_y1l): Likewise.
(pone): Likewise.
(qone): Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
(__ieee754_lgammal_r): Likewise.
* sysdeps/ieee754/ldbl-96/e_rem_pio2l.c (__ieee754_rem_pio2l):
Likewise.
* sysdeps/ieee754/ldbl-96/e_sinhl.c (__ieee754_sinhl): Likewise.
* sysdeps/ieee754/ldbl-96/s_copysignl.c (__copysignl): Likewise.
* sysdeps/ieee754/ldbl-96/s_erfl.c (__erfl): Likewise.
(__erfcl): Likewise.
* sysdeps/ieee754/ldbl-96/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-96/s_issignalingl.c (__issignalingl):
Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c (__llrintl): Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c (__llroundl): Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c (__lrintl): Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c (__lroundl): Likewise.
* sysdeps/ieee754/ldbl-96/s_modfl.c (__modfl): Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-96/s_nextupl.c (__nextupl): Likewise.
* sysdeps/ieee754/ldbl-96/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-96/s_roundl.c (__roundl): Likewise.
* sysdeps/ieee754/ldbl-96/s_tanhl.c (__tanhl): Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c (__nldbl_nexttowardf):
Likewise.
* sysdeps/m68k/m680x0/fpu/e_pow.c (s(__ieee754_pow)): Likewise.
* sysdeps/m68k/m680x0/fpu/s_fpclassifyl.c (__fpclassifyl):
Likewise.
* sysdeps/m68k/m680x0/fpu/s_llrint.c (__llrint): Likewise.
* sysdeps/m68k/m680x0/fpu/s_llrintf.c (__llrintf): Likewise.
* sysdeps/m68k/m680x0/fpu/s_llrintl.c (__llrintl): Likewise.
* sysdeps/m68k/m680x0/fpu/s_nextafterl.c (__nextafterl): Likewise.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise.
|
|
In math/math.h, __MATH_TG will expand signbit to __builtin_signbit*,
e.g.: __builtin_signbitf128, before GCC 6. However, there has never
been a __builtin_signbitf128 in GCC and the type-generic builtin is
only available since GCC 6. For older GCC, this patch defines
__builtin_signbitf128 to __signbitf128, so that the internal function
is used instead of the non-existent builtin.
This patch also changes the implementation of __signbitf128, because
it was reusing the implementation of __signbitl from ldbl-128, which
calls __builtin_signbitl. Using the long double version of the
builtin is not correct on machines where _Float128 is ABI-distinct
from long double (i.e.: ia64, powerpc64le, x86, x86_84). The new
implementation does not rely on builtins when being built with GCC
versions older than 6.0.
The new code does not currently affect powerpc64le builds, because
only GCC 6.2 fulfills the requirements from configure. It might
affect powerpc64le builds if those requirements are backported to
older versions of the compiler. The new code affects x86_64 builds,
since glibc is supposed to build correctly with older versions of GCC.
Tested for powerpc64le and x86_64.
* include/math.h (__signbitf128): Define as hidden.
* sysdeps/ieee754/float128/s_signbitf128.c (__signbitf128):
Reimplement without builtins.
* sysdeps/ia64/bits/floatn.h [!__GNUC_PREREQ (6, 0)]
(__builtin_signbitf128): Define to __signbitf128.
* sysdeps/powerpc/bits/floatn.h: Likewise.
* sysdeps/x86/bits/floatn.h: Likewise.
|
|
This patch enables float128 support for x86_64 and x86. All GCC
versions that can build glibc provide the required support, but since
GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN
tests and some tests of NaN payloads need to be disabled with such
compilers (this does not affect the generated glibc binaries at all,
just the tests). bits/floatn.h declares float128 support to be
available for GCC versions that provide the required libgcc support
(4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd);
compilation-only support was present some time before then, but not
really useful without the libgcc functions.
fenv_private.h needed updating to avoid trying to put _Float128 values
in registers. I make no assertion of optimality of the
math_opt_barrier / math_force_eval definitions for this case; they are
simply intended to be sufficient to work correctly.
Tested for x86_64 and x86, with GCC 7 and GCC 6. (Testing for x32 was
compilation tests only with build-many-glibcs.py to verify the ABI
baseline updates. I have not done any testing for Hurd, although the
float128 support is enabled there as for GNU/Linux.)
* sysdeps/i386/Implies: Add ieee754/float128.
* sysdeps/x86_64/Implies: Likewise.
* sysdeps/x86/bits/floatn.h: New file.
* sysdeps/x86/float128-abi.h: Likewise.
* manual/math.texi (Mathematics): Document support for _Float128
on x86_64 and x86.
* sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>.
(math_opt_barrier): Do not put _Float128 values in floating-point
registers.
(math_force_eval): Likewise.
[__x86_64__] (SET_RESTORE_ROUNDF128): New macro.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append
to Makefile variable.
* sysdeps/x86/fpu/e_sqrtf128.c: New file.
* sysdeps/x86/fpu/sfp-machine.h: Likewise. Based on libgcc.
* sysdeps/x86/math-tests.h: New file.
* math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro.
* math/libm-test-getpayload.inc (getpayload_test_data): Use
XFAIL_FLOAT128_PAYLOAD.
* math/libm-test-setpayload.inc (setpayload_test_data): Likewise.
* math/libm-test-totalorder.inc (totalorder_test_data): Likewise.
* math/libm-test-totalordermag.inc (totalordermag_test_data):
Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Update.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
|
|
Building for x86_64 with float128 support, I get a localplt test
failure from lrintf128 calling feraiseexcept.
The problem is that an inline optimized version of feraiseexcept calls
__feraiseexcept_renamed in cases where it doesn't completely expand
inline, and that in turn is redirected to feraiseexcept for a library
call, so meaning the redirection of feraiseexcept to
__GI_feraiseexcept inside libm is lost for that call.
This patch fixes the problem by moving the redirect to an internal
header in the _LIBC case, with the internal header using
__GI_feraiseexcept where appropriate.
Tested for x86_64 (in conjunction with float128 patches).
* sysdeps/x86/fpu/bits/fenv.h [_LIBC] (__feraiseexcept_renamed):
Do not declare.
* sysdeps/x86/fpu/include/bits/fenv.h [_LIBC &&
__USE_EXTERN_INLINES] (__feraiseexcept_renamed): Declare here,
redirected to __GI_feraiseexcept if [SHARED && IS_IN (libm)].
|
|
Rename glibc.tune.ifunc to glibc.tune.hwcaps and move it to
sysdeps/x86/dl-tunables.list since it is x86 specicifc. Also
change type of data_cache_size, data_cache_size and
non_temporal_threshold to unsigned long int to match size_t.
Remove usage DEFAULT_STRLEN from cpu-tunables.c.
* elf/dl-tunables.list (glibc.tune.ifunc): Removed.
* sysdeps/x86/dl-tunables.list (glibc.tune.hwcaps): New.
Remove security_level on all fields.
* manual/tunables.texi: Replace ifunc with hwcaps.
* sysdeps/x86/cpu-features.c (TUNABLE_CALLBACK (set_ifunc)):
Renamed to ..
(TUNABLE_CALLBACK (set_hwcaps)): This.
(init_cpu_features): Updated.
* sysdeps/x86/cpu-features.h (cpu_features): Change type of
data_cache_size, data_cache_size and non_temporal_threshold to
unsigned long int.
* sysdeps/x86/cpu-tunables.c (DEFAULT_STRLEN): Removed.
(TUNABLE_CALLBACK (set_ifunc)): Renamed to ...
(TUNABLE_CALLBACK (set_hwcaps)): This. Update comments. Don't
use DEFAULT_STRLEN.
|
|
* elf/dl-tunables.list: Move x86 specific tunables to ...
* sysdeps/x86/dl-tunables.list: Here. New file.
|
|
The current IFUNC selection is based on microbenchmarks in glibc. It
should give the best performance for most workloads. But other choices
may have better performance for a particular workload or on the hardware
which wasn't available at the selection was made. The environment
variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used
to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz,
where the feature name is case-sensitive and has to match the ones in
cpu-features.h. It can be used by glibc developers to override the
IFUNC selection to tune for a new processor or improve performance for
a particular workload. It isn't intended for normal end users.
NOTE: the IFUNC selection may change over time. Please check all
multiarch implementations when experimenting.
Also, GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=NUMBER is
provided to set threshold to use non temporal store to NUMBER,
GLIBC_TUNABLES=glibc.tune.x86_data_cache_size=NUMBER to set data cache
size, GLIBC_TUNABLES=glibc.tune.x86_shared_cache_size=NUMBER to set
shared cache size.
* elf/dl-tunables.list (tune): Add ifunc,
x86_non_temporal_threshold,
x86_data_cache_size and x86_shared_cache_size.
* manual/tunables.texi: Document glibc.tune.ifunc,
glibc.tune.x86_data_cache_size, glibc.tune.x86_shared_cache_size
and glibc.tune.x86_non_temporal_threshold.
* sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file.
* sysdeps/x86/cpu-tunables.c: Likewise.
* sysdeps/x86/cacheinfo.c
(init_cacheinfo): Check and get data cache size, shared cache
size and non temporal threshold from cpu_features.
* sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE):
New.
[HAVE_TUNABLES] Include <unistd.h>.
[HAVE_TUNABLES] Include <elf/dl-tunables.h>.
[HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise.
[HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set
IFUNC selection, data cache size, shared cache size and non
temporal threshold.
* sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size,
shared_cache_size and non_temporal_threshold.
|
|
These machine-dependent inline string functions have never been on by
default, and even if they were a good idea at the time they were
introduced, they haven't really been touched in ten to fifteen years
and probably aren't a good idea on current-gen processors. Current
thinking is that this class of optimization is best left to the
compiler.
* bits/string.h, string/bits/string.h
* sysdeps/aarch64/bits/string.h
* sysdeps/m68k/m680x0/m68020/bits/string.h
* sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h
* sysdeps/x86/bits/string.h: Delete file.
* string/string.h: Don't include bits/string.h.
* string/bits/string3.h: Rename to bits/string_fortified.h.
No need to undef various symbols that the removed headers
might have defined as macros.
* string/Makefile (headers): Remove bits/string.h, change
bits/string3.h to bits/string_fortified.h.
* string/string-inlines.c: Update commentary. Remove definitions
of various macros that nothing looks at anymore. Don't directly
include bits/string.h. Set _STRING_INLINE_unaligned here, based on
compiler-predefined macros.
* string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY
_is_ defined, provide internal hidden alias __strncat.
* include/string.h: Declare internal hidden alias __strncat.
Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is
not defined.
* include/bits/string3.h: Rename to bits/string_fortified.h,
update to match above.
* sysdeps/i386/string-inlines.c: Define compat symbols for
everything formerly defined by sysdeps/x86/bits/string.h.
Make existing definitions into compat symbols as well.
Remove some no-longer-necessary messing around with macros.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c
* sysdeps/s390/multiarch/mempcpy.c
No need to define _HAVE_STRING_ARCH_mempcpy.
Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT.
* sysdeps/i386/i686/multiarch/strncat-c.c
* sysdeps/s390/multiarch/strncat-c.c
* sysdeps/x86_64/multiarch/strncat-c.c
Define STRNCAT_PRIMARY. Don't change definition of libc_hidden_def.
|
|
The LD_HWCAP_MASK environment variable was ignored in static binaries,
which is inconsistent with the behaviour of dynamically linked
binaries. This seems to have been because of the inability of
ld_hwcap_mask being read early enough to influence anything but now
that it is in tunables, the mask is usable in static binaries as well.
This feature is important for aarch64, which relies on HWCAP_CPUID
being masked out to disable multiarch. A sanity test on x86_64 shows
that there are no failures. Likewise for aarch64.
* elf/dl-hwcaps.h [HAVE_TUNABLES]: Always read hwcap_mask.
* sysdeps/sparc/sparc32/dl-machine.h [HAVE_TUNABLES]:
Likewise.
* sysdeps/x86/cpu-features.c (init_cpu_features): Always set
up hwcap and hwcap_mask.
|
|
Drop _dl_hwcap_mask when building with tunables. This completes the
transition of hwcap_mask reading from _dl_hwcap_mask to tunables.
* elf/dl-hwcaps.h: New file.
* elf/dl-hwcaps.c: Include it.
(_dl_important_hwcaps)[HAVE_TUNABLES]: Read and update
glibc.tune.hwcap_mask.
* elf/dl-cache.c: Include dl-hwcaps.h.
(_dl_load_cache_lookup)[HAVE_TUNABLES]: Read
glibc.tune.hwcap_mask.
* sysdeps/sparc/sparc32/dl-machine.h: Likewise.
* elf/dl-support.c (_dl_hwcap2)[HAVE_TUNABLES]: Drop
_dl_hwcap_mask.
* elf/rtld.c (rtld_global_ro)[HAVE_TUNABLES]: Drop
_dl_hwcap_mask.
(process_envvars)[HAVE_TUNABLES]: Likewise.
* sysdeps/generic/ldsodefs.h (rtld_global_ro)[HAVE_TUNABLES]:
Likewise.
* sysdeps/x86/cpu-features.c (init_cpu_features): Don't
initialize dl_hwcap_mask when tunables are enabled.
|
|
Since cpu_features is available, use it instead of dl_x86_cpu_features.
* sysdeps/x86/cacheinfo.c (intel_check_word): Accept cpu_features
and use it instead of dl_x86_cpu_features.
(handle_intel): Replace maxidx with cpu_features. Pass
cpu_features to intel_check_word.
(__cache_sysconf): Pass cpu_features to handle_intel.
(init_cacheinfo): Likewise. Use cpu_features instead of
dl_x86_cpu_features.
|
|
Optimize x86-64 memcmp/wmemcmp with AVX2. It uses vector compare as
much as possible. It is as fast as SSE4 memcmp for size <= 16 bytes
and up to 2X faster for size > 16 bytes on Haswell and Skylake. Select
AVX2 memcmp/wmemcmp on AVX2 machines where vzeroupper is preferred and
AVX unaligned load is fast.
NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input. TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.
Key features:
1. For size from 2 to 7 bytes, load as big endian with movbe and bswap
to avoid branches.
2. Use overlapping compare to avoid branch.
3. Use vector compare when size >= 4 bytes for memcmp or size >= 8
bytes for wmemcmp.
4. If size is 8 * VEC_SIZE or less, unroll the loop.
5. Compare 4 * VEC_SIZE at a time with the aligned first memory area.
6. Use 2 vector compares when size is 2 * VEC_SIZE or less.
7. Use 4 vector compares when size is 4 * VEC_SIZE or less.
8. Use 8 vector compares when size is 8 * VEC_SIZE or less.
* sysdeps/x86/cpu-features.h (index_cpu_MOVBE): New.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memcmp-avx2 and wmemcmp-avx2.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test __memcmp_avx2 and __wmemcmp_avx2.
* sysdeps/x86_64/multiarch/memcmp-avx2.S: New file.
* sysdeps/x86_64/multiarch/wmemcmp-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp.S: Use __memcmp_avx2 on AVX
2 machines if AVX unaligned load is fast and vzeroupper is
preferred.
* sysdeps/x86_64/multiarch/wmemcmp.S: Use __wmemcmp_avx2 on AVX
2 machines if AVX unaligned load is fast and vzeroupper is
preferred.
|
|
These macros are used to implement ifunc selection in C. To implement
an ifunc function, foo, which returns the address of __foo_sse2 or
__foo_avx2:
__foo_avx2:
#define foo __redirect_foo
#define __foo __redirect___foo
#include <foo.h>
#undef foo
#undef __foo
#define SYMBOL_NAME foo
#include <init-arch.h>
extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
static inline void *
foo_selector (void)
{
if (use AVX2)
return OPTIMIZE (avx2);
return OPTIMIZE (sse2);
}
libc_ifunc_redirected (__redirect_foo, foo, foo_selector ());
* sysdeps/x86/init-arch.h (PASTER1): New.
(EVALUATOR1): Likewise.
(PASTER2): Likewise.
(EVALUATOR2): Likewise.
(REDIRECT_NAME): Likewise.
(OPTIMIZE): Likewise.
(IFUNC_SELECTOR): Likewise.
|
|
__x86_shared_non_temporal_threshold was set to 6 times of per-core
shared cache size, based on the large memcpy micro benchmark in glibc
on a 8-core processor. For a processor with more than 8 cores, the
threshold is too low. Set __x86_shared_non_temporal_threshold to the
3/4 of the total shared cache size so that it is unchanged on 8-core
processors. On processors with less than 8 cores, the threshold is
lower.
* sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
Set to the 3/4 of the total shared cache size.
|