Age | Commit message (Collapse) | Author |
|
This reverts commit b37899d34d2190ef4b454283188f22519f096048.
Apparently we load libc.so (and thus start using its functions) before
calling TLS_INIT_TP, so libc.so functions should not actually assume
that TLS is always set up.
|
|
Previously, once we set up TLS, we would implicitly switch from using
__hurd_reply_port0 to reply_port inside the TCB, leaving the former
unused. But we never deallocated it, so it got leaked.
Instead, migrate the port into the new TCB's reply_port slot. This
avoids both the port leak and an extra syscall to create a new reply
port for the TCB.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-28-bugaevc@gmail.com>
|
|
If we're doing signals, that means we've already got the signal thread
running, and that implies TLS having been set up. So we know that
__hurd_local_reply_port will resolve to THREAD_SELF->reply_port, and can
access that directly using the THREAD_GETMEM and THREAD_SETMEM macros.
This avoids potential miscompilations, and should also be a tiny bit
faster.
Also, use mach_port_mod_refs () and not mach_port_destroy () to destroy
the receive right. mach_port_destroy () should *never* be used on
mach_task_self (); this can easily lead to port use-after-free
vulnerabilities if the task has any other references to the same port.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-26-bugaevc@gmail.com>
|
|
When glibc is built as a shared library, TLS is always initialized by
the call of TLS_INIT_TP () macro made inside the dynamic loader, prior
to running the main program (see dl-call_tls_init_tp.h). We can take
advantage of this: we know for sure that __LIBC_NO_TLS () will evaluate
to 0 in all other cases, so let the compiler know that explicitly too.
Also, only define _hurd_tls_init () and TLS_INIT_TP () under the same
conditions (either !SHARED or inside rtld), to statically assert that
this is the case.
Other than a microoptimization, this also helps with avoiding awkward
sharing of the __libc_tls_initialized variable between ld.so and libc.so
that we would have to do otherwise -- we know for sure that no sharing
is required, simply because __libc_tls_initialized would always be set
to true inside libc.so.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-25-bugaevc@gmail.com>
|
|
Nothing in there needs tls.h
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-24-bugaevc@gmail.com>
|
|
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230403115621.258636-3-bugaevc@gmail.com>
|
|
These are just regular local variables that are not accessed in any
funny ways, not even though a pointer. There's absolutely no reason to
declare them volatile. It only ends up hurting the quality of the
generated machine code.
If anything, it would make sense to decalre sigsp as *pointing* to
volatile memory (volatile void *sigsp), but evidently that's not needed
either.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230403115621.258636-2-bugaevc@gmail.com>
|
|
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-18-bugaevc@gmail.com>
|
|
This is based on the Linux port's version, but laid out to match Mach's
struct i386_thread_state, much like the i386 version does.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
|
|
The hurd source tree already provides the same stubs and they are only
needed there.
Message-Id: <ZDN3rDdjMowtUWf7@jupiter.tail36e24.ts.net>
|
|
* manual/string.texi (Truncating Strings): Update obsolescent
reference and use the more-generic term “AddressSanitizer”.
Mention fortification, too. -fcheck-pointer-bounds is no longer
supported.
|
|
|
|
* manual/string.texi: Editorial fixes. Do not say “text” when
“string” or “string contents” is meant, as a C string can contain
bytes that are not valid text in the current encoding.
When warning about strcat efficiency, warn similarly about strncat
and wcscat. “coping” → “copying”.
Mention at the start of the two problematic sections that problems
are discussed at section end.
|
|
* manual/creature.texi (Feature Test Macros): Fix
“creature.texi:309: warning: `.' or `,' must follow @xref, not f”.
|
|
FreeBSD makes these functions available by default, so we should
not treat them as GNU-specific and restrict them to _GNU_SOURCE.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
FreeBSD makes them available by default, too, so there does not seem
to be a reason to restrict these functions to _GNU_SOURCE.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Add PREFETCHI support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add AMX-COMPLEX support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add AVX-NE-CONVERT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add AVX-VNNI-INT8 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add MSRLIST support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add AVX-IFMA support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add AMX-FP16 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add WRMSRNS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add Architectural Performance Monitoring Extended Leaf (EAX = 23H)
support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add CMPCCXADD support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add Linear Address Space Separation (LASS) support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add RAO-INT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add architectural LBR support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add RTM_FORCE_ABORT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add SGX-KEYS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add Bus lock debug exceptions (BUS_LOCK_DETECT) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Add 57-bit linear addresses and five-level paging (LA57) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Move LAM after LAHF64_SAHF64 to sort x86 features.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Rename x86_cpu_INDEX_7_ECX_1 to x86_cpu_INDEX_7_ECX_15 for the unused bit
15 in ECX from CPUID with EAX == 0x7 and ECX == 0.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
Signed-off-by: John David Anglin <dave.anglin@bell.net>
|
|
Handle both 32 and 64-bit ABIs.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
|
|
Both _IO_file_jumps_alias and _IO_wfile_jumps_alias are defined as
alias.
|
|
Both __rpc_freemem and __rpc_thread_destroy are only used if the
the compat symbols are required.
|
|
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-4-gfleury@disroot.org>
|
|
sysdeps/mach/hurd/htl/pt-pthread_self.c: New file.
htl/Makefile: .. Add it to libc routine.
sysdeps/mach/hurd/htl/pt-sysdep.c(__pthread_self): Remove it.
sysdeps/mach/hurd/htl/pt-sysdep.h(__pthread_self): Add hidden propertie.
htl/Versions(__pthread_self) Version it as private symbol.
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-3-gfleury@disroot.org>
|
|
htl/pt-nthreads.c: new file.
htl/Makefile: Add it to routine.
htl/Versions: version it as private libc symbol.
htl/pt-create.c: remove his definition here.
htl/pt-internal.h: add propertie to it declaration.
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-2-gfleury@disroot.org>
|
|
To calculate geometric mean for string benchmark results.
Signed-off-by: Nisha Poyarekar <nisha.s.menon@gmail.com>
|
|
Also replace an unreachable assert with __builtin_unreachable.
|
|
Similar to fb95c316382679c0826cc8399760977cd95f15c9, also disable
for string-ppc64.c (pulled on rltd as the default string
implementation).
Checked on powerpc64-linux-gnu.
|
|
As indicated by sparc kernel-features.h, even though sparc64 defines
__NR_pause, it is not supported (ENOSYS). Always use ppoll or the
64 bit time_t variant instead.
|
|
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper
with SVID error handling around the new code. There is no new symbol
version nor compatibility code on !LIBM_SVID_COMPAT targets
(e.g. riscv).
The ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation. For both i686 and m68k,
which provive arch specific implementation, wrappers are added so
no new symbol are added (which would require to change the
implementations).
It shows an small improvement, the results for fmod:
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 12.5049 | 9.40992
x86_64 (Ryzen 9) | normal | 296.939 | 296.738
x86_64 (Ryzen 9) | close-exponents | 16.0244 | 13.119
aarch64 (N1) | subnormal | 6.81778 | 4.33313
aarch64 (N1) | normal | 155.620 | 152.915
aarch64 (N1) | close-exponents | 8.21306 | 5.76138
armhf (N1) | subnormal | 15.1083 | 14.5746
armhf (N1) | normal | 244.833 | 241.738
armhf (N1) | close-exponents | 21.8182 | 22.457
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 8 (which is sizeof of uint32_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 8;
ex -= 8;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.0318
x86_64 (Ryzen 9) | normal | 85.4096 | 49.9641
x86_64 (Ryzen 9) | close-exponents | 19.1072 | 15.8224
aarch64 (N1) | subnormal | 10.2182 | 6.81778
aarch64 (N1) | normal | 60.0616 | 20.3667
aarch64 (N1) | close-exponents | 11.5256 | 8.39685
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 11.6662 | 10.8955
armhf (N1) | normal | 69.2759 | 34.1524
armhf (N1) | close-exponents | 13.6472 | 18.2131
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 11;
ex -= 11;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 19.1584 | 12.5049
x86_64 (Ryzen 9) | normal | 1016.51 | 296.939
x86_64 (Ryzen 9) | close-exponents | 18.4428 | 16.0244
aarch64 (N1) | subnormal | 11.153 | 6.81778
aarch64 (N1) | normal | 528.649 | 155.62
aarch64 (N1) | close-exponents | 11.4517 | 8.21306
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 15.908 | 15.1083
armhf (N1) | normal | 837.525 | 244.833
armhf (N1) | close-exponents | 16.2111 | 21.8182
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
1. Subnormals: 128 inputs.
2. Normal numbers with large exponent difference (|x/y| > 2^8):
1024 inputs between FLT_MIN and FLT_MAX;
3. Close exponents (ey >= -103 and |x/y| < 2^8): 1024 inputs with
exponents between -10 and 10.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|