aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/x86_64/fpu
AgeCommit message (Collapse)Author
2023-08-21x86_64: Add log1p with FMAH.J. Lu
On Skylake, it changes log1p bench performance by: Before After Improvement max 63.349 58.347 8% min 4.448 5.651 -30% mean 12.0674 10.336 14% The minimum code path is if (hx < 0x3FDA827A) /* x < 0.41422 */ { if (__glibc_unlikely (ax >= 0x3ff00000)) /* x <= -1.0 */ { ... } if (__glibc_unlikely (ax < 0x3e200000)) /* |x| < 2**-29 */ { math_force_eval (two54 + x); /* raise inexact */ if (ax < 0x3c900000) /* |x| < 2**-54 */ { ... } else return x - x * x * 0.5; FMA and non-FMA code sequences look similar. Non-FMA version is slightly faster. Since log1p is called by asinh and atanh, it improves asinh performance by: Before After Improvement max 75.645 63.135 16% min 10.074 10.071 0% mean 15.9483 14.9089 6% and improves atanh performance by: Before After Improvement max 91.768 75.081 18% min 15.548 13.883 10% mean 18.3713 16.8011 8%
2023-08-14x86_64: Add expm1 with FMAH.J. Lu
On Skylake, it improves expm1 bench performance by: Before After Improvement max 70.204 68.054 3% min 20.709 16.2 22% mean 22.1221 16.7367 24% NB: Add extern long double __expm1l (long double); extern long double __expm1f128 (long double); for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is defined since __expm1 may be expanded in their declarations which causes the build failure.
2023-08-11x86_64: Add log2 with FMAH.J. Lu
On Skylake, it improves log2 bench performance by: Before After Improvement max 208.779 63.827 69% min 9.977 6.55 34% mean 10.366 6.8191 34%
2023-08-10x86_64: Sort fpu/multiarch/MakefileH.J. Lu
Sort Makefile variables using scripts/sort-makefile-lines.py. No code generation changes observed in libm. No regressions on x86_64.
2023-07-19Update x86_64 libm-test-ulps (x32 ABI)Andreas K. Hüttel
Based on feedback by Mike Gilbert <floppym@gentoo.org> Linux-6.1.38-dist x86_64 AMD Phenom-tm- II X6 1055T Processor -march=amdfam10 failures occur for x32 ABI Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2023-05-23Fix misspellings in sysdeps/x86_64 -- BZ 25337.Paul Pluzhnikov
Applying this commit results in bit-identical rebuild of libc.so.6 math/libm.so.6 elf/ld-linux-x86-64.so.2 mathvec/libmvec.so.1 Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-05-23Fix misspellings in sysdeps/x86_64/fpu/multiarch -- BZ 25337.Paul Pluzhnikov
Applying this commit results in a bit-identical rebuild of mathvec/libmvec.so.1 (which is the only binary that gets rebuilt). Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-05-03Enable libmvec support for AArch64Joe Ramsay
This patch enables libmvec on AArch64. The proposed change is mainly implementing build infrastructure to add the new routines to ABI, tests and benchmarks. I have demonstrated how this all fits together by adding implementations for vector cos, in both single and double precision, targeting both Advanced SIMD and SVE. The implementations of the routines themselves are just loops over the scalar routine from libm for now, as we are more concerned with getting the plumbing right at this point. We plan to contribute vector routines from the Arm Optimized Routines repo that are compliant with requirements described in the libmvec wiki. Building libmvec requires minimum GCC 10 for SVE ACLE. To avoid raising the minimum GCC by such a big jump, we allow users to disable libmvec if their compiler is too old. Note that at this point users have to manually call the vector math functions. This seems to be acceptable to some downstream users. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-04-03x86_64: Fix asm constraints in feraiseexcept (bug 30305)Florian Weimer
The divss instruction clobbers its first argument, and the constraints need to reflect that. Fortunately, with GCC 12, generated code does not actually change, so there is no externally visible bug. Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-03-27benchtests: Move libmvec benchtest inputs to benchtests directoryJoe Ramsay
This allows other targets to use the same inputs for their own libmvec microbenchmarks without having to duplicate them in their own subdirectory. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-02-27x86_64: Update libm test ulpsH.J. Lu
Update libm test ulps for commit 3efbf11fdf15ed991d2c41743921c524a867e145 Author: Paul Zimmermann <Paul.Zimmermann@inria.fr> Date: Tue Feb 14 11:24:59 2023 +0100 update auto-libm-test-out-hypot Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-01-06Update copyright dates with scripts/update-copyrightsJoseph Myers
2022-12-19stdio-common: Convert vfprintf and related functions to buffersFlorian Weimer
vfprintf is entangled with vfwprintf (of course), __printf_fp, __printf_fphex, __vstrfmon_l_internal, and the strfrom family of functions. The latter use the internal snprintf functionality, so vsnprintf is converted as well. The simples conversion is __printf_fphex, followed by __vstrfmon_l_internal and __printf_fp, and finally __vfprintf_internal and __vfwprintf_internal. __vsnprintf_internal and strfrom* are mostly consuming the new interfaces, so they are comparatively simple. __printf_fp is a public symbol, so the FILE *-based interface had to preserved. The __printf_fp rewrite does not change the actual binary-to-decimal conversion algorithm, and digits are still not emitted directly to the target buffer. However, the staging buffer now uses bytes instead of wide characters, and one buffer copy is eliminated. The changes are at least performance-neutral in my testing. Floating point printing and snprintf improved measurably, so that this Lua script for i=1,5000000 do print(i, i * math.pi) end runs about 5% faster for me. To preserve fprintf performance for a simple "%d" format, this commit has some logic changes under LABEL (unsigned_number) to avoid additional function calls. There are certainly some very easy performance improvements here: binary, octal and hexadecimal formatting can easily avoid the temporary work buffer (the number of digits can be computed ahead-of-time using one of the __builtin_clz* built-ins). Decimal formatting can use a specialized version of _itoa_word for base 10. The existing (inconsistent) width handling between strfmon and printf is preserved here. __print_fp_buffer_1 would have to use __translated_number_width to achieve ISO conformance for printf. Test expectations in libio/tst-vtables-common.c are adjusted because the internal staging buffer merges all virtual function calls into one. In general, stack buffer usage is greatly reduced, particularly for unbuffered input streams. __printf_fp can still use a large buffer in binary128 mode for %g, though. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-11-27x86/fpu: Factor out shared avx2/avx512 code in svml_{s|d}_wrapper_impl.hNoah Goldstein
Code is exactly the same for the two so better to only maintain one version. All math and mathvec tests pass on x86.
2022-11-27x86/fpu: Cleanup code in svml_{s|d}_wrapper_impl.hNoah Goldstein
1. Remove unnecessary spills. 2. Fix some small nit missed optimizations. All math and mathvec tests pass on x86.
2022-11-27x86/fpu: Reformat svml_{s|d}_wrapper_impl.hNoah Goldstein
Just reformat with the style convention used in other x86 assembler files. This doesn't change libm.so or libmvec.so.
2022-11-27x86/fpu: Fix misspelled evex512 section in variety of svml filesNoah Goldstein
``` .section .text.evex512, "ax", @progbits ``` With misspelled as: ``` .section .text.exex512, "ax", @progbits ```
2022-11-27x86/fpu: Add missing ISA sections to variety of svml filesNoah Goldstein
Many sse4/avx2/avx512 files where just in .text.
2022-10-03x86: Remove .tfloat usageAdhemerval Zanella
Some compiler does not support it (such as clang integrated assembler) neither gcc emits it.
2022-06-22x86: Replace all sse instructions with vex equivilent in avx+ filesNoah Goldstein
Most of these don't really matter as there was no dirty upper state but we should generally avoid stray sse when its not needed. The one case that really matters is in svml_d_tanh4_core_avx2.S: blendvps %xmm0, %xmm8, %xmm7 When there was a dirty upper state. Tested on x86_64-linux
2022-06-09x86: Optimize svml_s_tanhf4_core_sse4.SNoah Goldstein
Optimizations are: 1. Reduce code size (-112 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Reduce rodata size (-4k+ rodata is shared with avx2). Result is roughly a 15-16% speedup: Function, New Time, Old Time, New / Old _ZGVbN4v_tanhf, 3.158, 3.749, 0.842
2022-06-09x86: Optimize svml_s_tanhf8_core_avx2.SNoah Goldstein
Optimizations are: 1. Reduce code size (-81 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Reduce rodata size (-32 bytes). Result is roughly a 17-18% speedup: Function, New Time, Old Time, New / Old _ZGVdN8v_tanhf, 1.977, 2.402, 0.823
2022-06-09x86: Add data file that can be shared by tanhf-avx2 and tanhf-sse4Noah Goldstein
tanhf-avx2 and tanhf-sse4 use the same data tables so we can save over 4kb using a shared datatable. This does increase the memory footprint of the sse4 version (as now all the targets are 32 bytes instead of 16), generally it seems worth the code size save. NB: This patch doesn't do anything itself, it is setup for future patches.
2022-06-09x86: Optimize svml_s_tanhf16_core_avx512.SNoah Goldstein
Optimizations are: 1. Reduce code size (-67 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Reduce rodata usage (-448 bytes). Result is roughly a 14% speedup: Function, New Time, Old Time, New / Old _ZGVeN16v_tanhf, 0.649, 0.752, 0.863
2022-06-09x86: Improve svml_s_atanhf4_core_sse4.SNoah Goldstein
Improvements are: 1. Reduce code size (-62 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Reduce rodata usage (-16 bytes). The throughput improvement is not significant as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVbN4v_atanhf, 8.821, 8.903, 0.991
2022-06-09x86: Improve svml_s_atanhf8_core_avx2.SNoah Goldstein
Improvements are: 1. Reduce code size (-60 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Shrink rodata usage (-32 bytes). The throughput improvement is not that significant (3-5%) as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVdN8v_atanhf, 2.799, 2.923, 0.958
2022-06-09x86: Improve svml_s_atanhf16_core_avx512.SNoah Goldstein
Improvements are: 1. Reduce code size (-64 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Reduce rodata size ([-128, -188] bytes). The throughput improvement is not significant as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVeN16v_atanhf, 1.39, 1.408, 0.987
2022-06-01x86_64: Optimize sincos where sin/cos is optimized (bug 29193)Andreas Schwab
The compiler may substitute calls to sin or cos with calls to sincos, thus we should have the same optimized implementations for sincos. The optimized implementations may produce results that differ, that also makes sure that the sincos call aggrees with the sin and cos calls.
2022-05-23math: Add math-use-builtins-fabs (BZ#29027)Adhemerval Zanella
Both float, double, and _Float128 are assumed to be supported (float and double already only uses builtins). Only long double is parametrized due GCC bug 29253 which prevents its usage on powerpc. It allows to remove i686, ia64, x86_64, powerpc, and sparc arch specific implementation. On ia64 it also fixes the sNAN handling: math/test-float64x-fabs math/test-ldouble-fabs Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
2022-04-29benchtests: Better libmvec integrationSiddhesh Poyarekar
Improve libmvec benchmark integration so that in future other architectures may be able to run their libmvec benchmarks as well. This now allows libmvec benchmarks to be run with `make BENCHSET=bench-math`. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-04-29benchtests: Add UNSUPPORTED benchmark statusSiddhesh Poyarekar
The libmvec benchmarks print a message indicating that a certain CPU feature is unsupported and exit prematurelyi, which breaks the JSON in bench.out. Handle this more elegantly in the bench makefile target by adding support for an UNSUPPORTED exit status (77) so that bench.out continues to have output for valid tests. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-04-07x86: Remove fcopysign{f} implementationAdhemerval Zanella
The builtin used by generic code generates similar code. Checked on x86_64-linux-gnu and i686-linux-gnu.
2022-04-05benchtests: Only build libmvec benchmarks iff $(build-mathvec) is setAdhemerval Zanella
Checked on x86_64-linux-gnu.
2022-04-04x86: Remove fabs{f} implementationAdhemerval Zanella
For x86_64 is the same as the generic implementation, while for i686 the builtin generates the same code.
2022-03-07x86_64: Fix svml_d_tanh8_core_avx512.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_d_tanh4_core_avx2.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_d_tanh2_core_sse4.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_s_tanhf8_core_avx2.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_s_tanhf4_core_sse4.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_s_tanhf16_core_avx512.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_d_tan8_core_avx512.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_d_tan4_core_avx2.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_d_tan2_core_sse4.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_s_tanf8_core_avx2.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_s_tanf4_core_sse4.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_s_tanf16_core_avx512.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_d_sinh8_core_avx512.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_d_sinh4_core_avx2.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_d_sinh2_core_sse4.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07x86_64: Fix svml_s_sinhf8_core_avx2.S code formattingSunil K Pandey
This commit contains following formatting changes 1. Instructions proceeded by a tab. 2. Instruction less than 8 characters in length have a tab between it and the first operand. 3. Instruction greater than 7 characters in length have a space between it and the first operand. 4. Tabs after `#define`d names and their value. 5. 8 space at the beginning of line replaced by tab. 6. Indent comments with code. 7. Remove redundent .text section. 8. 1 space between line content and line comment. 9. Space after all commas. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>