aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-11Revert "Linux: Consolidate auxiliary vector parsing"Florian Weimer
This reverts commit 8c8510ab2790039e58995ef3a22309582413d3ff. The revert is not perfect because the commit included a bug fix for _dl_sysdep_start with an empty argv, introduced in commit 2d47fa68628e831a692cba8fc9050cef435afc5e ("Linux: Remove DL_FIND_ARG_COMPONENTS"), and this bug fix is kept. The revert is necessary because the reverted commit introduced an early memset call on aarch64, which leads to crash due to lack of TCB initialization.
2022-02-11String: Ensure 'MIN_PAGE_SIZE' is multiple of 'getpagesize'Noah Goldstein
When 'TEST_LEN' was defined as (4096 * 3) the allocation size Would not be a multiple of system page size if system page size > 4096.
2022-02-10Use binutils 2.38 branch in build-many-glibcs.pyJoseph Myers
This patch makes build-many-glibcs.py use binutils 2.38 branch. Tested with build-many-glibcs.py (compilers and glibcs builds).
2022-02-10elf: Remove LD_USE_LOAD_BIASAdhemerval Zanella
It is solely for prelink with PIE executables [1]. [1] https://sourceware.org/legacy-ml/libc-hacker/2003-11/msg00127.html Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-02-10malloc: Remove LD_TRACE_PRELINKING usage from mtraceAdhemerval Zanella
The fix for BZ#22716 replacde LD_TRACE_LOADED_OBJECTS with LD_TRACE_PRELINKING so mtrace could record executable address position. To provide the same information, LD_TRACE_LOADED_OBJECTS is extended where a value or '2' also prints the executable address as well. It avoid adding another loader environment variable to be used solely for mtrace. The vDSO will be printed as a default library (with '=>' pointing the same name), which is ok since both mtrace and ldd already handles it. The mtrace script is changed to also parse the new format. To correctly support PIE and non-PIE executables, both the default mtrace address and the one calculated as used (it fixes mtrace for non-PIE exectuable as for BZ#22716 for PIE). Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-02-10elf: Remove prelink supportAdhemerval Zanella
Prelinked binaries and libraries still work, the dynamic tags DT_GNU_PRELINKED, DT_GNU_LIBLIST, DT_GNU_CONFLICT just ignored (meaning the process is reallocated as default). The loader environment variable TRACE_PRELINKING is also removed, since it used solely on prelink. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-02-10Linux: Consolidate auxiliary vector parsingFlorian Weimer
And optimize it slightly. The large switch statement in _dl_sysdep_start can be replaced with a large array. This reduces source code and binary size. On i686-linux-gnu: Before: text data bss dec hex filename 7791 12 0 7803 1e7b elf/dl-sysdep.os After: text data bss dec hex filename 7135 12 0 7147 1beb elf/dl-sysdep.os Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-02-10Linux: Assume that NEED_DL_SYSINFO_DSO is always definedFlorian Weimer
The definition itself is still needed for generic code. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-02-10Linux: Remove DL_FIND_ARG_COMPONENTSFlorian Weimer
The generic definition is always used since the Native Client port has been removed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-02-10Linux: Remove HAVE_AUX_SECURE, HAVE_AUX_XID, HAVE_AUX_PAGESIZEFlorian Weimer
They are always defined. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-02-10elf: Merge dl-sysdep.c into the Linux versionFlorian Weimer
The generic version is the de-facto Linux implementation. It requires an auxiliary vector, so Hurd does not use it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-02-09hppa: Fix bind-now audit (BZ #28857)Adhemerval Zanella
On hppa, a function pointer returned by la_symbind is actually a function descriptor has the plabel bit set (bit 30). This must be cleared to get the actual address of the descriptor. If the descriptor has been bound, the first word of the descriptor is the physical address of theA function, otherwise, the first word of the descriptor points to a trampoline in the PLT. This patch also adds a workaround on tests because on hppa (and it seems to be the only ABI I have see it), some shared library adds a dynamic PLT relocation to am empty symbol name: $ readelf -r elf/tst-audit25mod1.so [...] Relocation section '.rela.plt' at offset 0x464 contains 6 entries: Offset Info Type Sym.Value Sym. Name + Addend 00002008 00000081 R_PARISC_IPLT 508 [...] It breaks some assumptions on the test, where a symbol with an empty name ("") is passed on la_symbind. Checked on x86_64-linux-gnu and hppa-linux-gnu.
2022-02-08x86-64: Optimize bzeroH.J. Lu
memset with zero as the value to set is by far the majority value (99%+ for Python3 and GCC). bzero can be slightly more optimized for this case by using a zero-idiom xor for broadcasting the set value to a register (vector or GPR). Co-developed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-02-08benchtests: Add benches for bzeroH.J. Lu
Add bench-bzero-large.c, bench-bzero-walk.c and bench-bzero.c.
2022-02-07linux: fix accuracy of get_nprocs and get_nprocs_conf [BZ #28865]Dmitry V. Levin
get_nprocs() and get_nprocs_conf() use various methods to obtain an accurate number of processors. Re-introduce __get_nprocs_sched() as a source of information, and fix the order in which these methods are used to return the most accurate information. The primary source of information used in both functions remains unchanged. This also changes __get_nprocs_sched() error return value from 2 to 0, but all its users are already prepared to handle that. Old fallback order: get_nprocs: /sys/devices/system/cpu/online -> /proc/stat -> 2 get_nprocs_conf: /sys/devices/system/cpu/ -> /proc/stat -> 2 New fallback order: get_nprocs: /sys/devices/system/cpu/online -> /proc/stat -> sched_getaffinity -> 2 get_nprocs_conf: /sys/devices/system/cpu/ -> /proc/stat -> sched_getaffinity -> 2 Fixes: 342298278e ("linux: Revert the use of sched_getaffinity on get_nproc") Closes: BZ #28865 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-02-07x86: Remove SSSE3 instruction for broadcast in memset.S (SSE2 Only)Noah Goldstein
commit b62ace2740a106222e124cc86956448fa07abf4d Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Feb 6 00:54:18 2022 -0600 x86: Improve vec generation in memset-vec-unaligned-erms.S Revert usage of 'pshufb' in broadcast logic as it is an SSSE3 instruction and memset.S is restricted to only SSE2 instructions.
2022-02-07benchtests: Sort benches in MakefileH.J. Lu
Put one bench per line and sort them.
2022-02-06Benchtests: Add length zero benchmark for memset in bench-memset.cNoah Goldstein
Zero is a relevant size for some workloads (roughly 5% of uses for GCC) so we should be testing it's performance as well. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86: Improve vec generation in memset-vec-unaligned-erms.SNoah Goldstein
No bug. Split vec generation into multiple steps. This allows the broadcast in AVX2 to use 'xmm' registers for the L(less_vec) case. This saves an expensive lane-cross instruction and removes the need for 'vzeroupper'. For SSE2 replace 2x 'punpck' instructions with zero-idiom 'pxor' for byte broadcast. Results for memset-avx2 small (geomean of N = 20 benchset runs). size, New Time, Old Time, New / Old 0, 4.100, 3.831, 0.934 1, 5.074, 4.399, 0.867 2, 4.433, 4.411, 0.995 4, 4.487, 4.415, 0.984 8, 4.454, 4.396, 0.987 16, 4.502, 4.443, 0.987 All relevant string/wcsmbs tests are passing. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector tan/tanf to libmvec microbenchmarkSunil K Pandey
Add vector tan/tanf and input files to libmvec microbenchmark. libmvec-tan-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-tanf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector erfc/erfcf to libmvec microbenchmarkSunil K Pandey
Add vector erfc/erfcf and input files to libmvec microbenchmark. libmvec-erfc-inputs: 90% Normal random distribution range: (-6.0, 6.0) mean: 0.0 sigma: 1.0 10% uniform random distribution in range (-5.9, 5.9) libmvec-erfcf-inputs: 90% Normal random distribution range: (-4.0f, 4.0f) mean: 0.0f sigma: 1.0f 10% uniform random distribution in range (-3.9f, 3.9f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector asinh/asinhf to libmvec microbenchmarkSunil K Pandey
Add vector asinh/asinhf and input files to libmvec microbenchmark. libmvec-asinh-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 2.0 10% uniform random distribution in range (-1.0e6, 1.0e6) libmvec-asinhf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 2.0f 10% uniform random distribution in range (-1.0e6f, 1.0e6f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector tanh/tanhf to libmvec microbenchmarkSunil K Pandey
Add vector tanh/tanhf and input files to libmvec microbenchmark. libmvec-tanh-inputs: 90% Normal random distribution range: (-19.0, 19.0) mean: 0.0 sigma: 2.0 10% uniform random distribution in range (-16.0, 16.0) libmvec-tanhf-inputs: 90% Normal random distribution range: (-10.0f, 10.0f) mean: 0.0f sigma: 2.0f 10% uniform random distribution in range (-8.0f, 8.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector erf/erff to libmvec microbenchmarkSunil K Pandey
Add vector erf/erff and input files to libmvec microbenchmark. libmvec-erf-inputs: 90% Normal random distribution range: (-6.0, 6.0) mean: 0.0 sigma: 1.0 10% uniform random distribution in range (-5.9, 5.9) libmvec-erff-inputs: 90% Normal random distribution range: (-4.0f, 4.0f) mean: 0.0f sigma: 1.0f 10% uniform random distribution in range (-3.9f, 3.9f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector acosh/acoshf to libmvec microbenchmarkSunil K Pandey
Add vector acosh/acoshf and input files to libmvec microbenchmark. libmvec-acosh-inputs: 90% Normal random distribution range: (1.0, DBL_MAX) mean: 1.0 sigma: 8.0 10% uniform random distribution in range (1.0, 1.0e6) libmvec-acoshf-inputs: 90% Normal random distribution range: (1.0f, FLT_MAX) mean: 1.0f sigma: 4.0f 10% uniform random distribution in range (1.0f, 1.0e6f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector atanh/atanhf to libmvec microbenchmarkSunil K Pandey
Add vector atanh/atanhf and input files to libmvec microbenchmark. libmvec-atanh-inputs: 90% Normal random distribution range: (-1.0, 1.0) mean: 0.0 sigma: 1.0 10% uniform random distribution in range (-1.0, 1.0) libmvec-atanhf-inputs: 90% Normal random distribution range: (-1.0f, 1.0f) mean: 0.0f sigma: 1.0f 10% uniform random distribution in range (-1.0f, 1.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector log1p/log1pf to libmvec microbenchmarkSunil K Pandey
Add vector log1p/log1pf and input files to libmvec microbenchmark. libmvec-log1p-inputs: 70% Normal random distribution range: (-1.0, DBL_MAX) mean: 0.0 sigma: 50.0 30% uniform random distribution in range (-1.0, 1.0e6) libmvec-log1pf-inputs: 70% Normal random distribution range: (-1.0f, FLT_MAX) mean: 0.0f sigma: 50.0f 30% uniform random distribution in range (-1.0f, 1.0e6f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector log2/log2f to libmvec microbenchmarkSunil K Pandey
Add vector log2/log2f and input files to libmvec microbenchmark. libmvec-log2-inputs: 70% Normal random distribution range: (0.0, DBL_MAX) mean: 1.0 sigma: 50.0 30% uniform random distribution in range (0.0, 1.0e6) libmvec-log2f-inputs: 70% Normal random distribution range: (0.0f, FLT_MAX) mean: 1.0f sigma: 50.0f 30% uniform random distribution in range (0.0f, 1.0e6f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector log10/log10f to libmvec microbenchmarkSunil K Pandey
Add vector log10/log10f and input files to libmvec microbenchmark. libmvec-log10-inputs: 70% Normal random distribution range: (0.0, DBL_MAX) mean: 1.0 sigma: 50.0 30% uniform random distribution in range (0.0, 1.0e6) libmvec-log10f-inputs: 70% Normal random distribution range: (0.0f, FLT_MAX) mean: 1.0f sigma: 50.0f 30% uniform random distribution in range (0.0f, 1.0e6f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector atan2/atan2f to libmvec microbenchmarkSunil K Pandey
Add vector atan2/atan2f and input files to libmvec microbenchmark. libmvec-atan2-inputs: arg1: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 4.0 10% uniform random distribution in range (-1.0e6, 1.0e6) arg2: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 4.0 10% uniform random distribution in range (-1.0e6, 1.0e6) libmvec-atan2f-inputs: arg1: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 4.0f 10% uniform random distribution in range (-1.0e6f, 1.0e6f) arg2: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 4.0f 10% uniform random distribution in range (-1.0e6f, 1.0e6f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector cbrt/cbrtf to libmvec microbenchmarkSunil K Pandey
Add vector cbrt/cbrtf and input files to libmvec microbenchmark. libmvec-cbrt-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 10.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-cbrtf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 10.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector sinh/sinhf to libmvec microbenchmarkSunil K Pandey
Add vector sinh/sinhf and input files to libmvec microbenchmark. libmvec-sinh-inputs: 90% Normal random distribution range: (-710.0, 710.0) mean: 0.0 sigma: 32.0 10% uniform random distribution in range (-500.0, 500.0) libmvec-sinhf-inputs: 90% Normal random distribution range: (-89.0f, 89.0f) mean: 0.0f sigma: 16.0f 10% uniform random distribution in range (-50.0f, 50.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector expm1/expm1f to libmvec microbenchmarkSunil K Pandey
Add vector expm1/expm1f and input files to libmvec microbenchmark. libmvec-expm1-inputs: 90% Normal random distribution range: (-708.0, 709.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-500.0, 500.0) libmvec-expm1f-inputs: 90% Normal random distribution range: (-87.0f, 88.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-50.0f, 50.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector cosh/coshf to libmvec microbenchmarkSunil K Pandey
Add vector cosh/coshf and input files to libmvec microbenchmark. libmvec-cosh-inputs: 90% Normal random distribution range: (-710.0, 710.0) mean: 0.0 sigma: 32.0 10% uniform random distribution in range (-500.0, 500.0) libmvec-coshf-inputs: 90% Normal random distribution range: (-89.0f, 89.0f) mean: 0.0f sigma: 16.0f 10% uniform random distribution in range (-50.0f, 50.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector exp10/exp10f to libmvec microbenchmarkSunil K Pandey
Add vector exp10/exp10f and input files to libmvec microbenchmark. libmvec-exp10-inputs: 90% Normal random distribution range: (-307.0, 308.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-250.0, 250.0) libmvec-exp10f-inputs: 90% Normal random distribution range: (-37.0f, 38.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-25.0f, 25.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector exp2/exp2f to libmvec microbenchmarkSunil K Pandey
Add vector exp2/exp2f and input files to libmvec microbenchmark. libmvec-exp2-inputs: 90% Normal random distribution range: (-1022.0, 1024.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-exp2f-inputs: 90% Normal random distribution range: (-126.0f, 128.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-100.0f, 100.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector hypot/hypotf to libmvec microbenchmarkSunil K Pandey
Add vector hypot/hypotf and input files to libmvec microbenchmark. libmvec-hypot-inputs: arg1: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 10.0 10% uniform random distribution in range (-1000.0, 1000.0) arg1: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 10.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-hypotf-inputs: arg1: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 10.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) arg2: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 10.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector asin/asinf to libmvec microbenchmarkSunil K Pandey
Add vector asin/asinf and input files to libmvec microbenchmark. libmvec-asin-inputs: 90% Normal random distribution range: (-1.0, 1.0) mean: 0.0 sigma: 1.0 10% uniform random distribution in range (-1.0, 1.0) libmvec-asinf-inputs: 90% Normal random distribution range: (-1.0f, 1.0f) mean: 0.0f sigma: 1.0f 10% uniform random distribution in range (-1.0f, 1.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06x86-64: Add vector atan/atanf to libmvec microbenchmarkSunil K Pandey
Add vector atan/atanf and input files to libmvec microbenchmark. libmvec-atan-inputs: arg1: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 4.0 10% uniform random distribution in range (-1.0e6, 1.0e6) arg2: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 4.0 10% uniform random distribution in range (-1.0e6, 1.0e6) libmvec-atanf-inputs: arg1: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 4.0f 10% uniform random distribution in range (-1.0e6f, 1.0e6f) arg2: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 4.0f 10% uniform random distribution in range (-1.0e6f, 1.0e6f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-06elf: Replace tst-audit24bmod2.so with tst-audit24bmod2H.J. Lu
Replace tst-audit24bmod2.so with tst-audit24bmod2 to silence: make[2]: Entering directory '/export/gnu/import/git/gitlab/x86-glibc/elf' Makefile:2201: warning: overriding recipe for target '/export/build/gnu/tools-build/glibc-gitlab/build-x86_64-linux/elf/tst-audit24bmod2.so' ../Makerules:765: warning: ignoring old recipe for target '/export/build/gnu/tools-build/glibc-gitlab/build-x86_64-linux/elf/tst-audit24bmod2.so'
2022-02-05x86_64/multiarch: Sort sysdep_routines and put one entry per lineH.J. Lu
2022-02-05string: Sort headers, routines, tests and tests-translationH.J. Lu
Sort headers, routines, tests and tests-translation. Put one entry per line.
2022-02-05x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ))H.J. Lu
2022-02-05Benchtests: move 'alloc_bufs' from loop in bench-memset.cNoah Goldstein
One buf allocation is sufficient. Calling `alloc_bufs' in the loop just adds unnecessary syscall overhead. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-02-04x86-64: Fix strcmp-evex.SH.J. Lu
Change "movl %edx, %rdx" to "movl %edx, %edx" in: commit 8418eb3ff4b781d31c4ed5dc6c0bd7356bc45db9 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Jan 10 15:35:39 2022 -0600 x86: Optimize strcmp-evex.S
2022-02-04x86-64: Fix strcmp-avx2.SH.J. Lu
Change "movl %edx, %rdx" to "movl %edx, %edx" in: commit b77b06e0e296f1a2276c27a67e1d44f2cfa38d45 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Jan 10 15:35:38 2022 -0600 x86: Optimize strcmp-avx2.S
2022-02-03x86-64: Add vector acos/acosf to libmvec microbenchmarkSunil K Pandey
Add vector acos/acosf and input files to libmvec microbenchmark. libmvec-acos-inputs: 90% Normal random distribution range: (-1.0, 1.0) mean: 0.0 sigma: 1.0 10% uniform random distribution in range (-1.0, 1.0) libmvec-acosf-inputs: 90% Normal random distribution range: (-1.0f, 1.0f) mean: 0.0f sigma: 1.0f 10% uniform random distribution in range (-1.0f, 1.0f) Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-02-03benchtests: Add more coverage for strcmp and strncmp benchmarksNoah Goldstein
Add more small and medium sized tests for strcmp and strncmp. As well for strcmp add option for more direct control of alignment. Previously alignment was being pushed to the end of the page. While this is the most difficult case to implement, it is far from the common case and so shouldn't be the only benchmark. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-02-03x86: Optimize strcmp-evex.SNoah Goldstein
Optimization are primarily to the loop logic and how the page cross logic interacts with the loop. The page cross logic is at times more expensive for short strings near the end of a page but not crossing the page. This is done to retest the page cross conditions with a non-faulty check and to improve the logic for entering the loop afterwards. This is only particular cases, however, and is general made up for by more than 10x improvements on the transition from the page cross -> loop case. The non-page cross cases as well are nearly universally improved. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-02-03x86: Optimize strcmp-avx2.SNoah Goldstein
Optimization are primarily to the loop logic and how the page cross logic interacts with the loop. The page cross logic is at times more expensive for short strings near the end of a page but not crossing the page. This is done to retest the page cross conditions with a non-faulty check and to improve the logic for entering the loop afterwards. This is only particular cases, however, and is general made up for by more than 10x improvements on the transition from the page cross -> loop case. The non-page cross cases are improved most for smaller sizes [0, 128] and go about even for (128, 4096]. The loop page cross logic is improved so some more significant speedup is seen there as well. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>