Age | Commit message (Collapse) | Author |
|
|
|
* sysdeps/i386/fpu/libm-test-ulps: Update.
|
|
* sysdeps/sparc/fpu/libm-test-ulps: Update.
|
|
|
|
|
|
|
|
SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.
|
|
|
|
|
|
|
|
|
|
Many Linux arches require fixed mmaps to be aligned higher than pagesize,
so use the SHMLBA define as it represents this quantity exactly.
This fixes spurious errors seen on those arches like:
cannot map archive header: Invalid argument
URL: http://sourceware.org/bugzilla/show_bug.cgi?id=10283
Reported-by: CHIKAMA Masaki <masaki.chikama@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
This patch introduces two new convenience functions to set the default
thread attributes used for creating threads. This allows a programmer
to set the default thread attributes just once in a process and then
run pthread_create without additional attributes.
|
|
I have small patch for new Intel Silvermont machines.
http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture
I checked this on my machine and see that strcpy, ... unaligned
versions are faster than ssse3 versions.
|
|
Resolves: BZ #15627
Add an assembler version of rtld-memset to avoid using SSE registers.
|
|
Resolves #12515.
Use CLOCK_PROCESS_CPUTIME_ID instead of times to get better precision
in the value returned by clock.
|
|
GCC 4.8 enables -ftree-loop-distribute-patterns at -O3 by default and
this optimization may transform loops into memset/memmove calls. Without
proper handling this may generate unexpected PLT calls on GLIBC.
This patch fixes by create memset/memmove alias to internal GLIBC
__GI_memset/__GI_memmove symbols.
|
|
The most common use case of math functions is with default rounding
mode, i.e. rounding to nearest. Setting and restoring rounding mode
is an unnecessary overhead for this, so I've added support for a
context, which does the set/restore only if the FP status needs a
change. The code is written such that only x86 uses these. Other
architectures should be unaffected by it, but would definitely benefit
if the set/restore has as much overhead relative to the rest of the
code, as the x86 bits do.
Here's a summary of the performance improvement due to these
improvements; I've only mentioned functions that use the set/restore
and have benchmark inputs for x86_64:
Before:
cos(): ITERS:4.69335e+08: TOTAL:28884.6Mcy, MAX:4080.28cy, MIN:57.562cy, 16248.6 calls/Mcy
exp(): ITERS:4.47604e+08: TOTAL:28796.2Mcy, MAX:207.721cy, MIN:62.385cy, 15543.9 calls/Mcy
pow(): ITERS:1.63485e+08: TOTAL:28879.9Mcy, MAX:362.255cy, MIN:172.469cy, 5660.86 calls/Mcy
sin(): ITERS:3.89578e+08: TOTAL:28900Mcy, MAX:704.859cy, MIN:47.583cy, 13480.2 calls/Mcy
tan(): ITERS:7.0971e+07: TOTAL:28902.2Mcy, MAX:1357.79cy, MIN:388.58cy, 2455.55 calls/Mcy
After:
cos(): ITERS:6.0014e+08: TOTAL:28875.9Mcy, MAX:364.283cy, MIN:45.716cy, 20783.4 calls/Mcy
exp(): ITERS:5.48578e+08: TOTAL:28764.9Mcy, MAX:191.617cy, MIN:51.011cy, 19071.1 calls/Mcy
pow(): ITERS:1.70013e+08: TOTAL:28873.6Mcy, MAX:689.522cy, MIN:163.989cy, 5888.18 calls/Mcy
sin(): ITERS:4.64079e+08: TOTAL:28891.5Mcy, MAX:6959.3cy, MIN:36.189cy, 16062.8 calls/Mcy
tan(): ITERS:7.2354e+07: TOTAL:28898.9Mcy, MAX:1295.57cy, MIN:380.698cy, 2503.7 calls/Mcy
So the improvements are:
cos: 27.9089%
exp: 22.6919%
pow: 4.01564%
sin: 19.1585%
tan: 1.96086%
The downside of the change is that it will have an adverse performance
impact on non-default rounding modes, but I think the tradeoff is
justified.
|
|
|
|
|
|
__clock_gettime and other __clock_* functions could result in an extra
PLT reference within libc.so if it actually gets used. None of the
code currently uses them, which is why this probably went unnoticed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We only need to set/restore rounding mode to ensure correct
computation for non-default rounding modes.
|
|
|
|
* manual/errno.texi (ESTALE): Update to account for more than
just NFS file systems.
* sysdeps/gnu/errlist.c: Regenerated.
|
|
Resolves: #15465
The program name may be unavailable if the user application tampers
with argc and argv[]. Some parts of the dynamic linker caters for
this while others don't, so this patch consolidates the check and
fallback into a single macro and updates all users.
|
|
|
|
* sysdeps/mach/hurd/dl-sysdep.c (_dl_sysdep_start:go): Don't
declare _dl_skip_args.
Continuation of commit 8347c74cc5c972aa9e57747177b1f5f4b1cbcac8.
|
|
* sysdeps/mach/hurd/i386/init-first.c (_dl_non_dynamic_init):
Don't declare.
Continuation of commit bc16e260d0e74b36e48d30edc6ea4f1152700c09.
|
|
|
|
|
|
|
|
This patch add inline functions to change the Program Priority Register
from ISA 2.05.
|
|
|
|
Adds new SIGBUS error codes for hardware poison signals, syncing with
the current kernel headers (v3.9). It also adds si_trapno field for
alpha.
|
|
|
|
Fixes BZ #15339.
NSS_STATUS_UNAVAIL may mean that a necessary input resource is not
available. This could occur in a number of cases including when the
network is down, system runs out of file descriptors, etc. The
correct differentiator in such a case is the h_errno, which gives the
nature of failure. In case of failures other than a simple 'not
found', we set h_errno as NETDB_INTERNAL and let errno be the
identifier for the exact error.
|
|
This implementation speed up memset in several ways. First is avoiding
expensive computed jump. Second is using fact that arguments of memset
are most of time aligned to 8 bytes.
Benchmark results on:
kam.mff.cuni.cz/~ondra/benchmark_string/memset_profile_result27_04_13.tar.bz2
|
|
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.
Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
|
|
|