Age | Commit message (Collapse) | Author |
|
The optimization is achieved by following techniques:
> data alignment [gain from aligned memory access on read/write]
> POWER7 gains performance with loop unrolling/unwinding
[gain by reduction of branch penalty].
> zero padding done by calling optimized memset
|
|
This patch changes de default symbol redirection for internal call of
memcpy, memset, memchr, and strlen to the IFUNC resolved ones. The
performance improvement is noticeable in algorithms that uses these
symbols extensible, like the regex functions.
|
|
This patch fixes some powerpc32 and powerpc64 builds with
--disable-multi-arch option along with different --with-cpu=powerN.
It cleanups the Implies directories by removing the multiarch
folder for non multiarch config and also fixing two assembly
implementations: powerpc64/power7/strncat.S that is calling the
wrong strlen; and power8/fpu/s_isnan.S that misses the hidden_def and
weak_alias directives.
|
|
Typo fix.
* sysdeps/powerpc/powerpc64/power7/memrchr.S: Correct stream hint.
|
|
https://sourceware.org/ml/binutils/2014-03/msg00033.html removes the
"magic" treatment of symbols defined in a .toc section.
* sysdeps/powerpc/powerpc64/start.S: Add @toc to toc symbol reference.
|
|
[BZ #16786]
* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Don't trash stack.
|
|
This patch fixes the MFVSRD_R3_V1 macro that encodes 'mfvsrd r3,vs1'
(to support old binutils) for little endian.
|
|
This patch add an optimized strpbrk for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance on
the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and memory
clear using VSX instructions.
|
|
This patch add a optimized strcspn for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance
on the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and align
stack memory to table to 16 bytes (so VSX clean can ran without
alignment issues).
|
|
The roundl assembly implementation
(sysdeps/powerpc/powerpc64/fpu/s_roundl.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d401698a58cf7da150d9cce769fa6679ba5f that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_roundl.c instead fixes the failing math.
This fixes 16707.
|
|
The nearbyintl assembly implementation
(sysdeps/powerpc/powerpc64/fpu/s_nearbyintl.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d401698a58cf7da150d9cce769fa6679ba5f that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c instead fixes the failing
math.
Fixes BZ#16706.
|
|
The ceill assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_ceill.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d401698a58cf7da150d9cce769fa6679ba5f that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_ceill.c instead fixes the failing math.
Fixes BZ#16701.
|
|
This patch makes the strspn ifunc selector build for static builds.
|
|
This patch fixes an issue for powerpc64[le] static build where __bzero
is definied in multiple places (memset-ppc64.o and bzero.o). It is now
defined only in bzero.o and memset-ppc64.o only defined __bzero_ppc for
both dynamic and static library.
Fixes BZ#16683.
|
|
The optimization is achieved by following techniques:
> hashing of needle.
> hashing avoids scanning of duplicate entries in needle across the string.
> initializing the hash table with Vector instructions (VSX) by quadword access.
> unrolling when scanning for character in string across hash table.
|
|
The optimization is achieved by following techniques:
1. Doubleword aligned memory access and compares using
cmpb instruction.
2. Loop unrolling for byte load/store.
3. CPU pre-fetch to avoid cache miss.
|
|
This patch optimizes strrchr() for ppc64. It uses aligned memory
access along with cmpb instruction and CPU prefetch to avoid
cache misses for speed improvement.
|
|
This patch add a optimized llround/llroundf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
|
|
This patch add a optimized llrint/llrintf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
|
|
This patch add a optimized finite/finitef implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
|
|
This patch add a optimized isinf/isinff implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
|
|
This patch add a optimized isnan/isnanf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
|
|
|
|
The truncl assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_truncl.S)
returns wrong results for some inputs where first double is a exact integer
and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d401698a58cf7da150d9cce769fa6679ba5f that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_truncl.c instead it fixes tgammal
issues regarding wrong result sign.
|
|
This patch fixes some compile warnings related to extra tokens at
end of #undef directive from multilib patchset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This patch adds Implies files on multiarch folder for POWER chips so
multirach is enabled when building with --with-cpu and powerN
option.
|
|
For PPC64, all the wrappers at sysdeps are superfluous: they are
basically the same implementation from math/w_sqrt.c with the
'#ifdef _IEEE_LIBM'. And the power4 version just force the 'fsqrt'
instruction utilization with an inline assembly, which is already
handled by math_private.h __ieee754_sqrt implementation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|