aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/powerpc/powerpc64/power7
AgeCommit message (Collapse)Author
2014-05-06PowerPC: strncpy/stpncpy optimization for PPC64/POWER7Vidya Ranganathan
The optimization is achieved by following techniques: > data alignment [gain from aligned memory access on read/write] > POWER7 gains performance with loop unrolling/unwinding [gain by reduction of branch penalty]. > zero padding done by calling optimized memset
2014-04-09PowerPC: Fix --disable-multi-arch buildsAdhemerval Zanella
This patch fixes some powerpc32 and powerpc64 builds with --disable-multi-arch option along with different --with-cpu=powerN. It cleanups the Implies directories by removing the multiarch folder for non multiarch config and also fixing two assembly implementations: powerpc64/power7/strncat.S that is calling the wrong strlen; and power8/fpu/s_isnan.S that misses the hidden_def and weak_alias directives.
2014-04-02Correct prefetch hint in power7 memrchr.Alan Modra
Typo fix. * sysdeps/powerpc/powerpc64/power7/memrchr.S: Correct stream hint.
2014-03-20PowerPC: optimized strpbrk for POWER7Adhemerval Zanella
This patch add an optimized strpbrk for POWER7 by using a different algorithm than default implementation: it constructs a table based on the 'accept' argument and use this table to check for any occurance on the input string. The idea is similar as x86_64 uses. For PowerPC some tunings were added, such as unroll loops and memory clear using VSX instructions.
2014-03-20PowerPC: optimized strcspn for PPC64/POWER7Adhemerval Zanella
This patch add a optimized strcspn for POWER7 by using a different algorithm than default implementation: it constructs a table based on the 'accept' argument and use this table to check for any occurance on the input string. The idea is similar as x86_64 uses. For PowerPC some tunings were added, such as unroll loops and align stack memory to table to 16 bytes (so VSX clean can ran without alignment issues).
2014-03-11PowerPC: strspn optimization for PPC64/POWER7Vidya Ranganathan
The optimization is achieved by following techniques: > hashing of needle. > hashing avoids scanning of duplicate entries in needle across the string. > initializing the hash table with Vector instructions (VSX) by quadword access. > unrolling when scanning for character in string across hash table.
2014-03-10PowerPC: strncat optimization for PPC64Adhemerval Zanella
The optimization is achieved by following techniques: 1. Doubleword aligned memory access and compares using cmpb instruction. 2. Loop unrolling for byte load/store. 3. CPU pre-fetch to avoid cache miss.
2014-03-03PowerPC: strrchr optimization for POWER7/PPC64Rajalakshmi Srinivasaraghavan
This patch optimizes strrchr() for ppc64. It uses aligned memory access along with cmpb instruction and CPU prefetch to avoid cache misses for speed improvement.
2014-01-01Update copyright notices with scripts/update-copyrightsAllan McRae
2013-12-19Fix uses of CALL_MCOUNT in ppc64 assembler sourcesAndreas Schwab
2013-12-13PowerPC: Adjust multiarch Implies for PowerPC64Adhemerval Zanella
This patch adds Implies files on multiarch folder for POWER chips so multirach is enabled when building with --with-cpu and powerN option.
2013-12-13PowerPC: multiarch memset/bzero for PowerPC64Adhemerval Zanella
2013-12-13PowerPC: Adjust multiarch Implies for PowerPC64Adhemerval Zanella
This patch adds Implies files on multiarch folder for POWER chips so multirach is enabled when building with --with-cpu and powerN option.
2013-12-06PowerPC: Optimized mpn functions for PowerPC64/POWER7Adhemerval Zanella
This patch add optimized __mpn_add_n/__mpn_sub_n for PowerPC64/POWER7. They are originally from GMP with adjustments for GLIBC.
2013-12-06PowerPC: multiarch logb/logbf/logbl for PowerPC32Adhemerval Zanella
2013-10-25PowerPC: strcpy/stpcpy optimization for PPC64/POWER7Adhemerval Zanella
This patch intends to unify both strcpy and stpcpy implementationsi for PPC64 and PPC64/POWER7. The idead default powerpc64 implementation is to provide both doubleword and word aligned memory access. For PPC64/POWER7 is also provide doubleword and word memory access, remove the branch hints, use the cmpb instruction for compare doubleword/words, and add an optimization for inputs of same alignment.
2013-10-04PowerPC LE memchr and memrchrAlan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00105.html Like strnlen, memchr and memrchr had a number of defects fixed by this patch as well as adding little-endian support. The first one I noticed was that the entry to the main loop needlessly checked for "are we done yet?" when we know the size is large enough that we can't be done. The second defect I noticed was that the main loop count was wrong, which in turn meant that the small loop needed to handle an extra word. Thirdly, there is nothing to say that the string can't wrap around zero, except of course that we'd normally hit a segfault on trying to read from address zero. Fixing that simplified a number of places: - /* Are we done already? */ - addi r9,r8,8 - cmpld r9,r7 - bge L(null) becomes + cmpld r8,r7 + beqlr However, the exit gets an extra test because I test for being on the last word then if so whether the byte offset is less than the end. Overall, the change is a win. Lastly, memrchr used the wrong cache hint. * sysdeps/powerpc/powerpc64/power7/memchr.S: Replace rlwimi with insrdi. Make better use of reg selection to speed exit slightly. Schedule entry path a little better. Remove useless "are we done" checks on entry to main loop. Handle wrapping around zero address. Correct main loop count. Handle single left-over word from main loop inline rather than by using loop_small. Remove extra word case in loop_small caused by wrong loop count. Add little-endian support. * sysdeps/powerpc/powerpc32/power7/memchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. Use proper cache hint. * sysdeps/powerpc/powerpc32/power7/memrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Add little-endian support. Avoid rlwimi. * sysdeps/powerpc/powerpc32/power7/rawmemchr.S: Likewise.
2013-10-04PowerPC LE memsetAlan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00104.html One of the things I noticed when looking at power7 timing is that rlwimi is cracked and the two resulting insns have a register dependency. That makes it a little slower than the equivalent rldimi. * sysdeps/powerpc/powerpc64/memset.S: Replace rlwimi with insrdi. Formatting. * sysdeps/powerpc/powerpc64/power4/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memset.S: Likewise. * sysdeps/powerpc/powerpc32/power4/memset.S: Likewise. * sysdeps/powerpc/powerpc32/power6/memset.S: Likewise. * sysdeps/powerpc/powerpc32/power7/memset.S: Likewise.
2013-10-04PowerPC LE memcpyAlan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00103.html LIttle-endian support for memcpy. I spent some time cleaning up the 64-bit power7 memcpy, in order to avoid the extra alignment traps power7 takes for little-endian. It probably would have been better to copy the linux kernel version of memcpy. * sysdeps/powerpc/powerpc32/power4/memcpy.S: Add little endian support. * sysdeps/powerpc/powerpc32/power6/memcpy.S: Likewise. * sysdeps/powerpc/powerpc32/power7/memcpy.S: Likewise. * sysdeps/powerpc/powerpc32/power7/mempcpy.S: Likewise. * sysdeps/powerpc/powerpc64/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. Make better use of regs. Use power7 mtocrf. Tidy function tails.
2013-10-04PowerPC LE memcmpAlan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00102.html This is a rather large patch due to formatting and renaming. The formatting changes were to make it possible to compare power7 and power4 versions of memcmp. Using different register defines came about while I was wrestling with the code, trying to find spare registers at one stage. I found it much simpler if we refer to a reg by the same name throughout a function, so it's better if short-term multiple use regs like rTMP are referred to using their register number. I made the cr field usage changes when attempting to reload rWORDn regs in the exit path to byte swap before comparing when little-endian. That proved a bad idea due to the pipelining involved in the main loop; Offsets to reload the regs were different first time around the loop.. Anyway, I left the cr field usage changes in place for consistency. Aside from these more-or-less cosmetic changes, I fixed a number of places where an early exit path restores regs unnecessarily, removed some dead code, and optimised one or two exits. * sysdeps/powerpc/powerpc64/power7/memcmp.S: Add little-endian support. Formatting. Consistently use rXXX register defines or rN defines. Use early exit labels that avoid restoring unused non-volatile regs. Make cr field use more consistent with rWORDn compares. Rename regs used as shift registers for unaligned loop, using rN defines for short lifetime/multiple use regs. * sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise. * sysdeps/powerpc/powerpc32/power7/memcmp.S: Likewise. Exit with addi 1,1,64 to pop stack frame. Simplify return value code. * sysdeps/powerpc/powerpc32/power4/memcmp.S: Likewise.
2013-10-04PowerPC LE strchrAlan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html Adds little-endian support to optimised strchr assembly. I've also tweaked the big-endian code a little. In power7/strchr.S there's a check in the tail of the function that we didn't match 0 before finding a c match, done by comparing leading zero counts. It's just as valid, and quicker, to compare the raw output from cmpb. Another little tweak is to use rldimi/insrdi in place of rlwimi for the power7 strchr functions. Since rlwimi is cracked, it is a few cycles slower. rldimi can be used on the 32-bit power7 functions too. * sysdeps/powerpc/powerpc64/power7/strchr.S (strchr): Add little-endian support. Correct typos, formatting. Optimize tail. Use insrdi rather than rlwimi. * sysdeps/powerpc/powerpc32/power7/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strchrnul.S (__strchrnul): Add little-endian support. Correct typos. * sysdeps/powerpc/powerpc32/power7/strchrnul.S: Likewise. Use insrdi rather than rlwimi. * sysdeps/powerpc/powerpc64/strchr.S (rTMP4, rTMP5): Define. Use in loop and entry code to keep "and." results. (strchr): Add little-endian support. Comment. Move cntlzd earlier in tail. * sysdeps/powerpc/powerpc32/strchr.S: Likewise.
2013-10-04PowerPC LE strcmp and strncmpAlan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00099.html More little-endian support. I leave the main strcmp loops unchanged, (well, except for renumbering rTMP to something other than r0 since it's needed in an addi insn) and modify the tail for little-endian. I noticed some of the big-endian tail code was a little untidy so have cleaned that up too. * sysdeps/powerpc/powerpc64/strcmp.S (rTMP2): Define as r0. (rTMP): Define as r11. (strcmp): Add little-endian support. Optimise tail. * sysdeps/powerpc/powerpc32/strcmp.S: Similarly. * sysdeps/powerpc/powerpc64/strncmp.S: Likewise. * sysdeps/powerpc/powerpc32/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power4/strncmp.S: Likewise. * sysdeps/powerpc/powerpc32/power4/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise. * sysdeps/powerpc/powerpc32/power7/strncmp.S: Likewise.
2013-10-04PowerPC LE strnlenAlan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00098.html The existing strnlen code has a number of defects, so this patch is more than just adding little-endian support. The changes here are similar to those for memchr. * sysdeps/powerpc/powerpc64/power7/strnlen.S (strnlen): Add little-endian support. Remove unnecessary "are we done" tests. Handle "s" wrapping around zero and extremely large "size". Correct main loop count. Handle single left-over word from main loop inline rather than by using small_loop. Correct comments. Delete "zero" tail, use "end_max" instead. * sysdeps/powerpc/powerpc32/power7/strnlen.S: Likewise.
2013-10-04PowerPC LE strlenAlan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00097.html This is the first of nine patches adding little-endian support to the existing optimised string and memory functions. I did spend some time with a power7 simulator looking at cycle by cycle behaviour for memchr, but most of these patches have not been run on cpu simulators to check that we are going as fast as possible. I'm sure PowerPC can do better. However, the little-endian support mostly leaves main loops unchanged, so I'm banking on previous authors having done a good job on big-endian.. As with most code you stare at long enough, I found some improvements for big-endian too. Little-endian support for strlen. Like most of the string functions, I leave the main word or multiple-word loops substantially unchanged, just needing to modify the tail. Removing the branch in the power7 functions is just a tidy. .align produces a branch anyway. Modifying regs in the non-power7 functions is to suit the new little-endian tail. * sysdeps/powerpc/powerpc64/power7/strlen.S (strlen): Add little-endian support. Don't branch over align. * sysdeps/powerpc/powerpc32/power7/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/strlen.S (strlen): Add little-endian support. Rearrange tmp reg use to suit. Comment. * sysdeps/powerpc/powerpc32/strlen.S: Likewise.
2013-10-04PowerPC floating point little-endian [12 of 15]Alan Modra
http://sourceware.org/ml/libc-alpha/2013-08/msg00087.html Fixes for little-endian in 32-bit assembly. * sysdeps/powerpc/sysdep.h (LOWORD, HIWORD, HISHORT): Define. * sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Load little-endian words of double from correct stack offsets. * sysdeps/powerpc/powerpc32/fpu/s_copysignl.S: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_lround.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_lround.S: Likewise. * sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Use HISHORT. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise.
2013-09-05PowerPC: fix POWER7 memrchr for some large inputsAdhemerval Zanella
2013-03-06Remove powerpc64 bounded-pointers code.Joseph Myers
2013-01-07Fix spelling errors in sysdeps/powerpc files.Anton Blanchard
2013-01-02Update copyright notices with scripts/update-copyrights.Joseph Myers
2012-08-21[Powerpc] Tune/optimize powerpc{32,64}/power7/memchr.S.Will Schmidt
Assorted tweaking, twisting and tuning to squeeze a few additional cycles out of the memchr code. Changes include bypassing the shift pairs (sld,srd) when they are not required, and unrolling the small_loop that handles short and trailing strings. Per scrollpipe data measuring aligned strings for 64-bit, these changes save between five and eight cycles (9-13% overall) for short strings (<32), Longer aligned strings see slight improvement of 1-3% due to bypassing the shifts and the instruction rearranging.
2012-05-22PowerPC: libm ABI updateAdhemerval Zanella
Update for libm abilist for POWER6 and POWER7.
2012-05-15PowerPC - logb[f|l] optimization for POWER7Adhemerval Zanella
This patch provides optimized logb (1.2x on PPC32 and 2.5x on PPC64), logbf (1.1x on PPC32 and 2.2x on PPC64), and logbl (1.3x on PPC32 and 50% on PPC64) for the POWER7 processor.
2012-02-09Replace FSF snail mail address with URLs.Paul Eggert
2011-12-17Optimized strcasecmp for Power7Adhemerval Zanella
2011-09-07power7 strncmp optimizationWill Schmidt
2011-09-07power7 memcpy VSX optimizationsWill Schmidt
2011-04-22Remove doubled words.Jim Meyering
2011-04-17Fix POWER4/POWER7 optimized strncmp to not read past differing bytesAndreas Schwab
2011-02-17Disable VSX usage in rtld.c to prevent TOC ref before relocs are resolved.Ryan S. Arnold
2010-11-05power7-optimized mempcpyLuis Machado
2010-08-19powerpc: Various P7-optimized string functionsLuis Machado
2010-06-30powerpc: Re-work the Implies structureLuis Machado
This patch tries to organize the implies files for ppc, since there are a number of processors and most of them are compatible with each other (backwards compatible). Having in mind that we start the search for processor-specific files in the sysdeps/unix/sysv/linux tree (sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor]/fpu to be exact), we would like to grab any linux-specific code from that tree prior to going through the other tree (sysdeps/powerpc/...). For that, i removed the Implies files that were originally inside the fpu directories and placed then in the non-fpu directories (still inside the unix/sysv/linux tree). If no processor-specific/linux-specific files could be found, we "imply" the other tree's (sysdeps/powerpc/...) fpu directory for that specific processor AND also the non-fpu directory for that same tree. If, again, no processor-specific code is found, we read another Implies file that will point to the most compatible processor that we should grab code from, and so on, until we reach the power4 processor. So, in summary, the Implies files will live inside these directories now: * sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor] * sysdeps/powerpc/powerpc[32|64]/[processor] Practical example of the order we will use to pick power6-specific code with the new structure. sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/power6/fpu -> sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/power6 -> sysdeps/powerpc/powerpc[32|64]/power6/fpu -> sysdeps/powerpc/powerpc[32|64]/power6 -> sysdeps/powerpc/powerpc[32|64]/power5+/fpu -> sysdeps/powerpc/powerpc[32|64]/power5+ -> sysdeps/powerpc/powerpc[32|64]/power5/fpu -> sysdeps/powerpc/powerpc[32|64]/power5 -> sysdeps/powerpc/powerpc[32|64]/power4/fpu -> sysdeps/powerpc/powerpc[32|64]/power4 (from here, it'll go to the generic path as usual)
2010-06-14More whitespace fixes.Ulrich Drepper
2010-06-14Fix whitespaces.Ulrich Drepper
2010-06-14power7 string compare optimizationsLuis Machado
2010-05-20Add missing files.Luis Machado
2010-03-10Fix whitespace issues.Ulrich Drepper
2010-03-10power7-optimized 64-bit and 32-bit memcpyLuis Machado
2010-02-10Fix POWER7 ImpliesLuis Machado
2010-02-09Fix whitespace issues.Ulrich Drepper