aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/powerpc/powerpc64/multiarch/Makefile
AgeCommit message (Collapse)Author
2019-04-04powerpc: Use generic wcsrchr optimizationAdhemerval Zanella
This patch removes the power6 wcsrchr optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcsrchr.c): New rule. * sysdeps/powerpc/power6/wcsrchr.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcsrchr-power6 and wcsrchr-power7. (CFLAGS-wcsrchr-power7.c, CFLAGS-wcsrchr-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcsrchr optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
2019-04-04powerpc: Use generic wcschr optimizationAdhemerval Zanella
This patch removes the power6 wcschr optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcschr.c): New rule. * sysdeps/powerpc/power6/wcschr.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcschr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcschr-power6 and wcschr-power7. (CFLAGS-wcschr-power7.c, CFLAGS-wcschr-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcschr optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
2019-04-04powerpc: Use generic wcscpy optimizationAdhemerval Zanella
This patch removes the power6 wcscpy optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcscpy.c): New rule. * sysdeps/powerpc/power6/wcscpy.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcscpy-power6 and wcscpy-power7. (CFLAGS-wcscpy-power7.c, CFLAGS-wcscpy-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcscpy optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
2018-08-16powerpc: Rearrange little endian specific filesRajalakshmi Srinivasaraghavan
This patch moves little endian specific POWER9 optimization files to sysdeps/powerpc/powerpc64/le and creates POWER9 ifunc functions only for little endian.
2017-12-11powerpc: POWER8 memcpy optimization for cached memoryAdhemerval Zanella
On POWER8, unaligned memory accesses to cached memory has little impact on performance as opposed to its ancestors. It is disabled by default and will only be available when the tunable glibc.tune.cached_memopt is set to 1. __memcpy_power8_cached __memcpy_power7 ============================================================ max-size=4096: 33325.70 ( 12.65%) 38153.00 max-size=8192: 32878.20 ( 11.17%) 37012.30 max-size=16384: 33782.20 ( 11.61%) 38219.20 max-size=32768: 33296.20 ( 11.30%) 37538.30 max-size=65536: 33765.60 ( 10.53%) 37738.40 * manual/tunables.texi (Hardware Capability Tunables): Document glibc.tune.cached_memopt. * sysdeps/powerpc/cpu-features.c: New file. * sysdeps/powerpc/cpu-features.h: New file. * sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add _dl_powerpc_cpu_features. * sysdeps/powerpc/dl-tunables.list: New file. * sysdeps/powerpc/ldsodefs.h: Include cpu-features.h. * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h (INIT_ARCH): Initialize use_aligned_memopt. * sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED && IS_IN(rtld))]: Restrict dl_platform_init availability and initialize CPU features used by tunables. * sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): Add memcpy-power8-cached. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add __memcpy_power8_cached. * sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S: New file. Reviewed-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
2017-10-02powerpc: Optimize memrchr for power8Rajalakshmi Srinivasaraghavan
Vectorized loops are used for sizes greater than 32B to improve performance over power7 optimization. This shows as an average of 25% improvement depending on the position of search character. The performance is same for shorter strings.
2017-06-21powerpc: Optimize memchr for power8Rajalakshmi Srinivasaraghavan
Vectorized loops are used for sizes greater than 32B to improve performance over power7 optimiztion.
2017-05-18powerpc: Improve memcmp performance for POWER8Rajalakshmi Srinivasaraghavan
Vectorization improves performance over the current implementation. Tested on powerpc64 and powerpc64le.
2017-04-18powerpc64: strrchr optimization for power8Rajalakshmi Srinivasaraghavan
P7 code is used for <=32B strings and for > 32B vectorized loops are used. This shows as an average 25% improvement depending on the position of search character. The performance is same for shorter strings. Tested on ppc64 and ppc64le.
2017-04-13powerpc: Optimized strncat for POWER8Rajalakshmi Srinivasaraghavan
With new optimized strnlen for POWER8 [1], this patch adds strncat for power8 to make use of optimized strlen and strnlen. This is faster than POWER7 current implementation for larger strings. Tested on powerpc64 and powerpc64le. [1] https://sourceware.org/ml/libc-alpha/2017-03/msg00491.html * sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): Add strncat-power8. * sysdeps/powerpc/powerpc64/multiarch/strncat.c (strncat): Add __strncat_power8 to ifunc list. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c (strncat): Add __strncat_power8 to list of strncat functions. * sysdeps/powerpc/powerpc64/multiarch/strncat-power8.c: New file.
2017-04-05powerpc64: Add POWER8 strnlenWainer dos Santos Moschetta
Added strnlen POWER8 otimized for long strings. It delivers same performance as POWER7 implementation for short strings. This takes advantage of reasonably performing unaligned loads and bit permutes to check the first 1-16 bytes until quadword aligned, then checks in 64 bytes strides until unsafe, then 16 bytes, truncating the count if need be. Likewise, the POWER7 code is recycled for less than 32 bytes strings. Tested on ppc64 and ppc64le. * sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): Add strnlen-power8. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c (strnlen): Add __strnlen_power8 to list of strnlen functions. * sysdeps/powerpc/powerpc64/multiarch/strnlen-power8.S: New file. * sysdeps/powerpc/powerpc64/multiarch/strnlen.c (__strnlen): Add __strnlen_power8 to ifunc list. * sysdeps/powerpc/powerpc64/power8/strnlen.S: New file.
2016-12-28powerpc64: strchr/strchrnul optimization for power8Rajalakshmi Srinivasaraghavan
The P7 code is used for <=32B strings and for > 32B vectorized loops are used. This shows as an average 25% improvement depending on the position of search character. The performance is same for shorter strings. Tested on ppc64 and ppc64le.
2016-12-13powerpc: strncmp optimization for power9Rajalakshmi Srinivasaraghavan
Vectorized loops are used for strings > 32B when compared to power8 optimization. Tested on power9 ppc64le simulator.
2016-12-01powerpc: strcmp optimization for power9Rajalakshmi Srinivasaraghavan
Vectorized loops are used for strings > 32B when compared to power8 optimization. Tested on power9 ppc64le simulator.
2016-06-14powerpc: strcasecmp/strncasecmp optmization for power8raji
This implementation utilizes vectors to improve performance compared to current byte by byte implementation for POWER7. The performance improvement is upto 4x. This patch is tested on powerpc64 and powerpc64le.
2016-04-25powerpc: Add optimized strcspn for P8Paul E. Murphy
A few minor adjustments to the P8 strspn gives us an almost equally optimized P8 strcspn.
2016-04-22powerpc: strcasestr optmization for power8Rajalakshmi Srinivasaraghavan
This patch optimizes strcasestr function for power >= 8 systems. The average improvement of this optimization is ~40% and compares 16 bytes at a time using vector instructions. This patch is tested on powerpc64 and powerpc64le.
2016-04-15powerpc: Optimization for strlen for POWER8.Carlos Eduardo Seo
This implementation takes advantage of vectorization to improve performance of the loop over the current strlen implementation for POWER7.
2016-04-07powerpc: Add optimized P8 strspnPaul E. Murphy
This utilizes vectors and bitmasks. For small needle, large haystack, the performance improvement is upto 8x. For short strings (0-4B), the cost of computing the bitmask dominates, and is a tad slower.
2015-07-16powerpc: strstr optimizationRajalakshmi Srinivasaraghavan
This patch optimizes strstr function for power >= 7 systems. Performance gain is obtained using aligned memory access and usage of cmpb instruction for quicker comparison. The average improvement of this optimization is ~40%. Tested on ppc64 and ppc64le. 2015-07-16 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> * sysdeps/powerpc/powerpc64/multiarch/Makefile: Add strstr(). * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/powerpc/powerpc64/power7/strstr.S: New File. * sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: New File. * sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c: New File. * sysdeps/powerpc/powerpc64/multiarch/strstr.c: New File.
2015-02-09powerpc: wordcopy/memmove cleanup for ppc64Adhemerval Zanella
This patch cleanup some multiarch code related to memmmove optimization. Initial IFUNC support added specialized wordcopy symbols which turned in local IFUNC calls used by memmove default implementation. This change by removing then and used the optimized memmove instead for supported chips.
2015-02-09powerpc: Remove POWER7 wordcopy ifuncAdhemerval Zanella
This patch remove the POWER7 ifunc wordcopy function (_wordcopy_*_power7), since now GLIBC provides a optimized memmove/bcopy for POWER7.
2015-02-09powerpc: multiarch Makefile cleanup for powerpc64Adhemerval Zanella
This patch cleanups the multiarch Makefile by putting the wide chars implementation to correct wcsmbs rule.
2015-01-13powerpc: Optimized strncmp for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized POWER8 strncmp. The implementation focus on speeding up unaligned cases follwing the ideas of power8 strcmp. The algorithm first check the initial 16 bytes, then align the first function source and uses unaligned loads on second argument only. Aditional checks for page boundaries are done for unaligned cases (where sources alignment are different).
2015-01-13powerpc: Optimized strcmp for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized POWER8 strcmp using unaligned accesses. The algorithm first check the initial 16 bytes, then align the first function source and uses unaligned loads on second argument only. Aditional checks for page boundaries are done for unaligned cases
2015-01-13powerpc: Optimized st{r,p}ncpy for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized POWER8 st{r,p}ncpy using unaligned accesses. It shows 10%-80% improvement over the optimized POWER7 one that uses only aligned accesses, specially on unaligned inputs. The algorithm first read and check 16 bytes (if inputs do not cross a 4K page size). The it realign source to 16-bytes and issue a 16 bytes read and compare loop to speedup null byte checks for large strings. Also, different from POWER7 optimization, the null pad is done inline in the implementation using possible unaligned accesses, instead of realying on a memset call. Special case is added for page cross reads.
2015-01-13powerpc: Optimized strcat for POWER8/PPC64Adhemerval Zanella
With new optimized strcpy for POWER8, this patch adds an optimized strcat which uses it along with default implementation at strings/.
2015-01-13powerpc: Optimized st{r,p}cpy for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized POWER8 strcpy using unaligned accesses. For strings up to 16 bytes the implementation first calculate the string size, like strlen, and issues a memcpy. For larger strings, source is first aligned to 16 bytes and then tested over a loop that reads 16 bytes am combine the cmpb results for speedup. Special case is added for page cross reads. It shows 30%-60% improvement over the optimized POWER7 one that uses only aligned accesses.
2014-12-02powerpc: Add powerpc64 strpbrk optimizationAdhemerval Zanella
This patch makes the POWER7 optimized strpbrk generic by using default doubleword stores to zero the hash, instead of VSX instructions. Performance on POWER7/POWER8 does not change.
2014-12-02powerpc: Add powerpc64 strcspn optimizationAdhemerval Zanella
This patch makes the POWER7 optimized strcspn generic by using default doubleword stores to zero the hash, instead of VSX instructions. Performance on POWER7/POWER8 does not change.
2014-12-02powerpc: Add powerpc64 strspn optimizationAdhemerval Zanella
This patch makes the POWER7 optimized strspn generic by using default doubleword stores to zero the hash, instead of VSX instructions. Performance on POWER7/POWER8 machines does not changed.
2014-09-10PowerPC: memset optimization for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized memset implementation for POWER8. For sizes from 0 to 255 bytes, a word/doubleword algorithm similar to POWER7 optimized one is used. For size higher than 255 two strategies are used: 1. If the constant is different than 0, the memory is written with altivec vector instruction; 2. If constant is 0, dbcz instructions are used. The loop is unrolled to clear 512 byte at time. Using vector instructions increases throughput considerable, with a double performance for sizes larger than 1024. The dcbz loops unrolls also shows performance improvement, by doubling throughput for sizes larger than 8192 bytes.
2014-09-10PowerPC: multiarch bzero cleanup for PPC64Adhemerval Zanella
This patch cleanups the multiarch bzero for powerpc64 by remove the multiarch objects and use instead the the memset embedded implementation presented in each multiarch optimization. The code generate is essentially the same, but the TB_TOCLESS (which is not essential).
2014-07-07PowerPC: optimized memmove for POWER7/PPC64Adhemerval Zanella
This patch adds an optimized memmove optimization for POWER7/powerpc64. Basically the idea is to use the memcpy for POWER7 on non-overlapped memory regions and a optimized backward memcpy for memory regions that overlap (similar to the idea of string/memmove.c). The backward memcpy algorithm used is similar the one use for memcpy for POWER7, with adjustments done for alignment. The difference is memory is always aligned to 16 bytes before using VSX/altivec instructions.
2014-07-02PowerPC: strcat optimization for PPC64/POWER7Vidya Ranganathan
This patch adds an ifunc power7 strcat symbol that uses the logic on sysdeps/powerpc/strcat.c but call power7 strlen/strcpy symbols instead of default ones.
2014-06-11PowerPC: Optimized strcmp for PPC64/POWER7Vidya Ranganathan
Optimization is achieved on 8 byte aligned strings with double word comparison using cmpb instruction. On unaligned strings loop unrolling is applied for Power7 gain.
2014-05-06PowerPC: strncpy/stpncpy optimization for PPC64/POWER7Vidya Ranganathan
The optimization is achieved by following techniques: > data alignment [gain from aligned memory access on read/write] > POWER7 gains performance with loop unrolling/unwinding [gain by reduction of branch penalty]. > zero padding done by calling optimized memset
2014-03-20PowerPC: optimized strpbrk for POWER7Adhemerval Zanella
This patch add an optimized strpbrk for POWER7 by using a different algorithm than default implementation: it constructs a table based on the 'accept' argument and use this table to check for any occurance on the input string. The idea is similar as x86_64 uses. For PowerPC some tunings were added, such as unroll loops and memory clear using VSX instructions.
2014-03-20PowerPC: optimized strcspn for PPC64/POWER7Adhemerval Zanella
This patch add a optimized strcspn for POWER7 by using a different algorithm than default implementation: it constructs a table based on the 'accept' argument and use this table to check for any occurance on the input string. The idea is similar as x86_64 uses. For PowerPC some tunings were added, such as unroll loops and align stack memory to table to 16 bytes (so VSX clean can ran without alignment issues).
2014-03-11PowerPC: strspn optimization for PPC64/POWER7Vidya Ranganathan
The optimization is achieved by following techniques: > hashing of needle. > hashing avoids scanning of duplicate entries in needle across the string. > initializing the hash table with Vector instructions (VSX) by quadword access. > unrolling when scanning for character in string across hash table.
2014-03-10PowerPC: strncat optimization for PPC64Adhemerval Zanella
The optimization is achieved by following techniques: 1. Doubleword aligned memory access and compares using cmpb instruction. 2. Loop unrolling for byte load/store. 3. CPU pre-fetch to avoid cache miss.
2014-03-03PowerPC: strrchr optimization for POWER7/PPC64Rajalakshmi Srinivasaraghavan
This patch optimizes strrchr() for ppc64. It uses aligned memory access along with cmpb instruction and CPU prefetch to avoid cache misses for speed improvement.
2013-12-13PowerPC: multiarch stpcpy for PowerPC64Adhemerval Zanella
2013-12-13PowerPC: multiarch strcpy for PowerPC64Adhemerval Zanella
2013-12-13PowerPC: multiarch wordcopy for PowerPC64Adhemerval Zanella
2013-12-13PowerPC: multiarch wcscpy for PowerPC64.Adhemerval Zanella
2013-12-13PowerPC: multiarch wcsrchr for PowerPC64Adhemerval Zanella
2013-12-13PowerPC: multiarch wcschr for PowerPC64Adhemerval Zanella
2013-12-13PowerPC: multiarch strchrnul for PowerPC64Adhemerval Zanella
2013-12-13PowerPC: multiarch strchr for PowerPC64Adhemerval Zanella