aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-25Merge release/2.20/master into ibm/2.20/masteribm/2.20/masterGabriel F. T. Gomes
Conflicts: NEWS
2016-05-24CVE-2016-3075: Stack overflow in _nss_dns_getnetbyname_r [BZ #19879]release/2.20/masterFlorian Weimer
The defensive copy is not needed because the name may not alias the output buffer. (cherry picked from commit 317b199b4aff8cfa27f2302ab404d2bb5032b9a4) (cherry picked from commit f5b3338d70a7a2c626331ac4589b6deb2f610432)
2016-05-24Fix BZ #17905Paul Pluzhnikov
(cherry picked from commit 0f58539030e436449f79189b6edab17d7479796e)
2016-05-24hsearch_r: Apply VM size limit in test caseFlorian Weimer
(cherry picked from commit f34f146e682d8d529dcf64b3c2781bf3f2f05f6c)
2016-05-24Improve check against integer wraparound in hcreate_r [BZ #18240]Florian Weimer
(cherry picked from commit bae7c7c764413b23e61cb099ce33be4c4ee259bb)
2016-05-24Handle overflow in __hcreate_rOndřej Bílka
Hi, As in bugzilla entry there is overflow in hsearch when looking for prime number as SIZE_MAX - 1 is divisible by 5. We fix that by rejecting large inputs before looking for prime. * misc/hsearch_r.c (__hcreate_r): Handle overflow. (cherry picked from commit 2f5c1750558fe64bac361f52d6827ab1bcfe52bc)
2016-05-24CVE-2016-1234: glob: Do not copy d_name field of struct dirent [BZ #19779]Florian Weimer
Instead, we store the data we need from the return value of readdir in an object of the new type struct readdir_result. This type is independent of the layout of struct dirent. (cherry picked from commit 5171f3079f2cc53e0548fc4967361f4d1ce9d7ea)
2016-05-24glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdirFlorian Weimer
Previously, application code had to set up the d_namlen member if the target supported it, involving conditional compilation. After this change, glob will use the length of the string in d_name instead of d_namlen to determine the file name length. All glibc targets provide the d_type and d_ino members, and setting them as needed for gl_readdir is straightforward. Changing the behavior with regards to d_ino is left to a future cleanup. (cherry picked from commit 137fe72eca6923a00381a3ca9f0e7672c1f85e3f)
2016-05-24CVE-2016-3706: getaddrinfo: stack overflow in hostent conversion [BZ #20010]Florian Weimer
When converting a struct hostent response to struct gaih_addrtuple, the gethosts macro (which is called from gaih_inet) used alloca, without malloc fallback for large responses. This commit changes this code to use calloc unconditionally. This commit also consolidated a second hostent-to-gaih_addrtuple conversion loop (in gaih_inet) to use the new conversion function. (cherry picked from commit 4ab2ab03d4351914ee53248dc5aef4a8c88ff8b9)
2016-04-28S390: Fix "backtrace() returns infinitely deep stack frames with ↵Stefan Liebler
makecontext()" [BZ #18508]. On s390/s390x backtrace(buffer, size) returns the series of called functions until "makecontext_ret" and additional entries (up to "size") with "makecontext_ret". GDB-backtrace is also warning: "Backtrace stopped: previous frame identical to this frame (corrupt stack?)" To reproduce this scenario you have to setup a new context with makecontext() and activate it with setcontext(). See e.g. cf() function in testcase stdlib/tst-makecontext.c. Or see bug in libgo "Bug 66303 - runtime.Caller() returns infinitely deep stack frames on s390x " (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66303). This patch omits the cfi_startproc/cfi_endproc directives in ENTRY/END macro of __makecontext_ret. Thus no frame information is generated in .eh_frame and backtrace stops after __makecontext_ret. There is also no .eh_frame info for _start or thread_start functions. ChangeLog: [BZ #18508] * stdlib/Makefile ($(objpfx)tst-makecontext3): Depend on $(libdl). * stdlib/tst-makecontext.c (cf): Test if _Unwind_Backtrace is not called infinitely times. (backtrace_helper): New function. (trace_arg): New struct. (st1): Enlarge stack size. * sysdeps/unix/sysv/linux/s390/s390-32/__makecontext_ret.S: (__makecontext_ret): Omit cfi_startproc and cfi_endproc. * sysdeps/unix/sysv/linux/s390/s390-64/__makecontext_ret.S: Likewise. (cherry picked from commit 890b7a4b33d482b5c768ab47d70758b80227e9bc)
2016-04-28S/390: Fix setcontext/swapcontext which are not restoring sigmask.Stefan Liebler
This patch uses sigprocmask(SIG_SETMASK) instead of SIG_BLOCK in setcontext, swapcontext. (cherry picked from commit 2e807f29595eb5b1e5d0decc6e356a3562ecc58e)
2016-04-09configure: fix `test ==` usageMike Frysinger
POSIX defines the = operator, but not ==. Fix the few places where we incorrectly used ==. (cherry picked from commit b2d4456b333970ab4cb01ed8045b9a8d2c4832f3)
2016-04-04S390: Extend structs La_s390_regs / La_s390_retval with vector-registers.Stefan Liebler
Starting with z13, vector registers can also occur as argument registers. Thus the passed input/output register structs for la_s390_[32|64]_gnu_plt[enter|exit] functions should reflect those new registers. This patch extends these structs La_s390_regs and La_s390_retval and adjusts _dl_runtime_profile() to handle those fields in case of running on a z13 machine. ChangeLog: * sysdeps/s390/bits/link.h: (La_s390_vr) New typedef. (La_s390_32_regs): Append vector register lr_v24-lr_v31. (La_s390_64_regs): Likewise. (La_s390_32_retval): Append vector register lrv_v24. (La_s390_64_retval): Likeweise. * sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile): Handle extended structs La_s390_32_regs and La_s390_32_retval. * sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile): Handle extended structs La_s390_64_regs and La_s390_64_retval. (cherry picked from commit 5cdd1989d1d2f135d02e66250f37ba8e767f9772)
2016-04-04S390: Save and restore fprs/vrs while resolving symbols.Stefan Liebler
On s390, no fpr/vrs were saved while resolving a symbol via _dl_runtime_resolve/_dl_runtime_profile. According to the abi, the fpr-arguments are defined as call clobbered. In leaf-functions, gcc 4.9 and newer can use fprs for saving/restoring gprs instead of saving them to the stack. If gcc do this in one of the resolver-functions, then the floating point arguments of a library-function are invalid for the first library-function-call. Thus, this patch saves/restores the fprs around the resolving code. The same could occur for vector registers. Furthermore an ifunc-resolver could also clobber the vector/floating point argument registers. Thus this patch provides the further variants _dl_runtime_resolve_vx/ _dl_runtime_profile_vx, which are used if the kernel claims, that we run on a machine with vector registers. Furthermore, if _dl_runtime_profile calls _dl_call_pltexit, the pointers to inregs-/outregs-structs were setup invalid. Now they point to the correct location in the stack-frame. Before branching back to the caller, the return values are now restored instead of containing the return values of the _dl_call_pltexit() call. On s390-32, an endless loop occurs if _dl_call_pltexit() should be called. Now, this code-path branches to this function instead of just after the preceding basr-instruction. ChangeLog: * sysdeps/s390/s390-32/dl-trampoline.S: Include dl-trampoline.h twice to create a non-vector/vector version for _dl_runtime_resolve and _dl_runtime_profile. Move implementation to ... * sysdeps/s390/s390-32/dl-trampoline.h: ... here. (_dl_runtime_resolve) Save and restore fpr/vrs. (_dl_runtime_profile) Save and restore vrs and fix some issues if _dl_call_pltexit is called. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_runtime_setup): Choose the correct resolver function if running on a machine with vx. * sysdeps/s390/s390-64/dl-trampoline.S: Include dl-trampoline.h twice to create a non-vector/vector version for _dl_runtime_resolve and _dl_runtime_profile. Move implementation to ... * sysdeps/s390/s390-64/dl-trampoline.h: ... here. (_dl_runtime_resolve) Save and restore fpr/vrs. (_dl_runtime_profile) Save and restore vrs and fix some issues * sysdeps/s390/s390-64/dl-machine.h: (elf_machine_runtime_setup): Choose the correct resolver function if running on a machine with vx. (cherry picked from commit 4603c51ef7989d7eb800cdd6f42aab206f891077 and commit d8a012c5c9e4bfc1b8db2bc6deacb85b44a2e1eb)
2016-04-04S390: configure check for vector instruction support in assembler.Stefan Liebler
The S390 specific test checks if the assembler has support for the new z13 vector instructions by compiling a vector instruction. The .machine and .machinemode directives are needed to compile the vector instruction without -march=z13 option on 31/64 bit. On success the macro HAVE_S390_VX_ASM_SUPPORT is defined. This macro is used to determine if the optimized functions can be build without compile errors. If the used assembler lacks vector support, then a warning is dumped while configuring and only the common code functions are build. The z13 instruction support was introduced in "[Committed] S/390: Add support for IBM z13." (https://sourceware.org/ml/binutils/2015-01/msg00197.html) ChangeLog: * config.h.in (HAVE_S390_VX_ASM_SUPPORT): New macro undefine. * sysdeps/s390/configure.ac: Add test for S390 vector instruction assembler support. * sysdeps/s390/configure: Regenerated. (cherry picked from commit 4f0a1cea34c05fb2acc16f1a2d291f53230eb4fb)
2016-04-04S390: Add new s390 platform.Stefan Liebler
The new IBM z13 is added to platform string array. The macro _DL_PLATFORMS_COUNT is incremented to 8, because it was not incremented by commit "S/390: Sync AUXV capabilities and archs with kernel". ChangeLog: * sysdeps/s390/dl-procinfo.c (_dl_s390_cap_flags): Add z13. * sysdeps/s390/dl-procinfo.h (_DL_PLATFORMS_COUNT): Increased. (cherry picked from commit a1b0488fc9df3d895a2e5eefbcd348d3f7fe0e52)
2016-04-04S390: Add hwcaps value for vector facility.Stefan Liebler
The HWCAP_S390_VX flag in hwcap field of auxiliary vector indicates if the vector facility is available and the kernel is aware of it. This can be tested with LD_SHOW_AUXV=1 <prog>. Currently it does not show te, because it was not incremented by commit "S/390: Add hwcap value for transactional execution.". Thus _DL_HWCAP_COUNT is incremented by two. ChangeLog: * sysdeps/s390/dl-procinfo.c (_dl_s390_platforms): Add vector flag. * sysdeps/s390/dl-procinfo.h: Add vector capability. * sysdeps/unix/sysv/linux/s390/bits/hwcap.h (HWCAP_S390_VX): Define. (cherry picked from commit 4e28fa80886c71e6aaf85016b82ce981c0f12e6d)
2016-03-03S390: Do not use direct socket syscalls if build on kernels >= 4.3. [BZ #19682]Stefan Liebler
Beginning with Linux 4.3, the kernel headers contain direct system call numbers __NR_socket etc. on s390x. On older kernels, the socket-multiplexer syscall __NR_socketcall was used. To enable these new syscalls, the patch "S390: Call direct system calls for socket operations." (https://sourceware.org/git/?p=glibc.git;a=commit;h=016495b818cb61df7d0d10e6db54074271b3e3a5) was applied upstream. If glibc 2.23 is configured with --enable-kernel=4.3 and newer, the direct socket syscalls are used. For older kernels, the socket-multiplexer syscall is used instead. In glibc 2.22 and earlier, this patch is not applied. If you build glibc on a kernel < 4.3, the socket-multiplexer syscall is used. But if you build glibc on kernel >= 4.3, the direct socket-syscalls are used. If you install this glibc on a kernel < 4.3, all socket operations will fail. See "Bug 19682 - s390x: Incorrect syscall definitions cause breakage with Linux 4.3 headers" (https://sourceware.org/bugzilla/show_bug.cgi?id=19682) The configure switch --enable-kernel does not influence this behaviour on older glibc-releases. The solution is to remove the direct socket-syscalls in sysdeps/unix/sysv/linux/s390/s390-64/syscalls.list (this patch) on older glibc-releases as it was done by the upstream patch, too. These entries were never used on s390x, but the c-files in sysdeps/unix/sysv/linux/. After this removal, the behaviour of the socket functions are not changed compared to the original glibc release version and the socket-multiplexer-syscall is always used.
2016-02-26Merge branch 'release/2.20/master' into ibm/2.20/masterTulio Magno Quites Machado Filho
2016-02-25CVE-2015-7547: getaddrinfo() stack-based buffer overflow (Bug 18665).Carlos O'Donell
* A stack-based buffer overflow was found in libresolv when invoked from libnss_dns, allowing specially crafted DNS responses to seize control of execution flow in the DNS client. The buffer overflow occurs in the functions send_dg (send datagram) and send_vc (send TCP) for the NSS module libnss_dns.so.2 when calling getaddrinfo with AF_UNSPEC family. The use of AF_UNSPEC triggers the low-level resolver code to send out two parallel queries for A and AAAA. A mismanagement of the buffers used for those queries could result in the response of a query writing beyond the alloca allocated buffer created by _nss_dns_gethostbyname4_r. Buffer management is simplified to remove the overflow. Thanks to the Google Security Team and Red Hat for reporting the security impact of this issue, and Robert Holiday of Ciena for reporting the related bug 18665. (CVE-2015-7547) See also: https://sourceware.org/ml/libc-alpha/2016-02/msg00416.html https://sourceware.org/ml/libc-alpha/2016-02/msg00418.html (cherry picked from commit 16d0a0ce7613552301786bf05d7eba8784b5732c) Conflicts: NEWS resolv/res_send.c
2016-02-25Fix read past end of pattern in fnmatch (bug 18032)Andreas Schwab
(cherry picked from commit 4a28f4d55a6cc33474c0792fe93b5942d81bf185) Conflicts: NEWS
2016-02-25Fix BZ #17269 -- _IO_wstr_overflow integer overflowPaul Pluzhnikov
(cherry picked from commit bdf1ff052a8e23d637f2c838fa5642d78fcedc33) Conflicts: NEWS
2016-02-25Harden tls_dtor_list with pointer mangling [BZ #19018]Florian Weimer
(cherry picked from commit f586e1328681b400078c995a0bb6ad301ef73549) Conflicts: NEWS stdlib/cxa_thread_atexit_impl.c
2016-02-25Always enable pointer guard [BZ #18928]Florian Weimer
Honoring the LD_POINTER_GUARD environment variable in AT_SECURE mode has security implications. This commit enables pointer guard unconditionally, and the environment variable is now ignored. [BZ #18928] * sysdeps/generic/ldsodefs.h (struct rtld_global_ro): Remove _dl_pointer_guard member. * elf/rtld.c (_rtld_global_ro): Remove _dl_pointer_guard initializer. (security_init): Always set up pointer guard. (process_envvars): Do not process LD_POINTER_GUARD. (cherry picked from commit a014cecd82b71b70a6a843e250e06b541ad524f7) Conflicts: NEWS
2015-09-01powerpc: Fix tabort usage in syscallsPaul E. Murphy
Fix usage of tabort in generated syscalls. r0 has special meaning when used with this instruction, thus it will not generate persistent errors, nor return an error code. This mitigates poor CPU usage when performing elided critical sections. Additionally, transactions should be aborted when entering a user invoked syscall. Otherwise the results of the transaction may be undefined. 2015-08-25 Paul E. Murphy <murphyp@linux.vnet.ibm.com> * sysdeps/powerpc/powerpc32/sysdep.h (ABORT_TRANSACTION): Use register other than r0 for tabort, it has special meaning. * sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION): Likewise * sysdeps/unix.sysv/linux/powerpc/syscall.S (syscall): Abort transaction before starting syscall. (cherry picked from commit 18173559a23e28055640b152e623d9f0d40ecca8)
2015-09-01powerpc: Revert to default atomic ops in elision codePaul E. Murphy
Power ISA 2.07B section B.5.5 relaxed the barrier requirement around a TLE enabled lock. It is now identical to a traditional lock. 2015-08-26 Paul E. Murphy <murphyp@linux.vnet.ibm.com> * sysdeps/unix/sysv/linux/powerpc/elision-lock.c (__arch_compare_and_exchange_val_32_acq): Remove and use common definition. ISA 2.07B no longer requires full sync. (cherry picked from commit 6eb901de9b8c3a582ec2a5fc9a2223f326800812)
2015-07-20sparc: fix sigaction for 32bit builds [BZ #18694]Mike Frysinger
Commit a059d359d86130b5fa74e04a978c8523a0293f77 changed the sigaction struct to pass conform tests, but it ended up also changing the ABI for 32 bit builds. For 64 bit builds, changing the long to two ints works, but for 32 bit builds, it inserts 4 extra bytes. This leads to many packages randomly failing like bash that spews things like: configure: line 471: wait_for: No record of process 0 Bracket the new member by a wordsize check to fix the ABI for 32bit. (cherry picked from commit 7fde904c73c57faea48c9679bbdc0932d81b3a2f)
2015-05-25Separate internal state between getXXent and getXXbyYY NSS calls (bug 18007)Andreas Schwab
Conflicts: NEWS
2015-05-25CVE-2014-8121: Do not close NSS files database during iteration [BZ #18007]Florian Weimer
Robin Hack discovered Samba would enter an infinite loop processing certain quota-related requests. We eventually tracked this down to a glibc issue. Running a (simplified) test case under strace shows that /etc/passwd is continuously opened and closed: … open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3 lseek(3, 0, SEEK_CUR) = 0 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2717 lseek(3, 2717, SEEK_SET) = 2717 close(3) = 0 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3 lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_SET) = 0 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2717 lseek(3, 2717, SEEK_SET) = 2717 close(3) = 0 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3 lseek(3, 0, SEEK_CUR) = 0 … The lookup function implementation in nss/nss_files/files-XXX.c:DB_LOOKUP has code to prevent that. It is supposed skip closing the input file if it was already open. /* Reset file pointer to beginning or open file. */ \ status = internal_setent (keep_stream); \ \ if (status == NSS_STATUS_SUCCESS) \ { \ /* Tell getent function that we have repositioned the file pointer. */ \ last_use = getby; \ \ while ((status = internal_getent (result, buffer, buflen, errnop \ H_ERRNO_ARG EXTRA_ARGS_VALUE)) \ == NSS_STATUS_SUCCESS) \ { break_if_match } \ \ if (! keep_stream) \ internal_endent (); \ } \ keep_stream is initialized from the stayopen flag in internal_setent. internal_setent is called from the set*ent implementation as: status = internal_setent (stayopen); However, for non-host database, this flag is always 0, per the STAYOPEN magic in nss/getXXent_r.c. Thus, the fix is this: - status = internal_setent (stayopen); + status = internal_setent (1); This is not a behavioral change even for the hosts database (where the application can specify the stayopen flag) because with a call to sethostent(0), the file handle is still not closed in the implementation of gethostent. Conflicts: NEWS
2015-04-22CVE-2015-1781: resolv/nss_dns/dns-host.c buffer overflow [BZ#18287]Arjun Shankar
Conflicts: NEWS
2015-04-16Merge release/2.20/master into ibm/2.20/masterTulio Magno Quites Machado Filho
Conflicts: NEWS stdio-common/Makefile
2015-03-13powerpc: Fix incorrect results for pow when using FMAAdhemerval Zanella
This patch adds no FMA generation for e_pow to avoid precision issues for powerpc. This fixes BZ#18104.
2015-02-23CVE-2015-1472: wscanf allocates too little memoryPaul Pluzhnikov
BZ #16618 Under certain conditions wscanf can allocate too little memory for the to-be-scanned arguments and overflow the allocated buffer. The implementation now correctly computes the required buffer size when using malloc. A regression test was added to tst-sscanf. Conflicts: ChangeLog NEWS
2015-02-16CVE-2015-1472: wscanf allocates too little memoryPaul Pluzhnikov
BZ #16618 Under certain conditions wscanf can allocate too little memory for the to-be-scanned arguments and overflow the allocated buffer. The implementation now correctly computes the required buffer size when using malloc. A regression test was added to tst-sscanf. (cherry picked from commit 5bd80bfe9ca0d955bfbbc002781bc7b01b6bcb06) Conflicts: ChangeLog NEWS
2015-02-16Use AVX unaligned memcpy only if AVX2 is availableH.J. Lu
memcpy with unaligned 256-bit AVX register loads/stores are slow on older processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load and sets it only when AVX2 is available. [BZ #17801] * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Set the bit_AVX_Fast_Unaligned_Load bit for AVX2. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load): New. (index_AVX_Fast_Unaligned_Load): Likewise. (HAS_AVX_FAST_UNALIGNED_LOAD): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD. * sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise. (cherry picked from commit 5f3d0b78e011d2a72f9e88b0e9ef5bc081d18f97) Conflicts: ChangeLog NEWS
2015-02-16Fix memory handling in strxfrm_l [BZ #16009]Leonhard Holz
[Modified from the original email by Siddhesh Poyarekar] This patch solves bug #16009 by implementing an additional path in strxfrm that does not depend on caching the weight and rule indices. In detail the following changed: * The old main loop was factored out of strxfrm_l into the function do_xfrm_cached to be able to alternativly use the non-caching version do_xfrm. * strxfrm_l allocates a a fixed size array on the stack. If this is not sufficiant to store the weight and rule indices, the non-caching path is taken. As the cache size is not dependent on the input there can be no problems with integer overflows or stack allocations greater than __MAX_ALLOCA_CUTOFF. Note that malloc-ing is not possible because the definition of strxfrm does not allow an oom errorhandling. * The uncached path determines the weight and rule index for every char and for every pass again. * Passing all the locale data array by array resulted in very long parameter lists, so I introduced a structure that holds them. * Checking for zero src string has been moved a bit upwards, it is before the locale data initialization now. * To verify that the non-caching path works correct I added a test run to localedata/sort-test.sh & localedata/xfrm-test.c where all strings are patched up with spaces so that they are too large for the caching path. (cherry picked from commit 0f9e585480edcdf1e30dc3d79e24b84aeee516fa) Conflicts: ChangeLog NEWS
2015-02-16Move findidx nested functions to top-level.Roland McGrath
Needed in order to backport strxfrm_l security fix cleanly. (cherry picked from commit 8c0ab919f63dc03a420751172602a52d2bea59a8) Conflicts: ChangeLog
2015-02-12powerpc: Fix TABORT encoding for little endianAdhemerval Zanella
This patch fix the TABORT encoding for toolchains with no support for HTM builtins.
2015-01-20powerpc: abort transaction in syscallsAdhemerval Zanella
Linux kernel powerpc documentation states issuing a syscall inside a transaction is not recommended and may lead to undefined behavior. It also states syscalls does not abort transactoin neither they run in transactional state. To avoid side-effects being visible outside transactions, GLIBC with lock elision enabled will issue a transaction abort instruction just before all syscalls if hardware supports hardware transactions.
2015-01-20powerpc: Add adaptive elision to rwlocksAdhemerval Zanella
This patch adds support for lock elision using ISA 2.07 hardware transactional memory for rwlocks. The logic is similar to the one presented in pthread_mutex lock elision.
2015-01-20powerpc: Add the lock elision using HTMAdhemerval Zanella
This patch adds support for lock elision using ISA 2.07 hardware transactional memory instructions for pthread_mutex primitives. Similar to s390 version, the for elision logic defined in 'force-elision.h' is only enabled if ENABLE_LOCK_ELISION is defined. Also, the lock elision code should be able to be built even with a compiler that does not provide HTM support with builtins. However I have noted the performance is sub-optimal due scheduling pressures. Conflicts: ChangeLog NEWS sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
2015-01-14powerpc: Fix POWER7/PPC64 performance regression on LEAdhemerval Zanella
This patch fixes a performance regression on the POWER7/PPC64 memcmp porting for Little Endian. The LE code uses 'ldbrx' instruction to read the memory on byte reversed form, however ISA 2.06 just provide the indexed form which uses a register value as additional index, instead of a fixed value enconded in the instruction. And the port strategy for LE uses r0 index value and update the address value on each compare loop interation. For large compare size values, it adds 8 more instructions plus some more depending of trailing size. This patch fixes it by adding pre-calculate indexes to remove the address update on loops and tailing sizes. For large sizes it shows a considerable gain, with double performance pairing with BE.
2015-01-14powerpc: Optimized strncmp for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized POWER8 strncmp. The implementation focus on speeding up unaligned cases follwing the ideas of power8 strcmp. The algorithm first check the initial 16 bytes, then align the first function source and uses unaligned loads on second argument only. Aditional checks for page boundaries are done for unaligned cases (where sources alignment are different).
2015-01-14powerpc: Optimize POWER7 strcmp trailing checksRajalakshmi Srinivasaraghavan
This patch optimized the POWER7 trailing check by avoiding using byte read operations and instead use the doubleword already readed with bitwise operations.
2015-01-14powerpc: Optimized strcmp for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized POWER8 strcmp using unaligned accesses. The algorithm first check the initial 16 bytes, then align the first function source and uses unaligned loads on second argument only. Aditional checks for page boundaries are done for unaligned cases
2015-01-14powerpc: Optimized st{r,p}ncpy for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized POWER8 st{r,p}ncpy using unaligned accesses. It shows 10%-80% improvement over the optimized POWER7 one that uses only aligned accesses, specially on unaligned inputs. The algorithm first read and check 16 bytes (if inputs do not cross a 4K page size). The it realign source to 16-bytes and issue a 16 bytes read and compare loop to speedup null byte checks for large strings. Also, different from POWER7 optimization, the null pad is done inline in the implementation using possible unaligned accesses, instead of realying on a memset call. Special case is added for page cross reads.
2015-01-14powerpc: Optimized strncat for POWER7/PPC64Adhemerval Zanella
With 3eb38795dbbbd816 (Simplify strncat) the generic algorithms uses strlen, strnlen, and memcpy. This is faster than POWER7 current implementation, especially for unaligned strings (where POWER7 code uses byte-byte operations). This patch removes the assembly implementation and uses a multiarch specialization based on default algorithm calling optimized POWER7 symbols.
2015-01-14powerpc: Optimized strcat for POWER8/PPC64Adhemerval Zanella
With new optimized strcpy for POWER8, this patch adds an optimized strcat which uses it along with default implementation at strings/.
2015-01-14powerpc: Optimized st{r,p}cpy for POWER8/PPC64Adhemerval Zanella
This patch adds an optimized POWER8 strcpy using unaligned accesses. For strings up to 16 bytes the implementation first calculate the string size, like strlen, and issues a memcpy. For larger strings, source is first aligned to 16 bytes and then tested over a loop that reads 16 bytes am combine the cmpb results for speedup. Special case is added for page cross reads. It shows 30%-60% improvement over the optimized POWER7 one that uses only aligned accesses.
2015-01-14powerpc: POWER7 strcpy optimization for unaligned stringsRajalakshmi Srinivasaraghavan
This patch optimizes strcpy for ppc64/power7 for unaligned source or destination address. The source or destination address is aligned to doubleword and data is shifted based on the alignment and added with the previous loaded data to be written as a doubleword. For each load, cmpb instruction is used for faster null check. The word aligned optimization is also removed, since the new unaligned code path shows better results handling word-aligned strings. More combination of unaligned inputs is also added in benchtest to measure the improvement.The new optimization shows 2 to 80% of performance improvement for longer string though it does not show big difference on string size less than 16 due to additional checks.