aboutsummaryrefslogtreecommitdiff
path: root/sysdeps
AgeCommit message (Collapse)Author
2016-05-24S390: Do not call memcpy, memcmp, memset within libc.so via ifunc-plt.Stefan Liebler
On s390, the memcpy, memcmp, memset functions are IFUNC symbols, which are created with s390_libc_ifunc-macro. This macro creates a __GI_ symbol which is set to the ifunced symbol. Thus calls within libc.so to e.g. memcpy result in a call to *ABS*+0x954c0@plt stub and afterwards to the resolved memcpy-ifunc-variant. This patch sets the __GI_ symbol to the default-ifunc-variant to avoid the plt call. The __GI_ symbols are now created at the default variant of ifunced function. ChangeLog: * sysdeps/s390/multiarch/ifunc-resolve.h (s390_libc_ifunc): Remove __GI_ symbol. * sysdeps/s390/s390-32/multiarch/memcmp-s390.S: Add __GI_memcmp symbol. * sysdeps/s390/s390-64/multiarch/memcmp-s390x.S: Likewise. * sysdeps/s390/s390-32/multiarch/memcpy-s390.S: Add __GI_memcpy symbol. * sysdeps/s390/s390-64/multiarch/memcpy-s390x.S: Likewise. * sysdeps/s390/s390-32/multiarch/memset-s390.S: Add __GI_memset symbol. * sysdeps/s390/s390-64/multiarch/memset-s390x.S: Likewise.
2016-05-24S390: Use 64bit instruction to check for copies of > 1MB with mvcle.Stefan Liebler
The __memcpy_default variant on s390 64bit calculates the number of 256byte blocks in a 64bit register and checks, if they exceed 1MB to jump to mvcle. Otherwise a mvc-loop is used. The compare-instruction only checks a 32bit value. This patch uses a 64bit compare. ChangeLog: * sysdeps/s390/s390-64/memcpy.S (memcpy): Use cghi instead of chi to compare 64bit value.
2016-05-24S390: Use mvcle for copies > 1MB on 32bit with default memcpy variant.Stefan Liebler
If more than 255 bytes should be copied, the algorithm jumps away. Before this patch, it jumps to the mvc-loop (.L_G5_12). Now it jumps first to the "> 1MB" check, which jumps away to __memcpy_mvcle. Otherwise, the mvc-loop (.L_G5_12) copies the bytes. ChangeLog: * sysdeps/s390/s390-32/memcpy.S (memcpy): Jump to 1MB check before executing mvc-loop.
2016-05-23Make padding in struct sockaddr_storage explicit [BZ #20111]Florian Weimer
This avoids aliasing issues with GCC 6 in -fno-strict-aliasing mode. (With implicit padding, not all data is copied.) This change makes it explicit that struct sockaddr_storage is only 126 bytes large on m68k (unlike elsewhere, where we end up with the requested 128 bytes). The new test case makes sure that this does not happen on other architectures.
2016-05-23Update sysdeps/unix/sysv/linux/bits/socket.h for Linux 4.6.Joseph Myers
This patch updates sysdeps/unix/sysv/linux/bits/socket.h for new constants added in Linux 4.6. AF_KCM / PF_KCM are added. SOL_KCM is new, and I added a lot of SOL_* values postdating the last one present in the header, since I saw no apparent reason for the set in glibc to stop at SOL_IRDA. MSG_BATCH is added; Linux also has MSG_SENDPAGE_NOTLAST which is not in glibc, but given the comment starts "sendpage() internal" I presume it's correct for it not to be in glibc. (Note that this is a case where the Linux kernel header with userspace relevant values is *not* a uapi header but include/linux/socket.h - I don't know why, but at least this header, as well as uapi headers, needs reviewing for glibc-relevant changes each release.) Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by the patch). * sysdeps/unix/sysv/linux/bits/socket.h (PF_KCM): New macro. (PF_MAX): Update value. (AF_KCM): New macro. (SOL_NETBEUI): Likewise. (SOL_LLC): Likewise. (SOL_DCCP): Likewise. (SOL_NETLINK): Likewise. (SOL_TIPC): Likewise. (SOL_RXRPC): Likewise. (SOL_PPPOL2TP): Likewise. (SOL_BLUETOOTH): Likewise. (SOL_PNPIPE): Likewise. (SOL_RDS): Likewise. (SOL_IUCV): Likewise. (SOL_CAIF): Likewise. (SOL_ALG): Likewise. (SOL_NFC): Likewise. (SOL_KCM): Likewise. (MSG_BATCH): New enum value and macro.
2016-05-20Remove special L2 cache case for Knights LandingH.J. Lu
L2 cache is shared by 2 cores on Knights Landing, which has 4 threads per core: https://en.wikipedia.org/wiki/Xeon_Phi#Knights_Landing So L2 cache is shared by 8 threads on Knights Landing as reported by CPUID. We should remove special L2 cache case for Knights Landing. [BZ #18185] * sysdeps/x86/cacheinfo.c (init_cacheinfo): Don't limit threads sharing L2 cache to 2 for Knights Landing.
2016-05-19Implement proper fmal for ldbl-128ibm (bug 13304).Joseph Myers
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in most cases, with no attempt at actually being a fused operation. This patch replaces it with a genuine fused operation. It is not necessarily correctly rounding, but should produce a result at least as accurate as the long double arithmetic operations in libgcc, which I think is all that can reasonably be expected for such a non-IEEE format where arithmetic is approximate rather than rounded according to any particular rule for determining the exact result. Like the libgcc arithmetic, it may produce spurious overflow and underflow results, and it falls back to the libgcc multiplication in the case of (finite, finite, zero). This concludes the fixes for bug 13304; any subsequently found fma issues should go in separate Bugzilla bugs. Various other pieces of bug 13304 were fixed in past releases over the past several years. Tested for powerpc. [BZ #13304] * sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>, <float.h>, <math_private.h> and <stdlib.h>. (add_split): New function. (mul_split): Likewise. (ext_val): New typedef. (store_ext_val): New function. (mul_ext_val): New function. (compare): New function. (add_split_ext): New function. (__fmal): After checking for Inf, NaN and zero, compute result as an exact sum of scaled double values in round-to-nearest before adding those up and adjusting for other rounding modes. * math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from tests of fma. * math/auto-libm-test-out: Regenerated.
2016-05-19Correct Intel processor level type mask from CPUIDH.J. Lu
Intel CPUID with EAX == 11 returns: ECX Bits 07 - 00: Level number. Same value in ECX input. Bits 15 - 08: Level type. ^^^^^^^^^^^^^^^^^^^^^^^^ This is level type. Bits 31 - 16: Reserved. Intel processor level type mask should be 0xff00, not 0xff0. [BZ #20119] * sysdeps/x86/cacheinfo.c (init_cacheinfo): Correct Intel processor level type mask for CPUID with EAX == 11.
2016-05-19Check the HTT bit before counting logical threadsH.J. Lu
Skip counting logical threads for Intel processors if the HTT bit is 0 which indicates there is only a single logical processor. * sysdeps/x86/cacheinfo.c (init_cacheinfo): Skip counting logical threads if the HTT bit is 0. * sysdeps/x86/cpu-features.h (bit_cpu_HTT): New. (index_cpu_HTT): Likewise. (reg_HTT): Likewise.
2016-05-19Remove alignments on jump targets in memsetH.J. Lu
X86-64 memset-vec-unaligned-erms.S aligns many jump targets, which increases code sizes, but not necessarily improve performance. As memset benchtest data of align vs no align on various Intel and AMD processors https://sourceware.org/bugzilla/attachment.cgi?id=9277 shows that aligning jump targets isn't necessary. [BZ #20115] * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset): Remove alignments on jump targets.
2016-05-18Don't call internal _Unwind_Resume via PLTH.J. Lu
There is no need to call the internal funtion, _Unwind_Resume, which is defined in unwind-forcedunwind.c, via PLT. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S (__condvar_cleanup2): Remove JUMPTARGET from _Unwind_Resume call. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S (__condvar_cleanup1): Likewise.
2016-05-18Don't call internal __pthread_unwind via PLTH.J. Lu
Add PTHREAD_UNWIND to replace JUMPTARGET(__pthread_unwind) and define it to __GI___pthread_unwind within libpthread. * sysdeps/unix/sysv/linux/x86_64/cancellation.S (PTHREAD_UNWIND): New (__pthread_unwind): Renamed to ... (PTHREAD_UNWIND): This. (__pthread_enable_asynccancel): Replace JUMPTARGET(__pthread_unwind) with PTHREAD_UNWIND.
2016-05-18Add CLONE_NEWCGROUP from Linux 4.6 to bits/sched.h.Joseph Myers
This patch adds CLONE_NEWCGROUP, new in Linux 4.6, to sysdeps/unix/sysv/linux/bits/sched.h. Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by the patch). * sysdeps/unix/sysv/linux/bits/sched.h [__USE_GNU] (CLONE_NEWCGROUP): New macro.
2016-05-18Add Q_GETNEXTQUOTA from Linux 4.6 to sys/quota.h.Joseph Myers
This patch adds Q_GETNEXTQUOTA, new in Linux 4.6, to sysdeps/unix/sysv/linux/sys/quota.h. Tested for x86_64 and x86 (testsuite, and that installed shared libraries are unchanged by the patch). * sysdeps/unix/sysv/linux/sys/quota.h [_LINUX_QUOTA_VERSION >= 2] (Q_GETNEXTQUOTA): New macro.
2016-05-13Call init_cpu_features only if SHARED is definedH.J. Lu
In static executable, since init_cpu_features is called early from __libc_start_main, there is no need to call it again in dl_platform_init. [BZ #20072] * sysdeps/i386/dl-machine.h (dl_platform_init): Call init_cpu_features only if SHARED is defined. * sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
2016-05-13Support non-inclusive caches on Intel processorsH.J. Lu
* sysdeps/x86/cacheinfo.c (init_cacheinfo): Check and support non-inclusive caches on Intel processors.
2016-05-12This is an optimized memset for AArch64. Memset is split into 4 main cases:Wilco Dijkstra
small sets of up to 16 bytes, medium of 16..96 bytes which are fully unrolled. Large memsets of more than 96 bytes align the destination and use an unrolled loop processing 64 bytes per iteration. Memsets of zero of more than 256 use the dc zva instruction, and there are faster versions for the common ZVA sizes 64 or 128. STP of Q registers is used to reduce codesize without loss of performance. The speedup on test-memset is 1% on Cortex-A57 and 8% on Cortex-A53. * sysdeps/aarch64/memset.S (__memset): Rewrite of optimized memset.
2016-05-12Increase fork signal safety for single-threaded processes [BZ #19703]Florian Weimer
This provides a band-aid and addresses the scenario where fork is called from a signal handler while the process is in the malloc subsystem (or has acquired the libio list lock). It does not address the general issue of async-signal-safety of fork; multi-threaded processes are not covered, and some glibc subsystems have fork handlers which are not async-signal-safe.
2016-05-12getaddrinfo: Convert from extend_alloca to struct scratch_bufferFlorian Weimer
2016-05-11S390: Use fPIC to avoid R_390_GOT12 relocation in gcrt1.o.Stefan Liebler
if glibc is build with -march=z900 | -march=z990, the startup file gcrt1.o (used if you link with gcc -pg) contains R_390_GOT12 | R_390_GOT20 relocations. Thus, an entry in the GOT can be addressed relative to the GOT pointer with a 12 | 20 bit displacement value. The startup files should not contain R_390_GOT12, R_390_GOT20 relocations, but R_390_GOTENT ones. This patch removes the overrides of pic-ccflag and the default pic-ccflag = -fPIC in Makeconfig is used instead to get the R_390_GOTENT relocations in gcrt1.o. ChangeLog: * sysdeps/s390/s390-32/Makefile (pic-ccflag): Remove. * sysdeps/s390/s390-64/Makefile: Likewise.
2016-05-11Remove x86 ifunc-defines.sym and rtld-global-offsets.symH.J. Lu
Merge x86 ifunc-defines.sym with x86 cpu-features-offsets.sym. Remove x86 ifunc-defines.sym and rtld-global-offsets.sym. No code changes on i686 and x86-64. * sysdeps/i386/i686/multiarch/Makefile (gen-as-const-headers): Remove ifunc-defines.sym. * sysdeps/x86_64/multiarch/Makefile (gen-as-const-headers): Likewise. * sysdeps/i386/i686/multiarch/ifunc-defines.sym: Removed. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym: Likewise. * sysdeps/x86/Makefile (gen-as-const-headers): Remove rtld-global-offsets.sym. * sysdeps/x86_64/multiarch/ifunc-defines.sym: Merged with ... * sysdeps/x86/cpu-features-offsets.sym: This. * sysdeps/x86/cpu-features.h: Include <cpu-features-offsets.h> instead of <ifunc-defines.h> and <rtld-global-offsets.h>.
2016-05-10getaddrinfo: Restore RES_USE_INET6 flag on error path [BZ #19994]Florian Weimer
2016-05-09S390: Add support for vdso getcpu symbol.Stefan Liebler
This patch adds support for symbol __kernel_getcpu in vDSO, which is available with kernel 4.5. Now sched_getcpu is using this symbol if available in mapped vDSO by defining macro HAVE_GETCPU_VSYSCALL. If not available at runtime, the former syscall is used.
2016-05-08Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86H.J. Lu
Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86. No code changes on x86 and x86_64. * sysdeps/i386/cacheinfo.c: Include <sysdeps/x86/cacheinfo.c> instead of <sysdeps/x86_64/cacheinfo.c>. * sysdeps/x86_64/cacheinfo.c: Moved to ... * sysdeps/x86/cacheinfo.c: Here.
2016-05-04Revert "aio: fix newp->running data race"Samuel Thibault
This reverts commit fd67a9cf7b733da082e4b6a5f25c19ea7921b4cd.
2016-05-04aio: fix newp->running data raceSamuel Thibault
* sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write `running` field of `newp` when a thread was started to process it, since that thread will not take `__aio_requests_mutex`, and the field already has the proper value actually.
2016-05-04powerpc: Fix operand prefixesGabriel F. T. Gomes
The file sysdeps/powerpc/sysdeps.h defines aliases for condition register operands. E.g.: 'cr7' means condition register 7. On the one hand, this increases readability, as it makes it easier for readers to know whether the operand is a condition register, a general purpose register or an immediate. On the other hand, this permits that condition registers be written as if they were general purpose, and vice-versa, thus reducing the readability of the code. This commit removes some of these unintentional misuses. The changes have no effect on the final code. Checked with objdump.
2016-05-04CVE-2016-1234: glob: Do not copy d_name field of struct dirent [BZ #19779]Florian Weimer
Instead, we store the data we need from the return value of readdir in an object of the new type struct readdir_result. This type is independent of the layout of struct dirent.
2016-05-03powerpc: Add missing insn in swapcontext [BZ #20004]Paul E. Murphy
A missing instruction was discovered in the compat version of swapcontext while running the GCC test suite.
2016-05-02powerpc: Fix clone CLONE_VM compareAdhemerval Zanella
This patch fixes the clone CLONE_VM change from 0cb313f (BZ#19957) where the commit changed the register that contains the save flags argument to compare with (from r28 to r29). This patch changes back to correct register. Tested on powerpc32 (thanks to Tulio Magno Quites Machado Filho). * sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S (__clone): Fix flags CLONE_VM compare.
2016-04-30m68k: use large PIC model for gcrt1.oAndreas Schwab
2016-04-30m68k: avoid local labels in symbol tableAndreas Schwab
2016-04-29Fix clone (CLONE_VM) pid/tid reset (BZ#19957)Adhemerval Zanella
As discussed in libc-alpha [1] current clone with CLONE_VM (without CLONE_THREAD set) will reset the pthread pid/tid fields to -1. The issue is since memory is shared between the parent and child it will clobber parent's cached pid/tid leading to internal inconsistencies if the value is not restored. And even it is restored it may lead to racy conditions when between set/restore a thread might invoke pthread function that validate the pthread with INVALID_TD_P/INVALID_NOT_TERMINATED_TD_P and thus get wrong results. As stated in BZ19957, previously reports of this behaviour was close with EWONTFIX due the fact usage of clone outside glibc is tricky since glibc requires consistent internal pthread, while using clone directly may not provide it. However since now posix_spawn uses clone (CLONE_VM) to fixes various issues related to previous vfork usage this issue requires fixing. The vfork implementation also does something similar, but instead it negates and restores only the *pid* field and functions that might access its value know to handle such case (getpid, raise and pthread ones that uses INVALID_TD_P/INVALID_NOT_TERMINATED_TD_P macros that check only *tid* field). Also vfork does not call __clone directly, instead calling either __NR_vfork or __NR_clone directly. So this patch removes this clone behavior by avoiding setting the pthread pid/tid field for CLONE_VM. There is no need to check for CLONE_THREAD, since the minimum supported kernel in all architecture implies that CLONE_VM must be used with CLONE_THREAD, otherwise clone returns EINVAL. Instead of current approach of: int clone(int (*fn)(void *), void *child_stack, int flags, ...) [...] if (flags & CLONE_THREAD) goto do_syscall; pid_t new_value; if (flags & CLONE_VM) new_value = -1; else new_value = getpid (); THREAD_SETMEM (THREAD_SELF, pid, new_value); THREAD_SETMEM (THREAD_SELF, tid, new_value); do_syscall: [...] The new approach uses: int clone(int (*fn)(void *), void *child_stack, int flags, ...) [...] if (flags & CLONE_VM) goto do_syscall; pid_t new_value = getpid (); THREAD_SETMEM (THREAD_SELF, pid, new_value); THREAD_SETMEM (THREAD_SELF, tid, new_value); do_syscall: [...] It also removes the linux tst-getpid2.c test which expects the previous behavior and instead add another clone test. Tested on x86_64, i686, x32, powerpc64le, aarch64, armhf, s390, and s390x. I also did limited check on mips32 and sparc64 (using the new added test). I also got reviews from both m68k, hppa, and tile. So I presume for these architecture the patch works. The fixes for alpha, microblaze, sh, ia64, and nio2 have not been tested. [1] https://sourceware.org/ml/libc-alpha/2016-04/msg00307.html * sysdeps/unix/sysv/linux/Makefile [$(subdir) == nptl] (test): Remove tst-getpid2. (test): Add tst-clone2. * sysdeps/unix/sysv/linux/tst-clone2.c: New file. * sysdeps/unix/sysv/linux/aarch64/clone.S (__clone): Do not change pid/tid fields for CLONE_VM. * sysdeps/unix/sysv/linux/arm/clone.S: Likewise. * sysdeps/unix/sysv/linux/i386/clone.S: Likewise. * sysdeps/unix/sysv/linux/mips/clone.S: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/clone.S: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/clone.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/clone.S: Likewise. * sysdeps/unix/sysv/linux/tst-getpid2.c: Remove file.
2016-04-29powerpc: Zero pad using memset in strncpy/stpncpyGabriel F. T. Gomes
Call __memset_power8 to pad, with zeros, the remaining bytes in the dest string on __strncpy_power8 and __stpncpy_power8. This improves performance when n is larger than the input string, giving ~30% gain for larger strings without impacting much shorter strings.
2016-04-29CVE-2016-3706: getaddrinfo: stack overflow in hostent conversion [BZ #20010]Florian Weimer
When converting a struct hostent response to struct gaih_addrtuple, the gethosts macro (which is called from gaih_inet) used alloca, without malloc fallback for large responses. This commit changes this code to use calloc unconditionally. This commit also consolidated a second hostent-to-gaih_addrtuple conversion loop (in gaih_inet) to use the new conversion function.
2016-04-27Add missing iucv related defines.Stefan Liebler
this patch adds the missing SOL_IUCV socket level definition and socket options SO_IPRMDATA_MSG, SO_MSGLIMIT, SO_MSGSIZE which can be used with get/setsockopt(). SCM_IUCV_TRGCLS is needed to send/receive ancillary data with send/recvmsg(). The defines are copied from kernel-source: include/net/iucv/af_iucv.h include/linux/socket.h
2016-04-25powerpc: Add optimized strcspn for P8Paul E. Murphy
A few minor adjustments to the P8 strspn gives us an almost equally optimized P8 strcspn.
2016-04-25Fix stdlib/tst-makecontext regression for Nios IIChung-Lin Tang
2016-04-22powerpc: strcasestr optmization for power8Rajalakshmi Srinivasaraghavan
This patch optimizes strcasestr function for power >= 8 systems. The average improvement of this optimization is ~40% and compares 16 bytes at a time using vector instructions. This patch is tested on powerpc64 and powerpc64le.
2016-04-19Fix gprof timingSamuel Thibault
* sysdeps/mach/hurd/profil.c (__profile_frequency): Return tick frequency instead of tick length in us.
2016-04-19hurd: fix profiling short-living processesSamuel Thibault
* sysdeps/mach/hurd/profil.c (update_waiter): Initialize profil_reply_port. (profile_waiter): Do not initialize profil_reply_port.
2016-04-15powerpc: Optimization for strlen for POWER8.Carlos Eduardo Seo
This implementation takes advantage of vectorization to improve performance of the loop over the current strlen implementation for POWER7.
2016-04-15Detect Intel Goldmont and Airmont processorsH.J. Lu
Updated from the model numbers of Goldmont and Airmont processors in Intel64 And IA-32 Processor Architectures Software Developer's Manual Volume 3 Revision 058. * sysdeps/x86/cpu-features.c (init_cpu_features): Detect Intel Goldmont and Airmont processors.
2016-04-14Fix pread consolidation on ports that require argument alignmentAdhemerval Zanella
This patch fixes the __ALIGNMENT_{ARG,COUNT} definition for ports that define __ASSUME_ALIGNED_REGISTER_PAIRS by including the kernel-features.h (where it is defined if the case). This was shown on arm with failing cases: FAIL: debug/tst-chk1 FAIL: debug/tst-chk2 FAIL: debug/tst-chk3 FAIL: debug/tst-chk4 FAIL: debug/tst-chk5 FAIL: debug/tst-chk6 FAIL: debug/tst-lfschk1 FAIL: debug/tst-lfschk2 FAIL: debug/tst-lfschk3 FAIL: debug/tst-lfschk4 FAIL: debug/tst-lfschk5 FAIL: debug/tst-lfschk6 FAIL: posix/tst-preadwrite FAIL: posix/tst-preadwrite64 The patches fixes it. Tested on armhf. * sysdeps/unix/sysv/linux/sysdep.h: Include kernel-features.h.
2016-04-14malloc: Remove unused definitions of thread_atfork, thread_atfork_staticFlorian Weimer
2016-04-14malloc: Run fork handler as late as possible [BZ #19431]Florian Weimer
Previously, a thread M invoking fork would acquire locks in this order: (M1) malloc arena locks (in the registered fork handler) (M2) libio list lock A thread F invoking flush (NULL) would acquire locks in this order: (F1) libio list lock (F2) individual _IO_FILE locks A thread G running getdelim would use this order: (G1) _IO_FILE lock (G2) malloc arena lock After executing (M1), (F1), (G1), none of the threads can make progress. This commit changes the fork lock order to: (M'1) libio list lock (M'2) malloc arena locks It explicitly encodes the lock order in the implementations of fork, and does not rely on the registration order, thus avoiding the deadlock.
2016-04-14Remove union wait [BZ #19613]Florian Weimer
The overloading approach in the W* macros was incompatible with integer expressions of a type different from int. Applications using union wait and these macros will have to migrate to the POSIX-specified int status type.
2016-04-13Register extra test objectsAndreas Schwab
This makes sure that the extra test objects are compiled with the correct MODULE_NAME and dependencies are tracked.
2016-04-12X86-64: Use non-temporal store in memcpy on large dataH.J. Lu
The large memcpy micro benchmark in glibc shows that there is a regression with large data on Haswell machine. non-temporal store in memcpy on large data can improve performance significantly. This patch adds a threshold to use non temporal store which is 6 times of shared cache size. When size is above the threshold, non temporal store will be used, but avoid non-temporal store if there is overlap between destination and source since destination may be in cache when source is loaded. For size below 8 vector register width, we load all data into registers and store them together. Only forward and backward loops, which move 4 vector registers at a time, are used to support overlapping addresses. For forward loop, we load the last 4 vector register width of data and the first vector register width of data into vector registers before the loop and store them after the loop. For backward loop, we load the first 4 vector register width of data and the last vector register width of data into vector registers before the loop and store them after the loop. [BZ #19928] * sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold): New. (init_cacheinfo): Set __x86_shared_non_temporal_threshold to 6 times of shared cache size. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S (VMOVNT): New. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S (VMOVNT): Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S (VMOVNT): Likewise. (VMOVU): Changed to movups for smaller code sizes. (VMOVA): Changed to movaps for smaller code sizes. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update comments. (PREFETCH): New. (PREFETCH_SIZE): Likewise. (PREFETCHED_LOAD_SIZE): Likewise. (PREFETCH_ONE_SET): Likewise. Rewrite to use forward and backward loops, which move 4 vector registers at a time, to support overlapping addresses and use non temporal store if size is above the threshold and there is no overlap between destination and source.
2016-04-12VDSO support for MIPSMatthew Fortune
This patch adds support for using the implementations of gettimeofday() and clock_gettime() provided by the kernel in the VDSO. The VDSO will always provide clock_gettime() as CLOCK_{REALTIME,MONOTONIC}_COARSE can be implemented regardless of platform. CLOCK_{REALTIME,MONOTONIC}, along with gettimeofday(), are only implemented on platforms which make use of either the CP0 count or GIC as their clocksource. On other platforms, the VDSO does not provide the __vdso_gettimeofday symbol, as it is never useful. The VDSO functions return ENOSYS when they encounter an unsupported request, in which case glibc should fall back to the standard syscall. Tested with upstream kernel 4.5 and QEMU emulating Malta. ./vdsotest gettimeofday bench gettimeofday: syscall: 1021 nsec/call gettimeofday: libc: 262 nsec/call gettimeofday: vdso: 174 nsec/call * sysdeps/unix/sysv/linux/mips/Makefile (sysdep_routines): Include dl-vdso. * sysdeps/unix/sysv/linux/mips/Versions: Add __vdso_clock_gettime. * sysdeps/unix/sysv/linux/mips/init-first.c: New file. * sysdeps/unix/sysv/linux/mips/libc-vdso.h: New file. * sysdeps/unix/sysv/linux/mips/mips32/sysdep.h: (INTERNAL_VSYSCALL_CALL): Define to be compatible with MIPS definitions of INTERNAL_SYSCALL_{ERROR_P,ERRNO}. (HAVE_CLOCK_GETTIME_VSYSCALL): Define. (HAVE_GETTIMEOFDAY_VSYSCALL): Define. * sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/mips/mips64/n64/sysdep.h: Likewise.