aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/x86_64/multiarch/memcpy.S
AgeCommit message (Collapse)Author
2017-06-14x86-64: Implement memmove family IFUNC selectors in CH.J. Lu
Implement memmove family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memmove family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memcpy_chk-nonshared, mempcpy_chk-nonshared and memmove_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __memmove_chk_erms, __memcpy_chk_erms and __mempcpy_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memmove.h: New file. * sysdeps/x86_64/multiarch/memcpy.c: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Removed. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__mempcpy_chk_erms): New function. (__memmove_chk_erms): Likewise. (__memcpy_chk_erms): New alias.
2017-04-18x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]H.J. Lu
On Skylake server, AVX512 load/store instructions in memcpy/memset may lead to lower CPU turbo frequency in certain situations. Use of AVX2 in memcpy/memset has been observed to have improved overall performance in many workloads due to the higher frequency. Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512 if AVX512ER isn't available so that AVX2 versions of memcpy/memset are used on Skylake server. [BZ #21396] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Prefer_No_AVX512 if AVX512ER isn't available. * sysdeps/x86/cpu-features.h (bit_arch_Prefer_No_AVX512): New. (index_arch_Prefer_No_AVX512): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Don't use AVX512 version if Prefer_No_AVX512 is set. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S (__memmove_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memset.S (memset): Likewise. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Likewise.
2017-01-01Update copyright dates with scripts/update-copyrights.Joseph Myers
2016-07-01Require binutils 2.24 to build x86-64 glibc [BZ #20139]H.J. Lu
If assembler doesn't support AVX512DQ, _dl_runtime_resolve_avx is used to save the first 8 vector registers, which only saves the lower 256 bits of vector register, for lazy binding. When it is called on AVX512 platform, the upper 256 bits of ZMM registers are clobbered. Parameters passed in ZMM registers will be wrong when the function is called the first time. This patch requires binutils 2.24, whose assembler can store and load ZMM registers, to build x86-64 glibc. Since mathvec library needs assembler support for AVX512DQ, we disable mathvec if assembler doesn't support AVX512DQ. [BZ #20139] * config.h.in (HAVE_AVX512_ASM_SUPPORT): Renamed to ... (HAVE_AVX512DQ_ASM_SUPPORT): This. * sysdeps/x86_64/configure.ac: Require assembler from binutils 2.24 or above. (HAVE_AVX512_ASM_SUPPORT): Removed. (HAVE_AVX512DQ_ASM_SUPPORT): New. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/dl-trampoline.S: Make HAVE_AVX512_ASM_SUPPORT check unconditional. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: Check HAVE_AVX512DQ_ASM_SUPPORT instead of HAVE_AVX512_ASM_SUPPORT. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx51: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: Likewise.
2016-06-30Check Prefer_ERMS in memmove/memcpy/mempcpy/memsetH.J. Lu
Although the Enhanced REP MOVSB/STOSB (ERMS) implementations of memmove, memcpy, mempcpy and memset aren't used by the current processors, this patch adds Prefer_ERMS check in memmove, memcpy, mempcpy and memset so that they can be used in the future. * sysdeps/x86/cpu-features.h (bit_arch_Prefer_ERMS): New. (index_arch_Prefer_ERMS): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Return __memcpy_erms for Prefer_ERMS. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__memmove_erms): Enabled for libc.a. * ysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Return __memmove_erms or Prefer_ERMS. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Return __mempcpy_erms for Prefer_ERMS. * sysdeps/x86_64/multiarch/memset.S (memset): Return __memset_erms for Prefer_ERMS.
2016-06-08X86-64: Remove previous default/SSE2/AVX2 memcpy/memmoveH.J. Lu
Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones, we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with the new ones. No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2 memcpy/memmove optimized with Enhanced REP MOVSB will be used for processors with ERMS. The new AVX512 memcpy/memmove will be used for processors with AVX512 which prefer vzeroupper. Since the new SSE2 memcpy/memmove are faster than the previous default memcpy/memmove used in libc.a and ld.so, we also remove the previous default memcpy/memmove and make them the default memcpy/memmove, except that non-temporal store isn't used in ld.so. Together, it reduces the size of libc.so by about 6 KB and the size of ld.so by about 2 KB. [BZ #19776] * sysdeps/x86_64/memcpy.S: Make it dummy. * sysdeps/x86_64/mempcpy.S: Likewise. * sysdeps/x86_64/memmove.S: New file. * sysdeps/x86_64/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/memmove.c: Removed. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-sse2-unaligned, memmove-avx-unaligned, memcpy-avx-unaligned and memmove-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Replace __memmove_chk_avx512_unaligned_2 with __memmove_chk_avx512_unaligned. Remove __memmove_chk_avx_unaligned_2. Replace __memmove_chk_sse2_unaligned_2 with __memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and __memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2 with __memmove_avx512_unaligned. Replace __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned. Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2 with __memcpy_chk_avx512_unaligned. Remove __memcpy_chk_avx_unaligned_2. Replace __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned. Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2. Replace __memcpy_avx512_unaligned_2 with __memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2 and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2 with __mempcpy_chk_avx512_unaligned. Remove __mempcpy_chk_avx_unaligned_2. Replace __mempcpy_chk_sse2_unaligned_2 with __mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2. Replace __mempcpy_avx512_unaligned_2 with __mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2. Replace __mempcpy_sse2_unaligned_2 with __mempcpy_sse2_unaligned. Remove __mempcpy_sse2. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned. Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms if processor has ERMS. Default to __memcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../memcpy.S. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support __memcpy_chk_avx512_unaligned_erms and __memcpy_chk_avx512_unaligned. Use __memcpy_chk_avx_unaligned_erms and __memcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __memcpy_chk_sse2_unaligned. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S Change function suffix from unaligned_2 to unaligned. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned. Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms if processor has ERMS. Default to __mempcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../mempcpy.S. (mempcpy): New. Add a weak alias. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support __mempcpy_chk_avx512_unaligned_erms and __mempcpy_chk_avx512_unaligned. Use __mempcpy_chk_avx_unaligned_erms and __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __mempcpy_chk_sse2_unaligned.
2016-03-28[x86] Add a feature bit: Fast_Unaligned_CopyH.J. Lu
On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
2016-03-04x86-64: Fix memcpy IFUNC selectionH.J. Lu
Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. Existing selection order is updated with following selection order: 1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set. 2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set. 3. __memcpy_sse2 if SSSE3 isn't available. 4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set. 5. __memcpy_ssse3 [BZ #18880] * sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back.
2016-01-16Added memcpy/memmove family optimized with AVX512 for KNL hardware.Andrew Senkevich
Added AVX512 implementations of memcpy, mempcpy, memmove, memcpy_chk, mempcpy_chk, memmove_chk. It shows average improvement more than 30% over AVX versions on KNL hardware (performance results in the thread <https://sourceware.org/ml/libc-alpha/2016-01/msg00258.html>). * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new files. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: New file. * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Added new IFUNC branch. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
2016-01-04Update copyright dates with scripts/update-copyrights.Joseph Myers
2015-08-13Update x86_64 multiarch functions for <cpu-features.h>H.J. Lu
This patch updates x86_64 multiarch functions to use the newly defined HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from <cpu-features.h>. * sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1). * sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise. * sysdeps/x86_64/multiarch/strstr.c: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/test-multiarch.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features call. Add LOAD_RTLD_GLOBAL_RO_RDX. Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Likewise. * sysdeps/x86_64/multiarch/strcmp.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/wcscpy.S: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
2015-01-30Use AVX unaligned memcpy only if AVX2 is availableH.J. Lu
memcpy with unaligned 256-bit AVX register loads/stores are slow on older processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load and sets it only when AVX2 is available. [BZ #17801] * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Set the bit_AVX_Fast_Unaligned_Load bit for AVX2. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load): New. (index_AVX_Fast_Unaligned_Load): Likewise. (HAS_AVX_FAST_UNALIGNED_LOAD): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD. * sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.
2015-01-02Update copyright dates with scripts/update-copyrights.Joseph Myers
2014-11-24Remove NOT_IN_libcSiddhesh Poyarekar
Replace with !IS_IN (libc). This completes the transition from the IS_IN/NOT_IN macros to the IN_MODULE macro set. The generated code is unchanged on x86_64. * stdlib/isomac.c (fmt): Replace NOT_IN_libc with IN_MODULE. (get_null_defines): Adjust. * sunrpc/Makefile: Adjust comment. * Makerules (CPPFLAGS-nonlib): Remove NOT_IN_libc. * elf/Makefile (CPPFLAGS-sotruss-lib): Likewise. (CFLAGS-interp.c): Likewise. (CFLAGS-ldconfig.c): Likewise. (CPPFLAGS-.os): Likewise. * elf/rtld-Rules (rtld-CPPFLAGS): Likewise. * extra-lib.mk (CPPFLAGS-$(lib)): Likewise. * extra-modules.mk (extra-modules.mk): Likewise. * iconv/Makefile (CPPFLAGS-iconvprogs): Likewise. * locale/Makefile (CPPFLAGS-locale_programs): Likewise. * malloc/Makefile (CPPFLAGS-memusagestat): Likewise. * nscd/Makefile (CPPFLAGS-nscd): Likewise. * nss/Makefile (CPPFLAGS-nss_test1): Likewise. * stdlib/Makefile (CFLAGS-tst-putenvmod.c): Likewise. * sysdeps/gnu/Makefile ($(objpfx)errlist-compat.c): Likewise. * sysdeps/unix/sysv/linux/Makefile (CPPFLAGS-lddlibc4): Likewise. * iconvdata/Makefile (CPPFLAGS): Likewise. (cpp-srcs-left): Add libof for all iconvdata routines. * bits/stdio-lock.h: Replace NOT_IN_libc with IS_IN. * include/assert.h: Likewise. * include/ctype.h: Likewise. * include/errno.h: Likewise. * include/libc-symbols.h: Likewise. * include/math.h: Likewise. * include/netdb.h: Likewise. * include/resolv.h: Likewise. * include/stdio.h: Likewise. * include/stdlib.h: Likewise. * include/string.h: Likewise. * include/sys/stat.h: Likewise. * include/wctype.h: Likewise. * intl/l10nflist.c: Likewise. * libidn/idn-stub.c: Likewise. * libio/libioP.h: Likewise. * nptl/libc_multiple_threads.c: Likewise. * nptl/pthreadP.h: Likewise. * posix/regex_internal.h: Likewise. * resolv/res_hconf.c: Likewise. * sysdeps/arm/armv7/multiarch/memcpy.S: Likewise. * sysdeps/arm/memmove.S: Likewise. * sysdeps/arm/sysdep.h: Likewise. * sysdeps/generic/_itoa.h: Likewise. * sysdeps/generic/symbol-hacks.h: Likewise. * sysdeps/gnu/errlist.awk: Likewise. * sysdeps/gnu/errlist.c: Likewise. * sysdeps/i386/i586/memcpy.S: Likewise. * sysdeps/i386/i586/memset.S: Likewise. * sysdeps/i386/i686/memcpy.S: Likewise. * sysdeps/i386/i686/memmove.S: Likewise. * sysdeps/i386/i686/mempcpy.S: Likewise. * sysdeps/i386/i686/memset.S: Likewise. * sysdeps/i386/i686/multiarch/bcopy.S: Likewise. * sysdeps/i386/i686/multiarch/bzero.S: Likewise. * sysdeps/i386/i686/multiarch/memchr-sse2-bsf.S: Likewise. * sysdeps/i386/i686/multiarch/memchr-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/memchr.S: Likewise. * sysdeps/i386/i686/multiarch/memcmp-sse4.S: Likewise. * sysdeps/i386/i686/multiarch/memcmp-ssse3.S: Likewise. * sysdeps/i386/i686/multiarch/memcmp.S: Likewise. * sysdeps/i386/i686/multiarch/memcpy-ssse3-rep.S: Likewise. * sysdeps/i386/i686/multiarch/memcpy-ssse3.S: Likewise. * sysdeps/i386/i686/multiarch/memcpy.S: Likewise. * sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise. * sysdeps/i386/i686/multiarch/memmove.S: Likewise. * sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise. * sysdeps/i386/i686/multiarch/mempcpy.S: Likewise. * sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise. * sysdeps/i386/i686/multiarch/memrchr-c.c: Likewise. * sysdeps/i386/i686/multiarch/memrchr-sse2-bsf.S: Likewise. * sysdeps/i386/i686/multiarch/memrchr-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/memrchr.S: Likewise. * sysdeps/i386/i686/multiarch/memset-sse2-rep.S: Likewise. * sysdeps/i386/i686/multiarch/memset-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/memset.S: Likewise. * sysdeps/i386/i686/multiarch/memset_chk.S: Likewise. * sysdeps/i386/i686/multiarch/rawmemchr.S: Likewise. * sysdeps/i386/i686/multiarch/strcat-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/strcat-ssse3.S: Likewise. * sysdeps/i386/i686/multiarch/strcat.S: Likewise. * sysdeps/i386/i686/multiarch/strchr-sse2-bsf.S: Likewise. * sysdeps/i386/i686/multiarch/strchr-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/strchr.S: Likewise. * sysdeps/i386/i686/multiarch/strcmp-sse4.S: Likewise. * sysdeps/i386/i686/multiarch/strcmp-ssse3.S: Likewise. * sysdeps/i386/i686/multiarch/strcmp.S: Likewise. * sysdeps/i386/i686/multiarch/strcpy-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/strcpy-ssse3.S: Likewise. * sysdeps/i386/i686/multiarch/strcpy.S: Likewise. * sysdeps/i386/i686/multiarch/strcspn.S: Likewise. * sysdeps/i386/i686/multiarch/strlen-sse2-bsf.S: Likewise. * sysdeps/i386/i686/multiarch/strlen-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/strlen.S: Likewise. * sysdeps/i386/i686/multiarch/strnlen.S: Likewise. * sysdeps/i386/i686/multiarch/strrchr-sse2-bsf.S: Likewise. * sysdeps/i386/i686/multiarch/strrchr-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/strrchr.S: Likewise. * sysdeps/i386/i686/multiarch/strspn.S: Likewise. * sysdeps/i386/i686/multiarch/wcschr-c.c: Likewise. * sysdeps/i386/i686/multiarch/wcschr-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/wcschr.S: Likewise. * sysdeps/i386/i686/multiarch/wcscmp-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/wcscmp.S: Likewise. * sysdeps/i386/i686/multiarch/wcscpy-c.c: Likewise. * sysdeps/i386/i686/multiarch/wcscpy-ssse3.S: Likewise. * sysdeps/i386/i686/multiarch/wcscpy.S: Likewise. * sysdeps/i386/i686/multiarch/wcslen-c.c: Likewise. * sysdeps/i386/i686/multiarch/wcslen-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/wcslen.S: Likewise. * sysdeps/i386/i686/multiarch/wcsrchr-c.c: Likewise. * sysdeps/i386/i686/multiarch/wcsrchr-sse2.S: Likewise. * sysdeps/i386/i686/multiarch/wcsrchr.S: Likewise. * sysdeps/i386/i686/multiarch/wmemcmp-c.c: Likewise. * sysdeps/i386/i686/multiarch/wmemcmp.S: Likewise. * sysdeps/ia64/fpu/libm-symbols.h: Likewise. * sysdeps/nptl/bits/libc-lock.h: Likewise. * sysdeps/nptl/bits/libc-lockP.h: Likewise. * sysdeps/nptl/bits/stdio-lock.h: Likewise. * sysdeps/posix/closedir.c: Likewise. * sysdeps/posix/opendir.c: Likewise. * sysdeps/posix/readdir.c: Likewise. * sysdeps/posix/rewinddir.c: Likewise. * sysdeps/powerpc/novmx-sigjmp.c: Likewise. * sysdeps/powerpc/powerpc32/__longjmp.S: Likewise. * sysdeps/powerpc/powerpc32/bsd-_setjmp.S: Likewise. * sysdeps/powerpc/powerpc32/fpu/__longjmp.S: Likewise. * sysdeps/powerpc/powerpc32/fpu/setjmp.S: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/bzero.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memchr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memcmp-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memcmp.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memcpy-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memcpy.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memmove.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memrchr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memrchr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memset-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memset.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/rawmemchr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strcasecmp.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strcasecmp_l.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strchr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strchrnul.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strlen-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strlen.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strncase.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strncase_l.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/strnlen.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wordcopy.c: Likewise. * sysdeps/powerpc/powerpc32/power6/memset.S: Likewise. * sysdeps/powerpc/powerpc32/setjmp.S: Likewise. * sysdeps/powerpc/powerpc64/__longjmp.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/bzero.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcmp.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memmove-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memmove.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/mempcpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memrchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/rawmemchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpcpy-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpcpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcat.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchrnul.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcpy-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcspn.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strlen-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strlen.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncase.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncase_l.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncat.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncpy-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strnlen.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strpbrk.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strspn.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wordcopy.c: Likewise. * sysdeps/powerpc/powerpc64/setjmp.S: Likewise. * sysdeps/s390/s390-32/multiarch/ifunc-resolve.c: Likewise. * sysdeps/s390/s390-32/multiarch/memcmp.S: Likewise. * sysdeps/s390/s390-32/multiarch/memcpy.S: Likewise. * sysdeps/s390/s390-32/multiarch/memset.S: Likewise. * sysdeps/s390/s390-64/multiarch/ifunc-resolve.c: Likewise. * sysdeps/s390/s390-64/multiarch/memcmp.S: Likewise. * sysdeps/s390/s390-64/multiarch/memcpy.S: Likewise. * sysdeps/s390/s390-64/multiarch/memset.S: Likewise. * sysdeps/sparc/sparc64/multiarch/memcpy-niagara1.S: Likewise. * sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Likewise. * sysdeps/sparc/sparc64/multiarch/memcpy-niagara4.S: Likewise. * sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: Likewise. * sysdeps/sparc/sparc64/multiarch/memcpy.S: Likewise. * sysdeps/sparc/sparc64/multiarch/memset-niagara1.S: Likewise. * sysdeps/sparc/sparc64/multiarch/memset-niagara4.S: Likewise. * sysdeps/sparc/sparc64/multiarch/memset.S: Likewise. * sysdeps/unix/alpha/sysdep.S: Likewise. * sysdeps/unix/alpha/sysdep.h: Likewise. * sysdeps/unix/make-syscalls.sh: Likewise. * sysdeps/unix/sysv/linux/aarch64/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/aarch64/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/alpha/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise. * sysdeps/unix/sysv/linux/arm/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/arm/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/getpid.c: Likewise. * sysdeps/unix/sysv/linux/hppa/nptl/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/hppa/nptl/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/i386/i486/lowlevellock.S: Likewise. * sysdeps/unix/sysv/linux/i386/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/i386/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/i386/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/ia64/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/ia64/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/ia64/sysdep.S: Likewise. * sysdeps/unix/sysv/linux/ia64/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/lowlevellock-futex.h: Likewise. * sysdeps/unix/sysv/linux/m68k/bits/m68k-vdso.h: Likewise. * sysdeps/unix/sysv/linux/m68k/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/m68k/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/microblaze/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/microblaze/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/mips/mips64/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/mips/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/not-cancel.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/s390/longjmp_chk.c: Likewise. * sysdeps/unix/sysv/linux/s390/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/sysdep.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/sysdep.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise. * sysdeps/unix/sysv/linux/sh/lowlevellock.S: Likewise. * sysdeps/unix/sysv/linux/sh/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/sh/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/sh/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/sh/vfork.S: Likewise. * sysdeps/unix/sysv/linux/sparc/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/brk.S: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/tile/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/tile/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/tile/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/tile/waitpid.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/x86_64/sysdep-cancel.h: Likewise. * sysdeps/unix/sysv/linux/x86_64/sysdep.h: Likewise. * sysdeps/wordsize-32/symbol-hacks.h: Likewise. * sysdeps/x86_64/memcpy.S: Likewise. * sysdeps/x86_64/memmove.c: Likewise. * sysdeps/x86_64/memset.S: Likewise. * sysdeps/x86_64/multiarch/init-arch.h: Likewise. * sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise. * sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/strcat-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Likewise. * sysdeps/x86_64/multiarch/strchr-sse2-no-bsf.S: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Likewise. * sysdeps/x86_64/multiarch/strcmp-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/strcmp.S: Likewise. * sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/wcscpy-c.c: Likewise. * sysdeps/x86_64/multiarch/wcscpy-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/wcscpy.S: Likewise. * sysdeps/x86_64/multiarch/wmemcmp-c.c: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise. * sysdeps/x86_64/strcmp.S: Likewise.
2014-07-30Improve 64bit memcpy performance for Haswell CPU with AVX instructionLing Ma
In this patch we take advantage of HSW memory bandwidth, manage to reduce miss branch prediction by avoiding using branch instructions and force destination to be aligned with avx instruction. The CPU2006 403.gcc benchmark indicates this patch improves performance from 2% to 10%.
2014-01-01Update copyright notices with scripts/update-copyrightsAllan McRae
2013-05-20Faster memcpy on x64.Ondrej Bilka
We add new memcpy version that uses unaligned loads which are fast on modern processors. This allows second improvement which is avoiding computed jump which is relatively expensive operation. Tests available here: http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
2013-01-02Update copyright notices with scripts/update-copyrights.Joseph Myers
2012-10-11Add x86-64 __libc_ifunc_impl_listH.J. Lu
2012-02-09Replace FSF snail mail address with URLs.Paul Eggert
2011-04-01Work around old buggy program which cannot cope with memcpy semantics.H.J. Lu
2010-06-30Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7H.J. Lu
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to 4X on Core 2 and up to 2X on Core i7.