diff options
author | H.J. Lu <hjl.tools@gmail.com> | 2017-03-21 10:59:31 -0700 |
---|---|---|
committer | H.J. Lu <hjl.tools@gmail.com> | 2017-03-21 11:00:12 -0700 |
commit | c15f8eb50cea7ad1a4ccece6e0982bf426d52c00 (patch) | |
tree | da3251690c3d1f035acebce5350b0a5a0b33cc00 /posix/bug-regex26.c | |
parent | a640393a18329ef4044bf9213f6466cd2d1e69f3 (diff) | |
download | glibc-c15f8eb50cea7ad1a4ccece6e0982bf426d52c00.tar glibc-c15f8eb50cea7ad1a4ccece6e0982bf426d52c00.tar.gz glibc-c15f8eb50cea7ad1a4ccece6e0982bf426d52c00.tar.bz2 glibc-c15f8eb50cea7ad1a4ccece6e0982bf426d52c00.zip |
x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]
On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve
the first 8 vector registers. The code layout is
if only %xmm0 - %xmm7 registers are used
preserve %xmm0 - %xmm7 registers
if only %ymm0 - %ymm7 registers are used
preserve %ymm0 - %ymm7 registers
preserve %zmm0 - %zmm7 registers
Branch predication always executes the fallthrough code path to preserve
%zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7
registers are used. This leads to lower CPU frequency on Skylake
server. This patch changes the fallthrough code path to preserve
%xmm0 - %xmm7 registers instead:
if whole %zmm0 - %zmm7 registers are used
preserve %zmm0 - %zmm7 registers
if only %ymm0 - %ymm7 registers are used
preserve %ymm0 - %ymm7 registers
preserve %xmm0 - %xmm7 registers
Tested on Skylake server.
[BZ #21258]
* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt):
Define only if _dl_runtime_resolve is defined to
_dl_runtime_resolve_sse_vex.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt):
Fallthrough to _dl_runtime_resolve_sse_vex.
Diffstat (limited to 'posix/bug-regex26.c')
0 files changed, 0 insertions, 0 deletions