diff options
author | Joseph Myers <joseph@codesourcery.com> | 2017-09-20 16:54:05 +0000 |
---|---|---|
committer | Joseph Myers <joseph@codesourcery.com> | 2017-09-20 16:54:05 +0000 |
commit | ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5 (patch) | |
tree | 83340587a4086402e9f1686c278aa1a264ef77e7 /sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S | |
parent | a856d4d4a8a56eaefdddb58884bfa2bfe922ee4c (diff) | |
download | glibc-ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5.tar glibc-ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5.tar.gz glibc-ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5.tar.bz2 glibc-ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5.zip |
Add SSE4.1 trunc, truncf (bug 20142).
This patch adds SSE4.1 versions of trunc and truncf, using the roundsd
/ roundss instructions, similar to the versions of ceil, floor, rint
and nearbyint functions we already have. In my testing with the glibc
benchtests these are about 30% faster than the C versions for double,
20% faster for float.
Tested for x86_64.
[BZ #20142]
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1.
* sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file.
* sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
Diffstat (limited to 'sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S')
-rw-r--r-- | sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S b/sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S new file mode 100644 index 0000000000..ff3ed9c947 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S @@ -0,0 +1,25 @@ +/* trunc for SSE4.1. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <sysdep.h> + + .section .text.sse4.1,"ax",@progbits +ENTRY(__trunc_sse41) + roundsd $11, %xmm0, %xmm0 + ret +END(__trunc_sse41) |