diff options
author | Wilco Dijkstra <Wilco.Dijkstra@arm.com> | 2021-11-30 16:29:25 -0300 |
---|---|---|
committer | Adhemerval Zanella <adhemerval.zanella@linaro.org> | 2021-12-13 09:02:34 -0300 |
commit | ccfa865a82c648fde56864ea094f70ee1a8a944b (patch) | |
tree | 5d05ad55e786295bb8c49afa244ac9646a8ee98a /sysdeps/ieee754 | |
parent | 6c848d70383e1dbe932ef41723ac0abfdeec7ca8 (diff) | |
download | glibc-ccfa865a82c648fde56864ea094f70ee1a8a944b.tar glibc-ccfa865a82c648fde56864ea094f70ee1a8a944b.tar.gz glibc-ccfa865a82c648fde56864ea094f70ee1a8a944b.tar.bz2 glibc-ccfa865a82c648fde56864ea094f70ee1a8a944b.zip |
math: Improve hypot performance with FMA
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency. The non-fma version has 30% higher throughput and 10%
higher latency.
Max ULP error is 0.949 with fma and 0.792 without fma.
Passes GLIBC testsuite.
Diffstat (limited to 'sysdeps/ieee754')
-rw-r--r-- | sysdeps/ieee754/dbl-64/e_hypot.c | 17 |
1 files changed, 16 insertions, 1 deletions
diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c index 75bce2df4e..6fedf0d61f 100644 --- a/sysdeps/ieee754/dbl-64/e_hypot.c +++ b/sysdeps/ieee754/dbl-64/e_hypot.c @@ -26,7 +26,11 @@ rounding mode. - Handle required underflow exception for subnormal results. - The expected ULP is ~0.792. + The expected ULP is ~0.792 or ~0.948 if FMA is used. For FMA, the + correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP + if x^2 + y^2 is computed with less than 0.707 ULP error. If |x| >= |2y|, + fma (x, x, y^2) has ~0.625 ULP. If |x| < |2y|, fma (|2x|, |y|, (x - y)^2) + has ~0.625 ULP. [1] https://arxiv.org/pdf/1904.09481.pdf */ @@ -48,6 +52,16 @@ static inline double kernel (double ax, double ay) { double t1, t2; +#ifdef __FP_FAST_FMA + t1 = ay + ay; + t2 = ax - ay; + + if (t1 >= ax) + return sqrt (fma (t1, ax, t2 * t2)); + else + return sqrt (fma (ax, ax, ay * ay)); + +#else double h = sqrt (ax * ax + ay * ay); if (h <= 2.0 * ay) { @@ -64,6 +78,7 @@ kernel (double ax, double ay) h -= (t1 + t2) / (2.0 * h); return h; +#endif } double |