Age | Commit message (Collapse) | Author |
|
Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.
This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.
This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.
This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.
getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.
The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-s390x.S. The final state register clearing is
omitted.
On a z15 it shows the following improvements (using formatted
bench-arc4random data):
GENERIC MB/s
-----------------------------------------------
arc4random [single-thread] 198.92
arc4random_buf(16) [single-thread] 244.49
arc4random_buf(32) [single-thread] 282.73
arc4random_buf(48) [single-thread] 286.64
arc4random_buf(64) [single-thread] 320.06
arc4random_buf(80) [single-thread] 297.43
arc4random_buf(96) [single-thread] 310.96
arc4random_buf(112) [single-thread] 308.10
arc4random_buf(128) [single-thread] 309.90
-----------------------------------------------
VX. MB/s
-----------------------------------------------
arc4random [single-thread] 430.26
arc4random_buf(16) [single-thread] 735.14
arc4random_buf(32) [single-thread] 1029.99
arc4random_buf(48) [single-thread] 1206.76
arc4random_buf(64) [single-thread] 1311.92
arc4random_buf(80) [single-thread] 1378.74
arc4random_buf(96) [single-thread] 1445.06
arc4random_buf(112) [single-thread] 1484.32
arc4random_buf(128) [single-thread] 1517.30
-----------------------------------------------
Checked on s390x-linux-gnu.
|
|
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-ppc.c. It targets POWER8 and it is used on default
for LE.
On a POWER8 it shows the following improvements (using formatted
bench-arc4random data):
POWER8
GENERIC MB/s
-----------------------------------------------
arc4random [single-thread] 138.77
arc4random_buf(16) [single-thread] 174.36
arc4random_buf(32) [single-thread] 228.11
arc4random_buf(48) [single-thread] 252.31
arc4random_buf(64) [single-thread] 270.11
arc4random_buf(80) [single-thread] 278.97
arc4random_buf(96) [single-thread] 287.78
arc4random_buf(112) [single-thread] 291.92
arc4random_buf(128) [single-thread] 295.25
POWER8 MB/s
-----------------------------------------------
arc4random [single-thread] 198.06
arc4random_buf(16) [single-thread] 278.79
arc4random_buf(32) [single-thread] 448.89
arc4random_buf(48) [single-thread] 551.09
arc4random_buf(64) [single-thread] 646.12
arc4random_buf(80) [single-thread] 698.04
arc4random_buf(96) [single-thread] 756.06
arc4random_buf(112) [single-thread] 784.12
arc4random_buf(128) [single-thread] 808.04
-----------------------------------------------
Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
|
|
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-avx2.S. It is used only if AVX2 is supported
and enabled by the architecture.
As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.
On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):
SSE MB/s
-----------------------------------------------
arc4random [single-thread] 704.25
arc4random_buf(16) [single-thread] 1018.17
arc4random_buf(32) [single-thread] 1315.27
arc4random_buf(48) [single-thread] 1449.36
arc4random_buf(64) [single-thread] 1511.16
arc4random_buf(80) [single-thread] 1539.48
arc4random_buf(96) [single-thread] 1571.06
arc4random_buf(112) [single-thread] 1596.16
arc4random_buf(128) [single-thread] 1613.48
-----------------------------------------------
AVX2 MB/s
-----------------------------------------------
arc4random [single-thread] 922.61
arc4random_buf(16) [single-thread] 1478.70
arc4random_buf(32) [single-thread] 2241.80
arc4random_buf(48) [single-thread] 2681.28
arc4random_buf(64) [single-thread] 2913.43
arc4random_buf(80) [single-thread] 3009.73
arc4random_buf(96) [single-thread] 3141.16
arc4random_buf(112) [single-thread] 3254.46
arc4random_buf(128) [single-thread] 3305.02
-----------------------------------------------
Checked on x86_64-linux-gnu.
|
|
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S. It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.
As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.
On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):
GENERIC MB/s
-----------------------------------------------
arc4random [single-thread] 443.11
arc4random_buf(16) [single-thread] 552.27
arc4random_buf(32) [single-thread] 626.86
arc4random_buf(48) [single-thread] 649.81
arc4random_buf(64) [single-thread] 663.95
arc4random_buf(80) [single-thread] 674.78
arc4random_buf(96) [single-thread] 675.17
arc4random_buf(112) [single-thread] 680.69
arc4random_buf(128) [single-thread] 683.20
-----------------------------------------------
SSE MB/s
-----------------------------------------------
arc4random [single-thread] 704.25
arc4random_buf(16) [single-thread] 1018.17
arc4random_buf(32) [single-thread] 1315.27
arc4random_buf(48) [single-thread] 1449.36
arc4random_buf(64) [single-thread] 1511.16
arc4random_buf(80) [single-thread] 1539.48
arc4random_buf(96) [single-thread] 1571.06
arc4random_buf(112) [single-thread] 1596.16
arc4random_buf(128) [single-thread] 1613.48
-----------------------------------------------
Checked on x86_64-linux-gnu.
|
|
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-aarch64.S. It is used as default and only
little-endian is supported (BE uses generic code).
As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.
On a virtualized Linux on Apple M1 it shows the following
improvements (using formatted bench-arc4random data):
GENERIC MB/s
-----------------------------------------------
arc4random [single-thread] 380.89
arc4random_buf(16) [single-thread] 500.73
arc4random_buf(32) [single-thread] 552.61
arc4random_buf(48) [single-thread] 566.82
arc4random_buf(64) [single-thread] 574.01
arc4random_buf(80) [single-thread] 581.02
arc4random_buf(96) [single-thread] 591.19
arc4random_buf(112) [single-thread] 592.29
arc4random_buf(128) [single-thread] 596.43
-----------------------------------------------
OPTIMIZED MB/s
-----------------------------------------------
arc4random [single-thread] 569.60
arc4random_buf(16) [single-thread] 825.78
arc4random_buf(32) [single-thread] 987.03
arc4random_buf(48) [single-thread] 1042.39
arc4random_buf(64) [single-thread] 1075.50
arc4random_buf(80) [single-thread] 1094.68
arc4random_buf(96) [single-thread] 1130.16
arc4random_buf(112) [single-thread] 1129.58
arc4random_buf(128) [single-thread] 1137.91
-----------------------------------------------
Checked on aarch64-linux-gnu.
|
|
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:
sed -ri '
s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
$(find $(git ls-files) -prune -type f \
! -name '*.po' \
! -name 'ChangeLog*' \
! -path COPYING ! -path COPYING.LIB \
! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
! -path manual/texinfo.tex ! -path scripts/config.guess \
! -path scripts/config.sub ! -path scripts/install-sh \
! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
! -path INSTALL ! -path locale/programs/charmap-kw.h \
! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
! '(' -name configure \
-execdir test -f configure.ac -o -f configure.in ';' ')' \
! '(' -name preconfigure \
-execdir test -f preconfigure.ac ';' ')' \
-print)
and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:
chmod a+x sysdeps/unix/sysv/linux/riscv/configure
# Omit irrelevant whitespace and comment-only changes,
# perhaps from a slightly-different Autoconf version.
git checkout -f \
sysdeps/csky/configure \
sysdeps/hppa/configure \
sysdeps/riscv/configure \
sysdeps/unix/sysv/linux/csky/configure
# Omit changes that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
git checkout -f \
sysdeps/powerpc/powerpc64/ppc-mcount.S \
sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
# Omit change that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
|
|
The license does not allow modification.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
This provides an implementation of the IDNA2008 standard and fixes
CVE-2016-6261, CVE-2016-6263, CVE-2017-14062.
|
|
|
|
|
|
* manual/contrib.texi: Removed licenses, added acknowledgements
for contributions by Intel, IBM, Craig Metz.
* LICENSES: New file, contains the text of all non-FSF licenses in the
distribution that require putting the notice in the accompanying
documentation.
* README.template, README: Mention LICENSES.
* sysdeps/mach/hurd/net/if_ppp.h: Replaced CMU license with a
new one modelled on the modern BSD license, per recent letter
of permission from CMU.
* sysdeps/unix/sysv/linux/net/if_ppp.h: Likewise.
* sysdeps/ieee754/dbl-64/MathLib.h: Changed the copyright holder
from IBM to FSF, per the recent Software Letter. Changed the
distribution terms from GPL to LGPL.
* sysdeps/ieee754/dbl-64/asincos.tbl: Added FSF copyright and
copying permission notice (Lesser GPL), per recent IBM Software Letter.
* sysdeps/ieee754/dbl-64/powtwo.tbl: Likewise.
* sysdeps/ieee754/dbl-64/root.tbl: Likewise.
* sysdeps/ieee754/dbl-64/sincos.tbl: Likewise.
* sysdeps/ieee754/dbl-64/uatan.tbl: Likewise.
* sysdeps/ieee754/dbl-64/uexp.tbl: Likewise.
* sysdeps/ieee754/dbl-64/ulog.tbl: Likewise.
* sysdeps/ieee754/dbl-64/upow.tbl: Likewise.
* sysdeps/ieee754/dbl-64/utan.tbl: Likewise.
* sysdeps/ieee754/dbl-64/atnat.h: Changed the copyright holder
from IBM to FSF, per the recent Software Letter. Corrected the
text of the copying permission notice to say Lesser GPL instead
of GPL in warranty disclaimer paragraph.
* sysdeps/ieee754/dbl-64/atnat2.h: Likewise.
* sysdeps/ieee754/dbl-64/branred.h: Likewise.
* sysdeps/ieee754/dbl-64/dla.h: Likewise.
* sysdeps/ieee754/dbl-64/doasin.h: Likewise.
* sysdeps/ieee754/dbl-64/dosincos.h: Likewise.
* sysdeps/ieee754/dbl-64/mpa.h: Likewise.
* sysdeps/ieee754/dbl-64/mpa2.h: Likewise.
* sysdeps/ieee754/dbl-64/mpatan.h: Likewise.
* sysdeps/ieee754/dbl-64/mpexp.h: Likewise.
* sysdeps/ieee754/dbl-64/mplog.h: Likewise.
* sysdeps/ieee754/dbl-64/mpsqrt.h: Likewise.
* sysdeps/ieee754/dbl-64/mydefs.h: Likewise.
* sysdeps/ieee754/dbl-64/sincos32.h: Likewise.
* sysdeps/ieee754/dbl-64/uasncs.h: Likewise.
* sysdeps/ieee754/dbl-64/uexp.h: Likewise.
* sysdeps/ieee754/dbl-64/ulog.h: Likewise.
* sysdeps/ieee754/dbl-64/upow.h: Likewise.
* sysdeps/ieee754/dbl-64/urem.h: Likewise.
* sysdeps/ieee754/dbl-64/uroot.h: Likewise.
* sysdeps/ieee754/dbl-64/usncs.h: Likewise.
* sysdeps/ieee754/dbl-64/utan.h: Likewise.
* sysdeps/ieee754/dbl-64/branred.c: Corrected the text of the copying
permission notice to say Lesser GPL instead of GPL in warranty
disclaimer paragraph.
* sysdeps/ieee754/dbl-64/doasin.c: Likewise.
* sysdeps/ieee754/dbl-64/dosincos.c: Likewise.
* sysdeps/ieee754/dbl-64/e_asin.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Likewise.
* sysdeps/ieee754/dbl-64/e_log.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_remainder.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/halfulp.c: Likewise.
* sysdeps/ieee754/dbl-64/mpa.c: Likewise.
* sysdeps/ieee754/dbl-64/mpatan.c: Likewise.
* sysdeps/ieee754/dbl-64/mpatan2.c: Likewise.
* sysdeps/ieee754/dbl-64/mpexp.c: Likewise.
* sysdeps/ieee754/dbl-64/mplog.c: Likewise.
* sysdeps/ieee754/dbl-64/mpsqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/mptan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/sincos32.c: Likewise.
* sysdeps/ieee754/dbl-64/slowexp.c: Likewise.
* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
|