Discussion:
[blfs-support] Unbound slow to start with recent kernels on some machines
Ken Moffat
2018-06-02 21:02:39 UTC
Permalink
I've been seeing problems on some of my machines with recent kernels
(first noticed in 4.17-rc, but it also now happends in 4.16.4 or
later). The problem is that instead of unbound taking a handful of
seconds to start (often, it is all-but immediate), on the affected
machines it now takes up to two and a half minutes.

Twice I tried bisecting kernel changes, but I think there is more
than one change involved. The identified change in 4.16.stable
appeared unrelated, and reverting it from what was then 4.16.currnet
did not help. In 4.17 a merge commit got blamed, but I found the
details on google and reverting again made no difference. Also at
one point in the bisection, after a 'bad' kernel I went back to a
kernel I had previously labelled as 'good' and that time it too was
very slow to boot.

I have, of course, attempted to compare the kernel configs for the
affected and unaffected machines, but I dropped a lot of items,
added a few more, and did not get any joy.

So, for the moment I'm offering a workaround (patch the bootscript,
if anybody uses systemd and unbound this will obviously not help).

This separates unbound-anchor from unbound, and adds some verbosity.
On the first boot after unbound is installed, the key file in /etc
will not be installed - give it a few seconds (maybe you will not be
affected!) and then key <Ctrl-C>. You might need to key that twice,
and perhaps follow it by <enter>.

On the first boot, the modified bootscript will report FAILED for
the anchor, but unbound should then succeed (all-but immediately).
On subsequent boots there will be a report that the key file exists,
at that point key <Ctrl-C> etc again and both parts of the script
should report OK.

NB this applies to both previous and currnet versions of unbound.

Äžen
--
Keyboard not found, Press F1 to continue
Ken Moffat
2018-06-24 03:11:27 UTC
Permalink
Post by Ken Moffat
I've been seeing problems on some of my machines with recent kernels
(first noticed in 4.17-rc, but it also now happends in 4.16.4 or
later). The problem is that instead of unbound taking a handful of
seconds to start (often, it is all-but immediate), on the affected
machines it now takes up to two and a half minutes.
[...]
Post by Ken Moffat
So, for the moment I'm offering a workaround (patch the bootscript,
if anybody uses systemd and unbound this will obviously not help).
Just a minor update on this - I've updated one of the machines where
I didn't have a problem - a new system on 4.17.1 seemed a little
slower to boot unbound, but succeeded. But I've now updated it to
4.17.2 and it *appears* to hang (maybe not exactly the same, keying
a _single_ Ctrl-C after the message about the keyfile seems to let
it continue).

Which, I guess, promptes the question - does anybody else actually
use unbound ?

ĸen
--
Keyboard not found, Press F1 to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blf
Bruce Dubbs
2018-06-24 15:28:55 UTC
Permalink
Post by Ken Moffat
Post by Ken Moffat
I've been seeing problems on some of my machines with recent kernels
(first noticed in 4.17-rc, but it also now happends in 4.16.4 or
later). The problem is that instead of unbound taking a handful of
seconds to start (often, it is all-but immediate), on the affected
machines it now takes up to two and a half minutes.
[...]
Post by Ken Moffat
So, for the moment I'm offering a workaround (patch the bootscript,
if anybody uses systemd and unbound this will obviously not help).
Just a minor update on this - I've updated one of the machines where
I didn't have a problem - a new system on 4.17.1 seemed a little
slower to boot unbound, but succeeded. But I've now updated it to
4.17.2 and it *appears* to hang (maybe not exactly the same, keying
a _single_ Ctrl-C after the message about the keyfile seems to let
it continue).
Which, I guess, promptes the question - does anybody else actually
use unbound ?
I do not use it, but I might suggest watching udp traffic with wireshark
while it is starting.

Also, you might want to see if you get the same delays with the bind
server. What you describe seem to me to be timeouts and that could be
upstream or ISP issues.

-- Bruce
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/bl
Ken Moffat
2018-06-24 20:36:52 UTC
Permalink
Post by Bruce Dubbs
Post by Ken Moffat
Which, I guess, promptes the question - does anybody else actually
use unbound ?
I do not use it, but I might suggest watching udp traffic with wireshark
while it is starting.
Also, you might want to see if you get the same delays with the bind server.
What you describe seem to me to be timeouts and that could be upstream or
ISP issues.
If I stop it after booting, and then start it again, it starts
immediately.

There have been recent kernel changes regarding entropy, and when I
managed to find an online page about unbound and random it mentioned
that if running in a chroot, /dev/random needs to accessible.

https://www.unbound.net/documentation/unbound.conf.html

My impression is that the kernel now needs more entropy before
/dev/random can beaccessed without hanging. And google has a
report re running in a KVM -

https://www.unbound.net/pipermail/unbound-users/2018-May/005273.html

From my last cold boot:

Jun 22 20:01:13 origin kernel: [ 0.000000] random: get_random_bytes called from start_kernel+0x7f/0x57f with crng_init=0
...
Jun 22 20:01:13 origin kernel: [ 0.428419] random: fast init done
...
Jun 22 20:01:13 origin smartd[823]: smartd has fork()ed into background mode. New PID=823.
(an S21 script)
Jun 22 20:01:16 origin kernel: [ 5.283445] r8169 0000:25:00.0 eth0: link up
Jun 22 20:01:16 origin kernel: [ 5.283453] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jun 22 20:01:25 origin kernel: [ 14.684668] random: crng init done
Jun 22 20:01:25 origin kernel: [ 14.684671] random: 5 urandom warning(s) missed due to ratelimiting
Jun 22 20:01:25 origin unbound: [870:0] notice: init module 0: validator
Jun 22 20:01:25 origin unbound: [870:0] notice: init module 1: iterator
Jun 22 20:01:25 origin unbound: [870:0] info: start of service (unbound 1.7.2).

So I'm guessing that keying Ctrl-C one or more times, and possibly
<enter>, generates enough entropy.

I have the random script at S25, but unbound is at S21. The script
intialises /dev/urandom, of course, but unbound is supposed to fall
back to that (although the post on unbound-users suggests it didn't
in that setup).

According to the documentation for the unbound bootscript, it
relies on network (S20) so moving random to S21 would give me

S20network
S21random
S21smartd
S21unbound

I'm not intending to reboot at the moment (on this machine I
suspend, on the other main desktop I hibernate), will try to
remember to change that on whichever machine I next intend to
reboot.

ĸen
--
Keyboard not found, Press F1 to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information pag
Ken Moffat
2018-06-25 17:46:12 UTC
Permalink
Post by Ken Moffat
So I'm guessing that keying Ctrl-C one or more times, and possibly
<enter>, generates enough entropy.
I have the random script at S25, but unbound is at S21. The script
intialises /dev/urandom, of course, but unbound is supposed to fall
back to that (although the post on unbound-users suggests it didn't
in that setup).
According to the documentation for the unbound bootscript, it
relies on network (S20) so moving random to S21 would give me
S20network
S21random
S21smartd
S21unbound
I'm not intending to reboot at the moment (on this machine I
suspend, on the other main desktop I hibernate), will try to
remember to change that on whichever machine I next intend to
reboot.
Tried that on my haswell, where I had needed to use Ctrl-C, Ctrl-C,
<enter> to (I originally thought) stop whatever was hanging. No
change, but just keying Ctrl-C three times was good enough - and I
now guess that hitting any key(s) three tiem will do.

ĸen
--
Keyboard not found, Press F1 to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above infor
Ken Moffat
2018-06-26 02:55:47 UTC
Permalink
Post by Ken Moffat
Post by Ken Moffat
So I'm guessing that keying Ctrl-C one or more times, and possibly
<enter>, generates enough entropy.
[...]
Post by Ken Moffat
Tried that on my haswell, where I had needed to use Ctrl-C, Ctrl-C,
<enter> to (I originally thought) stop whatever was hanging. No
change, but just keying Ctrl-C three times was good enough - and I
now guess that hitting any key(s) three tiem will do.
Nope, Ctrl-C (possibly by sending a signal) generates more entropy
than a plain keypress, I had to thump a few keys to continue.

Like redhat bug 1572916 - in that case, gcrypt was making a blocking
call. That also says an early 4.18 kernel (during the merge window)
seems to have fixed it. Maybe 4.18-rc2 will be good enough and not
bring any new regressions (slim chance of the latter).

A reddit post re sddm suggests using init-tools to feed output from
the hardware RNG into the kernel. But posts by the kernel random
maintainer show he is worried that hardware RNGs have been
backdoored to be predictable.

Looks as if the root cause is fixing CVE-2018-1108. I wonder how
*late* we could successfully start unbound ?

Looking at other links, one of the sources of entropy is rotational
hard disks. My modern desktops don't have any of those.

<sigh/>
--
Keyboard not found, Press F1 to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/fa
Richard Melville
2018-06-26 09:22:15 UTC
Permalink
Post by Ken Moffat
Looking at other links, one of the sources of entropy is rotational
hard disks. My modern desktops don't have any of those.
Ken, if it's an entropy problem are you using haveged? It may also be
worth testing your entropy with rngtest.

Whilst I'm here, thanks for all your hard work over the years; you
certainly deserve a break.

Richard
Ken Moffat
2018-06-26 20:36:20 UTC
Permalink
Post by Richard Melville
Post by Ken Moffat
Looking at other links, one of the sources of entropy is rotational
hard disks. My modern desktops don't have any of those.
Ken, if it's an entropy problem are you using haveged? It may also be
worth testing your entropy with rngtest.
I got as far as reporting the entropy when I tried to start unbound on
one machine, with 4.17.3 it was absolutely tiny. Until now I
haven't added anything outside BLFS for randomness, things used to
apparently "just work" and I dislike creating bootscripts.

The links from Haveged in the Arch wiki are "interesting",
particularly 'LCE: Do not play dice with random numbers' in the
Warning.

Looks as if rng-tools might be a better bet (assuming my machines
all have one) - but Ted T'so suspects the hardware RNGs have been
back-doored by government agencies.

I might just keep thumping the keyboard when I have to boot.
Post by Richard Melville
Whilst I'm here, thanks for all your hard work over the years; you
certainly deserve a break.
Richard
Cheers.

ĸen
--
Keyboard not found, Press F1 to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/b
Ken Moffat
2018-07-19 01:04:00 UTC
Permalink
Post by Ken Moffat
I've been seeing problems on some of my machines with recent kernels
(first noticed in 4.17-rc, but it also now happends in 4.16.4 or
later). The problem is that instead of unbound taking a handful of
seconds to start (often, it is all-but immediate), on the affected
machines it now takes up to two and a half minutes.
Finally, making slow progress on this. The problem is caused by the
fix for CVE-2018-1108. A little while ago Ted Ts'o offered a patch,
possibly as an RFC, to use entropy from the hwrng (unsafe for
critical things like key generation, but it allows less-important
things, e.g. in systemd units, to run and therefore it lets the box
boot in the absence of real entropy.

Apparently he did this because fedora are starting to derive
"entropy" from jitter so that e.g. VMs can boot in a meaningful
time.

For my haswell that was great, but for my kaveri it made no
difference - turns out that the kaveri does NOT have a hwrng (I
enabled the option, and /dev/hwrng exists, but reading it with dd
reports 'No such file').

And the patch which introduced this fix can no-longer be reverted,
parts of the file, at least in 4.18-rc5, have been rewritten.

What I will now be looking at is twofold:

1. start the random bootscript earlier (currently it is S25, but
unbound is S21; S15 - just after sysklogd - looks likely).
For systemd, I've no idea how to change the dependencies.

AND

2. persuade unbound to use /dev/urandom.

Googling, mostly unsuccessfully, I found that Nixos create
/var/lib/unbound/dev/random (sic) with /var/lib/unbound as the home
directory for the unbound user, and binds /dev/urandom to it. They
also seem to move the root key, and perhaps unbound.conf, to that
directory. So, as well as moving the random script, the unbound
bootscript needs to be modified (and unmount afterwards).

To recap, only some of my machines with an SSD (and no 'spinning
rust') are affected.

The alternative for the second part is to hack unbound. In 1.7.1,
the compat/getentropy_linux.c file has:

#if defined(SYS_getrandom) && defined(__NR_getrandom)
/*
* Try descriptor-less getrandom()
*/
ret = getentropy_getrandom(buf, len);
if (ret != -1)
return (ret);
if (errno != ENOSYS)
return (-1);
#endif

/*
* Try to get entropy with /dev/urandom
*
* This can fail if the process is inside a chroot or if file
* descriptors are exhausted.
*/
ret = getentropy_urandom(buf, len);
if (ret != -1)
return (ret);

#ifdef SYS__sysctl
/*
* Try to use sysctl CTL_KERN, KERN_RANDOM, RANDOM_UUID.
* sysctl is a failsafe API, so it guarantees a result. This
* should work inside a chroot, or when file descriptors are
* exhausted.
*
* However this can fail if the Linux kernel removes support
* for sysctl. Starting in 2007, there have been efforts to
* deprecate the sysctl API/ABI, and push callers towards use
* of the chroot-unavailable fd-using /proc mechanism --
* essentially the same problems as /dev/urandom.
*
* Numerous setbacks have been encountered in their deprecation
* schedule, so as of June 2014 the kernel ABI still exists on
* most Linux architectures. The sysctl() stub in libc is missing
* on some systems. There are also reports that some kernels
* spew messages to the console.
*/
ret = getentropy_sysctl(buf, len);
if (ret != -1)
return (ret);
#endif /* SYS__sysctl */

If it gets to this point, on linux it then uses
getentropy_fallback().

What is happening is that it hangs until hammering on the keyboard
has generated enough entropy, so I'm currently assuming that the
initial ret = getentropy_getrandom(buf, len); now blocks until
sufficient entropy is available - and that is the expected behaviour
on linux.

To be honest, deleting that chunk of code looks easiest, but it
brings an ongoing maintenance commitment (1.7.1 is no longer
current, and whatever else happens there will probably be newer
versions in the future). This is the sort of case where I like
patches, they either apply to a new version, or they don't (whereas
deleting lines in sed might remove the wrong content).

For the unbound systemd unit, again I have no idea what to change.

Opinions on whether it is better to change the bootscript (assuming
that works) or hack the code ? In either case, urandom needs to be
seeded earlier.

Either way, this is not my number one priority. But it would be
nice to fix it before 8.3.

ĸen
--
Entropy not found, thump keyboard to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Bruce Dubbs
2018-07-19 03:47:43 UTC
Permalink
Post by Ken Moffat
Post by Ken Moffat
I've been seeing problems on some of my machines with recent kernels
(first noticed in 4.17-rc, but it also now happends in 4.16.4 or
later). The problem is that instead of unbound taking a handful of
seconds to start (often, it is all-but immediate), on the affected
machines it now takes up to two and a half minutes.
Finally, making slow progress on this. The problem is caused by the
fix for CVE-2018-1108. A little while ago Ted Ts'o offered a patch,
possibly as an RFC, to use entropy from the hwrng (unsafe for
critical things like key generation, but it allows less-important
things, e.g. in systemd units, to run and therefore it lets the box
boot in the absence of real entropy.
Apparently he did this because fedora are starting to derive
"entropy" from jitter so that e.g. VMs can boot in a meaningful
time.
For my haswell that was great, but for my kaveri it made no
difference - turns out that the kaveri does NOT have a hwrng (I
enabled the option, and /dev/hwrng exists, but reading it with dd
reports 'No such file').
And the patch which introduced this fix can no-longer be reverted,
parts of the file, at least in 4.18-rc5, have been rewritten.
1. start the random bootscript earlier (currently it is S25, but
unbound is S21; S15 - just after sysklogd - looks likely).
For systemd, I've no idea how to change the dependencies.
AND
2. persuade unbound to use /dev/urandom.
Googling, mostly unsuccessfully, I found that Nixos create
/var/lib/unbound/dev/random (sic) with /var/lib/unbound as the home
directory for the unbound user, and binds /dev/urandom to it. They
also seem to move the root key, and perhaps unbound.conf, to that
directory. So, as well as moving the random script, the unbound
bootscript needs to be modified (and unmount afterwards).
To recap, only some of my machines with an SSD (and no 'spinning
rust') are affected.
The alternative for the second part is to hack unbound. In 1.7.1,
#if defined(SYS_getrandom) && defined(__NR_getrandom)
/*
* Try descriptor-less getrandom()
*/
ret = getentropy_getrandom(buf, len);
if (ret != -1)
return (ret);
if (errno != ENOSYS)
return (-1);
#endif
/*
* Try to get entropy with /dev/urandom
*
* This can fail if the process is inside a chroot or if file
* descriptors are exhausted.
*/
ret = getentropy_urandom(buf, len);
if (ret != -1)
return (ret);
#ifdef SYS__sysctl
/*
* Try to use sysctl CTL_KERN, KERN_RANDOM, RANDOM_UUID.
* sysctl is a failsafe API, so it guarantees a result. This
* should work inside a chroot, or when file descriptors are
* exhausted.
*
* However this can fail if the Linux kernel removes support
* for sysctl. Starting in 2007, there have been efforts to
* deprecate the sysctl API/ABI, and push callers towards use
* of the chroot-unavailable fd-using /proc mechanism --
* essentially the same problems as /dev/urandom.
*
* Numerous setbacks have been encountered in their deprecation
* schedule, so as of June 2014 the kernel ABI still exists on
* most Linux architectures. The sysctl() stub in libc is missing
* on some systems. There are also reports that some kernels
* spew messages to the console.
*/
ret = getentropy_sysctl(buf, len);
if (ret != -1)
return (ret);
#endif /* SYS__sysctl */
If it gets to this point, on linux it then uses
getentropy_fallback().
What is happening is that it hangs until hammering on the keyboard
has generated enough entropy, so I'm currently assuming that the
initial ret = getentropy_getrandom(buf, len); now blocks until
sufficient entropy is available - and that is the expected behaviour
on linux.
To be honest, deleting that chunk of code looks easiest, but it
brings an ongoing maintenance commitment (1.7.1 is no longer
current, and whatever else happens there will probably be newer
versions in the future). This is the sort of case where I like
patches, they either apply to a new version, or they don't (whereas
deleting lines in sed might remove the wrong content).
For the unbound systemd unit, again I have no idea what to change.
Opinions on whether it is better to change the bootscript (assuming
that works) or hack the code ? In either case, urandom needs to be
seeded earlier.
Either way, this is not my number one priority. But it would be
nice to fix it before 8.3.
Have you tried using haveged? It's boot order is S21 and will start
slightly before unbound. That still leaves the problem of unbound using
/dev/urandom, but it may help.

-- Bruce
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See th
Douglas R. Reno
2018-07-19 12:45:16 UTC
Permalink
Post by Ken Moffat
Post by Ken Moffat
I've been seeing problems on some of my machines with recent kernels
(first noticed in 4.17-rc, but it also now happends in 4.16.4 or
later). The problem is that instead of unbound taking a handful of
seconds to start (often, it is all-but immediate), on the affected
machines it now takes up to two and a half minutes.
Finally, making slow progress on this. The problem is caused by the
fix for CVE-2018-1108. A little while ago Ted Ts'o offered a patch,
possibly as an RFC, to use entropy from the hwrng (unsafe for
critical things like key generation, but it allows less-important
things, e.g. in systemd units, to run and therefore it lets the box
boot in the absence of real entropy.
Apparently he did this because fedora are starting to derive
"entropy" from jitter so that e.g. VMs can boot in a meaningful
time.
For my haswell that was great, but for my kaveri it made no
difference - turns out that the kaveri does NOT have a hwrng (I
enabled the option, and /dev/hwrng exists, but reading it with dd
reports 'No such file').
And the patch which introduced this fix can no-longer be reverted,
parts of the file, at least in 4.18-rc5, have been rewritten.
1. start the random bootscript earlier (currently it is S25, but
unbound is S21; S15 - just after sysklogd - looks likely).
For systemd, I've no idea how to change the dependencies.
While option 2 is nice, for systemd, it'll be a one-liner configuration
change.

We could probably even do it as a sed.

We'd have to change it to Requires=haveged
Richard Melville
2018-07-19 09:27:56 UTC
Permalink
Post by Bruce Dubbs
Post by Ken Moffat
I've been seeing problems on some of my machines with recent kernels
Post by Ken Moffat
(first noticed in 4.17-rc, but it also now happends in 4.16.4 or
later). The problem is that instead of unbound taking a handful of
seconds to start (often, it is all-but immediate), on the affected
machines it now takes up to two and a half minutes.
Finally, making slow progress on this. The problem is caused by the
fix for CVE-2018-1108. A little while ago Ted Ts'o offered a patch,
possibly as an RFC, to use entropy from the hwrng (unsafe for
critical things like key generation, but it allows less-important
things, e.g. in systemd units, to run and therefore it lets the box
boot in the absence of real entropy.
Apparently he did this because fedora are starting to derive
"entropy" from jitter so that e.g. VMs can boot in a meaningful
time.
For my haswell that was great, but for my kaveri it made no
difference - turns out that the kaveri does NOT have a hwrng (I
enabled the option, and /dev/hwrng exists, but reading it with dd
reports 'No such file').
And the patch which introduced this fix can no-longer be reverted,
parts of the file, at least in 4.18-rc5, have been rewritten.
1. start the random bootscript earlier (currently it is S25, but
unbound is S21; S15 - just after sysklogd - looks likely).
For systemd, I've no idea how to change the dependencies.
AND
2. persuade unbound to use /dev/urandom.
Googling, mostly unsuccessfully, I found that Nixos create
/var/lib/unbound/dev/random (sic) with /var/lib/unbound as the home
directory for the unbound user, and binds /dev/urandom to it. They
also seem to move the root key, and perhaps unbound.conf, to that
directory. So, as well as moving the random script, the unbound
bootscript needs to be modified (and unmount afterwards).
To recap, only some of my machines with an SSD (and no 'spinning
rust') are affected.
The alternative for the second part is to hack unbound. In 1.7.1,
#if defined(SYS_getrandom) && defined(__NR_getrandom)
/*
* Try descriptor-less getrandom()
*/
ret = getentropy_getrandom(buf, len);
if (ret != -1)
return (ret);
if (errno != ENOSYS)
return (-1);
#endif
/*
* Try to get entropy with /dev/urandom
*
* This can fail if the process is inside a chroot or if file
* descriptors are exhausted.
*/
ret = getentropy_urandom(buf, len);
if (ret != -1)
return (ret);
#ifdef SYS__sysctl
/*
* Try to use sysctl CTL_KERN, KERN_RANDOM, RANDOM_UUID.
* sysctl is a failsafe API, so it guarantees a result. This
* should work inside a chroot, or when file descriptors are
* exhausted.
*
* However this can fail if the Linux kernel removes support
* for sysctl. Starting in 2007, there have been efforts to
* deprecate the sysctl API/ABI, and push callers towards use
* of the chroot-unavailable fd-using /proc mechanism --
* essentially the same problems as /dev/urandom.
*
* Numerous setbacks have been encountered in their deprecation
* schedule, so as of June 2014 the kernel ABI still exists on
* most Linux architectures. The sysctl() stub in libc is missing
* on some systems. There are also reports that some kernels
* spew messages to the console.
*/
ret = getentropy_sysctl(buf, len);
if (ret != -1)
return (ret);
#endif /* SYS__sysctl */
If it gets to this point, on linux it then uses
getentropy_fallback().
What is happening is that it hangs until hammering on the keyboard
has generated enough entropy, so I'm currently assuming that the
initial ret = getentropy_getrandom(buf, len); now blocks until
sufficient entropy is available - and that is the expected behaviour
on linux.
To be honest, deleting that chunk of code looks easiest, but it
brings an ongoing maintenance commitment (1.7.1 is no longer
current, and whatever else happens there will probably be newer
versions in the future). This is the sort of case where I like
patches, they either apply to a new version, or they don't (whereas
deleting lines in sed might remove the wrong content).
For the unbound systemd unit, again I have no idea what to change.
Opinions on whether it is better to change the bootscript (assuming
that works) or hack the code ? In either case, urandom needs to be
seeded earlier.
Either way, this is not my number one priority. But it would be
nice to fix it before 8.3.
Have you tried using haveged? It's boot order is S21 and will start
slightly before unbound. That still leaves the problem of unbound using
/dev/urandom, but it may help.
I already suggested that -- Ken doesn't like it. I use SSDs and it works
for me.

Richard
Ken Moffat
2018-07-19 23:37:46 UTC
Permalink
Post by Richard Melville
Post by Bruce Dubbs
Post by Ken Moffat
Finally, making slow progress on this. The problem is caused by the
fix for CVE-2018-1108. A little while ago Ted Ts'o offered a patch,
possibly as an RFC, to use entropy from the hwrng (unsafe for
critical things like key generation, but it allows less-important
things, e.g. in systemd units, to run and therefore it lets the box
boot in the absence of real entropy.
Apparently he did this because fedora are starting to derive
"entropy" from jitter so that e.g. VMs can boot in a meaningful
time.
And that generation of jitter sounds very similar to what haveged
claims to provide.
Post by Richard Melville
Post by Bruce Dubbs
Have you tried using haveged? It's boot order is S21 and will start
slightly before unbound. That still leaves the problem of unbound using
/dev/urandom, but it may help.
I already suggested that -- Ken doesn't like it. I use SSDs and it works
for me.
Richard
Yeah, if you google about haveged you will quickly find links
mentioning that almost all its tests pass if fed by a constant
stream of '1' bits, e.g. mentioned in
https://lwn.net/Articles/525459/

And also note the reference to debian's past openssh problem - at
one time they generated only 32,767 possible SSH keys.

As long as the result is NOT used for important things (crypto key
generation, perhaps generating UUIDs), the quality of the randomness
does not usually matter too much.

I now contend that generating a random number to use when validating
DNS responses does not require high-quality randomness, and as
evidence I refer to the code I posted (taken originally from Open
BSD, according to its documentation, so I will describe it as
"paranoid by preference"). It tries to read /dev/random, and only
falls back to /dev/urandom if the read failed. But the correct
behaviour of /dev/random *on linux* is to hang forever until the
kernel determines it can provide the requested entropy.

By adding something from haveged, unbound will probably be able to
start quickly, like it used to before the kernel correctly checked
that initialisation was complete. But any subsequent need for a
high-quality random number will get lower-quality randomness.

Using /dev/urandom seems a better way of preserving quality
randomness for when it is needed, which is why I am reluctant to use
haveged (and since my kaveri doesn't have an rng, I can't use
rng-tools).

And (probably like most other people here), thinking through the
details makes my brain hurt and I might have missed or misunderstood
something. Fortunately I don't have to worry about the more severe
issues such as generating cryptographic keys in a VM, and people who
do have to deal with that have my respect.

ĸen
--
Entropy not found, thump keyboard to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.ht
Ken Moffat
2018-07-20 04:20:32 UTC
Permalink
Post by Ken Moffat
I now contend that generating a random number to use when validating
DNS responses does not require high-quality randomness, and as
evidence I refer to the code I posted (taken originally from Open
BSD, according to its documentation, so I will describe it as
"paranoid by preference"). It tries to read /dev/random, and only
falls back to /dev/urandom if the read failed. But the correct
behaviour of /dev/random *on linux* is to hang forever until the
kernel determines it can provide the requested entropy.
I'm going to investigate this. Starting from a faint hope that I
might get somewhere, I've raised #10964.

But -
I'm supposed to be stepping back, so "You ain't seen me: right?"
[ © The Fast Show, apparently known as Brilliant in the USA ]

ĸen
--
Entropy not found, thump keyboard to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe
Ken Moffat
2018-07-23 17:05:59 UTC
Permalink
Post by Ken Moffat
Post by Ken Moffat
I now contend that generating a random number to use when validating
DNS responses does not require high-quality randomness, and as
evidence I refer to the code I posted (taken originally from Open
BSD, according to its documentation, so I will describe it as
"paranoid by preference"). It tries to read /dev/random, and only
falls back to /dev/urandom if the read failed. But the correct
behaviour of /dev/random *on linux* is to hang forever until the
kernel determines it can provide the requested entropy.
I'm going to investigate this. Starting from a faint hope that I
might get somewhere, I've raised #10964.
After raising this on lkml, I've been assured that /dev/urandom is
still non-blocking and the applications (chronyd is also affected if
I start that before unbound) must be calling getrandom.

Also, after a sufficient length of time with haveged running, the
system should be adequate for generating long-lived keys. So I'll
have to live with haveged.

ĸen
--
Entropy not found, thump keyboard to continue
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blf
Richard Melville
2018-07-20 09:43:47 UTC
Permalink
Post by Ken Moffat
Post by Ken Moffat
I now contend that generating a random number to use when validating
DNS responses does not require high-quality randomness, and as
evidence I refer to the code I posted (taken originally from Open
BSD, according to its documentation, so I will describe it as
"paranoid by preference"). It tries to read /dev/random, and only
falls back to /dev/urandom if the read failed. But the correct
behaviour of /dev/random *on linux* is to hang forever until the
kernel determines it can provide the requested entropy.
I'm going to investigate this. Starting from a faint hope that I
might get somewhere, I've raised #10964.
But -
I'm supposed to be stepping back, so "You ain't seen me: right?"
[ © The Fast Show, apparently known as Brilliant in the USA ]
To the invisible man :-) I seem to remember reading this some time ago:-
https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged

Maybe you've already read it; I found it useful.

Richard
Loading...