Not to be snarky but OpenBSD's random system has always seemed very elegant to me and seems like such a system could avoid this failure mode in systemd, video[1], slides[2].
That said it's good that the faulty RDRAND was discovered. As pointed out this isn't the first time processors (AMD particular) have had such issues. Do we need to just be skeptical of this and run RDRAND tests every time a new processor comes out? Or perhaps even each microcode update?
> seems like such a system could avoid this failure mode in systemd, video[1], slides[2]
Yes and no.
Yes, OpenBSD is not vulnerable to this failure mode. But no, it's only because OpenBSD made assumptions that systemd couldn't.
The problem is not how well the random system performs once it started, but simply what to do when it's not initialized yet during early boot. From Page 20: Kernel initialization from boot, OpenBSD's initialization sequence is:
1. If available, use rdrand to generate random seeds.
2. Read random seed file from the disk, saved from the previous boots or installation, very early by the bootloader.
3. Read the Stack Protector cookie from the kernel binary, by the bootloader.
4. Mix them together. The pool is initialized at this point.
5. Keep collecting entropy from various sources of randomness, such as interrupts.
Nice, it ensures that you always get a initialized entropy pool, and hides all complexity away from the userspace. But do note that (2) and (3) provides no security during the first early boot if the same OS image is replicated across many machines.
---
Now what about Linux?
1. If available...
- use rdrand to generate random seeds. It's optional and can be disabled by the user.
- use hardware random number generators and TPMs, but only if they're compiled into the kernel (or loaded early enough) and trusted by the user.
- use the in-kernel jitter entropy collector, "a modern out-of-order CPU, even quite simple loops show a fair amount of hard-to-predict timing variability." But only available since Linux 5.3 released in 2019.
2. Read random seed file from the disk, saved from the previous boots. But it's the responsibility of the userspace to do that, this is problematic for systemd.
- systemd needs a source of randomness at an earlier time, before the entropy pool is seeded by the file. How did OpenBSD solve it? Instead of loading it after boot in userspace, OpenBSD loads the file early, really early, via the bootloader.
- systemd doesn't trust the file. The file is read, but by default, its entropy is not credited, because systemd doesn't want to take the responsibility if someone accidentally replicated the random file across millions of machines via a system image (OpenBSD seems to be okay with the lack of protection of image replication, but systemd is more cautious, probably because it has zero control over the rest of the system). As a result, the system may block at step 4 (entropy collection) during boot for a long time.
3. If Latent Entropy GCC plugin is used, Linux kernel can use entropy embedded in the kernel binary. It's a creative innovation by PaX/grsec, it uses a random seed inserted at build time, but also inserts local variables in every marked function, so that different runtime code paths and control flows create different entropy seeds. But it's only an optional feature and almost nobody uses it - PaX/grsec ideas are too radical to most people ;-)
- Note that it's security properties can still be seriously weakened by a replicated kernel binary. Also, it's entropy is seeded to the pool, but it's also not trusted and credited, since it's considered a workaround, not a solution.
4. Keep collecting entropy from various sources of randomness, such as interrupts.
As you can see, it's actually really similar to OpenBSD (I meant the concepts, not the implementation), the only difference is that almost everything is optional and nothing is guaranteed to work.
---
So as a tradeoff, systemd uses the following logic instead.
1. If available, bypass the kernel, use rdrand directly for non-crypto randomness.
2. For everything else (no rdrand machines & crypto), use the system's entropy pool, it may block for a long time before the pool is initialized, even with a random seed file or Latent Entropy, because their entropy is uncredited.
I think this is a reasonable tradeoff, but if rdrand is broken, everything breaks down.
I think the differences between OpenBSD and Linux is basically due to how much control you have over the operating system.
OpenBSD has the advantage you have when you're building an entire operating system, not just a kernel.
---
Update: As pointed out by the comment, systemd supports bootloader entropy too, but only uses it via systemd-boot on EFI systems since it's not vulnerable to the image replication problem. With UEFI, it can combine the seed file with a machine-specific EFI variable ("system token") in UEFI's NVRAM generated during system installation.
Systemd also criticizes NetBSD's bootloader entropy for being vulnerable to the image replication problem, see https://systemd.io/RANDOM_SEEDS/
> This is boring: NetBSD had boot loader entropy seed support since ages!
> Yes, NetBSD has that, and the above is inspired by that (note though: this article is about a lot more than that). NetBSD’s support is not really safe, since it neither updates the random seed before using it, nor has any safeguards against replicating the same disk image with its random seed on multiple machines (which the ‘system token’ mentioned above is supposed to address). This means reuse of the same random seed by the boot loader is much more likely.
>How OpenBSD solved it? Instead of loading it after boot in userspace, OpenBSD load the file early, really early, via the bootloader! Seeding the entropy pool via a file is part of the boot protocol.
I thought systemd-boot implemented this as well these days?
>During early OS boot the system manager reads this variable and passes it to the OS kernel's random pool, crediting the full entropy it contains. This is an efficient way to ensure the system starts up with a fully initialized kernel random pool — as early as the initial RAM disk phase.
Interesting. So if I use systemd-boot, I always get an initialized pool during early boot, that sounds good. Perhaps systemd should take that into account and skip rdrand entirely if the pool initialization is guaranteed.
> This is the advantage you have when you're building an entire operating system, not just a kernel.
I'm now convinced that systemd really is an operating system.
As far as I can see, systemd still attempts to use rdrand for UUIDs regardless of how /dev/urandom is initialized. Perhaps a solution less vulnerable to the rdrand problem, is to use getrandom() directly if the pool is guaranteed to be initialized (but still, very few users can benefit from it due to the limited deployment of systemd-boot).
> When generating Type 4 UUIDs, systemd tries to use Intel’s and AMD’s RDRAND CPU opcode directly, [...] If RDRAND is not available or doesn’t work, it will use synchronous getrandom() as fallback, and /dev/urandom on old kernels where that system call doesn’t exist yet. This means on non-Intel/AMD systems UUID generation will block on kernel entropy initialization.
That said it's good that the faulty RDRAND was discovered. As pointed out this isn't the first time processors (AMD particular) have had such issues. Do we need to just be skeptical of this and run RDRAND tests every time a new processor comes out? Or perhaps even each microcode update?
[1] https://www.youtube.com/watch?v=aWmLWx8ut20
[2] https://www.openbsd.org/papers/hackfest2014-arc4random/index...