Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux random number generator updates #158

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

nmav
Copy link

@nmav nmav commented Jul 30, 2024

This includes the POSIX interface getentropy, that is simpler to use than getrandom, and in practice it is available for as long as getrandom is available in glibc, in addition to being part of OpenBSD before that. https://pubs.opengroup.org/onlinepubs/9799919799/

This patch set also removes the long discussion about /dev/random and /dev/urandom which I loved, but today these interfaces function similarly. torvalds/linux@30c08ef

@nmav nmav force-pushed the tmp-random branch 2 times, most recently from 36d0350 to 892d727 Compare July 30, 2024 19:54
@nmav nmav changed the title Random number generator updates Linux random number generator updates Sep 19, 2024
@nmav
Copy link
Author

nmav commented Oct 21, 2024

Any update on the merging status of this? The intent of the proposal is to get this section to apply to modern kernels and reduce the semantic complexity discussed. I find the value of giving a simple story of the current semantics of the devices significant, as a newcomer to Linux will not have to increase their limited mental load with details that are only of historical interest. The only "change" is the mentioning of getentropy because and correct me if I'm wrong it serves the same simplification.

@david-a-wheeler
Copy link
Contributor

I verified the POSIX inclusion: https://pubs.opengroup.org/onlinepubs/9799919799/functions/getentropy.html

The Linux kernel text looks correct to me. I've asked Greg KH to verify the Linux kernel situation. Yes, I should have done that a while ago, but I'm on the case now :-).

@david-a-wheeler
Copy link
Contributor

Greg KH asked me to look at the docs & talk directly with the kernel implementers. First, looking at available docs...

POSIX does have getentropy and looks great. It's not just paperware, it's clearly documented in the Linux man page on getentropy as of glibc 2.25 (as well as OpenBSD). We generally refer to standardized interfaces (as long as they're actually available) so that looks good.

The /dev stuff needs more research. It's true that commit torvalds/linux@30c08ef from 2020 was earlier merged in. However, there were problems in the unification noted in 2022. So I want to check and make sure that these statements about /dev/*random are correct.

I believe the people I eventually need to contact about Linux kernel random number generation are Jason A. Donenfeld and Theodore Ts'o. Jason seems to be more active recently on it, and tytso has a long background on it.

@david-a-wheeler
Copy link
Contributor

The most authoritative source is the code itself. I've identified the key source file to be reviewed:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/char/random.c?h=v6.12.4

@david-a-wheeler
Copy link
Contributor

The problem is that /dev/random blocks. In many cases blocking is a much worse problem. See: “Myths about /dev/urandom”

The urandom(4) man page says:

When read, the /dev/random device will only return random bytes within the estimated number of bits of noise in the entropy pool. /dev/random should be suitable for uses that need very high quality randomness such as one-time pad or key generation. When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.
A read from the /dev/urandom device will not block waiting for more entropy. As a result, if there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current unclassified literature...
If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys. ...
Users should be very economical in the amount of seed material that they read from /dev/urandom (and /dev/random)...

@david-a-wheeler
Copy link
Contributor

The same problem hits getentropy, at least on Linux, because it also blocks. Per the getentropy Linux man page:

A call to getentropy() may block if the system has just booted and the kernel has not yet collected enough randomness to initialize the entropy pool. In this case, getentropy() will keep blocking even if a signal is handled, and will return only once the entropy pool has been initialized.

<tr>
<td>C (POSIX)</td>
<td><tt>rand(), *rand48()</tt></td>
<td><tt>getentropy()</tt></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine!

For example, the Linux kernel provides cryptographically secure random number values via its `getrandom` system call, as well as the special files `/dev/urandom` and `/dev/random`. In most cases you would want to use the `getrandom` system call where practical, or the `/dev/urandom` special file if `getrandom` is hard to access (e.g., from a shell script). These generate cryptographically secure random values using a CSPRNG and entropy gathered by the kernel. In special circumstances, such as creating a long-lived cryptographic key, you might instead want to use `/dev/random` or the equivalent option in `getrandom`; this forces the kernel to wait (block) until it has a high estimated amount of internal entropy. The purpose of `/dev/random` is to ensure there is a large amount of internal entropy, but the blocking may be indefinite in some circumstances and it’s usually not necessary. What's important is that an attacker can't practically guess the random value, not the value of this internal entropy estimate. (see [“Myths about /dev/urandom”](https://www.2uo.de/myths-about-urandom/) by Thomas). In the future there may be no difference between `/dev/random` and `/dev/urandom`.

For example, the Linux kernel provides cryptographically secure random number values via its **/dev/urandom** special file, its **/dev/random** special file, and its **getrandom** system call. In most cases you would want to use the **/dev/urandom** special file or the **getrandom** system call. These generate cryptographically secure random values using a CSPRNG and entropy gathered by the kernel. In special circumstances, such as creating a long-lived cryptographic key, you might instead want to use **/dev/random** or the equivalent option in **getrandom**; this forces the kernel to wait (block) until it has a high estimated amount of internal entropy. The purpose of **/dev/random** is to ensure there is internal entropy, but the blocking may be indefinite in some circumstances and it’s usually not necessary
For example, POSIX defines the `getentropy` interface to access a cryptographically secure pseudo-random number generator. On Linux systems `getentropy` is wrapper over the `getrandom` system call. The Linux kernel additionally provides the special files `/dev/urandom` and `/dev/random` that have been similar since Linux kernel version 5.6. In most cases you would want to use the `getentropy` interface where practical, or the `/dev/random` special file if `getentropy` is hard to access (e.g., from a shell script). Both generate cryptographically secure random values using a CSPRNG and entropy gathered by the kernel. The key difference between `/dev/urandom` and `/dev/random`, is that `/dev/urandom` may provide output during early boot even before the random generator is fully seeded, while `getentropy` and `/dev/random` block until the generator is fully initialized.
Copy link
Contributor

@david-a-wheeler david-a-wheeler Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to note getentropy is in POSIX (we should probably note it's a standard). Noting that on Linux it calls getrandom is also good. However, this text misses a key issue: getentropy, /dev/random, and some uses of getrandom can block. If you're not creating keys for long-term use, the usual recommendation is to avoid the blocking versions, because having a system catastrophically fail usually means that people will do work-arounds that are even worse. The "early boot" text doesn't really explain things.

How about this instead? (I've made some edits since my first try):

For example, the POSIX specification defines the getentropy interface to access a cryptographically secure pseudo-random number generator. On Linux systems getentropy is a wrapper around the Linux-specific and more flexible getrandom system call. The Linux kernel additionally provides the special readable files /dev/urandom and /dev/random. The Linux kernel, and many other systems, gather entropy from hardware and mix that into random values using a CSPRNG to provide these cryptographically secure random values. Typical Linux systems record old seeds on shutdown so they will continue to produce highly random values when they reboot. Unfortunately, some of these requests will block - possibly forever - until the underlying system's estimate of internal entropy is high enough (getentropy, /dev/random, and some uses of getrandom will all block). In most cases on Linux kernel based systems you should use avoid blocking indefinitely, and instead use a non-blocking form for getrandom or the non-blocking /dev/urandom. However, for creating long-lived secrets (like GPG/SSL/SSH keys) where the quality of randomness is especially critical, consider using the blocking versions. This is a potential trade-off between availability and other aspects of security. See “Myths about /dev/urandom” by Thomas and Linux man page urandom(4).

@david-a-wheeler
Copy link
Contributor

Randomness is quite technical and challenging. I've proposed some alternative text, building on what you proposed earlier. Once we agree on something, I intend to bring it to the Linux kernel developers who focus on the random number generator, to get their take.

Security is confidentiality, integrity, and availability. If a system can't start, nobody cares about it. So we need to make it clear that there's a trade-off. What's more, in a lot of cases the secrets are short-lived (e.g., session keys) and the protections provided by entropy estimates are usually not worth making the system completely fail.

@nmav
Copy link
Author

nmav commented Dec 12, 2024

Randomness is quite technical and challenging. I've proposed some alternative text, building on what you proposed earlier. Once we agree on something, I intend to bring it to the Linux kernel developers who focus on the random number generator, to get their take.

Security is confidentiality, integrity, and availability. If a system can't start, nobody cares about it. So we need to make it clear that there's a trade-off. What's more, in a lot of cases the secrets are short-lived (e.g., session keys) and the protections provided by entropy estimates are usually not worth making the system completely fail.

I see the availability aspect important to you, and I see rightfully, but this introduces additional complexity that the one reading the guidance would have to manage. The key aspects that led me to simplify the guidance initially to an unconditional endorsement of getentropy and thus towards simplicity:

  • During my early times at red hat we had identified certain VMs with sometimes identical ssh private keys. That was at the time where /dev/urandom was the only practical choice to use. That led me to believe that an uninitialized random generator is something that is very very hard to use in a reasonable way (not just securely).
  • The initialized or not is a very detailed nuance the very few people have the ability to really understand and even fewer can claim they have securely used an uninitialized random generator (should they wait until there are 4 bits, 64 bits or were some bits fixed instead of random, and how to do all that). Given that the intention of a kernel-provided random generator is to seed other random generators that in turn can be used for any number of reasons including generating long-term keys, the use-uninitialized option stands on a very shaky foundation.

As such, I was led to believe that the only practical recommendation for the average developer is to avoid all nuances and details and use getentropy will always be the right one for the type of random generators that are intended to seed others.

At the same time, there is the issue of availability that is important, but at the same time I do not feel that getrandom() with the flag to not block is the unconditional answer, as in the case of openssh keys for example it will produce output that is predictable and with a long-lasting effect. These keys will be predictable even years after they have been generated.

The availability issue is something that could happen, probably when the system after a boot hasn't gathered enough entropy (could be in custom designed boards or in VMs as in the example I used above). In these cases using getrandom() with the option to not-block will hide the issue during the design phase and prevent a fix (e.g., using something like virtio_rng on a VM or a hardware random generator on a board). During the production phase it can prevent the issue of (un)availability.

As such, my recommendation would be to cover the potential availability issue, maybe with a pointers as after writing the above I'm not sure that getrandom() and /dev/urandom are even solutions (as opposed to quick hacks) to the availability issue for that type of random generator.

PS. this is more expressing my point of view rather than a text suggestion. If you also see value on these aspects above and we get on the same page I can propose something.

@david-a-wheeler
Copy link
Contributor

We're talking about "no/low estimated entropy" which isn't quite the same as "uninitialized" in its usual sense. In many cases the kernel doesn't normally credit entropy, and believes it's 0 or low, yet there's no known way for an attacker to predict the random value. In particular, "Writing to /dev/random or /dev/urandom will update the entropy pool with the data written, but this will not result in a higher entropy count." per urandom. This is a common case; distros generally write out the seed on shutdown, and read it back, so in cases where it can "start where it left off" this matters. It's really common for systems to reload the seed they previously had on shutdown, where it has a seed no one can know yet its estimated entropy is 0.

During my early times at red hat we had identified certain VMs with sometimes identical ssh private keys. That was at the time where /dev/urandom was the only practical choice to use.

Obviously that's bad :-). We already cover that, though. SSH keys are cases where we already recommend the use of blocking calls, like /dev/random.

I agree that this is tricky :-). We have a fundamental trade-off. I think we should clearly present it as a trade-off. If a system is unavailable that's often a non-starter. One of the most authoritative guidance about this trade-off when using Linux is the text from urandom(4):

If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys.
If a seed file is saved across reboots as recommended below (all major Linux distributions have done this since 2000 at least), the output is cryptographically secure against attackers without local root access as soon as it is reloaded in the boot sequence, and perfectly adequate for network encryption session keys. Since reads from /dev/random may block, users will usually want to open it in nonblocking mode (or perform a read with timeout), and provide some sort of user notification if the desired entropy is not immediately available.

I guess one option is to directly quote this as part of the text.

@nmav
Copy link
Author

nmav commented Dec 13, 2024

We're talking about "no/low estimated entropy" which isn't quite the same as "uninitialized" in its usual sense. In many cases the kernel doesn't normally credit entropy, and believes it's 0 or low, yet there's no known way for an attacker to predict the random value.

That's certainly correct and since you have checked the internals recently have a better overview than I had. However what is my concern is whether there a distinction between the uninitialized rng and the estimated as zero entropy level rng for the linux kernel? Without a distinction it seems to me very hard to define the expectation of what the output will be at this early boot stage.

(and just to make it clear that it is not my position that the /dev/random behavior was good when it had the notion of entropy loss and could block at any time causing effectively a DoS; my focus is on the current behavior where the rng will only unblock when it considers itself initialized irrespective of whether I agree with the criteria chosen)

In particular, "Writing to /dev/random or /dev/urandom will update the entropy pool with the data written, but this will not result in a higher entropy count." per urandom. This is a common case; distros generally write out the seed on shutdown, and read it back, so in cases where it can "start where it left off" this matters. It's really common for systems to reload the seed they previously had on shutdown, where it has a seed no one can know yet its estimated entropy is 0.

Let's not forget that this is an assumption that this happens (being in the embedded world right now changed a little my perspective). Maybe that's a good point to include in the text.

Obviously that's bad :-). We already cover that, though. SSH keys are cases where we already recommend the use of blocking calls, like /dev/random.

I agree that this is tricky :-). We have a fundamental trade-off. I think we should clearly present it as a trade-off. If a system is unavailable that's often a non-starter. One of the most authoritative guidance about this trade-off when using Linux is the text from urandom(4):

If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys.
If a seed file is saved across reboots as recommended below (all major Linux distributions have done this since 2000 at least), the output is cryptographically secure against attackers without local root access as soon as it is reloaded in the boot sequence, and perfectly adequate for network encryption session keys. Since reads from /dev/random may block, users will usually want to open it in nonblocking mode (or perform a read with timeout), and provide some sort of user notification if the desired entropy is not immediately available.
I guess one option is to directly quote this as part of the text.

To my understanding you suggest to split the recommendation based on the use cases such as:

  • long term effects such as keys (or any --unknown at the time-- purpose)
  • short term effects such as short-term keys

Thinking about it, this separation is not enough. On the second case we only have an assumption that a short term effect will come from using that short-term key. If this short term key is used to transport the "crown jewels", the assumption of suitability goes away --again thinking that there is no strict definition of the unblocked behavior strength. It feels that any distinction between blocking and non-blocking behavior usage must come with very concrete assumptions that will in turn limit the usefulness of the advice.

@nmav
Copy link
Author

nmav commented Dec 13, 2024

And let me propose some text to see how a recommendation based on use case can look:

## Recommendations based on use case

### 1. Seeding a random generator for any use including long-term key generation
- Use the `getentropy` or if unavailable (e.g., in shell scripts), prefer `/dev/random`

### 2. Seeding a random generator for purposes with no long-term effects
- For tasks where cryptographic security is required but availability during the early boot stage trumps security and there is no long-term effect from the use of the seed also consider `/dev/urandom` or getrandom with the non-blocking parameter.

NOTE: When designing a new system based on the Linux kernel, before considering the non-blocking variants ensure that the boot process provide enough entropy during the boot process by identifying the hardware entropy sources and ensuring they contribute to kernel entropy pool.

The second would have to have an example, but I'm not sure I can find a useful one. I hope that this demostrates a little better why I think that the usefulness of the advice no 2 is limited.

@david-a-wheeler
Copy link
Contributor

Part 1: I agree that using blocking calls for long-term keys are the general consensus. However, getentropy is a relatively new call. On Linux systems /dev/random and getrandom are available more often. There's no strong reason to avoid /dev/random on Linux systems where available, though, it gets the same thing. The only reason would be a container setup that didn't create the device for some reason, which would be a pretty odd setup.

Part 2: The "early boot" isn't quite right. The Linux kernel developers are pretty conservative about what counts as entropy - and understandably so. As a result, the kernel can report "0 entropy" when in fact there's no way an attacker can determine the seed value (so it's not REALLY 0 entropy). It's not clear how to fix this; the kernel devs only have so much information and it is reasonable for them to be conservative. However, this also means that many systems hang if you force them to use blocking calls. It's frustrating. I'd love to say "always use blocking calls for cryptographic randomness" - it is much easier to understand - but the real world makes that impractical to simply require.

@david-a-wheeler
Copy link
Contributor

Quick note: It may not look it, but I really appreciate this discussion & deep dive into random number generation. It seems simple at first, but this is an area that is absolutely fraught with complications. Many successful attacks have involved the cryptographic random number generator, so it matters. On the other hand, making entire systems fail because they think they don't have enough entropy is a great way to ensure that the system, and any thought of security, is removed immediately. Threading these issues is complex.

@nmav
Copy link
Author

nmav commented Dec 17, 2024

Quick note: It may not look it, but I really appreciate this discussion & deep dive into random number generation. It seems simple at first, but this is an area that is absolutely fraught with complications. Many successful attacks have involved the cryptographic random number generator, so it matters. On the other hand, making entire systems fail because they think they don't have enough entropy is a great way to ensure that the system, and any thought of security, is removed immediately. Threading these issues is complex.

Same here, and this is a difficult case. There is a need for balance between being conservative by suggesting blocking and being efficient by non-blocking, while at least I rely more on "gut" feeling rather than clear data. What I miss whether the blocking issue as communicated in myths-about-urandom, is an issue that is still relevant today as when it was written. Your suggestion is a good balance, and I'll update the MR with it. Would it be ok to remove the reference to myths-about-urandom, as the urandom/random environment was significantly different? The myths document provides a very nice historical context for the devices, but to someone learning linux and security today it will be of limited practical value.

nmav pushed a commit to nmav/secure-sw-dev-fundamentals that referenced this pull request Dec 17, 2024
This includes the POSIX interface getentropy, that is simpler to use
than getrandom, and in practice it is available for as long as getrandom
is available in glibc, in addition to being part of OpenBSD before that.
https://pubs.opengroup.org/onlinepubs/9799919799/

This patch set also removes the long discussion about /dev/random and
/dev/urandom which I loved, but today these interfaces function similarly.
torvalds/linux@30c08ef

Further discussion at:
ossf#158

Signed-off-by: Nikos Mavrogiannopoulos <[email protected]>
This includes the POSIX interface getentropy, that is simpler to use
than getrandom, and in practice it is available for as long as getrandom
is available in glibc, in addition to being part of OpenBSD before that.
https://pubs.opengroup.org/onlinepubs/9799919799/

This patch set also removes the long discussion about /dev/random and
/dev/urandom which I loved, but today these interfaces function similarly.
torvalds/linux@30c08ef

This still recommends for now the non-blocking interfaces based on discussion at:
ossf#158

Signed-off-by: Nikos Mavrogiannopoulos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants