crypto/rand: crash process on error reading randomness

On almost all our platforms, we now have crypto/rand backends that ~never fail.

On Linux, we primarily use the getrandom(2) system call, which never fails.
- It may block if the pool is not initialized yet at early boot, and may be interrupted by a signal handler if requesting more than 256 bytes, but neither of those surface as errors to the application.
- getrandom() was first available in Linux 3.17, released in October 2014. Debian oldstable is on Linux 5.10.
- getrandom() can be blocked with seccomp. That's a bad (and weird) idea, and the default Docker profile doesn't do that. In that case we fall back to opening /dev/urandom, which might fail if the file is not available or file descriptors run out.
On macOS and iOS we use arc4random() since https://go.dev/cl/569655. From the man page:

These functions are always successful, and no return value is reserved to indicate an error.
On Windows we use the ProcessPrng function. From the docs:

Always returns TRUE.
The BSDs use similar syscalls with similar properties (whether getrandom or getentropy) although we should switch the ones we can to arc4random.
On js/wasm we use getRandomValues which doesn't have documented failure modes.
On WASIP1 there's random_get which regrettably has an error return value, making it the one platform (ignoring misconfigured Linux) where there might be errors getting platform random bytes. Since WASI rests on an underlying platform, and every underlying platform has failure-less CSPRNGs, it's hard to imagine why random_get should actually return an error.

I'm proposing we make crypto/rand throw (irrecoverably crash the program) if an error occurs, and document that the error return values of crypto/rand.Read and crypto/rand.Reader.Read are always nil.

This will free applications from having to do error handling for a condition that essentially can't happen, and that if it did happen is essentially not possible to handle securely by the application.

This will also allow introducing new APIs like a hypothetical String(charset string) string (not part of this proposal) without an error return, making them more usable and appealing.

Based on a suggestion by @rsc.

/cc @golang/security @golang/proposal-review

Do you mean crypto/rand should panic on errors? I'm not very familiar with Go but I didn't think it used the terminology of "throwing" errors. I agree, on general language-independent robustness principles, that it should panic.

Panics are recoverable, throw is an internal name for fatal errors. Think of it as a call to exit(1).

If we made crypto/rand panic that risks encouraging applications to wrap the calls in defer/recover "for robustness", when really we think it's so unlikely and so unrecoverable that applications shouldn't try.

go/src/runtime/panic.go

Lines 1008 to 1022 in 519f6a0

    
           // throw triggers a fatal error that dumps a stack trace and exits. 
        
           // 
        
           // throw should be used for runtime-internal fatal errors where Go itself, 
        
           // rather than user code, may be at fault for the failure. 
        
           // 
        
           //go:nosplit 
        
           func throw(s string) { 
        
           	// Everything throw does should be recursively nosplit so it 
        
           	// can be called even when it's unsafe to grow the stack. 
        
           	systemstack(func() { 
        
           		print("fatal error: ", s, "\n") 
        
           	}) 
        
           	fatalthrow(throwTypeRuntime) 
        
           }

I think all our code already panics on crypto/rand errors (via wrapppers that don't return errors) so SGTM 😀

WASI 0.2 random interface thankfully does not return an error: https://github.com/WebAssembly/wasi-random/blob/main/wit/random.wit

Sounds like we need a rand v3

Sounds like we need a rand v3

This is the best argument for always using crand and mrand aliases.
||crypto/rand != math/rand 😉||

On Linux, we primarily use the getrandom(2) system call, which never fails.

It may block if the pool is not initialized yet at early boot, and may be interrupted by a signal handler if requesting more than 256 bytes, but neither of those surface as errors to the application.

getrandom() was first available in Linux 3.17, released in October 2014. Debian oldstable is on Linux 5.10.

getrandom() can be blocked with seccomp. That's a bad (and weird) idea, and [the default Docker profile doesn't do that]

Should we keep the /dev/urandom fallback for 3.17+ ?
I would rather be forced to tweak my seccomp config than having to debug rare flaky throws because I incorrectly configured some sandboxing options.
We can't remove /dev/urandom completely on linux without raising the 2.6.32 baseline.

Should we keep the /dev/urandom fallback for 3.17+ ?

This is tempting, but I think making decisions based on kernel version is opening a can of worms. I think even the urandom fallback is reasonably reliable: the file is opened only once, so either crypto/rand never works in a given process or it always works (although it might flake across process executions, if you run out of fds before the first Read call, or if the file is removed).

Ah I thought it opened a new file each time.

Then can we use import time side effects to solve this (open the file in init) ?
It's very unlikely you are running out of fds before main even started running.

I get the std tries to not do that, but the overwhelming majority of cases init will start running, try getrandom, succeed and do nothing that seems fine to me. (there also is a clear path to solving this if anyone finds it to be an issue, fixing their seccomp config)

I was thinking about that but I have no intuition as to whether the cost of calling getrandom (to check if it's available) on init() for every Linux program is acceptable.

If crypto/rand is imported (even indirectly), it should be fine to make a single system call during init. I'm kinda surprised it doesn't do so already.
The runtime reads from /dev/urandom on every startup.

@randall77

The runtime reads from /dev/urandom on every startup.

Does it? On linux when there is random in auxv, then it does not even open it.

go/src/runtime/rand.go

Lines 44 to 58 in f17b28d

    
           if startupRand != nil { 
        
           	for i, c := range startupRand { 
        
           		seed[i%len(seed)] ^= c 
        
           	} 
        
           	clear(startupRand) 
        
           	startupRand = nil 
        
           } else { 
        
           	if readRandom(seed[:]) != len(seed) { 
        
           		// readRandom should never fail, but if it does we'd rather 
        
           		// not make Go binaries completely unusable, so make up 
        
           		// some random data based on the current time. 
        
           		readRandomFailed = true 
        
           		readTimeRandom(seed[:]) 
        
           	} 
        
           }

@mateusz834 True, on linux if we get auxv randomness we don't read /dev/urandom.
We read it unconditionally on lots of OSes, like darwin and the BSDs.

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

Have all remaining concerns about this proposal been addressed?

The proposal is to document that crypto/rand.Read and crypto/rand.Reader.Read always return the full amount requested and never return errors. If the underlying OS returns an error, the Go process will runtime.throw, meaning the process crashes with no chance to recover. But no underlying OS’s actually return errors from random reads anymore.

This lets callers simplify and delete their dead error handling paths.

FYI, the Reader is a var and people might change it to a custom Reader implementation. I wonder whether that might be an issue for us here?

The proposal is to document that crypto/rand.Read and crypto/rand.Reader.Read always return the full amount requested and never return errors.

@rsc I think that we cannot document rand.Read as such.

I think that, we can only document the default rand.Reader this way.

Also it seems like plan9 always opens /dev/urandom, so i guess it might not always exist. Maybe we can replace it with some kind of syscall?

go/src/crypto/rand/rand_plan9.go

Lines 36 to 49 in 74a4918

    
           func (r *reader) Read(b []byte) (n int, err error) { 
        
           	r.seeded.Do(func() { 
        
           		t := time.AfterFunc(time.Minute, func() { 
        
           			println("crypto/rand: blocked for 60 seconds waiting to read random data from the kernel") 
        
           		}) 
        
           		defer t.Stop() 
        
           		entropy, err := os.Open(randomDevice) 
        
           		if err != nil { 
        
           			r.seedErr = err 
        
           			return 
        
           		} 
        
           		defer entropy.Close() 
        
           		_, r.seedErr = io.ReadFull(entropy, r.key[:]) 
        
           	})

CC @0intro

Plan 9 does not have /dev/urandom. It has /dev/random. That may not be present in the name space. But '#c/random' is always present, and the code should be opening that anyway.

@FiloSottile and I discussed this.

We believe that func Read should be documented to never return an error. It is also documented to use Reader, but if it observes an error from Reader, it will crash the program. That helps with the security of code that assumes Read never returns an error because the default implementations don't. If that code runs when Reader has been replaced with an erroring implementation, the call sites calling Read may not be correct. The security guarantee simply doesn't happen if Read has to be as lax as any possible overwritten Reader. The value-add for Read is simply that it does this check and provides this guarantee.

We also believe that we should document that if Reader is replaced, it should be replaced with an implementation that never returns an error.

Have all remaining concerns about this proposal been addressed?

The proposal is to document that rand.Read never returns an error, nor does the default rand.Reader. If rand.Reader is set to an erroring io.Reader, then rand.Read throws (fatal crashes) on error. Progarms that leave rand.Reader alone will never observe the “out of randomness” throw because all operating systems guarantee that getrandom works.

(This issue depends on #67001.)

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is to document that rand.Read never returns an error, nor does the default rand.Reader. If rand.Reader is set to an erroring io.Reader, then rand.Read throws (fatal crashes) on error. Progarms that leave rand.Reader alone will never observe the “out of randomness” throw because all operating systems guarantee that getrandom works.

(This issue depends on #67001.)

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The proposal is to document that rand.Read never returns an error, nor does the default rand.Reader. If rand.Reader is set to an erroring io.Reader, then rand.Read throws (fatal crashes) on error. Progarms that leave rand.Reader alone will never observe the “out of randomness” throw because all operating systems guarantee that getrandom works.

(This issue depends on #67001.)

Change https://go.dev/cl/602497 mentions this issue: crypto/rand: crash program if Read would return an error

Change https://go.dev/cl/602496 mentions this issue: crypto/rand: improve getrandom batching and retry logic

Change https://go.dev/cl/602495 mentions this issue: crypto/rand: remove /dev/urandom fallback and simplify package structure

Change https://go.dev/cl/608175 mentions this issue: crypto/rand: reintroduce urandom fallback for legacy Linux kernels

A follow-up question from the CL: should we have a randcrash=0 GODEBUG to turn the throw off? It makes it a little less obvious/universal that it's ok to ignore the error from Read, but that depends on the toolchain version anyway.

Change https://go.dev/cl/608435 mentions this issue: crypto/rand: add randcrash=0 GODEBUG

Change https://go.dev/cl/621979 mentions this issue: crypto/internal/mlkem768: remove crypto/rand.Read error checking

We realized a GODEBUG is actually a security risk here: most programs will start to ignore errors from Read because they can't happen (which is the intended behavior), but then if a program is run with GODEBUG=randcrash=0 it will use a partial buffer in case an error occurs, which may be catastrophic. (Errors should be impossible, but if they are then the GODEBUG is useless anyway.) Mailed https://go.dev/cl/622115 to revert it.

Note that the proposal was accepted without the GODEBUG, which was only added later.

Change https://go.dev/cl/622115 mentions this issue: Revert "crypto/rand: add randcrash=0 GODEBUG"

@FiloSottile Does this change need to be mentioned in Go 1.24 release notes? Thanks.

I think this does need to be in the release notes. Sent https://go.dev/cl/632036. Thanks.

Change https://go.dev/cl/632036 mentions this issue: doc/next: document that crypto/rand.Read never fails

Change https://go.dev/cl/640996 mentions this issue: crypto/mlkem: drop GenerateKey error return

	// throw triggers a fatal error that dumps a stack trace and exits.
	//
	// throw should be used for runtime-internal fatal errors where Go itself,
	// rather than user code, may be at fault for the failure.
	//
	//go:nosplit
	func throw(s string) {
	// Everything throw does should be recursively nosplit so it
	// can be called even when it's unsafe to grow the stack.
	systemstack(func() {
	print("fatal error: ", s, "\n")
	})

	fatalthrow(throwTypeRuntime)
	}

	if startupRand != nil {
	for i, c := range startupRand {
	seed[i%len(seed)] ^= c
	}
	clear(startupRand)
	startupRand = nil
	} else {
	if readRandom(seed[:]) != len(seed) {
	// readRandom should never fail, but if it does we'd rather
	// not make Go binaries completely unusable, so make up
	// some random data based on the current time.
	readRandomFailed = true
	readTimeRandom(seed[:])
	}
	}

	func (r *reader) Read(b []byte) (n int, err error) {
	r.seeded.Do(func() {
	t := time.AfterFunc(time.Minute, func() {
	println("crypto/rand: blocked for 60 seconds waiting to read random data from the kernel")
	})
	defer t.Stop()
	entropy, err := os.Open(randomDevice)
	if err != nil {
	r.seedErr = err
	return
	}
	defer entropy.Close()
	_, r.seedErr = io.ReadFull(entropy, r.key[:])
	})