/WAV-RNG

A Python-based random number generator based on atmospheric noise

Primary LanguagePython

WAV-RNG

Note: This project is a work-in-progress. The random number generator in this project should not be used for security or cryptographic purposes.

Introduction

Generating truly random numbers is hard. By definition, computers cannot generate random numbers since they are deterministic. Instead, computers can generate pseudorandom numbers. Pseudorandom number generators (PRNGs) use algorithms that take a seed, or starting value, and generate sequences of numbers that approximate the (statistical) properties of sequences of random numbers. Some of these are quite good, and are used for cryptographic purposes, but they cannot be said to be true random number generators (TRNGs, or just RNGs).

While PRNGs generate sequences of numbers that look random, but are in fact predetermined by the algorithm and seed, RNGs generate random numbers based on physical phenomena that are expected to be random. Examples of such physical phenomena include atmospheric noise, thermal noise, and radioactive decay. In this project, I use the former as a basis of a random number generator. Specifically, this project includes Python code that generates random numbers, in a variety of different formats, from .wav files containing recorded atmospheric noise. A driver script rng.py allows users to easily interact with the RNG.

Here I show how to use the driver program rng.py to generate random numbers from .wav files of recorded atmospheric noise. For a more in-depth discussion on the project see the technical details document where I discuss the intuition and methodology, the RNG construction, and testing of the RNG.

Using rng.py

(In addition to this section, this information is available in a more condensed form which you can access by running python3 rng.py -h. )

Running the rng.py script to generate random numbers is very simple. For the most basic usage, you can run python3 rng.py --in <input.wav> for any input .wav file with size at least 1124 bytes. This will print out, as a Python bytearray, all the random bytes generated by the .wav file. For a more readable format, you can choose from the --ascii, --binary, --hex, or --digits options to get the data in the specified formats. See examples below:

$ python3 rng.py --in noise.wav --ascii

output: CfqGbj[LMa^{lK[{AAk>Agzc?tpC-[lFh&0zj1KS,4=7_Lb_RlBt1k+6voR(}_5E*K#=

$ python3 rng.py --in noise.wav --digits

output: 60072664111942976342295533921735145761721749824549653603 ....

To write output to a file, you can run with the --out option, followed by a filename:

$ python3 rng.py --in noise.wav --digits --out random_digits.txt

The same can be done with any format, including raw bytes, which is the default mode of output. This will produce a raw binary file with the generated data.

Often times, a user may have a large .wav file, but may want only a portion of the random data in can generate. Firstly, to check how much random data a .wav file can generate, run, $python3 rng.py --in noise.wav -q.

This runs the "query" function, which will return how many available 64-byte blocks can be generated, with output: available 64-byte blocks for noise.wav: 1242.

This tells us that our file noise.wav can generate 1242 * 64 bytes or approximately 80kB of random data.

Now, suppose we want only a fraction of those raw bytes. We can specify the starting and ending blocks to print with the -s and -e flags. Start is inclusive, while end is exclusive. In mathematical notation, the range: [start, end). Without specifying either, the start position defaults to zero, and the end position defaults to the maximum given the file-size:

$python3 rng.py --in noise.wav -s 100 -e 165 --hex

output: a06b1aa4e775c1....

Combining with pseudorandom data

rng.py provides the option of combining the random data generated from the .wav file with pseudorandom data from different sources. The only supported source now is the Python secrets module, with associated flag --secrets. The method of combining the random data is with the XOR function, which is discussed in more detail in the technical details document. The following will print 10 64-byte blocks (i.e. 640 bytes) of random data from the .wav RNG combined with pseudorandom data from Python's cryptographically secure PRNG:

$ python3 rng.py --in noise.wav -s 0 -e 10 --secrets --hex

Seeding a PRNG with entropy from .wav

In the normal operating mode, the output of the RNG is based on entropy generated by the waveform of atmospheric noise. In particular, 64 random bytes are generated from each 1024-byte block of .wav data. (This is an efficiency of 1/16 or 6.25%). The user can instead seed a PRNG, to increase efficiency, by using the --extend <int> with a positive integer greater than one. In this mode, instead of generating one 64-byte block per every 1024-byte block of .wav data, the generator will generate x 64-byte blocks for every 1024-byte block of .wav data, where x is the argument given to --extend. Example usage, and comparison with the normal mode:

$python3 rng.py --in noise.wav -s 0 -e 10 --out output.bin

The above outputs 10 random 64-byte blocks, i.e. 640 bytes, to output.bin

$python3 rng.py --in noise.wav -s 0 -e 10 --out output.bin --extend 5

The above outputs 10x5=50 random 64-byte blocks, i.e. 3,200 bytes, to output.bin.

The PRNG is based on SHA-512, with the initial 64-byte block being the hash of the entropy from the .wav file (initial seed), and subsequent blocks being the hashes of the incremented .wav data (seed plus an increasing counter).

Running without SHA-512

In the normal mode of operation, the RNG uses SHA-512 as a randomness extractor. Using the --no-sha flag causes only raw bytes of the .wav file, without the SHA-512 post-processing step, to be output. The output of this mode of operation should still be sufficient for most purposes (it passes the NIST and dieharder randomness test suites!), but SHA-512 provides an extra assurance that the output is random.

Header Length

The header length of the .wav file can be optionally changed from the default value of 100 with the --header-len <int> flag. The value must be greater than 100 and be even. This is because the beginning of the file should be skipped, as it contains a low-entropy header, and because the generator treats even and odd bytes differently.

Example: $python3 --in noise.wav --hex --header-len 200.

Block size

The RNG is used with a default block-size of 1024 bytes. This means that .wav data is processed (and fed to SHA-512, unless the --no-sha flag is used) 1024 bytes at a time. This value was chosen with estimates of the entropy of the .wav data in mind, as in the technical details document. But the value can be specified with the --block-size flag, with any multiple of 64 greater than 0. In particular, larger block sizes (e.g. 2048, 4096) may give more security. Example:

$python3 rng.py --in noise.wav --out random.bin --block-size 2048