Theseus

Theseus is an implementation of the SP 800-90B tests.

This project is named after Claude Shannon's mechanical maze-solving mouse.

For general SP 800-90B testing topics, please see Joshua E. Hill's SP 800-90B web page.

Requirements

This was written with a fairly modern Linux system running on a somewhat modern Intel CPU (it uses BMI 2, SSE 4.2 and RdRand instructions). This is intended to be compiled using either a recent version of gcc (tested using gcc version 10.3.0, or clang (tested using clang version 12.0.1). This uses divsufsort, libbz2 and OpenMP, so these libraries (and their associated include files) must be installed and accessible to the compiler.

Folder Structure

The folder structure is as follows:

docs/: Contains documentation/examples for each function.
ex/: Contains data files used in examples.
src/: Contains the codebase.

How to Run

The tools in this package operate on symbols of type statData_t (uint8_t by default) or on uint32_t unless otherwise specified.

One can make all the binaries using:

make

Several Makefiles are provided; these are useful in various contexts which are hopefully reasonably clear from the name.

Overview

Below is a summary of available Theseus functions. Detailed documentation for each function can be found in the docs/ folder and links are provided.

Executables that Implement Portions of SP 800-90B Testing

Function Name	Description
non-iid-main	An implementation of the non-IID SP 800-90B estimators.
restart-transpose	Calculate the transpose of the provided restart data matrix.
restart-sanity	Perform the restart test for the provided restart data.
permtests	Perform the permutation IID tests on the provided data.
chisquare	Perform the chi square IID tests on the supplied data.
lrs-test	Perform the LRS IID test on the supplied data.
Markov	Run some variant of the SP 800-90B 2016 Markov test.

H_submitter Production Utilities

Function Name	Description
ro-model	Produce a min entropy estimate using the selected stochastic model.
linear-interpolate	Takes a set of points (`x_1`, `y_1`), ... (`x_n`, `y_n`) and treats the points as a relation. This tool can be used to infer parameters from the statistical results.

Test Data Production Utilities

Function Name	Description
randomfile	Create a random data file for use in testing.
simulate-osc	Create a random data file for use in testing that simulates a ring oscillator.

Result Interpretation Utilities

Function Name	Description
percentile	Takes a set of human-readable doubles and provides the pth percentile. Percentile is estimated using the recommended NIST method Hyndman and Fan's R6.
mean	Takes a set of human-readable doubles and provides the mean.
failrate	Takes a set of human-readable doubles and provides the proportion that are less than or equal to `<p>` or greater than or equal to `<q>`. This is useful to characterize false positive rates for statistical tests with inclusive failure bounds.

Data Conversion Utilities

Functions for word size and base input conversion:

Function Name	Description
u8-to-u32	Converts provided binary data from type uint8_t to type uint32_t.
u16-to-u32	Converts provided binary data from type uint16_t to type uint32_t.
u64-to-u32	Converts provided binary data from type uint64_t to type uint32_t.
dec-to-u32	Converts provided human-readable decimal values to binary data. (Note this is the opposite of u32-to-ascii.)
dec-to-u64	Converts provided human-readable decimal values to binary data. (Note this is the opposite of u64-to-ascii.)
hex-to-u32	Converts provided human-readable hexidecimal values to binary data.

Functions related to deltas, counters, endianness, and bit width:

Function Name	Description
u32-delta	Extracts deltas and then translates the result to positive values.
u64-jent-to-delta	Converts provided binary data from uint64_t deltas in jent format (JEnt version 3.0.1 and earlier) to uint64_t deltas in nanoseconds format. Also guesses native byte order and swaps if necessary. Jent format expects the upper 32 bits to contain seconds and the lower 32 bits to contain nanoseconds.
u32-counter-raw	Extracts deltas treated as 32-bit unsigned counters (they may roll over).
u64-counter-raw	Extracts deltas treated as 64-bit unsigned counters (they may roll over).
u32-counter-endian	Trys to guess counter endianness and translates to the local platform.
u64-counter-endian	Trys to guess counter endianness and translates to the local platform.
u64-change-endianness	Changes between big and little endian byte-ordering conventions.
u32-counter-bitwidth	Extracts deltas under the assumption that the data is an increasing counter of some inferred bitwidth.
u32-expand-bitwidth	Extracts inferred values under the assumption that the data is a truncation of some sampled value, whose bitwidth is inferred.

Functions for converting to type statData_t for statistical analysis:

Function Name	Description
u8-to-sd	Converts provided binary data from type uint8_t to type statData_t.
u16-to-sdbin	Converts provided binary data from type uint16_t to type statData_t by expanding packed bits.
u32-to-sd	Converts provided binary data from type uint32_t to type statData_t.
blocks-to-sdbin	Extracts bits from a blocksize-sized block one byte at a time, in the specified byte ordering.

Functions for output conversion (for histograms, human readability, categorization, etc.):

Function Name	Description
sd-to-hex	Converts provided binary data to human-readable hexidecimal values.
u32-to-ascii	Converts provided binary data to human-readable decimal values. (Note this is the opposite of dec-to-u32.)
u64-to-ascii	Converts provided binary data to human-readable decimal values. (Note this is the opposite of dec-to-u64.)
u32-to-categorical	Produces categorical summary of provided binary data.

Other Data Utilities

Functions to bin and group data:

Function Name	Description
downsample	Groups data by index into modular classes mod `<rate>` evenly into the `<block size>`.
highbin	Attempts to bin input symbols into `2^<outputBits>` bins with equal numbers of adjacent samples.
u32-downsample	Groups data by index into modular classes mod `<rate>` evenly into the `<block size>`.
u32-manbin	Assign given binary data to one of the n bin numbers (0, ..., n-1).

Functions to select and extract data:

Function Name	Description
extractbits	Takes the given binary data and extracts bits with `<bitmask>`.
selectbits	Identify the bit selections that are likely to contain the most entropy, up to `<outputBits>` bits wide.
u32-bit-select	Selects and returns the value in the given bit position (0 is the LSB, 31 is the MSB).
u128-bit-select	Selects and returns the value in the given bit position (0 is the LSB, 127 is the MSB, little endian is assumed).
u32-selectdata	Attempt to keep the percentages noted in the provided binary data.
u32-selectrange	Extracts all values from the given binary data between a specified `low` and `high` (inclusive).

Functions to translate data:

Function Name	Description
translate-data	Perform an order-preserving map to arrange the input symbols to (0, ..., k-1).
u32-translate-data	Perform an order-preserving map to arrange the input symbols to (0, ..., k-1).

Functions to find and process fixed data:

Function Name	Description
bits-in-use	Determines the number of bits required to represent the given data after removing stuck and superfluous bits.
discard-fixed-bits	Takes provided binary data and returns it with fixed bits discarded. Non-fixed bits are moved to the LSB of the output.
u32-discard-fixed-bits	Takes provided binary data and returns it with fixed bits discarded. Non-fixed bits are moved to the LSB of the output.
u128-discard-fixed-bits	Takes provided binary data and returns it with fixed bits discarded. Non-fixed bits are moved to the LSB of the output.

Functions to merge, sort, and permute data:

Function Name	Description
double-merge	Merges two sorted lists of doubles into a single merged sorted list of doubles.
double-sort	Takes doubles from the file and sorts them.
u32-bit-permute	Permute bits within the given binary data as specified in `<bit specification>`. Bit ordering is specified in the LSB 0 format (i.e., bit 0 is the LSB and bit 31 is the MSB).

Functions with mathematical operations on data:

Function Name	Description
double-minmaxdelta	Takes a set of human-readable doubles and provides the mean.
hweight	Calculates the Hamming weight of `<bitmask>`. As an example, the bit string 11101000 has a Hamming weight of 4.
u16-mcv	Finds the most common value in the given binary data.
u32-anddata	Takes the given binary data and bitwise ANDs each symbol with `<bitmask>`.
u32-gcd	Finds common divisors and removes these factors from the given binary data.
u32-mcv	Finds the most common value in the given binary data.
u32-regress-to-mean	Calculate the mean, force each `k`-block to have this mean, and then round the resulting values.
u32-xor-diff	Produces the running XOR of adjacent values in provided binary data.

More Information

For more information on the estimation methods, see SP 800-90B.

pskrgag/Theseus