Sample programs for article Parsing series of integers with SIMD
Parsers extract integer numbers from strings. A number can be prepended by a sign character. The numbers are separated by arbitrary sequences of separator chars. All other characters are invalid and the parsers detects them and raise exception.
This repository contains:
- scalar reference implementation;
- two variants of SSE parsers; there are also separate variants designed solely of parsing unsigned numbers;
- scalar hybrid that combines ideas from SIMD parsing with scalar conversion procedures.
Requires: C++11 compiler (tested with GCC 7.3) and Python 2.7.
Type make
to build all programs.
Type make run-unittests
to build all unit tests and then run them.
Some tests are time consuming, be patient.
Type make microbenchmarks.rst
to run microbenchmarks.
Type make report-overall.rst
to run performance benchmarks.
Type make spanmaskhistogram.rst
to produce runtime analysis report
for SSE implementation.
There are several programs available in bin
subdirectory.
benchmark
--- test performance of given procedurebenchmark-cpuclocks
--- measure performance of all procedures; display CPU clocksbenchmark-all
--- compare performance of different procedurescompare-singed
andcomapre-unsigned
--- are used to validate if parsers produces the same results as the reference parsercompare-avx512
--- the same as above, but tests only AVX512BW implementationstatistics
--- gather execution statistics from SSE parsers
Apart from these programs, there are several verify_*
executables
that runs various unit tests; they are invoked by make run-tests
.
All the programs generates random data which is then parsed, following command line arguments can be used to control data characteristics:
--size=NUMBER input size (in bytes) --loops=NUMBER how many times a test must be repeated [default: 1] --seed=NUMBER seed for random number generator [default: 0] --num=DISTRIBUTION distribution of lengths of numbers --sep=DISTRIBUTION distribution of lengths of gaps between numbers [default: '1'] --separators=string list of separator characters [default: ",; "] --sign=DISTRIBUTION distribution of sign in front of number [default: '1']
DISTRIBUTION
is a list of weights separated with commas, which defines
distribution of items.
In case of --num
it's the count of decimal digits in a random number.
For instance --num=1,1,1,1
will produce one-, two-, three- or four-digit
numbers with the same probability; --num=0,0,0,1,5,1
will produce four-,
five- or six-digit numbers, but five-digits numbers with probability 5/7.
In case of --sep
it is the distribution of numbers of separator characters
between the generated numbers. The default --sep=1
means there's always
exactly one character; --sep=0,0,1,1,1
would put form 2 to 4 separator chars.
The --sign
defines distribution of set: no-character-sign, '+' and '-'.
Thus the default --sign=1
forces just unsigned numbers; --sign=0,0,1
will force all numbers negative.
- Complete AVX512 implementation to handle scalar fallback.
BSD