arcetri/sts

Enhancement: write a TD;DR file on how to run tests

smclinden opened this issue · 7 comments

So I have a file of 108196287 lines of 8-byte strings converted to ASCII 0/1.

How do I run the tests on this file?

I was never able to get sts-2.1.2 to run without alloc errors.

lcn2 commented

Hello smclinden,

To start, we would recommend that you convert those ASCII "0" and "1" bytes into single binary bits, and strip out all whitespace (spaces, tabs, newlines, etc.) so that you are processing ONLY a raw binary file. That will require less memory to process as well.

That's easy enough. But what would be the arguments to sts?

I could shorten it. The issue is that the data is a set of 8-byte codes that are supposed to be random but I have reason to believe that the PRNG is flawed (it appears that someone has figured out how to generate valid additional codes based upon what is already known).

I can break the data up but I would need to know the appropriate strategy for doing so.

I have a similar file with 1000 datapoints of 32-bits each. Should I create a newline with 32-bits value for each data point? What should be the value of datastream in './assess '?

lcn2 commented

I have a similar file with 1000 datapoints of 32-bits each. Should I create a newline with 32-bits value for each data point? What should be the value of datastream in './assess '?

The problem is that you only have 32000 bits if data, which is a very small sample on which to make a meaningful measurement. If I were trying to look at the quality of the data, I would try for at least 1 000 000 such 32-bit data points.

However if you insist on testing such a small amount of data:

# generate 32000 bits from a lower quality source /dev/random
# bs=4 is 4 bytes or 32 bits
dd if=/dev/random of=binary.file bs=4 count=1000

/usr/local/bin/sts -S 1000 -i 32 binary.file

assuming that binary.file contains 32000 raw binary bits as the example shows.

You will notice that a number of sub-tests are disabled. For example the above run, the following warnings were produced:

Warning: Rank_init: disabling test Rank[6]: requires number of matrices(matrix_count): 0 >= 38
Warning: OverlappingTemplateMatchings_init: disabling test OverlappingTemplate[9]: requires bitcount(n): 1000 >= 1000000
Warning: Universal_init: disabling test Universal[10]: requires bitcount(n): 1000 >= 387840 for L >= 6
Warning: ApproximateEntropy_init: disabling test ApproximateEntropy[11]: requires block length(m): 10 >= 4
Warning: RandomExcursions_init: disabling test RandomExcursions[12]: requires bitcount(n): 1000 >= 1000000
Warning: RandomExcursionsVariant_init: disabling test RandomExcursionsVariant[13]: requires bitcount(n): 1000 >= 1000000
Warning: Serial_init: disabling test Serial[14]: requires block length(m): 16 >= 7
Warning: LinearComplexity_init: disabling test LinearComplexity[15]: requires bitcount(n): 1000 >= 1000000

So about 1/2 of the models cannot even get started to evaluate data due to the small data sample size.

If you had 1 000 000 32-bit data points as in:

```sh
# generate 32000000 bits from a lower quality source /dev/random
# bs=4 is 4 bytes or 32 bits
dd if=/dev/random of=binary.file bs=4 count=1000000

/usr/local/bin/sts -S 1000000 -i 32 binary.file

then the result.txt file would be more useful and statistically meaningful.

We hope this helps @nivi1501

lcn2 commented

We do plan to write a TL:DR file on how to run tests. Sorry we have been busy on a number of other projects.