Enhancement: write a TD;DR file on how to run tests
smclinden opened this issue · 7 comments
So I have a file of 108196287 lines of 8-byte strings converted to ASCII 0/1.
How do I run the tests on this file?
I was never able to get sts-2.1.2 to run without alloc errors.
Hello smclinden,
To start, we would recommend that you convert those ASCII "0" and "1" bytes into single binary bits, and strip out all whitespace (spaces, tabs, newlines, etc.) so that you are processing ONLY a raw binary file. That will require less memory to process as well.
That's easy enough. But what would be the arguments to sts?
I could shorten it. The issue is that the data is a set of 8-byte codes that are supposed to be random but I have reason to believe that the PRNG is flawed (it appears that someone has figured out how to generate valid additional codes based upon what is already known).
I can break the data up but I would need to know the appropriate strategy for doing so.
I have a similar file with 1000 datapoints of 32-bits each. Should I create a newline with 32-bits value for each data point? What should be the value of datastream in './assess '?
I have a similar file with 1000 datapoints of 32-bits each. Should I create a newline with 32-bits value for each data point? What should be the value of datastream in './assess '?
The problem is that you only have 32000 bits if data, which is a very small sample on which to make a meaningful measurement. If I were trying to look at the quality of the data, I would try for at least 1 000 000 such 32-bit data points.
However if you insist on testing such a small amount of data:
# generate 32000 bits from a lower quality source /dev/random
# bs=4 is 4 bytes or 32 bits
dd if=/dev/random of=binary.file bs=4 count=1000
/usr/local/bin/sts -S 1000 -i 32 binary.file
assuming that binary.file
contains 32000 raw binary bits as the example shows.
You will notice that a number of sub-tests are disabled. For example the above run, the following warnings were produced:
Warning: Rank_init: disabling test Rank[6]: requires number of matrices(matrix_count): 0 >= 38
Warning: OverlappingTemplateMatchings_init: disabling test OverlappingTemplate[9]: requires bitcount(n): 1000 >= 1000000
Warning: Universal_init: disabling test Universal[10]: requires bitcount(n): 1000 >= 387840 for L >= 6
Warning: ApproximateEntropy_init: disabling test ApproximateEntropy[11]: requires block length(m): 10 >= 4
Warning: RandomExcursions_init: disabling test RandomExcursions[12]: requires bitcount(n): 1000 >= 1000000
Warning: RandomExcursionsVariant_init: disabling test RandomExcursionsVariant[13]: requires bitcount(n): 1000 >= 1000000
Warning: Serial_init: disabling test Serial[14]: requires block length(m): 16 >= 7
Warning: LinearComplexity_init: disabling test LinearComplexity[15]: requires bitcount(n): 1000 >= 1000000
So about 1/2 of the models cannot even get started to evaluate data due to the small data sample size.
If you had 1 000 000 32-bit data points as in:
```sh
# generate 32000000 bits from a lower quality source /dev/random
# bs=4 is 4 bytes or 32 bits
dd if=/dev/random of=binary.file bs=4 count=1000000
/usr/local/bin/sts -S 1000000 -i 32 binary.file
then the result.txt
file would be more useful and statistically meaningful.
We hope this helps @nivi1501
We do plan to write a TL:DR
file on how to run tests. Sorry we have been busy on a number of other projects.