terrillmoore/NIST-Statistical-Test-Suite

I found the "Out of Bound" Error in discreteFourierTransform.c

TyeolRik opened this issue · 13 comments

And also I found the "Out of Bound" Error in for loop which is in line 41~42 ./sts/src/discreteFourierTransform.c

for ( i=0; i < n/2; i++ )
    m[i+1] = sqrt(pow(X[2*i+1],2)+pow(X[2*i+2],2));

This line will cause unexpected behavior.
Let's think n = 10, then maximum i will be 4. because i < 10/2
2 * i + 2 = 2 * 4 + 2 = 10 then, but, the range of X is [0, 9] So, this is out of bound.


As far as I know, this is out of bound. And also in C, X[n] would be un-initialized value. (It could be dump variable)
Is there any reason that I don't know C grammer?

Originally posted by @TyeolRik in #9 (comment)

This looks like an error in translating from Fortran to C. The array X is treated as origin 1, but is really origin zero.

This whole sequence looks odd:

m[0] = sqrt(X[0]*X[0]); /* COMPUTE MAGNITUDE */
for ( i=0; i<n/2; i++ )
m[i+1] = sqrt(pow(X[2*i+1],2)+pow(X[2*i+2],2));

Generally, DFFTs return an array of complex numbers. In this case, the even indices of X are probably the real, and the odd indices probably the imaginary, components of the spectrum. The OGG DFFT in this case is quite old (1996) and translated from Fortran.

But although line 39 says "compute magnitude", it's really just an expensive way to set m[0] to abs(X[0]). The comment suggests that this is translated from code that used Fortran complex as the type of X; that's the only way this makes sense at all. But then the walking through X[] at lines 41/42 makes little sense, either, because it implies that X[0] is already a real, and then X[1], X[2] are the real/imaginary values of the first AC coefficient. (X[0] is always the DC component. Possibly the DC component is known always to be real, and so they compressed things by omitting it. But... there's no indication of that in __ogg_fdrfftf() and related code. Ominously, this repo appears to be the only repo containing that function (or that text) that google finds.

Don't know what to say. It might be better to replace this with a modern, documented dfft function, and then fix the code to do the appropriate.

This looks like an error in translating from Fortran to C. The array X is treated as origin 1, but is really origin zero.

This whole sequence looks odd:

m[0] = sqrt(X[0]*X[0]); /* COMPUTE MAGNITUDE */
for ( i=0; i<n/2; i++ )
m[i+1] = sqrt(pow(X[2*i+1],2)+pow(X[2*i+2],2));

Generally, DFFTs return an array of complex numbers. In this case, the even indices of X are probably the real, and the odd indices probably the imaginary, components of the spectrum. The OGG DFFT in this case is quite old (1996) and translated from Fortran.

But although line 39 says "compute magnitude", it's really just an expensive way to set m[0] to X[0]. The comment suggests that this is translated from code that used Fortran complex as the type of X; that's the only way this makes sense at all. But then the walking through X[] at lines 41/42 makes little sense, either, because it implies that X[0] is already a real, and then X[1], X[2] are the real/imaginary values of the first AC coefficient. (X[0] is always the DC component. Possibly the DC component is known always to be real, and so they compressed things by omitting it. But... there's no indication of that in __ogg_fdrfftf() and related code. Ominously, this repo appears to be the only repo containing that function (or that text) that google finds.

Don't know what to say. It might be better to replace this with a modern, documented dfft function, and then fix the code to do the appropriate.

Thank you for your kind reply!! :)

I thought you wrote the whole code and the only one who I can ask about document error. but It was wrong thought.
Then, Would you tell me where I can ask about this document? (I seem to find some errors in Examples I think)

Would you tell me where I can ask about this document? (I seem to find some errors in Examples I think)

Actually, I'm not sure. I grabbed this during the 2019 govt shutdown because I thought someone should. But I've not been able to track down the original authors at NIST.

I did a calculation with excel and the data from https://www.easycalculation.com/engineering/mechanical/discrete-fourier-transform.php.

The code at line 42 does indeed set m[0..50] to values, and m[50] is constructed from data off the end of X[].

But subsequent code only references m[0..49] to calculate P_1.

So although they do seem to index off the array, the result of that computation is discarded in the following lines:

for ( i=0; i<n/2; i++ )
if ( m[i] < upperBound )
count++;

X[n/2 + 1] is not used.

Thank you for helping me. :)

I think all is up to NIST. I will comment later after contact with NIST

Amazing - just downloaded the Nist code and got it working myself. Visual Studio highlights quite a few errors, uninitialized variables and use of obsolete/unsafe methods. I get igamc: UNDERFLOW errors when testing the output of System.Random. The output of UnityEngine.Random is fine though.

I get igamc: UNDERFLOW errors

There is 15 tests in NIST SP800-22. I am not sure but there is some mistakes in official documents. (I don't know official source code is right or wrong.)

If you don't mind, please tell me what were input and test name :)
I can double check if you are ok. (There could be (statistical) input error like block size or something else.)

This is what I used to test with.
Ran all tests with 5,000,000

                using (var f = File.Open("systemRandom.dat", FileMode.Create))
                {
                    for (int i = 0; i < 10000000; i++)
                    {
                        // create a single int in range 0..255
                        var r = new System.Random();
                        var b = (byte)r.Next(0, 256);  // min-inclusive, max-exclusive

                        f.Write(new byte[] { b }, 0, 1);
                    }
                }

If you don't mind please upload bit stream.. I can read C# (a little bit) but, it is hard to compile in my environment now :(

And also, System.Random's result would be changed everytime when you call. So, we need "fixed numbers" to check what is right or wrong

systemRandom.zip

My code was buggy. My data wasn't random at all, but at least it causes the error.

systemRandom.zip

Sorry for late response :( I was busy.. :(

I read your systemRandom.dat which has 80,000,000 bits (10,000,000 bytes).
I am not sure but, the reason why you get igamc: UNDERFLOW is BlockFrequency test.

According to 2.2 Frequency Test within a Block (Page 26)
There is process that compute X^2.
And using that number, X^2, P-value is computed by igamc(N / 2, X^2 / 2)
So, you will get UNDERFLOW result because X^2 is too big, 9.9726900625e+06

If you use default configuration, BlockFrequency 's M=128, the block size, would be used.
but, then N = 625000.

According to 2.2.7 Input Size Recommendation

It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥100). Note that n ≥MN. The block size M should be selected such that M ≥ 20, M > .01n and N < 100.

So, N is too big to test. You should calculate more precisely.

@TyeolRik thanks for your tracking down the UNDERFLOW result.

For reference, the calculations for this in the error case:

  • n = 80000000
  • M =128
  • N = n/M

For M ≥ 20, M > .01n and N < 100, we want to set M = n/100. So:

  • n = 80,000,000
  • M = n / 100 = 800,000
  • N = n/M = 100

However, we need 10 sequences (by default), so we have to divide n by 10.

  • n = 8,000,000
  • M = n/100 = 80,000
  • N =n/M = 100

Then run assess 8000000, and set the block length to 80000.

You still get lots of IGAMC underflows.

@PhilPJL, I looked at SystemRandom.zip. It contains binary data (ok), but it's extremely repetitive:

$ xxd systemRandom.dat | head
00000000: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000010: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000020: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000030: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000040: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000050: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000060: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000070: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000080: 0808 0808 0808 0808 0808 0808 0808 0808  ................
00000090: 0808 0808 0808 0808 0808 0808 0808 0808  ................
$

Please see my comments on #13. You need to generate a string of random coin-flips from your random number generator. Either encode as text, or pack them (8 coin-flips per byte) in binary data.

I think this discussion is answered. Closing. Reopen if needed.