JuliaMath/Sobol.jl

Skipping on powers of 2

Timeroot opened this issue · 5 comments

I believe your "skip" function should be altered to actually skip one less than it currently does. As a simple benchmark, if I take a vector of floats and convert each to boolean by testing with >=0.5, I would like it to be the case that a set of 2^n samples covers each binary vector exactly once. This works if I skip 2^n - 1 times:

s = SobolSeq(4); for u=1 : 15;next!(s);end

sort(map(bs ->
        sum(map(x->x[2] >= 0.5 ? x[1] : 0, zip(2 .^ [0 1 2 3], bs))),
        map(x->next!(s), 1:16)
        ))

returns all numbers from 0 to 15 exactly once. If I instead run

s = SobolSeq(4);for u=1:16;next!(s);end

sort(map(bs ->
        sum(map(x->x[2] >= 0.5 ? x[1] : 0, zip(2 .^ [0 1 2 3], bs))),
        map(x->next!(s), 1:16)
        ))

then I get a repeat of 12, and no instance of 8. But skipping the 2^n - 1 seems to produce correct results for all cases I tried.

The advice of "skip the largest power of 2 below your desired one" seems quite bad, actually. Suppose I only want to produce 15 values of 4-bits -- I would hope to get no repeats, and one missing; maybe something less perfect than that. But it's actually quite terrible!

julia> s = SobolSeq(4);skip(s,15)

julia> sort(map(bs ->
        sum(map(x->x[2] >= 0.5 ? x[1] : 0, zip(2 .^ [0 1 2 3], bs))),
        map(x->next!(s), 1:15)
        ))
15-element Array{Int64,1}:
  4
  5
  5
  6
  6
  7
  7
  8
  8
  9
  9
 10
 10
 11
 11

That is to say, every vector produced has (bit 2 XOR bit 3) true! That should only be true in half of them! This is the result of only skipping 8. If you skip 15, as I suggest, then you get better results. (And, again: this seems true at various higher numbers as well.)

Sounds good to me.

I double-checked the original papers, just in case we had mis-copied the rule. Ackworth (1998) writes:
image
Joe & Kuo (2003) write:
image

Joe & Kuo have some later (2008) notes which say:
image

I'm suspect that there was better advice at some point along the line that got garbled/oversimplified?

Here's another example, generating n=8 points in 2d:
image
Skipping 7 is clearly superior in terms of symmetry to skipping none, skipping 8 (current behavior), or skipping 4 (the largest power of two less than n, which is what the references literally suggest.