mraggi/discreture

Combinations avoiding patterns

Opened this issue · 3 comments

Thank you for the great library.

Given your experiment, is there (or is it possible to implement) any algorithm or container interface that enumerates all combinations in a pseudo-random order (or at least avoiding patterns) without resampling?

Of course, apart from the impracticable alternative of keeping all possible combinations in memory and shuffling them.

This can be very useful for hypothesis testing and generate-and-test algorithms.

For instance, if we had these combinations:

$c_1$ $c_2$ $c_3$
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 1 2
0 2 0
0 2 1
0 2 2
1 0 0
1 0 1
1 0 2
1 1 0
1 1 1
1 1 2
1 2 0
1 2 1
1 2 2
2 0 0
2 0 1
2 0 2
2 1 0
2 1 1
2 1 2
2 2 0
2 2 1
2 2 2

I'm wondering if there would be a container interface that would return these combinations in random order (or at least an order that avoids patterns) but still being a container interface (without keeping all combinations in memory and shuffling a list of indexes).

In statistics, we could use that for randomized experiment control. In generate-and-test heuristics, this would be a systematic way of enumerating the search space without getting stuck in a region of the objective function.

Thank you.

Well, I don't know of a fast method, but combinations right now supports indexing. That is, you can ask it for the i-th combination, so if you simply shuffle the integers {0,1,...,binomial(n,k)-1} and then do

auto X = combinations(n,k);
for (size_t i = 0; i < X.size(); ++i)
{
    auto combination = X[S[i]];
}

where S = shuffled indices.

It won't be fast (only about a million combinations per second) but maybe it doesn't matter.

Thanks. I just realized what I meant was "non-decreasing sequences"/"combinations with repetition" but I don't think it makes much difference in this context.

If there were a way to access these sequences/combinations in random order (or pseudo-random, or at least avoiding patterns) without the shuffled indexes, this could be useful in many environments.

Whenever the number of sequences grows exponentially on the number of items (factorial hypothesis tests and generate-and-test algorithms), it's not viable to keep a list of random indexes and, because of that, people usually recur to algorithms that resample, which is very inefficient and it's impossible to know if the algorithm been through all sequences/combinations.

If you know of any algorithm we can put in a container interface to do that, please let me know. Maybe I can help implement this container.

Hello @alandefreitas,

As mentioned in the README, combinations with repetitions isn't currently available. Until then, your problem can easily be attacked by doing the following:

  1. Get the total number of combinations w/ repetition (assuming from a set of size n chosen k at a time). This is given by the binomial coefficient (n + k - 1, k).
  2. Next, you can generate a random sample S of integers using the number from point 1 as the upper bound.
  3. Finally, you can generate the jth combination with repetition for all j in S.

You can alter the method below for the case with repetition:

static inline void construct_combination(combination& data, size_type m)
{
IntType k = data.size();
// this is the biggest for which binomial is still well defined.
// Hopefully it's enough for most use cases.
size_type upper = 68;
for (IntType r = k; r > 1; --r)
{
IntType t;
big_integer_interval NR(r, upper);
t = NR.partition_point(
[m, r](auto x) { return binomial<size_type>(x, r) <= m; }) -
1;
data[r - 1] = t;
upper = t;
m -= binomial<size_type>(t, r);
}
if (k > 0)
data[0] = m;
}

@mraggi, please correct me if I am wrong.

Hope that helps!