an issue using a Quasi-Monte Carlo sampler
Opened this issue · 5 comments
Hi
An issue using a Quasi-Monte Carlo sampler with discrete uniform (values from ChaosPy distributions)
For example, consider the following scenario:
params = {
"X": {"type": "integer", "min": 0, "max": 20, "default": 10},
"Y": {"type": "integer", "min": 0, "max": 30, "default": 20},
"out_file": {"type": "string", "default": "output.csv"}
}
vary = {
"X": cp.DiscreteUniform(1, 15),
"Y": cp.DiscreteUniform(5, 25)
}
will give negative samples (needless to say it works very well with only Uniform distribution), and the reason could be found in the file easyvvuq/sampling/qmc.py where
dist_U = []
for i in range(self.n_params):
dist_U.append(cp.Uniform())
dist_U = cp.J(*dist_U)
only a Uniform distribution case is implemented, so either you need to add a DiscreteUniform case as well or add some assertion there!
thanks.
Hi @kbronik2017
Where do you get negative values? in sampler._samples
?
Yes it is, if you add for example "print" statement after the following line:
self._samples = self.distribution.inv(dist_U.fwd(nodes.transpose()))
print("self._samples")
it will print the negative samples
print("samples:", self._samples)
If I execute this:
import chaospy as cp
import easyvvuq as uq
params = {
"X": {"type": "integer", "min": 0, "max": 20, "default": 10},
"Y": {"type": "integer", "min": 0, "max": 30, "default": 20},
"out_file": {"type": "string", "default": "output.csv"}
}
vary = {
"X": cp.DiscreteUniform(1, 15),
"Y": cp.DiscreteUniform(5, 25)
}
sampler = uq.sampling.QMCSampler(vary, 10)
print(sampler._samples)
I obtain:
[[ 3.79589844 8.27832031 3.79589844 8.27832031 11.29589844 0.77832031
11.29589844 0.77832031 15.04589844 12.02832031 15.04589844 12.02832031
7.54589844 4.52832031 7.54589844 4.52832031 5.67089844 2.65332031
5.67089844 2.65332031 13.17089844 10.15332031 13.17089844 10.15332031
9.42089844 6.40332031 9.42089844 6.40332031 1.92089844 13.90332031
1.92089844 13.90332031 1.45214844 1.24707031 1.45214844 1.24707031
8.95214844 8.74707031 8.95214844 8.74707031]
[ 6.53027344 6.53027344 18.71191406 18.71191406 17.03027344 17.03027344
8.21191406 8.21191406 11.78027344 11.78027344 23.96191406 23.96191406
22.28027344 22.28027344 13.46191406 13.46191406 14.40527344 14.40527344
10.83691406 10.83691406 24.90527344 24.90527344 21.33691406 21.33691406
9.15527344 9.15527344 5.58691406 5.58691406 19.65527344 19.65527344
16.08691406 16.08691406 8.49902344 8.49902344 14.11816406 14.11816406
18.99902344 18.99902344 24.61816406 24.61816406]]
So no negative values. What is your version of ChaosPy?
However, the other obvious problem is that the resulting samples are not discrete. If I check the chaospy documentation, it seems that the DiscreteUniform variables are still treated as continuous under the hood:
distribution = chaospy.DiscreteUniform(2, 4)
qloc = numpy.linspace(0, 1, 9)
distribution.inv(qloc).round(2)
array([1.5 , 1.88, 2.25, 2.62, 3. , 3.38, 3.75, 4.12, 4.5 ])
So the inverse transform (which is also used in the QMCSampler) returns non-integer values. The other problem with the QMCSampler
is that it uses an external subroutine from SALib
to generate the (Saltelli) sampling plan. This external subroutine only asks for the bounds, not the type of input, e.g. discrete or continuous.
We also have a Monte Carlo sampler (MCSampler
), which has a built-in Saltelli sampler. If I run:
import chaospy as cp
import easyvvuq as uq
params = {
"X": {"type": "integer", "min": 0, "max": 20, "default": 10},
"Y": {"type": "integer", "min": 0, "max": 30, "default": 20},
"out_file": {"type": "string", "default": "output.csv"}
}
vary = {
"X": cp.DiscreteUniform(1, 15),
"Y": cp.DiscreteUniform(5, 25)
}
sampler = uq.sampling.MCSampler(vary, 10)
print(sampler.xi_mc)
I get
[[ 7. 17.]
[11. 17.]
[ 7. 13.]
[11. 13.]
[15. 7.]
[ 7. 7.]
[15. 21.]
[ 7. 21.]
[ 3. 9.]
[14. 9.]
[ 3. 24.]
[14. 24.]
[ 6. 21.]
[ 4. 21.]
[ 6. 11.]
[ 4. 11.]
[ 2. 20.]
[13. 20.]
[ 2. 25.]
[13. 25.]
[13. 22.]
[ 1. 22.]
[13. 9.]
[ 1. 9.]
[ 5. 21.]
[12. 21.]
[ 5. 13.]
[12. 13.]
[ 4. 13.]
[ 9. 13.]
[ 4. 8.]
[ 9. 8.]
[11. 14.]
[ 5. 14.]
[11. 8.]
[ 5. 8.]
[ 5. 18.]
[ 3. 18.]
[ 5. 19.]
[ 3. 19.]]
So I would stick with the MCsampler for now. I even think we could delete the QMCSampler
alltogether. The only difference between MCSampler
and QMCSampler
is that the latter uses some space-filling design from SALib (maybe a Sobol sequence or a Latin Hypercube). However, this option is also available in ChaosPy, it's just a flag that we need to toggle. I'll raise this in a separate issue.
Hi,
Thank you (and non-integer values, yes, I forgot to mention it as well, thank you for pointing that out), actually, the settings were exactly as the follows (which did results in negative samples, version of ChaosPy==4.3.2)
params = {
"Rayleigh": {"type": "integer", "min": 0, "max": 2000, "default": 1},
"Prandtl": {"type": "integer", "min": 0, "max": 8, "default": 7},
"DiffusionCoefficient": {"type": "integer", "min": 0, "max": 2, "default": 1},
"Temperature": {"type": "integer", "min": 0, "max": 81, "default": 1},
"out_file": {"type": "string", "default": "output.csv"}
}
vary = {
"Rayleigh": cp.DiscreteUniform(0, 1500),
"Prandtl": cp.DiscreteUniform(1, 8),
"DiffusionCoefficient": cp.DiscreteUniform(0, 2),
"Temperature": cp.DiscreteUniform(1, 80)
}
But it does not matter now, as you suggested, using the Monte Carlo sampler instead of QMC would be the best choice.
Best
Kevin