Checking CPD with a tolerance
AlexandreDubray opened this issue · 4 comments
Hello,
For some data sets coming from the bnlearn repository, building the models yield warning that some CPD does not sum up to 1.
It has been said in #13 that for some data sets there are inconsistencies in the data, but it is not always the case. For example, in the hailfinder data set there is this CPD:
probability ( TempDis | Scenario ) {
(A) 0.13, 0.15, 0.10, 0.62;
(B) 0.15, 0.15, 0.25, 0.45;
(C) 0.12, 0.10, 0.35, 0.43;
(D) 0.10, 0.15, 0.40, 0.35;
(E) 0.04, 0.04, 0.82, 0.10;
(F) 0.05, 0.12, 0.75, 0.08;
(G) 0.03, 0.03, 0.84, 0.10;
(H) 0.05, 0.40, 0.50, 0.05;
(I) 0.80, 0.19, 0.00, 0.01;
(J) 0.10, 0.05, 0.40, 0.45;
(K) 0.2, 0.3, 0.3, 0.2;
}
which is perfectly fine but fails to be built correctly. In particular the fifth row is seen as not sum up to one because, in my python shell, I have
>>> 0.04 + 0.04 + 0.82 + 0.1
0.9999999999999999
>>>
Altough the file is perfectly fine, warnigs are emitted. I think that the comparison should allow a small deviation from 1 in order to accomodate such float representation problems.
True. Floating Point Errors are not fixed.
Do you have a suggestion for a fix? I can use Decimal
but I am not a fan of it.
from decimal import Decimal
nums = [0.04, 0.04, 0.82, 0.1]
float(np.sum(list(map(lambda x: Decimal(str(x)), nums))))
1.0
I guess the easiest way would be to check something along the line of abs(1 - sum(nums)) < 0.00001
(the threshold is given as example). If this is just to check that the CPDs are correct, then it should be enough.
If there are some precision problems during queries on the networks (if they are very large), then maybe advanced floats should be used (altough for most use cases basic float should be fine)
I changed the type to Decimal (I think this is the cleanest fix) before checking whether it sums up to exactly one.
Can you check whether this solves your issue? Update to the latest version (>= 0.7.7) with:
pip install -U bnlearn
I am closing this issue. Please re-open if required.