Learning network structure and probability distribution for prediction output.
NazimR opened this issue · 2 comments
Hi,
I have two questions?
-
How to learn network structure without discretization since the tutorials are using discretized data for structural learning? (sorry if I failed to notice if there any tutorials for structural learning for mixed data without discretization)
-
How to get the output of prediction in term of probability?
For example, the target class of IRIS data are IrisSetosa, IrisVersicolor and IrisVirginica.
The output of "pred = bn.predict(iris)" will be the target class in term of IrisSetosa, IrisVersicolor or IrisVirginica. So, how to get the probability as the output like:
IrisSetosa = 0.2
IrisVersicolor = 0.2
IrisVirginica = 0.6
I really appreciate your help. Thank you :)
Hello NazimiR,
Q1:
We want to mention: You cannot learn structure on data with continuous column without discretization, excluding cases with BIC, MI, AIC scoring functions., so K2 requires descritization, for other scoring function this is optional.
If you just want to get only info
dict, here are how:
Preprocessor executes objects passed into Preprocessor
(like it's done in sci-kit learn
), so if you want to learn network on data you prepared on your own, you can do something like this:
empty_p = pp.Preprocessor([])
empty_p_data, est_e = empty_p.apply(hack) # any your data
We highly recommend to use this way, but you can do entire pipeline of preprocessing yourself:
- Preprocess your data: dropna and make all columns discrete;
- Map them:
Map scheme: Dict["types": Dict[node_name: str], "signs": Dict[node_name: str]]
Possible types: "disc", "cont", "disc_num" (the third category for discrete column consists of numerical categories)
Possible signs: "neg", "pos".
Q2:
Sampling and predicting in terms probas are the next features we want to add in BAMT.
Ref:
Bubnova A. V., Deeva I., Kalyuzhnaya A. V. MIxBN: Library for learning Bayesian networks from mixed data //Procedia Computer Science. – 2021. – Т. 193. – С. 494-503.
In this article more clarifications about scoring functions and data they accept can be found.
Okay. thank you for your nice reference :)