Format of comparison plot data in likelihood function
amageh opened this issue · 0 comments
- respy version used, if any: any
- Python version, if any: any
- Operating System: any
Describe the bug
We allow the msm and likelihood functions to return comparison_plot_data
which is a DataFrame containing moments/likelihood contributions that are useful for visualizations, etc. The format of the data for the likelihood function seems unintended to me:
If the model includes types, the comparison plot data returns two additional columns: type
and log_type_probability
. I am not sure if this is implemented correctly, as the resulting DataFrame looks like this (example: kw_97_basic
):
identifier | period | choice | value | kind | type | log_type_probability | |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | school | -0.020207 | choice | NaN | NaN |
1 | 0 | 0 | school | -1.577148 | choice | NaN | NaN |
2 | 0 | 0 | school | -1.143622 | choice | NaN | NaN |
3 | 0 | 0 | school | -0.018780 | choice | NaN | NaN |
4 | 0 | 1 | school | -0.088496 | choice | NaN | NaN |
... | 0 | 1 | school | -0.088496 | choice | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
21963 | 1371 | 0 | NaN | NaN | NaN | 3 | -3.413402 |
21964 | 1372 | 0 | NaN | NaN | NaN | 3 | -3.413402 |
21965 | 1372 | 0 | NaN | NaN | NaN | 3 | -3.413402 |
21966 | 1372 | 0 | NaN | NaN | NaN | 3 | -3.413402 |
21967 | 1372 | 0 | NaN | NaN | NaN | 3 | -3.413402 |
I think this stems from the way the log_type_probabilities
look like before they are added to the data, it seems to me like they contain duplicates? I am not entirely sure how the final DataFrame is supposed to look like though, so I am not sure if this is even an issue.
To reproduce
Construct likelihood function with comparison plot data for a model with types (e.g. kw_97_basic
)