OpenSourceEconomics/respy

Format of comparison plot data in likelihood function

amageh opened this issue · 0 comments

  • respy version used, if any: any
  • Python version, if any: any
  • Operating System: any

Describe the bug

We allow the msm and likelihood functions to return comparison_plot_data which is a DataFrame containing moments/likelihood contributions that are useful for visualizations, etc. The format of the data for the likelihood function seems unintended to me:

If the model includes types, the comparison plot data returns two additional columns: type and log_type_probability. I am not sure if this is implemented correctly, as the resulting DataFrame looks like this (example: kw_97_basic):

identifier period choice value kind type log_type_probability
0 0 0 school -0.020207 choice NaN NaN
1 0 0 school -1.577148 choice NaN NaN
2 0 0 school -1.143622 choice NaN NaN
3 0 0 school -0.018780 choice NaN NaN
4 0 1 school -0.088496 choice NaN NaN
... 0 1 school -0.088496 choice NaN NaN
... ... ... ... ... ... ... ...
21963 1371 0 NaN NaN NaN 3 -3.413402
21964 1372 0 NaN NaN NaN 3 -3.413402
21965 1372 0 NaN NaN NaN 3 -3.413402
21966 1372 0 NaN NaN NaN 3 -3.413402
21967 1372 0 NaN NaN NaN 3 -3.413402

I think this stems from the way the log_type_probabilities look like before they are added to the data, it seems to me like they contain duplicates? I am not entirely sure how the final DataFrame is supposed to look like though, so I am not sure if this is even an issue.

To reproduce

Construct likelihood function with comparison plot data for a model with types (e.g. kw_97_basic)