Format of comparison plot data in likelihood function

Question

Format of comparison plot data in likelihood function

amageh opened this issue 4 years ago · 0 comments

respy version used, if any: any
Python version, if any: any
Operating System: any

Describe the bug

We allow the msm and likelihood functions to return comparison_plot_data which is a DataFrame containing moments/likelihood contributions that are useful for visualizations, etc. The format of the data for the likelihood function seems unintended to me:

If the model includes types, the comparison plot data returns two additional columns: type and log_type_probability. I am not sure if this is implemented correctly, as the resulting DataFrame looks like this (example: kw_97_basic):

	identifier	period	choice	value	kind	type	log_type_probability
0	0	0	school	-0.020207	choice	NaN	NaN
1	0	0	school	-1.577148	choice	NaN	NaN
2	0	0	school	-1.143622	choice	NaN	NaN
3	0	0	school	-0.018780	choice	NaN	NaN
4	0	1	school	-0.088496	choice	NaN	NaN
...	0	1	school	-0.088496	choice	NaN	NaN
...	...	...	...	...	...	...	...
21963	1371	0	NaN	NaN	NaN	3	-3.413402
21964	1372	0	NaN	NaN	NaN	3	-3.413402
21965	1372	0	NaN	NaN	NaN	3	-3.413402
21966	1372	0	NaN	NaN	NaN	3	-3.413402
21967	1372	0	NaN	NaN	NaN	3	-3.413402

I think this stems from the way the log_type_probabilities look like before they are added to the data, it seems to me like they contain duplicates? I am not entirely sure how the final DataFrame is supposed to look like though, so I am not sure if this is even an issue.

To reproduce

Construct likelihood function with comparison plot data for a model with types (e.g. kw_97_basic)