Google-Health/records-research

Larger than 1 conditional probability

zhanghaijason opened this issue · 1 comments

Hello,
From the functions "build_seqex" and "count_conditional_prob_dp", I can see that there might be duplicate diagnosis codes or medication codes. And your default setting is to keep these duplicates can do the coutning. This will cause the probability larger than 1. Did you keep it as it is or set it to 1 if it is larger than 1? For example, we have 3 encounters in the dataset.
Encounter Diagnoses. Medications
1 [A, B, A] [1, 2]
2 [A, E, F] [1, 5, 8]
3 [A] [3]

P(A) = (2+ 1 + 1) / 3 = 4/3,
P(1) = (1 + 1) / 3 = 2/3,
P(A1) = (2 + 1) / 3 = 1
P(A|1) = P(A1)/P(1) = 1/ (2/3) = 1.5
P(1|A) = P(A1) / P(A) = ¾

Hi, unfortunately the GCT code is no longer maintained. From #6, @mp2893 (the original author) now recommends using DescEmb for new experiments where possible.