mp2893/gram

Low-frequency labels hard to predict

Opened this issue · 3 comments

Hello Dr. Choi,

Thanks for the nice work. I generated the CCS single-level labels as the target, and used your code to predict them. All hyperparameters are set according to the appendix. I group labels to five groups according to their frequencies (first rank all labels by their frequencies, and then equally divide them into five groups). But my results have some differences from those in the paper. I got [0, 0.01835, 0.0811, 0.3042, 0.8263] accuracies for the five groups, respectively. I noticed that I got higher accuracies for high-frequency labels, but cannot match the paper's accuracies for labels with frequency percentile [0-60]. Is there anything I have done wrongly? Furthermore, I found the frequency of labels in the first group (rarest) is only 0.16% out of all labels' frequencies (163/96677). I am wondering is this the correct way to divide to five groups?

Thanks,
Muhan

Hi Muhan,

Thanks for taking interest in the work.
Based on the information you've provided, it's nearly impossible to determine what caused the difference between my results and yours.
Some of the immediate questions that come to mind are:
-Did you use MIMIC-III?
-Did you pre-process MIMIC-III with preprocess_mimic.py?
-Did you use my code or implement your own?
-If you had used my code, did you use the same Theano version as I did?
-Did you use Accuracy@1, or Accuracy@20?
There are probably a lot more questions to follow, but there must be something you are doing differently than I did. Because even if the answers to all the questions above are yes, it's still weird that you couldn't outperform even the worst baseline RandomDAG (if you were indeed using MIMIC-III that is).
It's very likely this discussion is going to take some time, so it's better to have a phone call or a video chat than go back and forth here. If you are interested, please send me an email (mp2893@gmail.com) so we can set up a time.

Best,
Ed

Hi Edward,

Thank you for your reply. Sorry for the confusions in my question. Some quick answers to your questions are as follows:

-Yes, I did use MIMIC-III.
-I did modify preprocess_mimic.py to additionally generate sequences of CCS single-level labels as target.
-I used your code gram.py. But I did not use the same Theano version as you did (mine is v1.03). I downgrade to v0.8.2 and get slightly better results [0.0, 0.07339, 0.1867, 0.3294, 0.8194] now.
-I calculated Accuracy@20 (my understanding is, for every nonzero entry z in every real label vector y, counter[z] += 1 if top 20 indices of y_hat contain z; accuracy_z_@20=counter[z] / #occurrences[z]; similarly we can calculate accuracy@20 for groups of labels by (sum_z counter[z]) / (sum_z #occurrences[z]). )

To clarify my question, I saw that my optimization now focused more on high-frequency labels, yet hardly learned to predict extremely rare labels. In fact, I got an overall accuracy@20 = 0.71 (ensemble all labels in one group) which is even higher than your GRAM+ results (0.6267 for [80-100], thus the overall accuracy@20 must be lower than 0.6267). Since I used your code, but got higher results than yours, I was wondering whether my understanding of the grouping scheme or the accuracy calculations, etc. might be wrong, which led to different calculations of the results. Or should I use different validation criterion like selecting the epoch with the highest [0-20] validation accuracy instead of the validation cost?

Hope my issues do not bother you much. Of course, I'd like to set up a phone call anytime at your convenience if things are too complicated to explain here. You may send me your times to muhan@wustl.edu.

Many thanks,
Muhan

Hi Edward,

Thank you for your reply. Sorry for the confusions in my question. Some quick answers to your questions are as follows:

-Yes, I did use MIMIC-III.
-I did modify preprocess_mimic.py to additionally generate sequences of CCS single-level labels as target.
-I used your code gram.py. But I did not use the same Theano version as you did (mine is v1.03). I downgrade to v0.8.2 and get slightly better results [0.0, 0.07339, 0.1867, 0.3294, 0.8194] now.
-I calculated Accuracy@20 (my understanding is, for every nonzero entry z in every real label vector y, counter[z] += 1 if top 20 indices of y_hat contain z; accuracy_z_@20=counter[z] / #occurrences[z]; similarly we can calculate accuracy@20 for groups of labels by (sum_z counter[z]) / (sum_z #occurrences[z]). )

To clarify my question, I saw that my optimization now focused more on high-frequency labels, yet hardly learned to predict extremely rare labels. In fact, I got an overall accuracy@20 = 0.71 (ensemble all labels in one group) which is even higher than your GRAM+ results (0.6267 for [80-100], thus the overall accuracy@20 must be lower than 0.6267). Since I used your code, but got higher results than yours, I was wondering whether my understanding of the grouping scheme or the accuracy calculations, etc. might be wrong, which led to different calculations of the results. Or should I use different validation criterion like selecting the epoch with the highest [0-20] validation accuracy instead of the validation cost?

Hope my issues do not bother you much. Of course, I'd like to set up a phone call anytime at your convenience if things are too complicated to explain here. You may send me your times to muhan@wustl.edu.

Many thanks,
Muhan

Hi,Muhan:
Have you got how to calculate accuracy by groups? I have some problems about how to calculate the accuracy@k score of each group. I don't know which of the following two is right: 1.For each frequency group, the top20 score are selected to compare with the real label,and then calculate accuracy@20 individually. 2.Select the in the top 20 index of all labels,and then calculate which group the 20 indexs belong to,and compare with the labels. I hope you can help me if you know

Many thanks,
Oldpants