try split dataset by self
aguang1201 opened this issue · 3 comments
hi,
bruce.
version 0.3.0 is very good.Congratulations!
I want to share one of the problems I encountered.
I setted the config as:
use_default_split=false,
train_patient_count=29405,
dev_patient_count=1400,
split_dataset_random_state=2
I increase the training dataset count.I think it can improve the mean AUC in result,but it's not.
The result is:mean auroc: 0.7680469487585274.Less than default setting result.
I do not understand why.Would you tell me why default split worked better?
And how to set the number of train_patient_count for improving the AUC.
Thank you for your nice work.
@aguang1201 use_default_split option is deprecated in 0.3.0. please specify your own dataset split by using the new option dataset_csv_dir. Please check the sample.config.ini for the detail. I make this decision because many people find it confusing. I will update the source code to alarm people who use these deprecated options. Thank you.
@brucechou1983
Thank you for your reply.
Unfortunately, it was deleted.I think splitting dataset is very useful.
But it does not matter, I can add it in my code.
I just want to look into why I increased the training data, AUC but lower.
Is the amount of train data is enough,Or the dev data is not enough?
This is really hard to figure out.
In addition:Have you tried NASNET?
Thanks.
@aguang1201 Couldn't understand you. Since this issue is resolved, I will close it. You could send me email if you have adhoc questions.