define dataset schema required by femr
duncanmcelfresh opened this issue · 0 comments
duncanmcelfresh commented
Is your feature request related to a problem? Please describe.
it is not clear how to construct a valid dataset for femr. this makes it difficult to run basic tasks (like featurization and pretraining), and makes it difficult to add features and debug.
Describe the solution you'd like
One or both of the following would be helpful:
- Clear documentation about what constitutes a valid femr dataset. Something like:
femr datasets are datasets.Dataset objects (using huggingface's datasets package) with a specific schema: each entry in the dataset has two fields, "patient_id" (int) and "events" (list). each event has ....
- A custom class for femr datasets that handles construction, validation, and other dataset-specific tasks. (I can help with this once (1) is done.)