Dataset API and Configuration

Question

Dataset API and Configuration

mmcdermott opened this issue a month ago · 7 comments

Answer 1 · 2024-10-14T18:41:19.000Z

This function is not generally needed and can be deleted I expect: https://github.com/Oufattole/meds-torch/blob/main/src/meds_torch/data/components/pytorch_dataset.py#L699

Answer 2 · 2024-10-14T18:44:12.000Z

This can be removed: https://github.com/Oufattole/meds-torch/blob/main/src/meds_torch/data/components/pytorch_dataset.py#L660

Instead, it can be done just within pytorch lightning or other things where you just stop the dataloader after a given number of batches (or actually even just setting the length manually). With the set stats gone it won't add any bias.

Answer 3 · 2024-10-14T18:45:33.000Z

Can delete this as well; should happen in a pre-step for MEDS-transforms: https://github.com/Oufattole/meds-torch/blob/main/src/meds_torch/data/components/pytorch_dataset.py#L625

Answer 4 · 2024-10-14T18:47:25.000Z

Delete this: https://github.com/Oufattole/meds-torch/blob/main/src/meds_torch/data/components/pytorch_dataset.py#L588
Label schema should cover this and if not we should make it so

Answer 5 · 2024-10-14T18:48:54.000Z

Simplify this, but it needs to keep: https://github.com/Oufattole/meds-torch/blob/main/src/meds_torch/data/components/pytorch_dataset.py#L469

Answer 6 · 2024-10-14T18:54:23.000Z

delete this https://github.com/Oufattole/meds-torch/blob/main/src/meds_torch/data/components/pytorch_dataset.py#L33

Answer 7 · 2024-10-14T18:55:23.000Z

this can go https://github.com/Oufattole/meds-torch/blob/main/src/meds_torch/data/components/pytorch_dataset.py#L392 because it is all binary classification and in the label schema