Synthetic datasplit
omarfoq opened this issue · 1 comments
omarfoq commented
The purpose of this issue is to discuss the possibility to include synthetic splits in the repository, as suggested during the review process. Please find below a suggestion with simple synthetic splits.
- Camelyon16: Symmetric Beta distribution over the labels
- LIDC-IDRI: Symmetric Dirichlet distribution over the manufacturer identifier
- IXI: Symmetric Dirichlet distribution over the manufacturer identifier
- TCGA-BRCA: Symmetric Dirichlet distribution over the region
- KITS2019: Symmetric Dirichlet distribution over the sites
- ISIC2019: Symmetric Dirichlet distribution over the labels
- Heart-Disease: Symmetric Beta distribution over the labels
Deleted user commented
Thanks for the suggestion @omarfoq . I missed your post and @jeandut already implemented a simple method which splits each center into sub-centers (without worrying too much about border effects). I also added the possibility to split based on a Dirichlet sampling mechanism (on the original client ids). Please do not hesitate to make a PR with the classes as well.