owkin/FLamby

Synthetic datasplit

omarfoq opened this issue · 1 comments

The purpose of this issue is to discuss the possibility to include synthetic splits in the repository, as suggested during the review process. Please find below a suggestion with simple synthetic splits.

  • Camelyon16: Symmetric Beta distribution over the labels
  • LIDC-IDRI: Symmetric Dirichlet distribution over the manufacturer identifier
  • IXI: Symmetric Dirichlet distribution over the manufacturer identifier
  • TCGA-BRCA: Symmetric Dirichlet distribution over the region
  • KITS2019: Symmetric Dirichlet distribution over the sites
  • ISIC2019: Symmetric Dirichlet distribution over the labels
  • Heart-Disease: Symmetric Beta distribution over the labels

Thanks for the suggestion @omarfoq . I missed your post and @jeandut already implemented a simple method which splits each center into sub-centers (without worrying too much about border effects). I also added the possibility to split based on a Dirichlet sampling mechanism (on the original client ids). Please do not hesitate to make a PR with the classes as well.