/biohackathon-project-35

Repository for developing the project 35 - FAIRX: Quantitative bias assessment in ELIXIR biomedical data resources - for the 2021 Elixir biohackathon

Primary LanguageHTMLMIT LicenseMIT

Project 35: FAIRX: Quantitative bias assessment in ELIXIR biomedical data resources

Abstract

The design of AI systems for health is a grand achievement of science and technology of our times. Nevertheless, such systems learn to perform specific tasks by processing extensive amounts of data that is produced and stored in large biomedical repositories. The quality and content of this data have an immense impact on what and how AI learns. If the data contains biases, such as skewed representation of certain categories or missing information, the application of AI can lead to discriminatory outcomes and propagate them into society, as we recently pointed out (Cirillo et al. NPJ Digit Med. 2020 doi:10.1038/s41746-020-0288-5). The aim of our project is to determine the extent of biases in available demographic categories (sex, age, race) in ELIXIR biomedical data repositories, which are largely used in the community to train AI systems. We aim to quantify bias and provide recommendations on how to properly use the data to develop fair and trustworthy AI, including solutions and best practices. We have recently collected endorsement and support regarding this project from representatives of several ELIXIR platforms, communities and focus groups, namely Data platform, Human Data Communities, Diversity, Equity, & Inclusion group, Impact group, Industry group and Communication.

Topics

Cancer Data Platform Federated Human Data Human Copy Number Variation Machine learning Rare Disease

Project Number: 35

EasyChair Number: 61

Team

Lead(s)

Davide Cirillo davide.cirillo@bsc.es Nataly Buslón nataly.buslon@bsc.es

Expected outcomes

Task 1. Quantification of bias in selected resources Task 2. Evaluation of social and ethical impact

Expected audience

ELIXIR data resources representatives especially designers, developers and data miners Computer scientists with database skills including development and data management Researchers in computational biology with strong programming background Researchers in social sciences with interests in biomedicine and technology Data scientists with strong analytical and statistical knowledge Bioinformaticians with knowledge on biological data resources Biostatisticians with interests in bias and data mining Researchers and practitioners in academic or industrial fields devoted to social equity

Qualitative analysis (policies & recommendations)

  • Nataly Buslón, subgroup spokesperson
  • Gemma Holliday
  • Atia Cortés

Quantitative analysis (dbGAP)

Useful links

People

  • Davide Cirillo, subgroup spokesperson
  • María Morales
  • Alejandro Muñoz
  • Camila Pontes
  • Olivier Philippe

Quantitative analysis (EGA)

Useful links

People

  • Aina Jené, subgroup spokesperson
  • Babita Singh
  • Mauricio Moldes
  • Victoria Ruiz
  • Diego Saby