Challenge Lab : ML in public health and genomics.
In this assignment, students need to predict "Gestational Age" of women based on the 7 multi-omics high-dimensional datasets as illustrated below. Train data consist of 14 women. Students will be using various machine learning or deep learning (regression) models to predict the Gestational Age of 3 women using multi-omics datasets. We will assess the performance of the students based on :-
- Novelty of the algorithm (50)
- MAE [Mean Absolute Error] (50).
Email id : rintu.kutum@igib.in
Contact number: 7838369344
- Expose students to challenging problems in Public Health.
- Allow students to team-up and solve these problems via Machine Learning and Deep Learning models.
To build ML/DL models to predict gestational age (GA) from temporal high-dimentional datasets (immunome, transcriptome, microbiome, proteome and metabolome).
We have formulated the challenge into 3 sub-challeneges as mentioned below :-
- Predict GA using Immunome, SerumLuminex, plasmaLuminex and plasmaSomalogic data.
- Predict GA using cell-free RNA, metabolome and microbiome data.
- Predict GA using all the datasets.
The details about gestational age (GA) along with train and test set is available in challenge-meta-information
File naming (format)
SC1 (Sub Challenge 01)
- Preterm birth
- Preterm birth (premature birth)
- Extremely Preterm Birth
- Setting research priorities to improve global newborn health and prevent stillbirths by 2025
- Preterm birth: Case definition & guidelines for data collection, analysis,and presentation of immunisation safety data
- Gestational Age
- Cell-free RNA transcriptome
Cell-free RNA (CfRNA) was extracted from 1 mL of plasma using Plasma/Serum Circulating RNA and Exosomal Purification kit (Norgen, cat 42800) following manufacturer's instructions. The residue of DNA was digested using Baseline-ZERO DNase (Epicentre) and then cleaned by RNA Clean and Concentrator-5kit (Zymo). RNA was eluted to 12 ul in elution buffer.One half of the eluted RNA was used for sequencing library preparation using SMARTer Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian (Clontech)according to the manufacturer’s manual. Short read sequencing was performed using the Illumina NextSeq (2×75 bp) platform to the depth of more than 10 million reads per samples. The sequencing reads were mapped to human reference genome (hg38) using STAR aligner. Duplicates were removed by Picard and then unique reads were quantified using htseq-count. - Proteome
Blood was collected into EDTA tubes, put on ice, centrifuged for 60 minutes, and plasma was stored at −80◦C for further processing. At first analysis was performed in the Human Immune Monitoring Center (HIMC) at Stanford University using a standard human 62-plex kit from eBiosciences/Affymetrix(San Diego, CA) according to the manufacturer’s recommendations. - Microbiome
Whole genomic DNA was extracted from each vaginal swab by means of the PowerSoil DNA isolation kit (MO BIO Laboratories) according to the manufacturer’s protocol. - Immunome
Entire blood samples were stimulated for 15 min with either LPS, IFNα, a cock-tail containing IL-2 and IL-6, or left unstimulated. - Untargeted Metabolome
Metabolites were extracted from plasma and analyzed using a broad coverage untargeted metabolomics platform as described previously.