Data processing and visualisation for WGS-PGT (whole genome sequencing - preimplantation genetic testing).
The WGS-PGT approach, for the first time, enables all forms of PGT on nuclear and mitochondrial DNA in a single-assay. Specifically, our innovative approach outperforms traditional and state-of-the-art PGT, including:
- PGT for genetic indications in complex genomic regions;
- direct detection of single- and few-base pair genetic variations
- novel form of PGT-A that uncovers segregational origin (meiotic vs. mitotic) of aneuploidies and their level of mosaicism, an approach we coin PGT-AO
- (in)direct detection of the translocation breakpoints and inheritance of normal and derivative chromosomes
- PGT for mitochondrial DNA (mtDNA) disorders
Note: Sequencing pre-processing steps are included here.
- Installed packages in Singularity: R version 3.3, R version 4.2.
- Installed packages in Docker: R version 4.2.
singularity exec easyR4.def Rscript {script_path} {config_file} {err_file}
These are R scripts to process whole genome sequencing data using Haplarithmisis for WGS-PGT. Here is a step-by-step implementation in notebook, we also provide a python pipeline for wrapping up these steps, see haplarithmisis_pipeline.py.
-
MetaInfo more details for configuration: config.txt, samplesheet.csv
-
NucBedPrep Note: NucBedPrep generates a file containing Chr, Position and Names (format: "chrX:Position") from the family vcf file that will be used for subsequent step PGT Wave Correction.
-
PGT Wave correction Note: the output from WaveCorrection is a .txt file containing GC content for each position with average of 10000 bp bins. The CG content file could be used for GC correction in downstream steps. In our analysis, we have not used this file for GC correction as QDNAseq package has integrated functions for GC correction.
- sampledir: will need to be filled in as the ConvertGenotype folder.
- refdir: reference dir from config file.
- output_file_name: will need to be filled in as the Family number.
- windowsize: will need to filled in as gtypemodulator_window from the PGT config file.
- Subsampling to desired target coverage
- Visualisation of lab step timings
- coverage metrics from qualimap
- Extract qualimap output
- Visualisation coverage metrics
- Visualisation of Mendelian inconsistency for validation and pilot per subsampled target coverage and validation per chromosome
- Visualisation of Haplotype concordance for pilot at subsampled target coverages and validation
- liftover coordinates from onePGT output
- informative SNP binning and chromosome heatmap visualisation - see PGT-SR folder for chromosome coordinate scripts.
- input: CSV file with family information (example attached for parents-only haplarithmisis)
- Haplarithm plotting with chromosome ideogram - see PGT-SR folder for ideogram coordinate scripts.
Embryo trophectoderm biopsy (and parental/reference) data was processed with the following steps:
- The data were processed as per the PGT-M processing up to and including haplarithmisis.
- Deep (30-40X) sequenced data was subsampled as per PGT-M and the (segmented) logRs were plotted
- breakpoint analysis using Manta
- Relevant breakpoint extraction
- Custom visualisation of haplarithms including breakpoint information & chromosome schematics + generation of chromosome fill / outline coordinates for normal and affected
- Visualisation of copy number variation from VeriSeq output.
Embryo trophectoderm biopsy WGS data was processed with the following steps:
- The data were processed as per the PGT-M processing up to and including the alignment step. (alignment was done to the Hg38 reference genome including the mitochondrial "chromosome")
- (Samples that were deep sequenced (30-40X) were subsampled with the aforementioned procedure.
- Mitochondrial DNA coverage calculation & visualisation
- Heteroplasmy level calculation
MITOMAP: https://www.mitomap.org/foswiki/bin/view/MITOMAP/ConfirmedMutations