WGSPGT

Data processing and visualisation for WGS-PGT (whole genome sequencing - preimplantation genetic testing).

The WGS-PGT approach, for the first time, enables all forms of PGT on nuclear and mitochondrial DNA in a single-assay. Specifically, our innovative approach outperforms traditional and state-of-the-art PGT, including:

PGT for genetic indications in complex genomic regions;
direct detection of single- and few-base pair genetic variations
novel form of PGT-A that uncovers segregational origin (meiotic vs. mitotic) of aneuploidies and their level of mosaicism, an approach we coin PGT-AO
(in)direct detection of the translocation breakpoints and inheritance of normal and derivative chromosomes
PGT for mitochondrial DNA (mtDNA) disorders

Note: Sequencing pre-processing steps are included here.

Haplarithmisis

Installation

Installed packages in Singularity: R version 3.3, R version 4.2.
Installed packages in Docker: R version 4.2.

Usage

singularity exec easyR4.def Rscript {script_path} {config_file} {err_file}

Scripts

These are R scripts to process whole genome sequencing data using Haplarithmisis for WGS-PGT. Here is a step-by-step implementation in notebook, we also provide a python pipeline for wrapping up these steps, see haplarithmisis_pipeline.py.

MetaInfo more details for configuration: config.txt, samplesheet.csv
ConvertGenotype
QDNASeq

EmbryoTest: when embryo sequencing information is present (continue with step4 NucBedPrep)

NucBedPrep Note: NucBedPrep generates a file containing Chr, Position and Names (format: "chrX:Position") from the family vcf file that will be used for subsequent step PGT Wave Correction.
PGT Wave correction Note: the output from WaveCorrection is a .txt file containing GC content for each position with average of 10000 bp bins. The CG content file could be used for GC correction in downstream steps. In our analysis, we have not used this file for GC correction as QDNAseq package has integrated functions for GC correction.
- sampledir: will need to be filled in as the ConvertGenotype folder.
- refdir: reference dir from config file.
- output_file_name: will need to be filled in as the Family number.
- windowsize: will need to filled in as gtypemodulator_window from the PGT config file.
Haplarithmisis
EmbryoTestReportData
EmbryoTestReportPlot

PreTest: if no embryo sequencing information is present (continue with step9 PreTestReportData)

Data processing & QC

Subsampling to desired target coverage
Visualisation of lab step timings
coverage metrics from qualimap
Extract qualimap output
Visualisation coverage metrics
Visualisation of Mendelian inconsistency for validation and pilot per subsampled target coverage and validation per chromosome
Visualisation of Haplotype concordance for pilot at subsampled target coverages and validation
liftover coordinates from onePGT output
informative SNP binning and chromosome heatmap visualisation - see PGT-SR folder for chromosome coordinate scripts.

PGT-M (PGT for monogenic disorders)

PGT-AO (PGT for aneuploidy origins)

input: CSV file with family information (example attached for parents-only haplarithmisis)
Haplarithm plotting with chromosome ideogram - see PGT-SR folder for ideogram coordinate scripts.

PGT-SR (PGT for structural rearrangements)

Embryo trophectoderm biopsy (and parental/reference) data was processed with the following steps:

The data were processed as per the PGT-M processing up to and including haplarithmisis.
Deep (30-40X) sequenced data was subsampled as per PGT-M and the (segmented) logRs were plotted
breakpoint analysis using Manta
Relevant breakpoint extraction
Custom visualisation of haplarithms including breakpoint information & chromosome schematics + generation of chromosome fill / outline coordinates for normal and affected
Visualisation of copy number variation from VeriSeq output.

PGT-MT (PGT for mitochondrial disorders)

Embryo trophectoderm biopsy WGS data was processed with the following steps:

The data were processed as per the PGT-M processing up to and including the alignment step. (alignment was done to the Hg38 reference genome including the mitochondrial "chromosome")
(Samples that were deep sequenced (30-40X) were subsampled with the aforementioned procedure.
Mitochondrial DNA coverage calculation & visualisation
Heteroplasmy level calculation
- Mitochondrial variant calling with GATK
- Heteroplasmy level calculation

References

MITOMAP: https://www.mitomap.org/foswiki/bin/view/MITOMAP/ConfirmedMutations

CellularGenomicMedicine/WGSPGT