Giulio Caravagna ( 3/5/2021
MSc program in Data Science and Scientific Computing. University of Trieste, Italy
- 3CFU - 24h, 12 lecture, 2 hours each. 50% theoretical lecture, 50% practical session (40’ each).
- GitHub:
- Dr Giulio Caravagna, Cancer Data Science Laboratory.
Invited guest lecturers
- Dr Riccardo Bergamin, University of Trieste
- Dr Alex Graudenzi, CNR.
- Dr Salvatore Milite, University of Trieste
- Dr Daniele Ramazzotti, University of Milan-Bicocca.
Lecturer | Title | When |
Caravagna | Variant calling from bulk sequencing | 9/4 |
Caravagna | Measuring aneuploidy from bulk sequencing | 12/4 |
Caravagna | Integrated quality control of somatic calls | 16/4 |
Bergamin | Population genetics for cancer | 19/4 |
Caravagna | Tumour subclonal deconvolution | 21/4 |
Ramazzotti | Somatic mutational signatures | 23/4 |
Bergamin, Milite | Basics of Single-cell RNA analysis | 30/4 |
Graudenzi | Longitudinal evolution from single cell | 3/5 |
Milite | Count-based models for single-cell data | 5/5 |
Caravagna | Evolutionary based stratifications | 12/5 |
Caravagna | Population-level models | 14/5 |
Course presentation
- Cancer Evolution,
- Modern Genomics,
- Single-cell.
Research at the CDSLab
Lecture: Variant calling from bulk sequencing
(Theory) Mutation calling:
- Tumour matched-normal design,
- High-level design of GATK
- Joint calling model
(Practice) Example VCF and PCAWG:
- VCF manipulation
- 27 PCAWG cases (mutation types, burden, etc.)
Lecture: Measuring aneuploidy from bulk sequencing
(Theory) Aneuploidy and Copy Number calling:
- Motivation
- ASCAT model
- Segmentation
(Practice) Example runs with different tools:
- Sequenza (inspection of alternative solutions)
- Circular Binary Segmentation
- Cohort-level distribution of CNAs per chromosome (length, percentage, copy state).
Lecture: Integrated quality control of somatic calls
(Theory) Validating mutations, copy number and tumour purity
- Cancer Cell Fractions
- CNAqc
- Tumour in Normal contamination (ideas)
(Practice) Quality-control of Whole Genome Sequencing data:
- Skim through PCAWG data
- Metadata
- Project codes:
- Pocessing samples with CNAqc
- Househam, Jacob, William CH Cross, and Giulio Caravagna. “A fully automated approach for quality control of cancer mutations in the era of high-resolution whole genome sequencing.” bioRxiv (2021).
- Cmero, Marek, et al. “Inferring structural variant cancer cell fraction.” Nature Communications 11.1 (2020): 1-15.
- Yuan, Ke, et al. “Ccube: a fast and robust method for estimating cancer cell fractions.” bioRxiv (2018): 484402.
Lecture (R Bergamin): Population genetics models of growth
(Theory) Branching processes and other models
- Cancer Evolution as Stochastic Branching Process
- Markov System and Master equation
- Some Examples: Moran Model, Wright-Fisher Model, Coalescence
- Birth-Death Process
- Luria-Delbruck Model
- Theory of 1/f tail
- Quantify Cancer Evolution from VAF Spectrum
- Spatial Tumor Growth
(Practice) Tumour growth simulation:
- Simulations of a Branching process and VAF spectrum
- Example tumours from CHESS
Lecture: Tumour subclonal deconvolution
(Theory) Subclonal deconvolution
- Tail modelling versus subclones
- Read counts analysis
- Multi-sample deconvolution
(Practice) Deconvolution in practice
- MOBSTER runs with WGS data
Lecture (D Ramazzotti): Mutational signatures in human cancers
Theory:
- Concepts behind mutational signatures
- De novo inference of mutational signatures
- Solving with non-negative matrix factorization (NMF)
- Mutational signature extraction from pan-cancer data
Practice (install required packages before the lecture):
- Examples and best practice on real data
- Analysis of breast cancer data
Lecture (R Bergamin, S Milite): Basics of Single-cell RNA analysis
Theory
- Introduction to 10x single cell RNA sequencing
- Problems and opportunities
- Data format explanation
- Data QC
- Batch Effects removal
- Dimensionality Reduction
- Clustering and cell type assignments
- Signature enrichment
- Differential expression (DE)
Lecture (A Graudenzi, F Angaroni and D Maspero): Longitudinal evolution from single cell
Theory: Inference of phylogenies from single cell data
- Perfect phylogenies from categorical data: the Gusfield algorithm,
- Translating the perfect phylogeny problem as non-negative factorization (NMF)
- Technical noise (sequencing errors) and biological variability: the need for probabilistic models of clonal evolution.
- The likelihood function and the probabilistic graphical model of SCITE
- Estimation of the error rate
- Structure learning via MCMCd
- Extension: longitudinal models (LACE)
- Extension: modeling mutation losses (SIFIT)
- Extension: including population dynamics (SICLONEFIT)
Practice (Data to download
- Application of LACE to real data
Lecture (S Milite): Count-based models for single-cell data
Theory
- Generative modelling as an alternative to pipelines
- Poisson and Negative binomial distributions
- Count based modelling, RNA-seq vs scRNA-seq
- Count models for normalisation (scTransform)
- Scaling NB models with variational autoencoders (scVI)
- CONGAS (genotype CNV from scRNA-seq)
- Elements of Gradient based variational inference
- Discrete Latent Variable modelling
- Example run of CONGAS on breast cancer 10x dataset
Lecture: Evolutionary based stratifications
(Theory) Detecting repeated evolution from multi-region bulk sequencing
- Clone-trees and tree expansion
- Expectation Maximisation for latent model discovery
- Evolutionary distance and cluster
(Practice) Inference in practice
- Colorectal adenomas with REVOLVER
- TRACERx Lung Adencarcinomas with REVOLVER
Lecture: Population-level models
(Theory) Bayesian Networks models
- Conjunctive Bayesian Networks
- Suppes’ probabilistic causation
(Practice) Inference in practice
- Analysis of CODREAD with PICNIC
- Analysis of other cbio data
