/vnowchi

A Copy Number Variant Calling Pipeline

Primary LanguageShell

VNOWCHI

Variable Non-Overlapping Window CBS and HMM Intersect (VNOWCHI): A Copy Number Variant Calling Pipeline


R: v3.5.0 DOI:10.1101/241851

VNOWCHI is a copy number variant calling pipeline that utilizes the Circular Binary Segmentaion and Hidden Markov Model algorithms to determine the ploidy and sex status of embryos.

Detailed information regarding VNOWCHI and the CBS and HMM algorithms can be found on the Wiki.

Table of Contents

Getting Started

The scripts provided are tailored for use on OHSU's exacloud server using SLURM. For help with SLURM, please refer to ACC's tutorials.

To use scripts without SLURM, comment srun and sbatch commands and uncomment the commands beneath.

Prerequisites

  • Linux environment
  • Java 8 (How to install Java) for Trimmomatic
  • The following software tools installed:
    • FastQC 0.10.1
    • Trimmomatic 0.35
    • FASTX-Toolkit 0.0.13
    • BWA-MEM 0.7.9a-r786
    • SAMtools 0.1.19-44428cd
    • BEDtools 2.25.0
    • FastUniq 1.1
  • R version 3.5.0 with the following R packages installed:
    • GenomicAlignments
    • DNAcopy
    • HMMcopy
    • IRanges
    • GenomicRanges
    • dplyr
    • ggplot2

Installation

Installation instructions can be found on the Wiki Installation page.

How to Use

Detailed "How to Use" instructions are located on the Wiki How to Use page.

1. Modify scripts with your directory information.

2. Copy fibroblast FASTQs and sample FASTQs to the following locations:

Fibroblast samples (5 scDNA-seq samples preferred):
/your/working/dir/CopyNumberPipeline/results/FIBROBLASTS/FASTQ

Single-ended samples:
/your/working/dir/CopyNumberPipeline/results/fastq/SE

Paired-ended samples:
/your/working/dir/CopyNumberPipeline/results/fastq/PE

3. Generate bins for the pipeline using FIBROBLAST data.

sbatch PIPELINE_bins.sh

4. Run pipeline on actual data.

sbatch PIPELINE_VNOWCHI.sh

5. Look at results:

  • CNV plots for all samples, includes all and individual chromosomes
  • Mapping summary statistics for VNOWCHI_summary.txt
  • Tabluar summary for all samples classified by ploidy and sex status
  • Tabular summary for all embryos classified by ploidy and sex status based on samples

Results:

  • Mapping summary statistics VNOWCHI_Summary.txt
  • Tabular summary of all individual samples CNV calls with embryo, blastomere, ploidy and sex classifications
    • CNV_<SE|PE>_<bin>.sampleSummary.txt
  • Tabular summary of all embryos classified by ploidy and sex status
    • CNV_<SE|PE>_<bin>.embryoSummary.txt
  • CNV plots for all samples by chromosome or by all chromosomes
    • <sampleName>_<chromosome>.png
    • <sampleName>_<all>.png

Additional Notes

  • Please note that R package dply will behave differently than intended if R package plyr is loaded. More info regarding the issue can be found here and here. Here's a possible solution from Stack Overflow if you get any errors.
  • Might need to modify step 3 in PIPELINE_bins.sh and PIPELINE_VNOWCHI.sh to ensure script will accept the provided fastq file name format pattern
  • If trying to use Rscripts in Rstudio, some scripts have issues. Ex. get_copy_number.R has no problems on server but does not work in RStudio, could be related to R version.

Authors

  • Melissa Yan - extended the VNOWC pipeline to include CHI, classify samples/embryos, accommodate different genomes, and run on SLURM
  • Nathan Lazar - original author of Variable Non-Overlapping Window CBS (VNOWC)
  • Kristof Torkency - original author of CBS/HMM Intersect (CHI) pipeline

Acknowledgments

This project would not be possible without the support from the Chavez Lab, Carbone Lab, Adey Lab, and the Biostatistics & Bioinformatics Core:

Citation

Daughtry, B. L., Rosenkrantz, J. L., Lazar, N. H., Fei, S. S., Redmayne, N., Torkenczy, K. A., Adey, A., Gao, L., Park, B., Nevonen, K.A., Carbone, L., Chavez, S. L. (2019). Single-cell sequencing of primate preimplantation embryos reveals chromosome elimination via cellular fragmentation and blastomere exclusion. Genome Research, 29(3), 367-382. doi:10.1101/gr.239830.118