Scripts associated with our study, "Genomic and transcriptomic characterization of relapsed small cell lung cancer through rapid research autopsy."
Software | URL | Version Used |
---|---|---|
FASTQC | http://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc | 0.11.9 |
HISAT2 | https://daehwankimlab.github.io/hisat2/download/ | 2.2.1 |
TrimGalore | https://github.com/FelixKrueger/TrimGalore/releases | 0.6.6 |
Cutadapt | https://cutadapt.readthedocs.io/en/stable/ | 3.1 |
StringTie | https://ccb.jhu.edu/software/stringtie/#install | 2.1.4 |
GffCompare | https://github.com/gpertea/gffcompare/releases/tag/v0.12.1 | 0.12.1 |
SAMtools | http://www.htslib.org/ | 1.9 |
bedtools | https://bedtools.readthedocs.io/en/latest/ | 2.29.2 |
Snakemake | https://snakemake.readthedocs.io/ | 5.31.1 |
All executables for the above (fastqc
, hisat2
, trim_galore
, cutadapt
, stringtie
, gffcompare
, samtools
, bedtools
) must be available in $PATH
, and the Python libraries associated with Cutadapt and Snakemake in $PYTHONPATH
.
This workflow requires paired-end fastqs per sample, named as follows:
StudyID-PatientNum-NT-SampleID-HybID_Instrument_LaneID_R12.fastq
where:
Field | Description |
---|---|
StudyID | String identifying a sequencing study |
PatientNum | Number or string unique to a patient |
NT | Either "N" for normal or "T" for tumor |
SampleID | Number that identifies a tumor sample |
HybID | Number that identifies a particular hybridization or sequencing run |
Instrument | Name of sequencing instrument |
LaneID | Lane number that sample was run on |
R12 | Either "R1" for read 1 or "R2" for read 2 from paired-end sequencing |
After cloning the repository, edit the workdir
in Snakefile
to point to your working directory. download_resources.sh
will automatically download reference files for HISAT2 and StringTie. Note that this workflow aligns to hg19 and uses corresponding hg19 reference files; it can be easily adapted to other genome assemblies.
The included BED file (resources/reference_data/filter_bam/IDT_xGen_Exome_Research_CNV-Core_merged_sorted_grch37.bed
) was derived from the IDT xGen Exome v1 capture region (https://www.idtdna.com/pages/products/next-generation-sequencing/targeted-sequencing/hybridization-capture/predesigned-panels/xgen-exome-research-panel-v2) by removing the "chr" prefix from chromosomes, and merging overlapping regions and sorting with bedtools. Aligned reads from all samples used in our study were filtered with this BED file as we utilized an exome capture in our RNAseq prep, and other studies did not.
This workflow was run at the Ohio Supercomputer Center (https://www.osc.edu/), which utilizes the Slurm batch scheduler (https://slurm.schedmd.com/). We installed the Snakemake slurm profile (https://github.com/Snakemake-Profiles/slurm) using cookiecutter (https://github.com/cookiecutter/cookiecutter). The workflow was then started with Snakemake:
snakemake --profile slurm --jobs 20