Group 4
Team leader: Yu-yuan Yang (楊淯元)
Members: (方柏翰), (鍾國洲), (李祖福), (陳延安), (曾宇璐)
Advisor: Chun-chi Lai (賴俊吉)
The automated WGS reporting system (for cancer) is based on GATK-workflow with little modification. The modified workflow was designed by our team. In order to create report automatically, our team modified codes from MuSiCa github to a new Rscript file.
(ngscourse node) QUICK tutorial (mode2) - testing with files in "NHRI_group4/test_files" folder
Citation: Díaz-Gay et al., BMC Bioinformatics (2018)
Copied from marcos-diazg/musica gituhub:
MuSiCa (Mutational Signatures in Cancer) is a shiny-based web application aimed to visualize the somatic mutational profile of a series of provided samples (different formats are allowed) and to extract the contribution of the reported mutational signatures (Alexandrov L.B. et al., Nature (2013), Catalogue Of Somatic Mutations In Cancer, COSMIC (2020)) on their variation profile. It is mainly based on the MutationalPatterns R package (Blokzijl et al., Genome Medicine (2018)).
Scripts are created by 方柏翰, 鍾國洲, 楊淯元
README are created by 方柏翰, 楊淯元
DEMO - mode 1
Our testing platform is NCHC-Taiwania 1. The version of reference genome is hg38.
Most required tools are pre-installed by NCHC. Please feel free to run our reporting system.
If you want to excute on your own laptop (Linux/Unix-based), the following tools are required. Please install all of them.
- bwa mem
- fastp
- sambamba
- manta
- strelka
- vep
- job-query-system
- musica environment
Please follow steps from MuSiCa github "Local version installation"
With NCHC Taiwania-1 system, most of packages are well installed.
(1) FILE preparation: The files were collected by NGS machine such as NovaSeq, etc. They would be converted to fastq files from your company. Then, you need to put files correctly in your working directory or a known path.
For example:
- Files:
>Normal fastq (read1/read2):normal.read1.fastq
normal.read2.fastq
>Tumor fastq (read1/read2):tumor.read1.fastq
tumor.read2.fastq
- Files' path:
/work1/XXX123456/TXCRB/case001
(2) Install program: Please download our script program, change directory to NHRI_group4/autoscript2
folder, and make all scripts excutable.
git clone https://github.com/yuyuan871111/NHRI_group4.git
cd NHRI_group4/autoscript2
chmod -R 700 *
Our tool is an automated processing tool in order to convert your data from fastq to vcf with gatk tools. At the same time, we will do multiple works, like alignment, sort, mark duplicates, indexing, variant calling and annotation.
Check your files are stored in the path:
/work1/XXX123456/TXCRB/case001
Check 4 fastq files in the path are required: (forward read/backward read)
normal.read1.fastq
normal.read2.fastq
tumor.read1.fastq
tumor.read2.fastq
Note that extension of files should be ".read1.fastq.gz" ".read2.fastq.gz" or".read1.fastq" ".read2.fastq"
Name of files in read1 and read2 should be same.
Execute program:
Type the following command in terminal.
The program structures:
./script.sh (data_folder_path) (tumor_data_name) (normal_data_name)
./script.sh /work1/XXX123456/TXCRB/case001 tumor normal
The program would ask for some input parameters.
- WGS or WES: (WGS/WES)
WGS: whole genome sequencing
WES: whole exome sequencing
The automated WGS(WES) reporting system for cancer take some hours. Please keep patient and wait for results.
Check whether your jobs are completed with the following command.
qstat ngscourse
or
qstat -u (username)
- The script would create a folder named "dealed" and all processed data would be stored in "dealed".
Data included: tumor/normal bam file, vcf file (w/ and w/o annotation), MuSiCa results -plots&tables
- The MuSiCa report are compressed to "NHRI_report_html.zip" file. Plot and tables are stored in the folder "musica_result" in your working directory.
- In order to view the report normally, please download the file
NHRI_report_html.zip
from NCHC-Taiwania 1. After decompressing the zip file, please open "index.html" with full screen browser to view the report. - If there is any problem about viewing the report, please refresh your browser first.
- Any other questions, please contact us.
If you want to do it again, it is recommanded to remove old files. Please follow the steps below. (Note that you are in autoscript2
directory)
rm *.std *.err NHRI_report_html.zip
With NCHC Taiwania-1 system, most of packages are well installed.
(1) FILE preparation: You need to put files correctly in your working directory or a known path.
The files are collected by NGS machine such as NovaSeq, etc. Then, it have been converted to vcf file with DRAGEN pipeline or GATK pipeline.
For example:Cancer-case1.vcf
Cancer-case2.vcf
Cancer-case3.vcf
Cancer-case4.vcf
... etc.
In the path:/work1/XXX123456/TXCRB/case001
(2) Install program: Please download our script program, change directory to NHRI_group4/autoscript2
folder, and make all scripts excutable.
git clone https://github.com/yuyuan871111/NHRI_group4.git
cd NHRI_group4/autoscript2
chmod -R 700 *
Our tool is an automated processing tool in order to create MuSiCa report in one html file.
Please check your files have stored in your working directory again. As the example mentioned,
File path: /work1/XXX123456/TXCRB/case001
The vcf files:
Cancer-case1.vcf
Cancer-case2.vcf
Cancer-case3.vcf
Cancer-case4.vcf
... etc.
Then, type the following command in terminal to exctute programs.
./script.sh musica
or
./script.sh MuSiCa
The program would ask for some input parameters.
- WGS or WES: (WGS/WES)
WGS: whole genome sequencing
WES: whole exome sequencing - Reference human genome version: (19/37/hg38)
19: UCSC GRCh37/hg19
37: 1000genomes hs37d5
hg38: UCSC GRCh38/hg38 - Data path:
input/work1/XXX123456/TXCRB/case001
- Data file: you can tell the program which data you want to compare with.
- simple file:
inputCancer-case1.vcf
- multiple files: joining with ':'
inputCancer-case1.vcf:Cancer-case2.vcf:Cancer-case3.vcf:Cancer-case4.vcf
- simple file:
When you see "performing musica", all program go well. You can wait for results for about 30 mins (depend on your files).
Check whether your jobs are completed with the following command.
qstat ngscourse
or
qstat -u (username)
- The MuSiCa report are compressed to "NHRI_report_html.zip" file. Plot and tables are stored in the folder "musica_result" in your working directory.
- In order to view the report normally, please download the file
NHRI_report_html.zip
from NCHC-Taiwania 1. After decompressing the zip file, please open "index.html" with full screen browser to view the report. - If there is any problem about viewing the report, please refresh your browser first.
- Any other questions, please contact us.
If you want to do it again, it is recommanded to remove old files. Please follow the steps below. (Note that you are in autoscript2
directory)
rm *.std *.err NHRI_report_html.zip
git clone https://github.com/yuyuan871111/NHRI_group4.git
cd NHRI_group4/autoscript2
chmod -R 700 *
./script.sh musica
TYPING:
Is your file whole genome sequence or whole exnome sequence? WGS/WES: WGS
which reftype? 19/37/hg38:hg38
Please input the path of the folder storing your data: (Absolute path)/NHRI_group4/test_files
Please input the names of your cases: TLCRC_020.hard-filtered.vcf:TLCRC_043.hard-filtered.vcf:TLCRC_047.hard-filtered.vcf
Some R packages are needed. Please excute following codes for setting.
- Note that only curtain version of R on Taiwania-1 could run well.
PATH on Taiwania-1:
/pkg/biology/R/R_default/bin/R
(version: 3.5.2)
Rscript Env_setting.R
About other questions, please contact us.