comprehensive genome-wide visualization of absolute copy number and copy neutral variations
Contact: Victor Renault / Alexandre How-Kit (aCNViewer@cephb.fr)
aCNViewer (Absolute CNV Viewer) is a tool which allows the visualization of absolute CNVs and cn-LOH across a group of cancer samples. aCNViewer proposes three graphical representations : dendrograms, bidimensional heatmaps allowing the visualization of chromosomal regions sharing similar abnormality patterns and quantitative stacked histograms facilitating the identification of recurrent absolute CNVs and cn-LOH. aCNViewer include a complete pipeline allowing the processing of raw data from SNP array (in tumor-only or paired tumor / normal mode) and whole exome/genome sequencing experiments (in paired tumor / normal mode only) using respectively ASCAT and Sequenza algorithms to generate absolute CNV and cn-LOH data used for the graphical outputs.
- Installation
- Overview
- Tutorial:
- Glossary
- Processing SNP array data:
- Affymetrix
- Test using Ascat results (start here if you want an overview of all the plotting options and different usage scenarios):
- Test using Affymetrix Cel files
- Illumina
- Affymetrix
- Processing NGS data
- Processing CNV file
- Output files
- Limitations
- Citation
The easiest way to install aCNViewer is to install the Docker application (supports multi-threading but not computer clusters which are better suited for processing NGS bams):
docker pull fjdceph/acnviewer
aCNViewer docker image requires about 20GB of space to install so if you run into an error while pulling the image locally, you probably need to change the location of docker images from /var/lib/docker/ to a location with more space and try again.
aCNViewer can also be installed from its source by:
- downloading aCNViewer's data (includes test data sets and most of the third-party softwares listed in the dependencies section)
- installing the dependencies listed below.
- downloading the github source code from this page:
git clone https://github.com/FJD-CEPH/aCNViewer
Once aCNViewer is installed, you can run unit tests in order to check that everything is fine.
Most of the dependencies (except R and python), along with test data sets, are packaged in the archive aCNViewer_DATA.tar.gz in aCNViewer_DATA/bin
. You can find more details below:
-
APT (Affymetrix Power Tools) if you plan to process raw Affymetrix SNP arrays (to uncompress into
BIN_DIR
) -
a recent version of R (version ≥ 3.2) with ggplot2 installed for generating the different graphical outputs:
- ASCAT (will be automatically installed if not already installed) if you are analyzing raw SNP array data
- Sequenza (will be automatically installed if not already installed) if you are analyzing paired (tumor / normal) bams
- plotrix for plotting dendrograms (will be automatically installed if not already installed)
- gplots
- RColorBrewer
-
samtools if you are analyzing paired (tumor / normal) bams. As Sequenza does not support newer mpileup file formats produced by more recent versions of samtools, use a version prior to Sequenza release date (2015-10-10): samtools version 0.1.19 for example is compatible.
-
tQN if you plan to process raw Illumina SNP arrays (to uncompress into
BIN_DIR
) and run tQN normalisation. If the cluster file for the Illumina SNP array you plan to analyze is not in the tQN lib folder, you can download additional cluster files from here -
GISTIC if you want to have an advanced statistical way to prioritize regions of interest. Create a folder named
GISTIC_VERSION
inBIN_DIR
and uncompress the GISTIC archive into it. Follow the instructions listed inINSTALL.txt
at the root of the GISTIC folder in order to install MATLAB Component Runtime required by GISTIC and set the associated environment variables (LD_LIBRARY_PATH
andXAPPLRESDIR
). -
Python with version ≥ 2.7
The results of all the examples below can be found in aCNViewer_DATA/allTests
in their respective target folder. All examples of this tutorial are implemented as unit tests and can be run at once using: DOCKER_OR_PYTHON
-P testAll -t TARGET_DIR [--fastTest 0 --smallMem 0 --runGISTIC 0]
.
If --fastTest
is set to 1
, only tests which run in a reasonable amount of time will be run (all tests except Illumina SNP array, paired bams with Sequenza, GISTIC and Affymetrix SNP arrays from CEL files). If --runGistic
is 1
, GISTIC will be tested and if --smallMem
is set 1
, GISTIC will run in small memory mode and will only require about 10GB of RAM vs 50GB of RAM at the expense of a longer running time.
Let's call:
-
aCNViewer_DATA
the location where the test data set aCNViewer_DATA.tar.gz has been uncompressed into -
BIN_DIR
the folder containing all third-party softwares located inaCNViewer_DATA/bin
. Here is the list of files and folders that should be inBIN_DIR
:- apt-*: Affymetrix Power Tools binaries
- ascat: contains ASCAT file for GC correction (this folder is automatically created by aCNViewer and GC files are automatically downloaded)
- GC_Affy250k.txt
- GC_AffySNP6_102015.txt
- gc_done
- GC_Illumina660k.txt
- GC_IlluminaOmniexpress.txt
- GISTIC*: installation folder of GISTIC
- PennCNV: contains PennCNV-Affy protocols and helper scripts (will be automatically created by aCNViewer from gw6.tar.gz)
- samtools*: installation folder of samtools
- tQN*: installation folder of tQN
- ../genomes: folder located in the parent folder of
BIN_DIR
with one folder per genomic build. Each genomic build folderBUILD
should contain at least:- one file named
BUILD.centro.txt
with centromere positions for each chromosome of the genomic build (can be generated usingcurl -s "http://hgdownload.cse.ucsc\ .edu/goldenPath/BUILD/database/cytoBand.txt.gz" | gunzip -c | grep acen > BUILD.centro.txt
) - a tab-delimited file named
BUILD.chrom.sizes
with 2 columns respectively chromosome name and chromosome length (can be downloaded from UCSC Genome browser) - optionnaly, a reference fasta file (with one of the extension
.fa
,.fa.gz
,.fasta
or.fasta.gz
) if you plan to use Sequenza
- one file named
-
DOCKER_OR_PYTHON
refers to the fact thatdocker run fjdceph/acnviewer
orpython aCNViewer/code/aCNViewer.py
can be used as a prefix to run aCNViewer depending on the chosen installation method.
Download the test data set aCNViewer_DATA.tar.gz (~5GB and ~20GB uncompressed). In terms of computing resources: if you plan to:
- run Sequenza on paired bam files, an access to a computer cluster is highly recommended as even though aCNViewer will be able to process your data in multi-threading mode, it may take quite a long time depending on the number of sample pairs to analyze
- run GISTIC in order to have a robust statistical way to prioritize recurrent regions of interest, a machine with at least 50GB of RAM is necessary with
--smallMem 0
and 10GB with--smallMem 1
(this option will make GISTIC run substantially longer)
Generate all available plots from ASCAT segment files using base resolution for the quantitative histograms and using a window size of 2Mbp for the other plots:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt
Here are other typical plots you may be interested in:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_RCOLOR --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt --rColorFile aCNViewer_DATA/rColor.txt
Quantitative histogram with GISTIC results:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_GISTIC --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --runGISTIC 1
If you have trouble running this example (in particular if your machine freezes or you get the message "Killed" in the "_gistic.txt.err" file), it may be due to a lack of resources in the machine you are using. In that case, please add the following option to the command above --smallMem 1
so that GISTIC runs in compressed memory mode. You can view the GISTIC results with significant broad events and significant focal events.
Heatmap of relative copy number values only for the clinical feature BCLC stage
with the chromosome legend position set at 0,.55
i.e. at the left-most of the graph and at 55% on the y axis and the group legend position set at .9,1.05
(basically at the top right corner):
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_HEATMAP1 --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt --plotAll 0 --heatmap 1 --dendrogram 0 -G "BCLC stage" --chrLegendPos 0,.55 --groupLegendPos .9,1.05 --useRelativeCopyNbForClustering 1
Heatmap with regions ordered by genomic positions (only clustering on samples):
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_HEATMAP_GENPOS --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt --plotAll 0 --heatmap 1 --dendrogram 0 -G "BCLC stage" --chrLegendPos 0,.55 --groupLegendPos .9,1.05 --useRelativeCopyNbForClustering 1 --keepGenomicPosForHistogram 1
Heatmap with copy number values:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_HEATMAP2 --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt --plotAll 0 --heatmap 1 --dendrogram 0 -G "BCLC stage" --chrLegendPos 0,.55 --groupLegendPos .9,1.05
Dendrogram with copy number values:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_DENDRO --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt --plotAll 0 --heatmap 0 --dendrogram 1 -G "BCLC stage" -u 1
-
all outputs set to pdf:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_PDF --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt --outputFormat pdf
-
all output set to jpg:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_PDF --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt --outputFormat jpg
-
heatmaps set to
bmp
, histograms totiff
and dendrograms topdf
with the R plot parameterswidth=10,height=8
:-f aCNViewer_DATA/snpArrays250k_sty/GSE9845_lrr_baf.segments.txt -t TEST_AFFY_OTHER_OUT --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --sampleFile aCNViewer_DATA/snpArrays250k_sty/GSE9845_clinical_info2.txt --outputFormat "heat:bmp;hist:tiff;dend:pdf(width=10,height=8)"
==Here is the full command:==
DOCKER_OR_PYTHON
-f ASCAT_SEGMENT_FILE --refBuild REF_BUILD -b
BIN_DIR
[--histogram HISTOGRAM --lohToPlot LOH_TO_PLOT --useFullResolutionForHist USE_FULL_RESOLUTION_FOR_HIST] [-c CHR_SIZE_FILE -t OUTPUT_DIR -C CENTROMERE_FILE -w WINDOW_SIZE --sampleFile SAMPLE_FILE -G PHENOTYPIC_COLUMN_NAME --rColorFile RCOLOR_FILE --plotAll PLOT_ALL --outputFormat OUTPUT_FORMAT --ploidyFile PLOIDY_FILE --sampleToProcessList SAMPLE_TO_PROCESS_LIST --sampleToExcludeList SAMPLE_TO_EXCLUDE_LIST --sampleAliasFile SAMPLE_ALIAS_FILE] [--heatmap HEATMAP --labRow LAB_ROW --labCol LAB_COL --cexCol CEX_COL --cexRow CEX_ROW --height HEIGHT --width WIDTH --margins MARGINS --hclust HCLUST --groupLegendPos GROUP_LEGEND_POS --chrLegendPos CHR_LEGEND_POS --useRelativeCopyNbForClustering USE_RELATIVE_COPY_NB_FOR_CLUSTERING --keepGenomicPosForHistogram KEEP_GENOMIC_POS] [--dendrogram DENDROGRAM --useShape USE_SHAPE] [--runGISTIC RUN_GISTIC --geneGistic GENE_GISTIC --smallMem SMALL_MEM --broad BROAD --brLen BR_LEN --conf CONF --armPeel ARM_PEEL --saveGene SAVE_GENE --gcm GCM]
where:
ASCAT_SEGMENT_FILE
: ASCAT segment file (ascat.output$segments
obtained by runningascat.runAscat
) with the following columns:sample
chr
startpos
endpos
nMajor
nMinor
REF_BUILD
: the genome build used to generate the CNV segments (hg18
andhg19
are currently supported. If you want to add another buildBUILD
, please add a folder inBUILD
inaCNViewer_DATA/genomes
containing at least a tab-delimited file namedBUILD.chrom.sizes
with each chromosome name and length and a tab-delimited file namedBUILD.centro.txt
with the centromere positions by chr [this file can be generated usingcurl -s "http://hgdownload.cse.ucsc.edu/goldenPath/BUILD/database/cytoBand.txt.gz" | gunzip -c | grep acen > centro_build.txt
])
The following options are general plotting options:
CHR_SIZE_FILE
: a tab-delimited file with 2 columns respectively chromosome name and chromosome length. WhenREF_BUILD
is set,CHR_SIZE_FILE
is automatically set toaCNViewer_DATA/genomes/REF_BUILD.chrom.sizes
.CENTROMERE_FILE
: file giving the centromere bounds. Can be generated usingcurl -s "http://hgdownload.cse.ucsc.edu/goldenPath/BUILD/database/cytoBand.txt.gz" | gunzip -c | grep acen > centro_build.txt
. WhenREF_BUILD
is set,CENTROMERE_FILE
is automatically set toaCNViewer_DATA/genomes/REF_BUILD.centro.txt
.WINDOW_SIZE
: segment size in bp. Please note that alternatively,-p PERCENTAGE
can be used instead of-w WINDOW_SIZE
in order to set the segment size in percentage of chromosome length wherePERCENTAGE
is a floating number between 0 and 100. IfWINDOW_SIZE
andPERCENTAGE
are null thenWINDOW_SIZE
is set to 2Mb by default.SAMPLE_FILE
: a tab-delimited file that should contain a column namedSample
with the name of each sample and at least another column with the phenotypic / clinical feature. This file can contain asampleAlias
which will be used as the official sample id if provided.PHENOTYPIC_COLUMN_NAME
is optional and refers to the name of the column of the phenotypic / clinical feature of interest inSAMPLE_FILE
. If you omit this parameter, one plot per feature defined inSAMPLE_FILE
will be generated.-
RCOLOR_FILE
: file allowing to customize graph colors: colors for histograms can be overriden using a section named "[histogram]" which should contain exactly 10 colors [one per line] corresponding to CNV values in the following order: "≤ -4", "-3", "-2", "-1", "1", "2", "3", "4", "5", "≥ 6"). Histogram colors for heterozygous / homozygous CNVs can be changed using the section "[heteroHomo]" which should contain 6 colors corresponding to the values "-Hom", "-Het", "=Hom", "=Het", "+Hom", "+Het". Colors for dendrograms can be redefined using the section "[group]" which should contain at least the same number of colors than the number of distinct values for the phenotypic / clinical feature of interest. Colors for heatmaps are customizable using the section "[chr]" and should contain 22 colors corresponding to chromosomes 1 to 22], the section "[group]" (the same as previously seen for dendrograms) and the section "[heatmap]" which should contain 10 colors (one per line) corresponding to CNV values in the following order: "0", "1", "2", "3", "4", "5", "6", "7", "8", "≥ 9". An example can be found here. PLOT_ALL
: specify whether all available plots should be generated. The default value is1
.OUTPUT_FORMAT
: allow to customize output formats for the different types of available plots (histograms, heatmaps and dendrograms). Examples of use can be found above. The default value ishist:png(width=4000,height=1800,res=300);hetHom:png(width=4000,height=1800,res=300);dend:png(width=4000,height=2200,res=300);heat:pdf(width=10,height=12)
.PLOIDY_FILE
: custom ploidy values for each sample. Can either be a tab-delimited file with at least 2 columns: "sample" and "ploidy" or an integer which will set the same ploidy to all samples. By default, the ploidy is calculated using the CNV file segmented in fragments of 10% of chromosomal length and its value will be the most represented CNV value for each sample.SAMPLE_TO_PROCESS_LIST
: comma-separated string or file with one sample per line used to restrict the list of samples to process by aCNViewer.SAMPLE_TO_EXCLUDE_LIST
: comma-separated string or file with one sample per line used to exclude a list of samples from analyses.SAMPLE_ALIAS_FILE
: optional parameter used to change the sample name to a preferred sample name. It is a tab-delimited file with 2 columns: one for the sample name and a second one with the preferred sample name.
The following options are histogram specific:
HISTOGRAM
: specify whether an histogram should be generated. The default value is0
but its value is overriden to1
when option--plotAll 1
is set.LOH_TO_PLOT
: histogram option for LOH plotting. Values should be one of "cn-LOH" for plotting cn-LOH only, "LOH" for LOH only, "both" for cn-LOH and LOH or "none" to disable this feature. The default value is "cn-LOH".USE_FULL_RESOLUTION_FOR_HIST
: tell whether to plot histogram using full resolution i.e. CNVs are not segmented according to a user-defined length through windowing approach. The default value is1
. If0
, the resolution of the plot will be given by eitherWINDOW_SIZE
orPERCENTAGE
.
The following options are GISTIC options (more details can be found here):
RUN_GISTIC
: specify whether to run GISTIC in order to have a statistical way to prioritize regions of interest. The default value is0
.GENE_GISTIC
: tell whether gene GISTIC algorithm should be used to calculate the significance of deletions at a gene level instead of a marker level. The default value is1
.SMALL_MEM
: tell GISTIC whether to use memory compression at the cost of a longer runtime. The default value is0
.BROAD
: tell GISTIC to run the broad-level analysis as well. The default value is1
.BR_LEN
: set GISTIC'sbroad_len_cutoff
. The default value is0.5
.CONF
: set the confidence level used to calculate the region containing a driver. The default value is0.9
.ARM_PEEL
: set GISTIC'sarm_peeloff
. The default value is1
.SAVE_GENE
: tell GISTIC whether to save gene tables. The default value is1
.GCM
: set GISTIC'sgene_collapse_method
. The default value isextreme
.
The following options are mainly specific to heatmaps while a few are related to dendrograms:
HEATMAP
is an optional parameter used only ifPLOT_ALL
is set to0
to tell whether to plot heatmaps or not. The default value is1
LAB_ROW
is an optional parameter telling whether heatmap's row names (chromosomal regions) should be shown. The default value is0
LAB_COL
is an optional parameter telling whether heatmap's column names (sample names) should be shown. The default value is1
CEX_COL
is an optional parameter settingcexCol
for heatmaps. The default value is0.7
. See R heatmap.2 documentation for more detailsCEX_ROW
is an optional parameter settingcexRow
for heatmaps. The default value is0.45
. See R heatmap.2 documentation for more detailsHEIGHT
is an optional parameter settingheight
for heatmaps. The default value is12
. for heatmaps.WIDTH
is an optional parameter settingwidth
for heatmaps. The default value is10
. See R heatmap.2 documentation for more detailsMARGINS
is an optional parameter settingmargins
as a comma-separated string for heatmaps. The default value is5,5
. See R heatmap.2 documentation for more detailsHCLUST
is an optional parameter settinghclust
method for heatmaps / dendrograms. See R heatmap.2 documentation for more detailsGROUP_LEGEND_POS
is an optional parameter setting the phenotypic / clinical feature legend's position within the heatmap. The default value istopright
and can be changed to coordinates (for example0.1,0.5
which will put the legend at 10% of the total width of the graph on the x axis and 50% of the total height of the graph on the y axis i.e. in the middle of the y axis) or in R specified logical location (top
,bottom
,left
,right
, etc)CHR_LEGEND_POS
is an optional parameter setting the chromosome legend's position within the heatmap. The default value isbottomleft
and can be changed to coordinates (for example0.1,0.5
which will put the legend at 10% of the total width of the graph on the x axis and 50% of the total height of the graph on the y axis i.e. in the middle of the y axis) or in R specified logical location (top
,bottom
,left
,right
, etc)RCOLOR_FILE
USE_RELATIVE_COPY_NB_FOR_CLUSTERING
is an optional parameter specifying whether the CNV matrix used for the heatmap should be relative copy number values or not. The default value is0
. IfPLOT_ALL
is1
then plots for both values ofUSE_RELATIVE_COPY_NB_FOR_CLUSTERING
will be generated.KEEP_GENOMIC_POS
is optional and will keep the segmented genome in its original position if set to1
and not cluster segments according to sample CNV patterns (the default value is0
).DENDROGRAM
is an optional dendrogram parameter used only ifPLOT_ALL
is set to0
to tell whether to plot dendrograms or not. The default value is1
USE_SHAPE
is an optional dendrogram parameter and if set to1
(default value) will replace sample labels with colored shapes in the leaves of the dendrogram(s).
Generate a quantitative stacked histogram from CEL files (subset of data of hepatocellular carcinomas with hepatitis C virus etiology used in Chiang et al. Cancer Res, 2008) with a window size of 2Mbp:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrays250k_sty/ -t TEST_AFFY_CEL --refBuild hg18 -w 2000000 -b aCNViewer_DATA/bin --platform Affy250k_sty -l aCNViewer_DATA/snpArrays250k_sty/LibFiles/
[--useCustomPloidies USE_CUSTOM_PLOIDIES]
If ASCAT is not installed (i.e you are not using the docker application) and if you want to install it into a custom R library folder, please add the following option to the previous command line: --rLibDir RLIB
.
==Here is the full command:==
DOCKER_OR_PYTHON
-f CEL_DIR --refBuild
REF_BUILD
-t OUTPUT_DIR -b
BIN_DIR
--platform AFFY_PLATFORM -l AFFY_LIB_DIR [--gw6Dir GW6_DIR] [--gcFile ASCAT_GC_FILE]
[GENERAL_PLOT_OPTIONS
] [HISTOGRAM_OPTIONS
] [GISTIC_OPTIONS]
[HEATMAP_DENDRO_OPTIONS]
where:
CEL_DIR
is the folder containing ".cel" ou ".cel.gz" filesAFFY_PLATFORM
: name of ASCAT supported Affymetrix platform with a GC content file available ("Affy250k_sty", "Affy250k_nsp", "Affy500k" or "AffySNP6"). Please refer to ASCAT website for more detailsAFFY_LIB_DIR
: Affymetrix library file downloadable from Affymetrix websiteGW6_DIR
is optional and refers to the folder where gw6.tar.gz has been uncompressed into. This archive contains different programs and files necessary to process Affymetrix SNP array and has been uncompressed intoaCNViewer_DATA/bin/PennCNV/gw6/
(default value).ASCAT_GC_FILE
: GC content file necessary for ASCAT GC correction when analyzing SNP array data. This parameter is optional as its value will be automatically deduced from the value ofAFFY_PLATFORM
. Please check ASCAT website for available GC content files. It is also possible to create custom GC file.USE_CUSTOM_PLOIDIES
: specify whether ploidies should be calculated using our custom algorithm (use a window of 10% of chromosomal length and set the ploidy to the most frequent CNV value for each sample) or use ploidies calculated by ASCAT/Sequenza. The default value is1
.
Generate a quantitative stacked histogram from raw Illumina data from non-Hodgkin lymphoma patients used in Yang F et al. PLoS One 2014 with a window size of 2Mbp:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/snpArrayIllu660k/GSE47357_Matrix_signal_660w.txt.gz -t TEST_ILLU --refBuild hg19 -w 2000000 -b aCNViewer_DATA/bin --probeFile aCNViewer_DATA/snpArrayIllu660k/Human660W-Quad_v1_H_SNPlist.txt --platform Illumina660k --beadchip "human660w-quad"
==Here is the full command:==
DOCKER_OR_PYTHON
-f ILLU_FILES --refBuild
REF_BUILD
-b
BIN_DIR
[--sampleList SAMPLE_TO_PROCESS_FILE] --probeFile PROBE_POS_FILE --platform ILLUMINA_PLATFORM [--beadchip BEADCHIP] [-g ASCAT_GC_FILE] [-N NORMALIZE]
[GENERAL_PLOT_OPTIONS
] [HISTOGRAM_OPTIONS
] [GISTIC_OPTIONS]
[HEATMAP_DENDRO_OPTIONS]
where:
-
ILLU_FILES
can either be the list of Illumina final report files to process specified either as a comma-separated string with all the report files to process or as a directory containing these files. Each Illumina final report file should contain at least the following columns:SNP Name
Sample ID
Log R Ratio
B Allele Freq
Alternatively, it can be the raw Illumina files with at least the following columns:
ID
SAMPLE1.X
SAMPLE1.Y
- ...
SAMPLEn.X
SAMPLEn.Y
-
PROBE_POS_FILE
: file listing the probes used on the SNP array with their genomic position. The file is tab-delimited with the following columns:Name
Chr
MapInfo
orPosition
-
ILLUMINA_PLATFORM
: name of ASCAT supported Illumina platform with a GC content file available ("Illumina660k" or "HumanOmniExpress12"). Please refer to ASCAT website for more details -
BEADCHIP
: -
NORMALIZE
: Turn on / off tQN normalization. The default value is 1. -
SAMPLE_TO_PROCESS_FILE
: optional, used to specify list of samples to process in one of the following formats:- a comma-separated string listing all the samples to process
- the name of text file with one line per sample to process
- the name of a Python dump file with the extension ".pyDump"
Sequenza is used to process NGS paired (tumor / normal) bams and produce CNV segments. These segments are then used by aCNViewer to produce the different available outputs. This step is best executed on a computer cluster (supported clusters are SGE, SLURM, MOAB and LSF. Tests have been successfully made on SGE and SLURM clusters) but will work on a single machine as well (although it will be much slower).
Generate a quantitative histogram from paired (tumor / normal) bams:
DOCKER_OR_PYTHON
-f aCNViewer_DATA/wes/bams/ -t TEST_WES_RAW --refBuild hg19 -w 2000000 -b aCNViewer_DATA/bin --fileType Sequenza --samplePairFile aCNViewer_DATA/wes/bams/sampleFile.txt
[--useCustomPloidies USE_CUSTOM_PLOIDIES]
==Here is the full command:==
DOCKER_OR_PYTHON
-f BAM_DIR -t OUTPUT_DIR --refBuild
REF_BUILD
-b
BIN_DIR
--fileType Sequenza --samplePairFile SAMPLE_PAIR_FILE [-r REF_FILE] [--byChr 1] [-n NB_THREADS] [--createMpileUp CREATE_MPILEUP] [--pattern BAM_FILE_PATTERN] [-M MEMORY]
[GENERAL_PLOT_OPTIONS
] [HISTOGRAM_OPTIONS
] [GISTIC_OPTIONS]
[HEATMAP_DENDRO_OPTIONS]
where:
BAM_DIR
is the folder containing the paired bam filesBAM_FILE_PATTERN
is an optional parameter which default value is.bam
CHR_SIZE_FILE
CENTROMERE_FILE
WINDOW_SIZE
SAMPLE_PAIR_FILE
is a tab-delimited file with the following three column names:idvdName
sampleName
type
which should either beT
for tumoral samples orN
for normal samples
REF_FILE
is the reference file in fasta format used to generate the bam files. WhenREF_BUILD
is set,REF_FILE
is automatically set to the fasta file present inaCNViewer_DATA/genomes/REF_BUILD
.BY_CHR
is an optional parameter to indicate whether Sequenza should create seqz (Sequenza intermediate file) files by chromosome or not (the default value is1
)NB_THREADS
is an optional parameter specifying the number of cores which will be used for each sample pair to create chromosomal seqz files ifBY_CHR
has been set to1
. If aCNViewer is ran on a supported computer cluster master node, jobs will be submitted to the cluster. Otherwise, multi-threading will be used run Sequenza.CREATE_MPILEUP
is an optional parameter telling Sequenza whether to create intermediate mpileup files when generating results. The default value is1
and it is recommended not to change its value as Sequenza may freeze in some cases when set to0
.MEMORY
: optional argument specifying allocated memory in GB to run Sequenza when using a computer cluster. The default value is 8 (GB) and should work for most WES analysis
Generate quantitative stacked histogram from Sequenza results with a window size of 2Mbp:
aCNViewer_DATA.tar.gz is required to run this example.
DOCKER_OR_PYTHON
-f aCNViewer_DATA/wes/ -t TEST_WES_SEQUENZA --refBuild hg19 -w 2000000 -b aCNViewer_DATA/bin --fileType Sequenza
==Here is the full command:==
DOCKER_OR_PYTHON
-f SEQUENZA_RES_DIR --fileType Sequenza -t TARGET_DIR --refBuild
REF_BUILD
-b
BIN_DIR
[GENERAL_PLOT_OPTIONS
] [HISTOGRAM_OPTIONS
] [GISTIC_OPTIONS]
[HEATMAP_DENDRO_OPTIONS]
where:
SEQUENZA_RES_DIR
is the folder containing Sequenza results (*_segments.txt
)
At the moment, ASCAT segment file, PennCNV and Sequenza results can be used as an input to aCNViewer. It is possible however to feed aCNViewer with CNV results from any other softwares as explained in the section below.
Both examples below require to download aCNViewer_DATA.tar.gz.
Generate quantitative stacked histogram from PennCNV results (79 samples from Hapmap3):
DOCKER_OR_PYTHON
-f aCNViewer_DATA/pennCNV/hapmap3.rawcnv -t TEST_PENN_CNV --refBuild hg18 -b aCNViewer_DATA/bin --lohToPlot none
CNV results from any software can be processed by aCNViewer if formatted in the ASCAT segment file format i.e. a tab-delimited file with the following columns:
sample
chr
startpos
endpos
nMajor
nMinor
The result file should be sorted according to the following ordered column names: sample
, chr
, startpos
, endpos
and chromosome names in the chr
column should not contain the prefix chr
so chr1
should appear as 1
. All CNVs for one indivual should be non overlapping. If there is only a global CNV value v
(and this no allele-specific CNV value), nMajor
and nMinor
can take any value as long as nMajor + nMinor = v
. When plotting the quantitative histogram, add option --lohToPlot none
to disable LOH plotting.
When processing raw SNP array data with aCNViewer, ASCAT is used to calculate CNV profiles. These results are saved into a folder named ASCAT
in the user selected target directory with the following files:
*.segments.txt
: file containing ASCAT predicted CNV segments*.ascatInfo.txt
: file containing the following ASCAT values for all the samples:aberrantcellfraction
,goodnessOfFit
,psi
andploidy
*.png
: the various ASCAT graphical outputs:
File | Description |
---|---|
.ASCATprofile.png | genome-wide representation of ASCAT CNVs |
.ASPCF.png | results of segmentation using Allele-Specific Piecewise Constant Fitting |
.rawprofile.png | genome-wide representation of raw ASCAT CNVs |
.sunrise.png | sunrise plot showing the optimal solution of tumor ploidy and percentage of aberrant tumor |
.tumour.png | representation of LogR and BAF values |
tumorSep*.png | plot of BAF values |
.ascatInfo.txt | ASCAT values of aberrantcellfraction, goodnessOfFit, psi and ploidy for all samples |
.segments.txt | list of all CNVs with the copy number for each allele |
For the full list of GISTIC output files, please refer to the section Output Files
of the following website. Here are the main output files of interest:
File | Description |
---|---|
broad_significance_results.txt | The list of broad events with related q-values and frequencies |
all_lesions.conf_*.txt | the list of all focal events along with their level of significance |
amp_* | list of all focal amplification events |
del_* | list of all focal deletion events |
The Sequenza results of each sample pair are stored in a folder named TUMOR_NORMAL_sequenza
in the sequenza
folder and contains the following files:
File | Description |
---|---|
*_segments.txt |
predicted CNVs |
*_CP_contours.pdf , *_confints_CP.txt & *_model_fit.pdf |
inferred cellularity and ploidy |
*_alternative_fit.pdf & *_alternative_solutions.txt |
alternative inferred cellularities and ploidies |
*_chromosome_view.pdf |
chromosome view with mutations, BAF, depth ratio and segments |
*_genome_view.pdf |
genome view of all the CNVs |
*_mutations.txt |
list of detected mutations |
*_CN_bars.pdf |
frequency of all the copy number values |
For more information about Sequenza output files, please refer to its user guide.
When generating histograms, 3 text files with the suffix _samples.txt
will be created along:
- one with all the genomic segments
- one with only the LOH events (file with suffix
_loh_samples.txt
) - one with only the cn-LOH (file with suffix
_cnLoh_samples.txt
)
Each file is in the same format with the following columns:
CNV key
: the relative copy number value compared to the tumor ploidychrName
start
: the middle of the segment so the real start isstart - segmentLength / 2
segmentLength
: length of the current segmentpercentage
: the percentage of samples with the relative ploidy value inCNV key
for the segment (chrName
, [start - segmentLength / 2
,start + segmentLength / 2
])samples
: the list of the samples falling in the above category
The following files are created as well:
-
*_10pc_ploidy.txt
is a matrix of segments of 10% chromosomal length for all samples. The last column indicates the calculated ploidy which corresponds to the most frequent ploidy -
*.R
are R scripts used to create the various graphical representations. You can modify and re-run these scripts if you want to further customize your graphical outputs and if aCNViewer do not propose the customizations you are looking for.
2 folders (relCopyNb
and rawCopyNb
) will be created and will respectively contain graphs generated from relative copy number values and raw copy number values.
aCNViewer has a few limitations including the fact that it does not currently account for intra-tumor heterogeneity. Indeed, having a simultaneous view on the copy number landscape along with the clonality status of these events could help better understand the mechanisms of a disease. Another current limitation of aCNViewer is the absence of a function to compare two groups of samples. One simple way to do that, though, would be to generate the quantitative histograms for both groups separately and compare these plots (as we did in Fig 2 of the article below).
aCNViewer: comprehensive genome-wide visualization of absolute copy number and copy neutral variations. Victor Renault, Jörg Tost, Fabien Pichon, Shu-Fang Wang-Renault, Eric Letouzé, Sandrine Imbeaud, Jessica Zucman-Rossi, Jean-François Deleuze & Alexandre How-Kit. PLoS One. 2017 Dec 19;12(12):e0189334. doi: 10.1371/journal.pone.0189334. eCollection 2017.