Somatic cnv issue
Opened this issue · 0 comments
Hi ,
I want to identify the somatic cnv from tumor-normal paired WES data. The config file i have used is :
#First way
[general]
BedGraphOutput=TRUE
chrLenFile = path/data/hg38.fa.fai
window = 0
ploidy = 2
outputDir =path to outputfile/freec-results
bedtools = /usr/bin/bedtools
samtools = /usr/bin/samtools
forceGCcontentNormalization = 0
contaminationAdjustment=TRUE
#sex=XY
breakPointType=4
chrFiles = path/genome-file-chr
maxThreads=10
gemMappabilityFile = path/FREEC-11.6/out100m2_hg38.gem
noisyData=TRUE
printNA=FALSE
sambamba = path/bin/./sambamba
readCountThreshold=50
[sample]
mateFile =path/sample_1T_dedup_reads.bam
miniPileup = path/pileup-files/sample_1T.pileup.gz
inputFormat = BAM
mateOrientation = FR
[control]
mateFile = path/sample_1N_dedup_reads.bam
miniPileup =path/sample_1N.pileup.gz
inputFormat = BAM
mateOrientation = FR
[BAF]
makePileup = path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.bed
fastaFile = path/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
SNPfile =path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz
minimalCoveragePerPosition = 5
shiftInQuality = 33
[target]
#captureRegions = /EXOME-SEQ/TruSeq_exome_targeted_regions.hg19.bed
captureRegions = path/final-overlapping-removed-sorted.bed
After running this script i got the output file for somatic cnv like this:
1 18921041 19198941 3 gain AAB 81.846
1 71007971 75160955 3 gain AAB 6.00045
1 78641024 84202811 3 gain AAB 10.0454
1 93278631 99740911 3 gain AAB 32.4437
1 100292566 108136556 3 gain AAB 56.9157
1 185296350 187641778 3 gain AAB 3.25507
1 190097632 198859138 3 gain - -1
1 220025096 220134605 1 loss A -1
1 227016028 227081001 3 gain AAB 68.6832
2 38783 45827 3 gain - -1
2 31963042 32433806 1 loss A 100
2 49017438 51029206 3 gain AAB 28.0357
#Second way
#using config file-only including pileup files:
[general]
BedGraphOutput=TRUE
chrLenFile = /path/hg38.fa.fai
window = 0
ploidy = 2
outputDir = /path/pileup-basedresults
bedtools = /usr/bin/bedtools
samtools = /usr/bin/samtools
forceGCcontentNormalization = 1
contaminationAdjustment=TRUE
#sex=XY
breakPointType=4
chrFiles =path/genome-file-chr
maxThreads=10
gemMappabilityFile = path/out100m2_hg38.gem
noisyData=TRUE
printNA=FALSE
sambamba = path/./sambamba
readCountThreshold=50
[sample]
mateFile = /path/CaGB_10_T.pileup.gz
inputFormat = pileup
mateOrientation = FR
[control]
mateFile = path/CaGB_10_N.pileup.gz
inputFormat = pileup
mateOrientation = FR
[BAF]
SNPfile =path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz
minimalCoveragePerPosition = 5
shiftInQuality = 33
[target]
#captureRegions = /EXOME-SEQ/TruSeq_exome_targeted_regions.hg19.bed
captureRegions = path/final-overlapping-removed-sorted.bed
output_file: again only 7 columns
1 27621445 29279653 1 loss A -1
1 77631847 77926905 1 loss A 47.3996
1 91642882 93079408 1 loss A 100
1 96777544 99709210 3 gain AAB 27.2263
1 173500937 174241659 1 loss A 11.0234
1 219915358 220147732 1 loss A 17.95
2 31531343 32388914 1 loss A -1
2 36378249 36555753 3 gain AAB 53.6888
2 38689775 38882527 1 loss A -1
2 46816758 48733146 1 loss A -1
2 54143223 54850294 3 gain AAB 24.374
2 60756240 62502550 1 loss A -1
2 62502552 69962188 2 neutral AA -1
2 69996591 70297189 1 loss A -1
only 7 columns are displayed i am not getting status column whether its germline or somatic. I want to acquire somatic alterations excluding germline .Additionally , i have overlapped these files with dgv database germline variants in order to remove the germline ones but i am getting good match for almost every region. So i am not sure whether these cnv is somatic or germline .how can i be sure that the cnvs in this file are somatic please suggest how can i acquire that column.