Issue running FREEC on fungal genome
tania-k opened this issue · 7 comments
Hello Boeva Lab!
Appreciate the availability of this tool and am excited to see the results. I have been having an issue with running the code and am looking for any guidance you can provide!
My log file looks like:
Loading samtools/1.14
Loading requirement: libdeflate/1.10
Control-FREEC v11.6 : a method for automatic detection of copy number alterations, subclones and for accurate estimation of contamination and main ploidy using deep-sequencing data
Non Multi-threading mode
..Breakpoint threshold for segmentation of copy number profiles is 0.05
..telocenromeric set to 50000
..FREEC is not going to adjust profiles for a possible contamination by normal cells
..Window = 50000 was set
..Output directory: Attempt1
..Directory with files containing chromosome sequences: GENOMES/
..Sample file: aln/Ex15.sort.bam
..Sample input format: BAM
..will use this instance of samtools: 'samtools' to read BAM files
..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.45
..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
..Polynomial degree for "ReadCount ~ GC-content" normalization is 3 or 4: will try both
..Minimal CNA length (in windows) is 1
..File with chromosome lengths: test/res_Exophiala_dermatitidis_Ex4.fa
..Using the default minimal mappability value of 0.85
..uniqueMatch = FALSE
..average ploidy set to 1
..break-point type set to 2
..noisyData set to 0
..Control-FREEC will not look for subclones
..File test/res_Exophiala_dermatitidis_Ex4.fa was read
..[genomecopynumber] Starting reading aln/Ex15.sort.bam
..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view -@ 1 aln/Ex15.sort.bam
..finished reading aln/Ex15.sort.bam
PROFILING [tid=140054545385280]: aln/Ex15.sort.bam read in 22 seconds [fillMyHash]
11351239 lines read..
0 reads used to compute copy number profile
Error: FREEC was not able to extract reads from aln/Ex15.sort.bam
Check your parameters: inputFormat and mateOrientation
Use "matesOrientation=0" if you have single end reads
Check the list of possible input formats at http://bioinfo-out.curie.fr/projects/freec/tutorial.html#CONFIG
While the config file looks like:
#For more options see: http://boevalab.com/FREEC/tutorial.html#CONFIG ###
[general]
#parameters chrLenFile and ploidy are required.
chrLenFile = test/res_Exophiala_dermatitidis_Ex4.fa
ploidy = 1
#Parameter "breakPointThreshold" specifies the maximal slope of the slope of residual sum of squares.
#This should be a positive value. The closer it is to Zero, the more breakpoints will be called. Its recommended value is between 0.01 and 0.08.
breakPointThreshold = .05
#Either coefficientOfVariation or window must be specified for whole genome sequencing data. Set window=0 for exome sequencing data.
#coefficientOfVariation = 0.01
window = 50000
#step=10000
#Either chrFiles or GCcontentProfile must be specified too if no control dataset is available.
#If you provide a path to chromosome files, Control-FREEC will look for the following fasta files in your directory (in this order):
#1, 1.fa, 1.fasta, chr1.fa, chr1.fasta; 2, 2.fa, etc.
#csplit -s -z /path/to/INPUT.FA '/>/' '{*}'
#Please ensure that you don't have other files but sequences having the listed names in this directory.
chrFiles = GENOMES/
#GCcontentProfile = test/GC_profile_50kb.cnp
#if you are working with something non-human, we may need to modify these parameters:
minExpectedGC = 0.45
maxExpectedGC = 0.55
#readCountThreshold=10
#numberOfProcesses = 4
outputDir = Attempt1
#contaminationAdjustment = TRUE
#contamination = 0.4
#minMappabilityPerWindow = 0.95
#If the parameter gemMappabilityFile is not specified, then the fraction of non-N nucleotides per window is used as Mappability.
#gemMappabilityFile = /GEM_mappability/out76.gem
#breakPointType = 4
#forceGCcontentNormalization = 0
#sex=XY
#set BedGraphOutput=TRUE if you want to create a BedGraph track for visualization in the UCSC genome browser:
BedGraphOutput=TRUE
[sample]
mateFile = aln/Ex15.sort.bam
#mateCopyNumberFile = test/sample.cpn
inputFormat = BAM
mateOrientation=0
#use "mateOrientation=0" for sorted .SAM and .BAM
[control]
#mateFile = /path/control.pileup.gz
#mateCopyNumberFile = path/control.cpn
#inputFormat = pileup
#mateOrientation = RF
#[BAF]
#use the following options to calculate B allele frequency profiles and genotype status. This option can only be used if "inputFormat=pileup"
#SNPfile = /bioinfo/users/vboeva/Desktop/annotations/hg19_snp131.SingleDiNucl.1based.txt
#minimalCoveragePerPosition = 5
#use "minimalQualityPerPosition" and "shiftInQuality" to consider only high quality position in calculation of allelic frequencies (this option significantly slows down reading of .pileup)
#minimalQualityPerPosition = 5
#shiftInQuality = 33
[target]
#use a tab-delimited .BED file to specify capture regions (control dataset is needed to use this option):
#captureRegions = /bioinfo/users/vboeva/Desktop/testChr19/capture.bed
test/res_Exophiala_dermatitidis_Ex4.fa should be test/res_Exophiala_dermatitidis_Ex4.fa.fai, no?
The test/res_Exophilala_dermatitidis_Ex4 is the file containing scaffold/chromosome length information (after running your perl script).
i.e.
chr_1 4314646
chr_2 4303030
chr_3 3726496
chr_4 3672342
chr_5 3382390
chr_6 2864055
chr_7 2891896
chr_8 1200926
chr_9 251008
---
I am confused with your suggestion. I renamed my chrom length file so it doesn't end with fa. But the program still doesn't run to the end.
Was a naming issue, my BAM, chromosome length and separate chromosome files had different names (i.e. scaffold vs chromosome) once fixed got it to run!
Thank you, and sorry about the questions.
Great! Hope you get nice results!