Questions about the output of CNVs and the use of assess_significance
stbio-hbh opened this issue · 8 comments
Hello,
I'm using the tool Control-FREEC to evaluate the CNVs in my data, but I have some doubts about the _CNVs file given in output.
According to the manual information, the CNVs file should have 9 columns of results, but my result only output seven columns.
This is my configuration file and command line:
freec -conf freec_LPS4.txt -sample LPS4.bam
[general]
samtools = /data_center_01/home/hubihao/anaconda3/envs/NB/bin/samtools
bedtools = /data_center_01/home/hubihao/anaconda3/envs/NB/bin/bedtools
sambamba = /data_center_01/home/hubihao/anaconda3/envs/NB/bin/sambamba
chrLenFile = /data_center_01/home/hubihao/script/hg19_index/hg19_EBV_target.fasta.fai
ploidy = 2
maxThreads = 4
breakPointThreshold = .8
window = 50000
chrFiles = /data_center_01/home/hubihao/script/hg19_chr/
outputDir = /data_center_01/home/hubihao/project/Neuroblastoma/FREEC_analysis/
sex = XX
[sample]
#matefile = [samfile]
inputFormat = BAM
mateOrientation = 0
[BAF]
makePileup = /data_center_01/home/hubihao/project/Neuroblastoma/FREEC_test/hg19_snp142.SingleDiNucl.1based.bed
fastaFile = /data_center_01/home/hubihao/script/hg19_index/hg19_EBV_target.fasta
SNPfile = /data_center_01/home/hubihao/project/Neuroblastoma/FREEC_test/hg19_snp142.SingleDiNucl.1based.txt.gz
I looked up the previous issue, but I didn't find anything that needed to be adjusted in my config file, because I didn't have a control?
Also, I had some problems with the CNVs using assess_significance.R.
cat assess_significance.R | R --slave --args LPS4.bam_CNVs LPS4.bam_ratio.txt
And then this is the script's error message:
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
expand.grid, I, unname
Loading required package: IRanges
Loading required package: GenomeInfoDb
Error in if (class(resultks) == "try-error") return(list(statistic = NA, :
the condition has length > 1
Calls: KS
In addition: Warning message:
In ks.test.default(values, score(normals)) :
p-value will be approximate in the presence of ties
Execution halted
Pvalue couldn't be added in CNVs. How can I eliminate this error.
Thanks for your reply. : )
I also have this problem. How can I solve it? Thank you
I tried to modify assess_significance.R, and the output of the modified script seems to be correct. I posted the modified part in the script below:
Start with line 36 of the original script
numberOfCol=length(cnvs)
wscore=c()
kscore=c()
for (i in c(1:length(cnvs[,1]))) {
values <- score(subsetByOverlaps(ratio.bed,cnvs.bed[i]))
resultw <- class(try(wilcox.test(values,score(normals)), silent = TRUE))
ifelse(resultw == "try-error", wscore <- c(wscore, "NA"), wscore <- c(wscore, wilcox.test(values,score(normals))$p.value))
resultks <- class(try(ks.test(values,score(normals)), silent = TRUE))
ifelse(resultks == "try-error",kscore <- c(kscore, "NA"),kscore <- c(kscore, ks.test(values,score(normals))$p.value))
}
cnvs = cbind(cnvs, wscore, kscore)
####End with line 50 of the original script #####
I hope you can try out the modified script, and I also hope to get some guidance from the author.
According to what you said, I have made modifications and have successfully run. Thank you
@stbio-hbh @hushaoqiang @valeu Hello, I am also facing the same issue. I deleted lines 36 to 50 in assess_significance.R as per the instructions above and pasted the modified commands. However, after running the program again for 3-4 hours, there was still no change. After terminating the program, I checked the ratio.txt file and it remained unchanged. What could be the reason for this?
I don't understand, I haven't paid attention to it since it was successfully run
@hushaoqiang 可以麻烦你分享一下你修改后的 assess_significance.R给我么?
后缀我改成.txt了,.R的文件github不让传
收到,感谢慷慨援助,谢谢!