_ratio.txt output
ashenflower opened this issue · 4 comments
Hello,
I'm using the tool Control-FREEC to evaluate the CNVs in my data, but I have some doubts about the _ratio.txt file given in output.
As far as I understood by both the paper and the documentation, this file should report the ratio (that, if I'm correct, should be the nr. of reads count from the sample over the nr. of reads count from the control - and not the log2 ratio) for each window, and the related estimated absolute CNs.
Currently, ratios values seem to make sense for my data, given ploidy 2 ( ratio ~1 corresponds to cn of 2, ~0.5 to 1 and so on), but then for some windows I have ratio equal to -1. What does it mean?
EDIT I am also not sure if I am interpretig correctly the ratio. I am using a control sample, and given ploidy =2, I would expect a median ratio ~ 2 for predicted CN of 4, but in some cases I get CNs of 4 for median ratios equal to 7 and 8. How come?
Thank you for any answer.
Dear user,
-1 means "information not available". It is a number and not 'na' to avoid plotting errors.
you are right about your understanding of ratio. it is ~1 for copy-neutral regions. So if ploidy is 2, then ratio of 1 should mean CN=2. However, ratio is calculated per window (or exon), and then smoothed (over the whole segment) to get Median Ratio. and then CN=round(Median Ratio X Ploidy) is calculated. If you set "NoisyData=TRUE" then CN may slightly vary because BAF values in the segment will be taken into account.
Dear @valeu,
Thank you very much for your quick reply! I still have some doubts about the ratios. In the results I am getting, for example I get CN=2 for a Median Ratio of 7 (so the ratio for the whole segment), and I am not providing any information for the BAF calculation, so I don't think it taken into account. Other examples are CN=3 for Median ratio of 4, or CN=4 for Median ratio of 8. Why is that so?
Also, how are the 'N' regions in the genome handled by control-FREEC?
N's will be accounted for if you provide GEM mappability files.
Can you share the whole file or a large part of it so that I could better visualize the problem?
This is an example from my results:
Chromosome Start End Ratio MedianRatio CopyNumber
NC_000016.10 46380000 46381000 7.02739 8.00749 4
NC_000016.10 46381000 46382000 7.45326 8.00749 4
NC_000016.10 46382000 46383000 7.20728 8.00749 4
NC_000016.10 46383000 46384000 8.81572 8.00749 4
NC_000016.10 46384000 46385000 7.60995 8.00749 4
NC_000016.10 46385000 46386000 7.08909 8.00749 4
NC_000016.10 46386000 46387000 8.28973 8.00749 4
NC_000016.10 46387000 46388000 7.97253 8.00749 4
NC_000016.10 46388000 46389000 7.86441 8.00749 4
NC_000016.10 46389000 46390000 7.87856 8.00749 4
NC_000016.10 46390000 46391000 8.08635 8.00749 4
NC_000016.10 46391000 46392000 8.40486 8.00749 4
NC_000016.10 46392000 46393000 8.25247 8.00749 4
NC_000016.10 46393000 46394000 9.14821 8.00749 4
NC_000016.10 46394000 46395000 8.20969 8.00749 4
NC_000016.10 46395000 46396000 9.00321 8.00749 4
NC_000016.10 46396000 46397000 8.25807 8.00749 4
NC_000016.10 46397000 46398000 6.95637 8.00749 4
NC_000016.10 46398000 46399000 8.23604 8.00749 4
NC_000016.10 46399000 46400000 8.04246 8.00749 4
NC_000016.10 46400000 46401000 7.82838 8.00749 4
NC_000016.10 46401000 46402000 7.75044 8.00749 4
Given ploidy=2 and Median Ratio = 8.00749, I would expect CN to be 8*2=16 , is it correct?
This is my config file:
[general]
chrLenFile = ../GRCh38.p14_genomic.fna.fai
maxThreads = 12
sex=XX
ploidy = 2
window = 1000
[sample]
mateFile = ../freec/sorted-tumor-aln.bam
inputFormat = bam
mateOrientation = FR
[control]
mateFile = ../freec/sorted-normal-aln.bam
inputFormat = bam
mateOrientation = FR
Also,
N's will be accounted for if you provide GEM mappability files.
So, for regions in the reference genome filled in with long sequences of “N” characters, what happens if I don't provide the GEM mappability files?
Sorry for such a long reply, and thank you for your time.