BoevaLab/FREEC

_ratio.txt output

ashenflower opened this issue · 4 comments

Hello,

I'm using the tool Control-FREEC to evaluate the CNVs in my data, but I have some doubts about the _ratio.txt file given in output.

As far as I understood by both the paper and the documentation, this file should report the ratio (that, if I'm correct, should be the nr. of reads count from the sample over the nr. of reads count from the control - and not the log2 ratio) for each window, and the related estimated absolute CNs.

Currently, ratios values seem to make sense for my data, given ploidy 2 ( ratio ~1 corresponds to cn of 2, ~0.5 to 1 and so on), but then for some windows I have ratio equal to -1. What does it mean?

EDIT I am also not sure if I am interpretig correctly the ratio. I am using a control sample, and given ploidy =2, I would expect a median ratio ~ 2 for predicted CN of 4, but in some cases I get CNs of 4 for median ratios equal to 7 and 8. How come?

Thank you for any answer.

valeu commented

Dear user,
-1 means "information not available". It is a number and not 'na' to avoid plotting errors.

you are right about your understanding of ratio. it is ~1 for copy-neutral regions. So if ploidy is 2, then ratio of 1 should mean CN=2. However, ratio is calculated per window (or exon), and then smoothed (over the whole segment) to get Median Ratio. and then CN=round(Median Ratio X Ploidy) is calculated. If you set "NoisyData=TRUE" then CN may slightly vary because BAF values in the segment will be taken into account.

Dear @valeu,

Thank you very much for your quick reply! I still have some doubts about the ratios. In the results I am getting, for example I get CN=2 for a Median Ratio of 7 (so the ratio for the whole segment), and I am not providing any information for the BAF calculation, so I don't think it taken into account. Other examples are CN=3 for Median ratio of 4, or CN=4 for Median ratio of 8. Why is that so?

Also, how are the 'N' regions in the genome handled by control-FREEC?

valeu commented

N's will be accounted for if you provide GEM mappability files.

Can you share the whole file or a large part of it so that I could better visualize the problem?

This is an example from my results:

Chromosome       Start            End              Ratio            MedianRatio      CopyNumber     
NC_000016.10     46380000         46381000         7.02739          8.00749          4              
NC_000016.10     46381000         46382000         7.45326          8.00749          4              
NC_000016.10     46382000         46383000         7.20728          8.00749          4              
NC_000016.10     46383000         46384000         8.81572          8.00749          4              
NC_000016.10     46384000         46385000         7.60995          8.00749          4              
NC_000016.10     46385000         46386000         7.08909          8.00749          4              
NC_000016.10     46386000         46387000         8.28973          8.00749          4              
NC_000016.10     46387000         46388000         7.97253          8.00749          4              
NC_000016.10     46388000         46389000         7.86441          8.00749          4              
NC_000016.10     46389000         46390000         7.87856          8.00749          4              
NC_000016.10     46390000         46391000         8.08635          8.00749          4              
NC_000016.10     46391000         46392000         8.40486          8.00749          4              
NC_000016.10     46392000         46393000         8.25247          8.00749          4              
NC_000016.10     46393000         46394000         9.14821          8.00749          4              
NC_000016.10     46394000         46395000         8.20969          8.00749          4              
NC_000016.10     46395000         46396000         9.00321          8.00749          4              
NC_000016.10     46396000         46397000         8.25807          8.00749          4              
NC_000016.10     46397000         46398000         6.95637          8.00749          4              
NC_000016.10     46398000         46399000         8.23604          8.00749          4              
NC_000016.10     46399000         46400000         8.04246          8.00749          4              
NC_000016.10     46400000         46401000         7.82838          8.00749          4              
NC_000016.10     46401000         46402000         7.75044          8.00749          4 

Given ploidy=2 and Median Ratio = 8.00749, I would expect CN to be 8*2=16 , is it correct?

This is my config file:


[general]

chrLenFile = ../GRCh38.p14_genomic.fna.fai
maxThreads = 12
sex=XX
ploidy = 2
window = 1000

[sample]

mateFile = ../freec/sorted-tumor-aln.bam
inputFormat = bam
mateOrientation = FR

[control]

mateFile = ../freec/sorted-normal-aln.bam
inputFormat = bam
mateOrientation = FR


Also,

N's will be accounted for if you provide GEM mappability files.

So, for regions in the reference genome filled in with long sequences of “N” characters, what happens if I don't provide the GEM mappability files?

Sorry for such a long reply, and thank you for your time.