CompEpigen/figeno

Bug? with copynumber using the different layouts

Closed this issue · 2 comments

Hi,

First of all, thanks a lot, really cool tool!
I discovered it today and been playing around and trying to produce some plots visualizing some BNDs and the copy number. I have several problems, sorry for the long post.

1- when using horizontal and stacked layouts, the copy number segments seem to follow the table I provide. However, when using circular, there seems to be a bug and the copy number segments don't make sense, they are overlapping for some reason. See both figures below, they both have the exact same JSON, the only difference is the layout. I am not sure if it's supposed to be like this or this is a bug.
Screenshot from 2024-06-12 16-31-44
horizontal

2- If the table with the CNV values has missing segments, the plot does not show gaps but seems to default to a ploidy of 2, is this the intended behavior? I am just wondering, if there's a way to not draw anything then, to at least show the gap in the data.

3- I have another CNV table produced by Delly2. There, the segments are smaller (100kb windows), which is causing problems with the visualization, as the final plot seems to only show ploidy 2, maybe because the segments are very small, they're not being drawn. I can try do some "smoothing", for example using DNAcopy library in R to produce longer segments, but then I'll have non-integer values for the copy number, but figeno seems to not accept non-integer values. Looking quickly at the code, I changed line 285 in track_copynumber.py from

CNAs[chr].append((int(linesplit[1]),int(linesplit[2]),int(linesplit[3])))

to

CNAs[chr].append((int(linesplit[1]),int(linesplit[2]),float(linesplit[3])))

Seems to do the trick for me, but of course, this might cause other problems, I haven't delved much into the code

Hi,

Thanks for the nice feedback, and sorry for these issues.
There was indeed a bug with the circular layout, which is now fixed in version 1.3.2.

I have also changed the way the GUI looks for the copynumber track because I realize that it was not very clear, and added a new input type. There are now 3 possible input types for the copynumber track:

  • freec (for CNAs called with Control-FREEC). This requires ideally a ratios file and a CNAs file. The ratios file indicates the copy number in each bin (of size 10kb for example), and is used to show dots. The CNAs file indicates the called copy numbers, and is used to color the dots based on the called copy number. If only the ratios file is provided, dots will be colored based on the copy number in each bin, but this is not ideal. If only the CNAs file is provided (which is what you were doing), this will only show segments corresponding to each CNA, and it will assume that all other positions have a copy number equal to the ploidy. This is not ideal, because as you saw then we cannot distinguish between missing data or real copy number of 2. Also this mode excludes small CNAs. This mode was intended as a last resort, in case only the called CNAs are available. I have now added a warning in case the user tries to show copy numbers only based on the CNAs file.
  • purple (for CNAs called with purple). This requires a copy number file containing copy numbers in each segment.
  • delly (for CNAs called with delly; new in version 1.3.2). This is similar to freec, although it requires a copy number file instead of a ratios file (copy number = ratio * ploidy) and the formats are a bit different. This might work best if you say you called CNVs with delly.

You can have a look at the documentation for more detail: https://figeno.readthedocs.io/en/latest/content/describe_figure.html#copynumber .
If you called CNAs with a different tool, you can try to convert your files to one of the supported formats, or you can share the file format with me and I can try to add direct support for it.

You can try to install the new version with pip install figeno==1.3.2 and let me know if this works!

Thanks for the fixes!
Yeah, after looking around in the code, I was able to use delly's cov output by just dividing that column by 2 and then internally it's multiplied by 2.
Glad the issue with indels is solved as well.

Best,
Fawaz