Gene ID dropdown not showing options with gff option
preetida opened this issue · 7 comments
Hi Joe,
I am using covviz for ~1500 samples, after generating bed file with golfer indexcov, I am running covviz by following command.
_covviz --ped testcoviz/testcoviz-indexcov.ped --gff ~/scratch/Homo_sapiens.GRCh38.99.gtf.gz testcoviz/testcoviz-indexcov.bed.gz -o CHIP_Coverage
In my html output I don't know see options in GeneID dropdown. Do I have to specify the gene feature ? in option?
referring to this line on documentation "Currently we support GFF, VCF, and BED. GFF tracks are added using --gff where features are 'gene' and attributes have 'Name='. Feature type and attribute regex can be configured using --gff-feature and --gff-attr."
Yes, probably just need to use a new pattern to grab the gene IDs from the feature field of the GFF.
--gff-feature
refers to column 3. The default it's looking for are lines annotated as 'gene'.
--gff-attr
refers to the column 9 and by default it'll look to split the gene name using 'Name='.
Could you send a few lines of your gtf or point me to where your annotation was downloaded from?
Downloaded from here : ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/
here is the few lines of file.
#!genebuild-last-updated 2019-08
1 havana gene 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
Try covviz
with --gff-attr "gene_name "
.
I'm pretty sure that'll leave the quote marks. I'll push an update to handle removing any remaining quote marks after the regex has been applied.
Since the input bed file is coming from indexcov
, can you include --skip-norm
in your covviz call and see how that looks. Or maybe coverage is really poor in a lot of the samples.
Coverage is normalized to 1x per sample from indexcov, so you won't be able to say precise depths per sample. You can analyze each alignment file to get coverages pretty rapidly with https://github.com/brentp/mosdepth.
What you're seeing with the highlight (green line) is a sample that deviates significantly from the rest of the cohort through those coordinates. We had to do it this way because if you attempt to draw all ~1500 lines for all points along the x-axis, the browser will not behave and likely run out of RAM in the process. The upper and lower bounds and everything in the middle shaded gray represents the other ~1499 samples.
That should give you some sense how to interpret these outputs, but please follow up if I'm still unclear.
I pushed a new release to address the quotes. You can install via pip (pip install -U covviz
).