brwnj/covviz

Gene ID dropdown not showing options with gff option

preetida opened this issue · 7 comments

Hi Joe,
I am using covviz for ~1500 samples, after generating bed file with golfer indexcov, I am running covviz by following command.

_covviz --ped testcoviz/testcoviz-indexcov.ped --gff ~/scratch/Homo_sapiens.GRCh38.99.gtf.gz testcoviz/testcoviz-indexcov.bed.gz -o CHIP_Coverage

In my html output I don't know see options in GeneID dropdown. Do I have to specify the gene feature ? in option?

referring to this line on documentation "Currently we support GFF, VCF, and BED. GFF tracks are added using --gff where features are 'gene' and attributes have 'Name='. Feature type and attribute regex can be configured using --gff-feature and --gff-attr."

Screen Shot 2020-04-01 at 1 26 17 PM

brwnj commented

Yes, probably just need to use a new pattern to grab the gene IDs from the feature field of the GFF.

--gff-feature refers to column 3. The default it's looking for are lines annotated as 'gene'.

--gff-attr refers to the column 9 and by default it'll look to split the gene name using 'Name='.

Could you send a few lines of your gtf or point me to where your annotation was downloaded from?

Downloaded from here : ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/
here is the few lines of file.
#!genebuild-last-updated 2019-08
1 havana gene 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";

brwnj commented

Try covviz with --gff-attr "gene_name ".

I'm pretty sure that'll leave the quote marks. I'll push an update to handle removing any remaining quote marks after the regex has been applied.

Thanks that worked !
Though its has " " in the geneID dropdown.

On separate issue, it only highlight the sample, and not others, any clue why?
Screen Shot 2020-04-01 at 6 44 50 PM

brwnj commented

Since the input bed file is coming from indexcov, can you include --skip-norm in your covviz call and see how that looks. Or maybe coverage is really poor in a lot of the samples.

Coverage is normalized to 1x per sample from indexcov, so you won't be able to say precise depths per sample. You can analyze each alignment file to get coverages pretty rapidly with https://github.com/brentp/mosdepth.

What you're seeing with the highlight (green line) is a sample that deviates significantly from the rest of the cohort through those coordinates. We had to do it this way because if you attempt to draw all ~1500 lines for all points along the x-axis, the browser will not behave and likely run out of RAM in the process. The upper and lower bounds and everything in the middle shaded gray represents the other ~1499 samples.

That should give you some sense how to interpret these outputs, but please follow up if I'm still unclear.

brwnj commented

I pushed a new release to address the quotes. You can install via pip (pip install -U covviz).