Genotype filter using wildcard with gt_alt_freqs > 0.3
OskarSchnappauf opened this issue · 5 comments
Dear gemini team,
I use gemini very frequently and it is an awesome tool for variant prioritization within large databases. However, one thing I could not find out yet is how to use the gt.alt.freqs option in combination with a wildcard.
For instance, I want all variants with impact severity MED or HIGH and with at least two affected individuals in our database:
gemini query --header -q "SELECT gene, chrom, start, end FROM variants where impact_severity != 'LOW'" --gt-filter "(gt_types).(Phenotype==2).(==HET).(count >1) and (gt_types).(Phenotype==1).(==HOM_REF).(all)" gemini.db
However, some of the identified variants have a very low gt.alt.freqs. How can I include a threshold for gt.alt.freqs for the identified variants? I tried : (gt_alt_freqs).(*).(>=0.3).(any), but it did not work.
Thank you very much for your help.
Oskar
Anyone?
When you say it did not work, do you mean you know for certain there are such variants and none were returned?
Hi Aaron,
thank you so much for your reply. I don't know about the variants, but it does not even run, I get an error message.
Here is what I did and what the error message was:
I browsed the database with this command:
gemini query --header -q "SELECT gene, chrom, start, end FROM variants where impact_severity != 'LOW'" --gt-filter "(gt_types).(Phenotype==2).(==HET).(count >1) and (gt_types).(Phenotype==1).(==HOM_REF).(all) and (gt_alt_freqs).(*).(>=0.3).(any)" gemini.db
And the error message was:
Traceback (most recent call last):
File "/usr/local/apps/gemini/0.20.1/bin/gemini", line 7, in
gemini_main.main()
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_main.py", line 1248, in main
args.func(parser, args)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_main.py", line 439, in query_fn
gemini_query.query(parser, args)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_query.py", line 169, in query
run_query(args)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_query.py", line 135, in run_query
gene_needed, args.show_families, subjects=subjects)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 622, in run
self.gt_filter = self._correct_genotype_filter()
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 1047, in _correct_genotype_filter
raise ValueError("Wildcard filter should consist of 4 elements. Exiting.")
ValueError: Wildcard filter should consist of 4 elements. Exiting.
I think it is related to the "." in (>=0.3) since it complians about the number of elements.
Any suggestion?
Thank you so much,
Oskar
I encountered the same issue. It issues the "ValueError: Wildcard filter should consist of 4 elements" Also #868 is the same issue.
Uma
This can be fixed by changing the file ....../python2.7/site-packages/gemini/GeminiQuery.py
Line 1043:
if token.count('.') != 3 or \
becomes
if token.count(').(') != 3 or \
Line 1048:
(column, wildcard, wildcard_rule, wildcard_op) = token.split('.')
becomes
column, wildcard, wildcard_rule, wildcard_op) = token.split(').(')
I have no idea if this breaks other functionalities, so make a backup of the original file.