bcolz index for segregation
8nb24 opened this issue · 5 comments
Hi,
Thank you all for your progress with this tool. I have a gemini database with ~2000 individuals WGS variants. For reanalysis of old cases, we would like to be able to run segregation queries for all individuals in the cohort, i.e. run 'comp_het' on a per-gene basis for every family. For a single gene this takes 20-40 minutes. I was hoping to bcolz index the database to speed this process, however my understanding is that the bcolz index works only with the "gemini query -q ... --gt-filter ..." queries. I am wondering what the back end of the comp_het query looks like (does it run multiple queries with different --gt-filter ?), and if it is possible to utilize the bcolz index for comp_het, denovo, etc.
Thanks
it will use the bcolz index if available, including for the tools like comp_het.
Thanks for the response. I ran each of the following queries on a bcolz indexed database:
gemini comp_hets --filter "(gene == 'LOXL3')" --use-bcolz Rare.db
gemini query -q "select *, gts.U11802 from variants where (gene == 'LOXL3')" --gt-filter "gt_types.U11802 == HET" --use-bcolz Rare.db
In this case, the second query runs, but the first fails with this error:
usage: gemini [-h] [-v] [--annotation-dir ANNOTATION_DIR] {actionable_mutations,amend,annotate,autosomal_dominant,autosomal_recessive,bcolz_index,browser,burden,comp_hets,db_info,de_novo,dump,examples,fusions,gene_wise,interactions,load,load_chunk,lof_interactions,lof_sieve,mendel_errors,merge_chunks,pathways,qc,query,region,roh,set_somatic,stats,update,windower,x_linked_de_novo,x_linked_dominant,x_linked_recessive} ... gemini: error: unrecognized arguments: --use-bcolz
in the first query, it will detect the presence of the bcolz index and use it.
Are there additional arguments that should be called? I got the aforementioned error with the first query.
With the barebones comp_hets, denovo, query: gemini comp_hets --use-bcolz /path/to/db/database.db
I get the same error.
This is running on gemini 0.20.1
just run gemini comp_hets
without --use-bcolz