ohlab/GRiD

how do I pass bowtie2 argument through update_database

jmartin77777 opened this issue · 6 comments

I'm trying to build a custom db for GRiD (based on the UHGG database from https://www.ebi.ac.uk/metagenomics/genomes) and the update_database command died after running for several hours with the message:

Error: Reference sequence has more than 2^32-1 characters! Please build a large
index by passing the --large-index option to bowtie2-build

Is there some way to pass-thru commands for bowtie2-build when running update_database? Or is there a way I can pre-build the indexes for bowtie2 and present them to update_database so it can complete?

Hi,

One option is to edit line 155 of the "update_database" script. Please refer to a similar question posted earlier. #5 (comment)

Thanks.
Tunde

Following the advice from #5 (comment) I tried first removing the -q argument, and got the same results. Then I left the -q off, and added --large-index and again got the same results. I've also screened my input genome fasta for contigs that were entirely N and didn't find anything. The command I'm running is:

update_database -d database -g genomes -p UHGG_grid_db

and I have each genome in a seperate fasta under 'genomes' ending in .fna. I've attached a text file of the log captured by our job manager tool. Is there anything else I should check or any other modifications I could make that might be worth trying?
GRiD_update_database_errorlog.txt

Note that I've tried a subset of my genomes and update_database does seem to be running. So this does point to something being wrong in my input. I had checked for 0-length contigs & all ambiguous contigs and found none, but maybe there is some other input problem that can trip up bowtie2-build?

I suspect this may be a memory issue. Please request for more memory and see if the problem persists.

Thanks
Tunde