gamcil/cblaster

Error using makedb with gff and fa files

Opened this issue · 5 comments

Hi, I'm having trouble using makedb to create my own database. I have a directory with the .fa and .gff files for each genome and run the following:

cblaster makedb /home/rlhoover/cblaster/03_Chosen_GFF-Fasta/*.gff -n myGallDb -f
Importing genomicsqlite failed, falling back to SQLite3
[12:57:34] INFO - Starting makedb module
[12:57:34] INFO - Initialising cblaster SQLite3 database to myGallDb.sqlite3
[12:57:34] INFO - Parsing 104 genome files, in 1 batches of 104
[12:57:34] INFO - Processing batch 1
[12:57:34] INFO -   Ca_Houarnoksidobacter_IN7.gff
[12:57:34] INFO -   Ferrigenium_9BH_112.gff
[12:57:34] INFO -   Ferrigenium_An22.gff

It goes through all the .gff files in my directory then ends with:

[12:57:40] ERROR - File parsing failed, exiting...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 234, in parse_file
    for record in function()
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 165, in parse_gff
    regions = find_regions(gff.directives)
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 103, in find_regions
    _, accession, start, end = directive.split(" ")
ValueError: not enough values to unpack (expected 4, got 2)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/database.py", line 216, in makedb
    for organism in pool.imap(func, group):
  File "/usr/lib64/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
ValueError: not enough values to unpack (expected 4, got 2)

I've included one of my gff and fa files for reference.
Sample-gff-fa.zip

Hi @Rene-Hoover, it seems cblaster was tripping up because it expects ##sequence-region lines that resemble:

##sequence-region ctg123 1 1497228

but the ones in your file lack the coordinates. I added a check to get around this in v1.13.15 (available from pip now) which skips these lines. I can now create a database using your files with the command:

cblaster makedb -n myDb ~/Downloads/Sample-gff-fa/Sideroxydans_ES1.gff

Thanks @gamcil I appreciate the update. It fixed the issue for most of my files, but when I run the makedb command I get the following error for a small number of my files:

`[09:30:54] ERROR - File parsing failed, exiting...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 238, in parse_file
    for record in function()
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 167, in parse_gff
    sort_attribute_values=True
  File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 1405, in create_db
    c.create()
  File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 543, in create
    self._populate_from_lines(self.iterator)
  File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 622, in _populate_from_lines
    self._insert(f, c)
  File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 566, in _insert
    cursor.execute(constants._INSERT, feature.astuple())
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.
"""
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/database.py", line 216, in makedb
    for organism in pool.imap(func, group):
  File "/usr/lib64/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.
`  

I suspect it may be another issue with the gff format, but I'm not sure. Attached are 3 of the files that trigger the error.
Sample-gff-fa-2.zip

Thanks @gamcil I appreciate the update. It fixed the issue for most of my files, but when I run the makedb command I get the following error for a small number of my files:

`[09:30:54] ERROR - File parsing failed, exiting...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 238, in parse_file
    for record in function()
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 167, in parse_gff
    sort_attribute_values=True
  File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 1405, in create_db
    c.create()
  File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 543, in create
    self._populate_from_lines(self.iterator)
  File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 622, in _populate_from_lines
    self._insert(f, c)
  File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 566, in _insert
    cursor.execute(constants._INSERT, feature.astuple())
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.
"""
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/database.py", line 216, in makedb
    for organism in pool.imap(func, group):
  File "/usr/lib64/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.
`  

I suspect it may be another issue with the gff format, but I'm not sure. Attached are 3 of the files that trigger the error. Sample-gff-fa-2.zip

Update: I tested a few of the files that appeared problematic individually and cblaster created a local database for them. So, I'm wondering if it's the number of genomes I'm using for my local database. I have 104 genomes total. If I use makedb either by folder (genomes/*.gff) or with a list of all 104 file names I get the error I posted yesterday. However, if I use makedb on a subset of the genomes (<20) it appears to work fine regardless of which .gff files I tell it to use.

Hi, @gamcil and @Rene-Hoover I truly need your assistance with this cblaster makedb command; kindly assist me. Unfortunately, despite my best efforts, I have been unable to figure out how to use. Actually whenever i am giving gbk input cblaster didn't create .dmnd file it only create fasta and sqlite3, however when i am giving input gff file it create all 3 files but it gives some error like-

cblaster search -m local -db ps_db.dmnd -qf ~/Neelam/output12type.fasta
[11:44:18] INFO - Starting cblaster in local mode
[11:44:18] ERROR - Error: Incomplete database file. Database building did not complete successfully.

Thank you

Hi @neelam19051 I had trouble using the makedb command too, but it seemed to be because of the way my gff files were formatted. I don't think I tried making a database with gbk files. Hopefully, someone else will chime in and be able to assist you. It seems like the search isn't working because your database file didn't build properly, but I'm really not sure what the solution would be.