gamcil/cblaster

AttributeError: 'NoneType' object has no attribute 'group'

Opened this issue · 6 comments

yye88 commented

Hello, dear author, recently I was doing a regular cblaster search, but I encountered the following error during the running process.
[16:18:05] INFO - Search has completed successfully!
[16:18:05] INFO - Retrieving results for search 6BPHW1VJ016
Traceback (most recent call last):
File "/home/yye/miniconda3/bin/cblaster", line 8, in
sys.exit(main())
File "/home/yye/miniconda3/lib/python3.9/site-packages/cblaster/main.py", line 432, in main
cblaster(
File "/home/yye/miniconda3/lib/python3.9/site-packages/cblaster/main.py", line 318, in cblaster
rid, results = remote.search(
File "/home/yye/miniconda3/lib/python3.9/site-packages/cblaster/remote.py", line 371, in search
results = retrieve(rid, hitlist_size=hitlist_size)
File "/home/yye/miniconda3/lib/python3.9/site-packages/cblaster/remote.py", line 208, in retrieve
for line in re.search("

(.+?)
", response.text, re.DOTALL)
AttributeError: 'NoneType' object has no attribute 'group'

Could you guide me on how to solve it? Thank you very much.

I'm also encountering this same problem.

[11:10:46] INFO - Retrieving results for search 0SB0MXJK013
Traceback (most recent call last):
  File "/home/brian/anaconda3/envs/clinker_env/bin/cblaster", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/brian/anaconda3/envs/clinker_env/lib/python3.11/site-packages/cblaster/main.py", line 432, in main
    cblaster(
  File "/home/brian/anaconda3/envs/clinker_env/lib/python3.11/site-packages/cblaster/main.py", line 318, in cblaster
    rid, results = remote.search(
                   ^^^^^^^^^^^^^^
  File "/home/brian/anaconda3/envs/clinker_env/lib/python3.11/site-packages/cblaster/remote.py", line 371, in search
    results = retrieve(rid, hitlist_size=hitlist_size)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/brian/anaconda3/envs/clinker_env/lib/python3.11/site-packages/cblaster/remote.py", line 209, in retrieve
    .group(1)
     ^^^^^
AttributeError: 'NoneType' object has no attribute 'group'

Hej!
I'm still running into this issue in version 1.3.18 running cblast search.

from the log:

[10:56:00] INFO - Checking search status...
[10:56:00] INFO - Search has completed successfully!
[10:56:00] INFO - Retrieving results for search 44BBGM0G013
Traceback (most recent call last):
  File "/home/steffk1/.conda/envs/cblaster/bin/cblaster", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/steffk1/.conda/envs/cblaster/lib/python3.12/site-packages/cblaster/main.py", line 432, in main
    cblaster(
  File "/home/steffk1/.conda/envs/cblaster/lib/python3.12/site-packages/cblaster/main.py", line 318, in cblaster
    rid, results = remote.search(
                   ^^^^^^^^^^^^^^
  File "/home/steffk1/.conda/envs/cblaster/lib/python3.12/site-packages/cblaster/remote.py", line 371, in search
    results = retrieve(rid, hitlist_size=hitlist_size)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/steffk1/.conda/envs/cblaster/lib/python3.12/site-packages/cblaster/remote.py", line 209, in retrieve
    .group(1)
     ^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
$ cblaster --version
Importing genomicsqlite failed, falling back to SQLite3
cblaster 1.3.18

And I would like to add one more quirk to the issue: I ran it erroneously first with one more gene in the BGC and didn't encounter this issue. Once I removed the gene and re-ran, this issue appeared.
Cheers,
Karin

I'm also encountering this same problem.

[22:49:11] INFO - Starting cblaster in remote mode
[22:49:11] INFO - Launching new search
[22:49:13] INFO - Request Identifier (RID): 5KNASY6N013
[22:49:13] INFO - Request Time Of Execution (RTOE): 28s
[22:49:41] INFO - Polling NCBI for completion status
[22:49:41] INFO - Checking search status...
[22:50:41] INFO - Checking search status...
[22:51:41] INFO - Checking search status...
[22:52:41] INFO - Checking search status...
[22:53:41] INFO - Checking search status...
[22:54:41] INFO - Checking search status...
[22:55:41] INFO - Checking search status...
[22:56:41] INFO - Checking search status...
[22:57:41] INFO - Checking search status...
[22:58:41] INFO - Checking search status...
[22:59:41] INFO - Checking search status...
[23:00:41] INFO - Checking search status...
[23:01:41] INFO - Checking search status...
[23:02:41] INFO - Checking search status...
[23:03:41] INFO - Checking search status...
[23:04:41] INFO - Checking search status...
[23:05:41] INFO - Checking search status...
[23:06:41] INFO - Checking search status...
[23:07:41] INFO - Checking search status...
[23:08:41] INFO - Checking search status...
[23:09:41] INFO - Checking search status...
[23:10:41] INFO - Checking search status...
[23:10:42] INFO - Search has completed successfully!
[23:10:42] INFO - Retrieving results for search 5KNASY6N013
Traceback (most recent call last):
File "/home/kys/anaconda3/envs/cblaster/bin/cblaster", line 8, in
sys.exit(main())
^^^^^^
File "/home/kys/anaconda3/envs/cblaster/lib/python3.12/site-packages/cblaster/main.py", line 432, in main
cblaster(
File "/home/kys/anaconda3/envs/cblaster/lib/python3.12/site-packages/cblaster/main.py", line 318, in cblaster
rid, results = remote.search(
^^^^^^^^^^^^^^
File "/home/kys/anaconda3/envs/cblaster/lib/python3.12/site-packages/cblaster/remote.py", line 371, in search
results = retrieve(rid, hitlist_size=hitlist_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kys/anaconda3/envs/cblaster/lib/python3.12/site-packages/cblaster/remote.py", line 209, in retrieve
.group(1)
^^^^^
AttributeError: 'NoneType' object has no attribute 'group'

Hello, me too :( Trace is at the bottom, I think the issues lies with the html parsing after you query NCBI for the RID:

def retrieve(rid, hitlist_size=5000):
    """Retrieve BLAST results corresponding to a given Request Identifier (RID).

    Arguments:
        rid (str): NCBI BLAST search request identifiers (RID)
        hitlist_size (int): Total number of hits to retrieve
    Returns:
        list: BLAST search results split by newline, with HTML parts removed
    """

    parameters = {
        "CMD": "Get",
        "RID": rid,
        "FORMAT_TYPE": "Tabular",
        "FORMAT_OBJECT": "Alignment",
        "HITLIST_SIZE": hitlist_size,
        "ALIGNMENTS": hitlist_size,
        "DESCRIPTIONS": hitlist_size,
        "NCBI_GI": "F",
    }

    LOG.debug(parameters)

    response = requests.get(BLAST_API_URL, params=parameters)

    LOG.debug(response.url)

    # Remove HTML junk and info lines
    # BLAST results are stored inside <PRE></PRE> tags
    return [
        line
        for line in re.search("<PRE>(.+?)</PRE>", response.text, re.DOTALL)
        .group(1) #####This line############
        .split("\n")
        if line and not line.startswith("#")
    ]

Traceback:

(pip_env) C:\Users\my_username>cblaster search -qf "fasta\path.fasta" -mh 11 -b "binary\path.txt" -bde ","
Importing genomicsqlite failed, falling back to SQLite3
[14:15:44] INFO - Starting cblaster in remote mode
[14:15:44] INFO - Launching new search
[14:15:46] INFO - Request Identifier (RID): 78AW18TZ016
[14:15:46] INFO - Request Time Of Execution (RTOE): 10s
[14:15:56] INFO - Polling NCBI for completion status
[14:15:56] INFO - Checking search status...
[14:16:56] INFO - Checking search status...
[14:17:56] INFO - Checking search status...
[14:18:56] INFO - Checking search status...
[14:19:56] INFO - Checking search status...
[14:20:56] INFO - Checking search status...
[14:21:56] INFO - Checking search status...
[14:22:56] INFO - Checking search status...
[14:23:56] INFO - Checking search status...
[14:24:56] INFO - Checking search status...
[14:25:56] INFO - Checking search status...
[14:26:56] INFO - Checking search status...
[14:27:56] INFO - Checking search status...
[14:28:56] INFO - Checking search status...
[14:29:56] INFO - Checking search status...
[14:30:56] INFO - Checking search status...
[14:31:56] INFO - Checking search status...
[14:32:56] INFO - Checking search status...
[14:33:56] INFO - Checking search status...
[14:34:56] INFO - Checking search status...
[14:35:56] INFO - Checking search status...
[14:36:56] INFO - Checking search status...
[14:37:56] INFO - Checking search status...
[14:38:56] INFO - Checking search status...
[14:39:56] INFO - Checking search status...
[14:40:56] INFO - Checking search status...
[14:40:56] INFO - Search has completed successfully!
[14:40:56] INFO - Retrieving results for search 78AW18TZ016
Traceback (most recent call last):
  File "C:\Users\u03132tk\AppData\Local\anaconda3\envs\pip_env\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\u03132tk\AppData\Local\anaconda3\envs\pip_env\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\u03132tk\AppData\Local\anaconda3\envs\pip_env\Scripts\cblaster.exe\__main__.py", line 7, in <module>
  File "C:\Users\u03132tk\AppData\Local\anaconda3\envs\pip_env\lib\site-packages\cblaster\main.py", line 432, in main
    cblaster(
  File "C:\Users\u03132tk\AppData\Local\anaconda3\envs\pip_env\lib\site-packages\cblaster\main.py", line 318, in cblaster
    rid, results = remote.search(
  File "C:\Users\u03132tk\AppData\Local\anaconda3\envs\pip_env\lib\site-packages\cblaster\remote.py", line 371, in search
    results = retrieve(rid, hitlist_size=hitlist_size)
  File "C:\Users\u03132tk\AppData\Local\anaconda3\envs\pip_env\lib\site-packages\cblaster\remote.py", line 209, in retrieve
    .group(1)
AttributeError: 'NoneType' object has no attribute 'group'

I have the same error. The html returned by NCBI still says "searching" and therefore has no instances of <PRE>, causing the regex to fail. However, looking up the RID in the BLAST interface displays the error:

There was a problem with the search. Please, contact Help Desk and include RID 9XT95HJG013.

CPU usage limit was exceeded. You may need to change your search strategy. Helpful changes include reducing the number of queries, choosing a smaller database and setting an organism limit. You may also need to adjust the Algorithm parameters in the bottom section of the form. Choose a smaller number of target sequences, set a smaller Expect cut-off, use a larger word-size and turn on species specific repeats.

Please consider running your BLAST searches with ElasticBLAST or BLAST+.

I assume that this means that our files are too large and cblaster isn't able to process the output when BLAST errors. Is there any workarounds to large .fasta/.gbk files that people know of?

I have the same error and after doing some research, I think the issue might be multifaceted. Regardless, the core issue is that the request.get() command in retrieve() function of remote.py did not get expected output, which might be due to various reasons.

My situation is that the blast search was indeed complete. I could view the results online using the session id. But when cblaster
requesting the blast output through API (using request.get()), the output is incomplete. As a result, there is no </PRE> tag in the incomplete output, and the re.search("<PRE>(.+?)</PRE>", response.text, re.DOTALL) call would return None. The direct reason why the output is incomplete is because when retrieving the search result using request.get the cblaster is requesting 5000 hits (as set by the hitlist_size parameter). And when you have a couple of genes in the gene cluster, the hit counts may get to be really big (e.g. 35000 hits for 7 genes).

Yet I failed to fully understand if it is a blast api issue or the issue with request package. Probably the former, since when I type the url created by request.get command in a browser, the output is still incomplete. Maybe the ncbi limits the amount of output from api request? I am also not sure if this 5000 hits setting is selected on purpose.

One workaround is to set the -hs parameter to a lesser value, such as 3000. This works for me since the blast search was completed.