RNAcentral/rnacentral-sequence-search

Network issue

carlosribas opened this issue · 0 comments

A network problem in the middle of a search can not only interrupt operation, but also leave a VM stuck even when the network returns to operation.

The message below shows a VM that was unable to connect to the database and therefore was unable to update the search status. In the database, this VM remained in the "busy" status until it was manually changed.

This is another example where transactions can help.

DEBUG:root:Nhmmer job chunk timeout out: job_id = f033ed19-42e4-46af-8985-691ef8ba8730, database = all-except-rrna-12.fasta
ERROR:asyncio:Job processing failed
job: <Job coro=<<coroutine object nhmmer at 0x7f9ddcee25f0>>>
Traceback (most recent call last):
  File "/srv/sequence_search/consumer/views/submit_job.py", line 68, in nhmmer
    await asyncio.wait_for(task, MAX_RUN_TIME)
  File "/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/sequence_search/db/job_chunks.py", line 124, in set_job_chunk_status
    async with engine.acquire() as connection:
  File "/usr/local/lib/python3.7/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.7/site-packages/aiopg/sa/engine.py", line 165, in _acquire
    raw = await self._pool.acquire()
  File "/usr/local/lib/python3.7/site-packages/aiopg/pool.py", line 164, in _acquire
    await self._fill_free_pool(True)
  File "/usr/local/lib/python3.7/site-packages/aiopg/pool.py", line 199, in _fill_free_pool
    **self._conn_kwargs)
  File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 43, in connect
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 78, in __init__
    self._conn = psycopg2.connect(dsn, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "192.168.0.6", port 5432 failed: Network is unreachable
	Is the server running on that host and accepting TCP/IP connections?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/sequence_search/consumer/views/submit_job.py", line 81, in nhmmer
    await set_job_chunk_status(engine, job_id, database, status=JOB_CHUNK_STATUS_CHOICES.timeout)
  File "/srv/sequence_search/db/job_chunks.py", line 169, in set_job_chunk_status
    "set_job_chunk_status, job_id = %s, database = %s" % (job_id, database)) from e
sequence_search.db.DatabaseConnectionError: Failed to open connection to the database in set_job_chunk_status, job_id = f033ed19-42e4-46af-8985-691ef8ba8730, database = all-except-rrna-12.fasta