Network issue
carlosribas opened this issue · 0 comments
carlosribas commented
A network problem in the middle of a search can not only interrupt operation, but also leave a VM stuck even when the network returns to operation.
The message below shows a VM that was unable to connect to the database and therefore was unable to update the search status. In the database, this VM remained in the "busy" status until it was manually changed.
This is another example where transactions can help.
DEBUG:root:Nhmmer job chunk timeout out: job_id = f033ed19-42e4-46af-8985-691ef8ba8730, database = all-except-rrna-12.fasta
ERROR:asyncio:Job processing failed
job: <Job coro=<<coroutine object nhmmer at 0x7f9ddcee25f0>>>
Traceback (most recent call last):
File "/srv/sequence_search/consumer/views/submit_job.py", line 68, in nhmmer
await asyncio.wait_for(task, MAX_RUN_TIME)
File "/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/srv/sequence_search/db/job_chunks.py", line 124, in set_job_chunk_status
async with engine.acquire() as connection:
File "/usr/local/lib/python3.7/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.7/site-packages/aiopg/sa/engine.py", line 165, in _acquire
raw = await self._pool.acquire()
File "/usr/local/lib/python3.7/site-packages/aiopg/pool.py", line 164, in _acquire
await self._fill_free_pool(True)
File "/usr/local/lib/python3.7/site-packages/aiopg/pool.py", line 199, in _fill_free_pool
**self._conn_kwargs)
File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 43, in connect
**kwargs
File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 78, in __init__
self._conn = psycopg2.connect(dsn, **kwargs)
File "/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "192.168.0.6", port 5432 failed: Network is unreachable
Is the server running on that host and accepting TCP/IP connections?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/srv/sequence_search/consumer/views/submit_job.py", line 81, in nhmmer
await set_job_chunk_status(engine, job_id, database, status=JOB_CHUNK_STATUS_CHOICES.timeout)
File "/srv/sequence_search/db/job_chunks.py", line 169, in set_job_chunk_status
"set_job_chunk_status, job_id = %s, database = %s" % (job_id, database)) from e
sequence_search.db.DatabaseConnectionError: Failed to open connection to the database in set_job_chunk_status, job_id = f033ed19-42e4-46af-8985-691ef8ba8730, database = all-except-rrna-12.fasta