insitro/redun

Database connection error after completing very long task

Opened this issue · 0 comments

(I'm on redun 0.8.6, so apologies if this is fixed in a more recent version.)

After running a task that took 27 hours to complete, redun errors with psycopg2.OperationalError: SSL connection has been closed unexpectedly.

Presumably the connection has expired while waiting this long. It may be good to attempt to reconnect every so often. There may also be arguments that could be passed to the sqlalchemy engine that accomplish this.

Full traceback:

Traceback (most recent call last):                                                                                                                                                               
  File "***/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1803, in _execute_context                                                                       
    cursor, statement, parameters, context                                                                                                                                                       
  File "***/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute                                                                           
    cursor.execute(statement, parameters)                                                                                                                                                        
psycopg2.OperationalError: SSL connection has been closed unexpectedly                                                                                                                            
                                                                                                                                                                                                 
                                                                                                                                                                                                  
The above exception was the direct cause of the following exception:                                                                                                                             
                                                                                                                                                                                                  
Traceback (most recent call last):                                                                                                                                                               
  File "***/bin/redun", line 11, in <module>                                                                                                                             
    client.execute()                                                                                                                                                                             
  File "***/lib/python3.7/site-packages/redun/cli.py", line 1021, in execute                                                                                             
    return args.func(args, extra_args, argv)                                                                                                                                                     
  File "***/lib/python3.7/site-packages/redun/cli.py", line 1558, in run_command                                                                                         
    tags=tags,                                                                                                                                                                                   
  File "***/lib/python3.7/site-packages/redun/scheduler.py", line 811, in run                                                                                            
    self.process_events(result)                                                                                                                                                                  
  File "***/lib/python3.7/site-packages/redun/scheduler.py", line 855, in process_events                                                                                 
    event_func()                                                                                                                                                                                 
  File "***/lib/python3.7/site-packages/redun/scheduler.py", line 1215, in <lambda>                                                                                      
    self.events_queue.put(lambda: self._done_job(job, result, job_tags=job_tags))                                                                                                                
  File "***/lib/python3.7/site-packages/redun/scheduler.py", line 1244, in _done_job                                                                                     
    self.set_cache(job.eval_hash, job.task.hash, job.args_hash, result)                                                                                                                          
  File "***/lib/python3.7/site-packages/redun/scheduler.py", line 1630, in set_cache                                                                                     
    self.backend.set_eval_cache(eval_hash, task_hash, args_hash, value, value_hash=None)                                                                                                         
  File "***/lib/python3.7/site-packages/redun/backends/db/__init__.py", line 1696, in set_eval_cache                                                                     
    value_hash = self.record_value(value)                                                                                                                                                        
  File "***/lib/python3.7/site-packages/redun/backends/db/__init__.py", line 1475, in record_value                                                                       
    value_row = self.session.query(Value).filter_by(value_hash=value_hash).first()                                                                                                               
  File "***/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 2810, in first                                                                                    
    return self.limit(1)._iter().first()                                                                                                                                                          
  File "***/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 2897, in _iter                                                                                     
    execution_options={"_sa_orm_load_options": self.load_options},                                                                                                                                 
  File "***/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1692, in execute                                                                                  
    result = conn._execute_20(statement, params or {}, execution_options)                                                                                                                            
  File "***/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1614, in _execute_20                                                                                 
    return meth(self, args_10style, kwargs_10style, execution_options)                                                                                                                                 
  File "***/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 326, in _execute_on_connection                                                                      
    self, multiparams, params, execution_options                                                                                                                                                         
  File "***/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1491, in _execute_clauseelement                                                                       
    cache_hit=cache_hit,                                                                                                                                                                                 
  File "***/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context                                                                              
    e, statement, parameters, cursor, context                                                                                                                                                             
  File "***/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2027, in _handle_dbapi_exception                                                                        
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e                                                                                                                                             
  File "***/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 207, in raise_                                                                                           
    raise exception                                                                                                                                                                                         
  File "***/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1803, in _execute_context                                                                                  
    cursor, statement, parameters, context                                                                                                                                                                   
  File "***/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute                                                                                       
    cursor.execute(statement, parameters)                                                                                                                                                                       
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly                                                                                                            
                                                                                                                                                                                                                       
[SQL: SELECT value.value_hash AS value_value_hash, value.type AS value_type, value.format AS value_format, value.value AS value_value                                                                                   
FROM value                                                                                             
WHERE value.value_hash = %(value_hash_1)s                                                               
 LIMIT %(param_1)s]                                                                                       
[parameters: {'value_hash_1': '8ca852f0d79ba0485e9ac2c9c175853adf296425', 'param_1': 1}]                   
(Background on this error at: https://sqlalche.me/e/14/e3q8)