Parallel Sherpa MongoDB access issues
djgagne opened this issue · 2 comments
djgagne commented
I have tried running the parallel simple.py and mnistmlp examples, but when I do, I keep getting the following error in the jobs/trial_*.out files about connecting to the database.
2020-09-03 10:40:27.957058: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
warning in stationary: failed to import cython module: falling back to numpy
warning in coregionalize: failed to import cython module: falling back to numpy
warning in choleskies: failed to import cython module: falling back to numpy
Traceback (most recent call last):
File "trial.py", line 79, in <module>
trial = client.get_trial()
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/sherpa/database.py", line 222, in get_trial
t = next(g)
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/sherpa/database.py", line 221, in <genexpr>
g = (entry for entry in self.db.trials.find({'trial_id': trial_id}))
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/pymongo/cursor.py", line 1207, in next
if len(self.__data) or self._refresh():
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/pymongo/cursor.py", line 1100, in _refresh
self.__session = self.__collection.database.client._ensure_session()
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1766, in __start_session
server_session = self._get_server_session()
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
return self._topology.get_server_session()
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/pymongo/topology.py", line 488, in get_server_session
None)
File "/glade/u/home/dgagne/miniconda3/envs/goes/lib/python3.7/site-packages/pymongo/topology.py", line 217, in _select_servers_loop
(self._error_message(selector), timeout, self.description))
pymongo.errors.ServerSelectionTimeoutError: casper26:27001: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 5f511c8580fc9c3448b850b1, topology_type: Single, servers: [<ServerDescription ('casper26', 27001) server_type: Unknown, rtt: None, error=AutoReconnect('casper26:27001: [Errno 111] Connection refused')>]>
Any ideas on what may be going wrong? I installed mongodb through conda. The main program also completes with no errors, but there are no summary results at the end.
ggantos commented
Hello, I would like to second that I am having the same issue. Any help would be appreciated. Thanks!
2020-09-04 09:09:25.204755: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "train_conv2d_zdist_sherpa_parallel.py", line 33, in <module>
trial = client.get_trial()
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/sherpa/database.py", line 222, in get_trial
t = next(g)
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/sherpa/database.py", line 221, in <genexpr>
g = (entry for entry in self.db.trials.find({'trial_id': trial_id}))
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/pymongo/cursor.py", line 1207, in next
if len(self.__data) or self._refresh():
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/pymongo/cursor.py", line 1100, in _refresh
self.__session = self.__collection.database.client._ensure_session()
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1766, in __start_session
server_session = self._get_server_session()
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
return self._topology.get_server_session()
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/pymongo/topology.py", line 488, in get_server_session
None)
File "/glade/u/home/ggantos/miniconda3/envs/sherpa/lib/python3.6/site-packages/pymongo/topology.py", line 217, in _select_servers_loop
(self._error_message(selector), timeout, self.description))
pymongo.errors.ServerSelectionTimeoutError: casper24:27001: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 5f5258b31496961c60064812, topology_type: Single, servers: [<ServerDescription ('casper24', 27001) server_type: Unknown, rtt: None, error=AutoReconnect('casper24:27001: [Errno 111] Connection refused',)>]>
bluevex commented
I got this error when something was using the port. Usually it's the previous instance of the sherpa mongodb database. I had to write a script to manually delete the 'sherpa' database from the previous run, and kill instances of mongo.