morris-lab/CellOracle

can't load base_GRN mouse scATAC-seq demo data

Closed this issue · 7 comments

I am trying to load base_GRN using base_GRN = co.data.load_mouse_scATAC_atlas_base_GRN() as it says in this tutorial, but I am getting NameError: name 'logg' is not defined. When I try to look through the data_download_from_web.py file, the git_url = "https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data" link gives Error 404 when I open it in a browser.

@erencshin

I tried co.data.load_mouse_scATAC_atlas_base_GRN() function now, but it worked without any problem.
Also celloracle all functions used in our tutorial are checked at every update and every week using github action, it seems ok so far. I re-run it now just in case, but it also finished without problem. https://github.com/morris-lab/CellOracle/actions/runs/3922667680

The data_download_from_web.py is not used in the tutorial, and the url is intended to be used after connecting with file name, os.path.join(git_url, file), and it is natural you get 404 error if you go to the git_url address alone.

I'm guessing your internet access had a problem when you use cellroacle function.
Can you please try it again after checking internet environment?

@KenjiKamimoto-wustl122
My internet is working fine, but maybe the issue is because I'm running it on the cluster provided by my institution. Here is my full error log:

Data not found in the local folder. Loading data from github. Data will be saved at /home/erenshin/celloracle_data/TFinfo_data

gaierror Traceback (most recent call last)
File ~/data/.conda/envs/cello/lib/python3.8/urllib/request.py:1354, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
1353 try:
-> 1354 h.request(req.get_method(), req.selector, req.data, headers,
1355 encode_chunked=req.has_header('Transfer-encoding'))
1356 except OSError as err: # timeout error

File ~/data/.conda/envs/cello/lib/python3.8/http/client.py:1256, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
1255 """Send a complete request to the server."""
-> 1256 self._send_request(method, url, body, headers, encode_chunked)

File ~/data/.conda/envs/cello/lib/python3.8/http/client.py:1302, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
1301 body = _encode(body, 'body')
-> 1302 self.endheaders(body, encode_chunked=encode_chunked)

File ~/data/.conda/envs/cello/lib/python3.8/http/client.py:1251, in HTTPConnection.endheaders(self, message_body, encode_chunked)
1250 raise CannotSendHeader()
-> 1251 self._send_output(message_body, encode_chunked=encode_chunked)

File ~/data/.conda/envs/cello/lib/python3.8/http/client.py:1011, in HTTPConnection._send_output(self, message_body, encode_chunked)
1010 del self._buffer[:]
-> 1011 self.send(msg)
1013 if message_body is not None:
1014
1015 # create a consistent interface to message_body

File ~/data/.conda/envs/cello/lib/python3.8/http/client.py:951, in HTTPConnection.send(self, data)
950 if self.auto_open:
--> 951 self.connect()
952 else:

File ~/data/.conda/envs/cello/lib/python3.8/http/client.py:1418, in HTTPSConnection.connect(self)
1416 "Connect to a host on a given (SSL) port."
-> 1418 super().connect()
1420 if self._tunnel_host:

File ~/data/.conda/envs/cello/lib/python3.8/http/client.py:922, in HTTPConnection.connect(self)
921 """Connect to the host and port specified in init."""
--> 922 self.sock = self._create_connection(
923 (self.host,self.port), self.timeout, self.source_address)
924 self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

File ~/data/.conda/envs/cello/lib/python3.8/socket.py:787, in create_connection(address, timeout, source_address)
786 err = None
--> 787 for res in getaddrinfo(host, port, 0, SOCK_STREAM):
788 af, socktype, proto, canonname, sa = res

File ~/data/.conda/envs/cello/lib/python3.8/socket.py:918, in getaddrinfo(host, port, family, type, proto, flags)
917 addrlist = []
--> 918 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
919 af, socktype, proto, canonname, sa = res

gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

URLError Traceback (most recent call last)
File ~/data/.conda/envs/cello/lib/python3.8/site-packages/celloracle/utility/data_download_from_web.py:39, in _download(path, url)
38 try:
---> 39 open_url = urlopen(req)
40 except URLError:

File ~/data/.conda/envs/cello/lib/python3.8/urllib/request.py:222, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 opener = _opener
--> 222 return opener.open(url, data, timeout)

File ~/data/.conda/envs/cello/lib/python3.8/urllib/request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
524 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 525 response = self._open(req, data)
527 # post-process response

File ~/data/.conda/envs/cello/lib/python3.8/urllib/request.py:542, in OpenerDirector._open(self, req, data)
541 protocol = req.type
--> 542 result = self._call_chain(self.handle_open, protocol, protocol +
543 '_open', req)
544 if result:

File ~/data/.conda/envs/cello/lib/python3.8/urllib/request.py:502, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
501 func = getattr(handler, meth_name)
--> 502 result = func(*args)
503 if result is not None:

File ~/data/.conda/envs/cello/lib/python3.8/urllib/request.py:1397, in HTTPSHandler.https_open(self, req)
1396 def https_open(self, req):
-> 1397 return self.do_open(http.client.HTTPSConnection, req,
1398 context=self._context, check_hostname=self._check_hostname)

File ~/data/.conda/envs/cello/lib/python3.8/urllib/request.py:1357, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
1356 except OSError as err: # timeout error
-> 1357 raise URLError(err)
1358 r = h.getresponse()

URLError: <urlopen error [Errno -2] Name or service not known>

During handling of the above exception, another exception occurred:

NameError Traceback (most recent call last)
Cell In[15], line 1
----> 1 co.data.load_mouse_scATAC_atlas_base_GRN()

File ~/data/.conda/envs/cello/lib/python3.8/site-packages/celloracle/data/load_data.py:54, in load_mouse_scATAC_atlas_base_GRN(version, force_download)
52 path = os.path.join(CELLORACLE_DATA_DIR, filename)
53 backup_url = os.path.join(WEB_PAR_DIR, filename)
---> 54 download_data_if_data_not_exist(path=path, backup_url=backup_url)
56 return pd.read_parquet(path)

File ~/data/.conda/envs/cello/lib/python3.8/site-packages/celloracle/utility/data_download_from_web.py:15, in download_data_if_data_not_exist(path, backup_url)
13 path = Path(path)
14 if not path.is_file():
---> 15 _download(url=backup_url, path=path)

File ~/data/.conda/envs/cello/lib/python3.8/site-packages/celloracle/utility/data_download_from_web.py:41, in _download(path, url)
39 open_url = urlopen(req)
40 except URLError:
---> 41 logg.warning(
42 'Failed to open the url with default certificates, trying with certifi.'
43 )
45 from certifi import where
46 from ssl import create_default_context

NameError: name 'logg' is not defined

@erencshin

The message says the problem is due to the network.
Can you please try the following code to check your network status on the cluster?
If it works (you see 200), please try celloracle data loading function again.

from urllib.request import urlopen, Request
url = 'https://api.github.com'
req = Request(url)
open_url = urlopen(req)
print(open_url.code)

@KenjiKamimoto-wustl122

That doesn't work either (URLError: <urlopen error [Errno -2] Name or service not known>), it must be because I'm running it on the cluster.

@erencshin

I got it.
In that case, please download all data from github, and re-install celloracle from source as follows.

  1. Download celloracle git hub repository.
    In terminal, please run
    git clone https://github.com/morris-lab/CellOracle.git
    If you cannot use git command on cluster, please do it on another PC, and copy folder to the cluster.

  2. Go to the downloaded github repository folder, and install celloracle from source as follows.

cd CellOracle
pip install -e .

If you install celloracle from github repository source file directly, all data should be there already.
Pleas don't delete the repository folder after install.

@KenjiKamimoto-wustl122

That worked perfectly, thank you so much!

I'm glad it works. I'm closing this issue.