Downloading datasets behind network Proxies fail due to timeout errors
Opened this issue · 0 comments
ashahba commented
For users behind network proxies, the following example in the main README.md
fails due to timeout errors:
$ python
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import opendatasets as od
>>> dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
>>> od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: ****
Your Kaggle Key:
2024-01-12 06:45:08,854 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1a5408e490>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/datasets/download/tunguz/us-elections-dataset?datasetVersionNumber=None
However if KAGGLE_PROXY
environment variable is properly set, the example works for users behind network proxy as well:
Here's the code snippet that makes this work:
import os
if 'https_proxy' in os.environ.keys():
os.environ['KAGGLE_PROXY'] = os.environ['https_proxy']
elif 'HTTPS_PROXY' in os.environ.keys():
os.environ['KAGGLE_PROXY'] = os.environ['HTTPS_PROXY']
else:
os.environ['KAGGLE_PROXY'] = ''
import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download(dataset_url)
and here's the sample run behind network proxy:
python
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> if 'https_proxy' in os.environ.keys():
... os.environ['KAGGLE_PROXY'] = os.environ['https_proxy']
... elif 'HTTPS_PROXY' in os.environ.keys():
... os.environ['KAGGLE_PROXY'] = os.environ['HTTPS_PROXY']
... else:
... os.environ['KAGGLE_PROXY'] = ''
...
>>> import opendatasets as od
>>> dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
>>> od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: ****
Your Kaggle Key:
Downloading us-elections-dataset.zip to ./us-elections-dataset
0%| | 0.00/133k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████| 133k/133k [00:00<00:00, 6.49MB/s]
I was planning to submit a PR to fix the issue but I see that the last time this repo updated was over 2 years ago.