下载运行后无法连接huggingface数据集

Question

下载运行后无法连接huggingface数据集

Closed this issue a year ago · 8 comments

描述这个 bug
运行时会报出连接错误MaxRetryError

如何复现
我将整个仓库Pull到本地后使用Pycharm（python 3.10）运行run_textbox.py文件后报错

日志
运行输出如下：
F:\文本生成模型TextBox\venv\Scripts\python.exe F:\文本生成模型TextBox\venv\TextBox\run_textbox.py
'wandb' ��ڲ��ⲿ��Ҳ��ǿ��еĳ��
��ļ��
05 Oct 20:08 INFO 65 parameters found.

General Hyper Parameters:

gpu_id: 0
use_gpu: True
device: cpu
seed: 2020
reproducibility: True
cmd: F:\文本生成模型TextBox\venv\TextBox\run_textbox.py
filename: BART-samsum-2023-Oct-05_20-08-06
saved_dir: saved/
state: INFO
wandb: online

Training Hyper Parameters:

do_train: True
do_valid: True
optimizer: adamw
adafactor_kwargs: {'lr': 0.001, 'scale_parameter': False, 'relative_step': False, 'warmup_init': False}
optimizer_kwargs: {}
valid_steps: 1
valid_strategy: epoch
stopping_steps: 2
epochs: 50
learning_rate: 3e-05
train_batch_size: 4
grad_clip: 0.1
accumulation_steps: 48
disable_tqdm: False
resume_training: True

Evaluation Hyper Parameters:

do_test: True
lower_evaluation: True
multiref_strategy: max
bleu_max_ngrams: 4
bleu_type: nltk
smoothing_function: 0
corpus_bleu: False
rouge_max_ngrams: 2
rouge_type: files2rouge
meteor_type: pycocoevalcap
chrf_type: m-popovic
distinct_max_ngrams: 4
inter_distinct: True
unique_max_ngrams: 4
self_bleu_max_ngrams: 4
tgt_lang: en
metrics: ['rouge']
eval_batch_size: 16
corpus_meteor: True

Model Hyper Parameters:

model: BART
model_name: bart
config_kwargs: {}
tokenizer_kwargs: {'use_fast': True}
generation_kwargs: {'num_beams': 5, 'no_repeat_ngram_size': 3, 'early_stopping': True}
efficient_kwargs: {}
efficient_methods: []
efficient_unfreeze_model: False
label_smoothing: 0.1

Dataset Hyper Parameters:

dataset: samsum
data_path: dataset/samsum
tgt_lang: en
src_len: 1024
tgt_len: 128
truncate: tail
metrics_for_best_model: ['rouge-1', 'rouge-2', 'rouge-l']
prefix_prompt: Summarize:

Unrecognized Hyper Parameters:

tokenizer_add_tokens: []
find_unused_parameters: False
load_type: from_scratch

================================================================================
'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。')))"), '(Request ID: 39115113-bab9-419b-b03c-e34796e2584e)')' thrown while requesting HEAD https://huggingface.co/None/resolve/main/tokenizer_config.json
05 Oct 20:08 WARNING '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。')))"), '(Request ID: 39115113-bab9-419b-b03c-e34796e2584e)')' thrown while requesting HEAD https://huggingface.co/None/resolve/main/tokenizer_config.json
Traceback (most recent call last):
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connection.py", line 203, in _new_conn
sock = connection.create_connection(
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
raise err
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\util\connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [WinError 10061] 由于目标计算机积极拒绝，无法连接。

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connectionpool.py", line 776, in urlopen
self._prepare_proxy(conn)
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connectionpool.py", line 1041, in _prepare_proxy
conn.connect()
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connection.py", line 611, in connect
self.sock = sock = self._new_conn()
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connection.py", line 218, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "F:\文本生成模型TextBox\venv\lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
retries = retries.increment(
File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\util\retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "F:\文本生成模型TextBox\venv\TextBox\run_textbox.py", line 15, in
run_textbox(model=args.model, dataset=args.dataset, config_file_list=args.config_files, config_dict={})
File "F:\文本生成模型TextBox\venv\TextBox\textbox\quick_start\quick_start.py", line 20, in run_textbox
experiment = Experiment(model, dataset, config_file_list, config_dict)
File "F:\文本生成模型TextBox\venv\TextBox\textbox\quick_start\experiment.py", line 56, in init
self._init_data(self.get_config(), self.accelerator)
File "F:\文本生成模型TextBox\venv\TextBox\textbox\quick_start\experiment.py", line 81, in _init_data
tokenizer = get_tokenizer(config)
File "F:\文本生成模型TextBox\venv\TextBox\textbox\utils\utils.py", line 212, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, **tokenizer_kwargs)
File "F:\文本生成模型TextBox\venv\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 686, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "F:\文本生成模型TextBox\venv\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 519, in get_tokenizer_config
resolved_config_file = cached_file(
File "F:\文本生成模型TextBox\venv\lib\site-packages\transformers\utils\hub.py", line 429, in cached_file
resolved_file = hf_hub_download(
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\file_download.py", line 1232, in hf_hub_download
metadata = get_hf_file_metadata(
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\file_download.py", line 1599, in get_hf_file_metadata
r = _request_wrapper(
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\file_download.py", line 417, in _request_wrapper
response = _request_wrapper(
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\file_download.py", line 452, in _request_wrapper
return http_backoff(
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_http.py", line 274, in http_backoff
raise err
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_http.py", line 258, in http_backoff
response = session.request(method=method, url=url, **kwargs)
File "F:\文本生成模型TextBox\venv\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "F:\文本生成模型TextBox\venv\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_http.py", line 63, in send
return super().send(request, *args, **kwargs)
File "F:\文本生成模型TextBox\venv\lib\site-packages\requests\adapters.py", line 513, in send
raise ProxyError(e, request=request)
requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。')))"), '(Request ID: 39115113-bab9-419b-b03c-e34796e2584e)')

Process finished with exit code 1

Answer 1 · 2023-10-05T12:23:52.000Z

这是代理相关的问题，你可以尝试在代码中使用代理解决这个问题，或者把模型下载后再使用也可以。

Answer 2 · 2023-10-05T12:25:52.000Z

这是代理相关的问题，你可以尝试在代码中使用代理解决这个问题，或者把模型下载后再使用也可以。

如何在代码中使用代理呀T T

Answer 3 · 2023-10-05T12:26:46.000Z

建议百度一下吧，或者把模型下载后再使用也可以。

Answer 4 · 2023-10-05T12:27:38.000Z

建议百度一下吧，或者把模型下载后再使用也可以。

感谢

Answer 5 · 2023-10-05T12:54:27.000Z

建议百度一下吧，或者把模型下载后再使用也可以。

https://huggingface.co/None/resolve/main/tokenizer_config.json，这个url通过代理访问现在也是没有的

Answer 6 · 2023-10-05T13:01:12.000Z

https://github.com/RUCAIBox/TextBox#quick-start

运行命令有误，请认真阅读

Answer 7 · 2023-10-05T14:39:49.000Z

https://github.com/RUCAIBox/TextBox#quick-start

运行命令有误，请认真阅读

如果是在pycharmIDE打开了run_textbox.py了，需要改哪些才能和在命令行操作等价阿？（小白诚信发问，研究了好久也不知道在py文件里面模仿cmd的操作，模型和数据集还好说，改default的值就好，那个model_path真不知道改哪了，还有run_textbox.py里那个config_files，那个需要改么，配置文件是不是都已经下载好了（呜呜））

Answer 8 · 2023-10-06T06:25:12.000Z

可以在config_dict={'model_path': 'xxx'}中添加，不过还是建议学习如何使用cmd运行