Torch Error

Question

Torch Error

Jufyer opened this issue 6 months ago · 0 comments

Hi,
I get an Error if I want to run the model. My input in Anaconda is the following:

torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir LLamaa/ \ --tokenizer_path tokenizer.model \ --max_seq_len 512 --max_batch_size 6

This is the Command from the GitHub-Page here. I think the important part of the Error is following:

 [W socket.cpp:697] [c10d] The client socket has failed to connect to [NB-KUHNLA]:29500 (system error: 10049 - Die angeforderte Adresse ist in diesem Kontext ung³ltig.).
C:\Users\User\miniconda3\Lib\site-packages\torch\distributed\distributed_c10d.py:613: UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled

But the whole Error can be fund here: https://pastebin.com/emJyPEC2

Runtime Environment

Model: [llama-2-7b]
Using via huggingface?: [yes/no]
OS: [Windows]
GPU VRAM:
Number of GPUs: 1
GPU Make: [Intel]

I am thankfull if you could help me