juncongmoo/pyllama

Download 7B model seems stuck

guanlinz opened this issue · 9 comments

It stucks at downloading file for 3 hours but still unfinished:

(llamachat) [ec2-user@ip-172-31-6-66 llamachat]$ python -m llama.download --model_size 7B --folder ./pyllama_data/
❤️ Resume download is supported. You can ctrl-c and rerun the program to resume the downloading
Downloading tokenizer...
✅ ./pyllama_data//tokenizer.model
✅ ./pyllama_data//tokenizer_checklist.chk
tokenizer.model: OK
Downloading 7B
downloading file to ./pyllama_data//7B/consolidated.00.pth ...please wait for a few minutes ...

Hello, are you still stuck at downloading file? I'm also stuck when downloading the 7B model.

Same here. I checked the bandwidth usage and confirmed that it get stuck for downloading the 7B model with this script.

skpig commented

Same here.

I'm stuck with 13B. Is it expected to finish in a few minutes?

I also meet the same issue, and apparently if I try to rerun the command, the download process would continue for like 4-5 mins and then stuck again.
I do not look in the code for debugging yet, but for my purpose, I just created a bash script to restart the download process after sometime anyway, and it works for me
here the sketchy solution of mine :D (which I borrowed a lot from chatGPT lol ):

#!/bin/bash
# Function to handle stopping the script
function stop_script() {
  echo "Stopping the script."
  exit 0
}

# Register the signal handler
trap stop_script SIGINT


while true; do
  # Run the command with a timeout of 200 seconds
  timeout 200  python -m llama.download --model_size $1 --folder model

  echo "restart download"
  sleep 1  # Wait for 1 second before starting the next iteration
# Wait for any key to be pressed within a 1-second timeout
  read -t 1 -n 1 -s key
  if [[ $key ]]; then
    stop_script
  fi
done

and using script like so:

bash llama_download.sh 7B

highly recommend to download each model alone, rather than download all since it will check the checksum of previous model downloaded, which might take full 200secs each iteration

Hi @CuongTranXuan,

Thank you for sharing the shell script. I ran into the same issue and used your script to download the 7B weight. However, seems like this is also a never ending loop. I keep running into the following:

❤️ Resume download is supported. You can ctrl-c and rerun the program to resume the downloading`
Downloading tokenizer...
✅ model/tokenizer.model
✅ model/tokenizer_checklist.chk
tokenizer.model: OK
Downloading 7B
downloading file to model/7B/consolidated.00.pth ...please wait for a few minutes ...
✅ model/7B/consolidated.00.pth
✅ model/7B/params.json
✅ model/7B/checklist.chk
Checking checksums
consolidated.00.pth: OK
params.json: OK
restart download

I was wondering if I should stop the code manually but I am not sure if the download is complete. Do by any chance happen to know the file size of the weights that you downloaded? Mine are the following:

checklist.chk -> 100 bytes
consolidated.00.pth -> 12852.61 MB
params.json -> 101 bytes

Hi @CuongTranXuan,

Thank you for sharing the shell script. I ran into the same issue and used your script to download the 7B weight. However, seems like this is also a never ending loop. I keep running into the following:

❤️ Resume download is supported. You can ctrl-c and rerun the program to resume the downloading`
Downloading tokenizer...
✅ model/tokenizer.model
✅ model/tokenizer_checklist.chk
tokenizer.model: OK
Downloading 7B
downloading file to model/7B/consolidated.00.pth ...please wait for a few minutes ...
✅ model/7B/consolidated.00.pth
✅ model/7B/params.json
✅ model/7B/checklist.chk
Checking checksums
consolidated.00.pth: OK
params.json: OK
restart download

I was wondering if I should stop the code manually but I am not sure if the download is complete. Do by any chance happen to know the file size of the weights that you downloaded? Mine are the following:

checklist.chk -> 100 bytes
consolidated.00.pth -> 12852.61 MB
params.json -> 101 bytes

Hi @z-mahmud22 ,
from the download script itself after the model finished downloading, it will check for hash checksums to verify the integrity of the model, so you can just stop the download script when the checking is done, either by bash script or by yourself. Since this kind of sketchy script is just a workaround and we hope the actual download script will be patched soon. Cheers!

mxdlzg commented

It seems that the wget process is working correctly. Kill the python -m xxx process while keeping wget working well.

yyyhz commented

It seems that the wget process is working correctly. Kill the python -m xxx process while keeping wget working well.

How can I use wget to download? Because it seems like I only have the magnet link'magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA'.