hsiehjackson/RULER

niah.py hang with hf models

hijkzzz opened this issue · 4 comments

envs:

ubuntu 22.04
docker image ruler-0.1
model jamba-v0.1
A100 x4

# added model_config ....

Templates = {
    'base': "{task_template}",

    'jamba': """<|im_start|>system 
You are a helpful AI assistant.
<|im_end|> 
<|im_start|>user
{task_template}
<|im_end|> 
<|im_start|>assistant
""",

  case $MODEL_NAME in
      jamba)
          MODEL_PATH="/home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730"
          MODEL_TEMPLATE_TYPE="jamba"
          MODEL_FRAMEWORK="hf"
          ;;

bash run.sh jamba synthetic

This script hangs here 12 hours

logs

he/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730:hf:::::
+ IFS=:
+ read MODEL_PATH MODEL_TEMPLATE_TYPE MODEL_FRAMEWORK TOKENIZER_PATH TOKENIZER_TYPE OPENAI_API_KEY GEMINI_API_KEY AZURE_ID AZURE_SECRET AZURE_ENDPOINT
+ '[' -z /home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730 ']'
+ export OPENAI_API_KEY=
+ OPENAI_API_KEY=
+ export GEMINI_API_KEY=
+ GEMINI_API_KEY=
+ export AZURE_API_ID=
+ AZURE_API_ID=
+ export AZURE_API_SECRET=
+ AZURE_API_SECRET=
+ export AZURE_API_ENDPOINT=
+ AZURE_API_ENDPOINT=
+ source config_tasks.sh
++ NUM_SAMPLES=500
++ REMOVE_NEWLINE_TAB=false
++ STOP_WORDS=
++ '[' -z '' ']'
++ STOP_WORDS=
++ '[' false = false ']'
++ REMOVE_NEWLINE_TAB=
++ synthetic=("niah_single_1" "niah_single_2" "niah_single_3" "niah_multikey_1" "niah_multikey_2" "niah_multikey_3" "niah_multivalue" "niah_multiquery" "vt" "cwe" "fwe" "qa_1" "qa_2")
+ BENCHMARK=synthetic
+ declare -n TASKS=synthetic
+ '[' -z niah_single_1 ']'
+ '[' hf == vllm ']'
+ '[' hf == trtllm ']'
+ for MAX_SEQ_LENGTH in "${SEQ_LENGTHS[@]}"
+ RESULTS_DIR=/home/scratch.jianh_gpu/projects/RULER/jamba/synthetic/131072
+ DATA_DIR=/home/scratch.jianh_gpu/projects/RULER/jamba/synthetic/131072/data
+ PRED_DIR=/home/scratch.jianh_gpu/projects/RULER/jamba/synthetic/131072/pred
+ mkdir -p /home/scratch.jianh_gpu/projects/RULER/jamba/synthetic/131072/data
+ mkdir -p /home/scratch.jianh_gpu/projects/RULER/jamba/synthetic/131072/pred
+ for TASK in "${TASKS[@]}"
+ python data/prepare.py --save_dir /home/scratch.jianh_gpu/projects/RULER/jamba/synthetic/131072/data --benchmark synthetic --task niah_single_1 --tokenizer_path /home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730 --tokenizer_type hf --max_seq_length 131072 --model_template_type jamba --num_samples 500
python /home/scratch.jianh_gpu/projects/RULER/scripts/data/synthetic/niah.py         --save_dir  /home/scratch.jianh_gpu/projects/RULER/jamba/synthetic/131072/data         --save_name niah_single_1         --subset validation         --tokenizer_path /home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730         --tokenizer_type hf         --max_seq_length 131072         --tokens_to_generate 128         --num_samples 500         --random_seed 42         --type_haystack repeat --type_needle_k words --type_needle_v numbers --num_needle_k 1 --num_needle_v 1 --num_needle_q 1                           --template "<|im_start|>system
You are a helpful AI assistant.
<|im_end|>
<|im_start|>user
Some special magic {type_needle_v} are hidden within the following text. Make sure to memorize it. I will quiz you about the {type_needle_v} afterwards.
{context}
What are all the special magic {type_needle_v} for {query} mentioned in the provided text?
<|im_end|>
<|im_start|>assistant
 The special magic {type_needle_v} for {query} mentioned in the provided text are"

Hi @hijkzzz, do you include any tokenizer files in your folder /home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730? If so, you can also directly run following command to see which part you got hang.

python /home/scratch.jianh_gpu/projects/RULER/scripts/data/synthetic/niah.py \         
--save_dir  /home/scratch.jianh_gpu/projects/RULER/jamba/synthetic/131072/data \
--save_name niah_single_1 \
--subset validation \
--tokenizer_path /home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730 \
--tokenizer_type hf \
--max_seq_length 131072 \
--tokens_to_generate 128 \
--num_samples 500 \
--random_seed 42 \
--type_haystack repeat \
--type_needle_k words \
--type_needle_v numbers \
--num_needle_k 1 \
--num_needle_v 1 \
--num_needle_q 1 \
--template "<|im_start|>system
You are a helpful AI assistant.
<|im_end|>
<|im_start|>user
Some special magic {type_needle_v} are hidden within the following text. Make sure to memorize it. I will quiz you about the {type_needle_v} afterwards.
{context}
What are all the special magic {type_needle_v} for {query} mentioned in the provided text?
<|im_end|>
<|im_start|>assistant
 The special magic {type_needle_v} for {query} mentioned in the provided text are"

@hsiehjackson
I can load the tokenizer use "AutoTokenizer.from_pretrained" from "/home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730"

my script

model = AutoModelForCausalLM.from_pretrained("/home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730",
                                             trust_remote_code=True,
                                             attn_implementation="flash_attention_2",
                                              torch_dtype=torch.bfloat16,
                                             device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("/home/scratch.jianh_inf/.cache/huggingface/hub/models--lightblue--Jamba-v0.1-chat-multilingual/snapshots/38a2d5d2301ba642d1a48be1251a825022f78730")

btw, niah.py hangs as it started, and I didn't see useful information....
image

And after running this script, I find that the docker container also hangs
Control-C failed

image

cased by docker container mount