indexSelectLargeIndex: block: [654,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
ahmadmustafaanis opened this issue · 1 comments
ahmadmustafaanis commented
Hi, I am training the self-rag generator model and getting this error.
My code:
source venv_3.10/bin/activate
export HF_DATASETS_CACHE="data_cache"
cd retrieval_lm
bash script_finetune_7b.sh
Where I have edited script_finetune_7b.sh
a little bit:
export CUDA_VISIBLE_DEVICES=0,1,2,3
MODEL_SIZE=3B # using llama 3B
NUM_GPUS=4
BATCH_SIZE_PER_GPU=1
TOTAL_BATCH_SIZE=128
GRADIENT_ACC_STEPS=$(($TOTAL_BATCH_SIZE/$NUM_GPUS/$BATCH_SIZE_PER_GPU))
echo "Training llama model ${MODEL_SIZE} using $NUM_GPUS GPUs, $BATCH_SIZE_PER_GPU batch size per GPU, $GRADIENT_ACC_STEPS gradient accumulation steps"
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \
--mixed_precision bf16 \
--num_machines 1 \
--num_processes $NUM_GPUS \
--use_deepspeed \
--deepspeed_config_file stage3_no_offloading_accelerate.conf \
finetune.py \
--model_name_or_path meta-llama/Llama-3.2-3B \ # this line is changed
--use_flash_attn \
--tokenizer_name meta-llama/Llama-3.2-3B \ # changed
--use_slow_tokenizer \
--train_file ../full_output_1005.jsonl \ # this line is changed, using https://drive.google.com/file/d/10G_FozUV4u27EX0NjwVe-3YMUMeTwuLk/view dataset as provided
--max_seq_length 2048 \
--preprocessing_num_workers 16 \
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
--gradient_accumulation_steps $GRADIENT_ACC_STEPS \
--learning_rate 2e-5 \
--lr_scheduler_type linear \
--warmup_ratio 0.03 \
--weight_decay 0. \
--num_train_epochs 3 \
--output_dir output/self_rag_${MODEL_SIZE}/ \
--with_tracking \
--report_to tensorboard \
--logging_steps 1 \
--use_special_tokens
Now I get this error:
0%| | 0/3414 [00:00<?, ?it/s]/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 726, in <module>
File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 726, in <module>
File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 726, in <module>
Traceback (most recent call last):
File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 726, in <module>
main()
File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 660, in main
main()
File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 660, in main
main() outputs = model(**batch, use_cache=False)
outputs = model(**batch, use_cache=False)
File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 660, in main
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
main()
File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 660, in main
outputs = model(**batch, use_cache=False)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
outputs = model(**batch, use_cache=False)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)return self._call_impl(*args, **kwargs)return self._call_impl(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)return forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
return forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
return forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
ret_val = func(*args, **kwargs)ret_val = func(*args, **kwargs) File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1833, in forward
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1833, in forward
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1833, in forward
ret_val = func(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1833, in forward
loss = self.module(*inputs, **kwargs)loss = self.module(*inputs, **kwargs)loss = self.module(*inputs, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
loss = self.module(*inputs, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
return self._call_impl(*args, **kwargs)result = forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
result = forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
result = forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
result = forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
outputs = self.model(outputs = self.model(outputs = self.model(
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
outputs = self.model(
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)return self._call_impl(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
return self._call_impl(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
result = forward_call(*args, **kwargs) result = forward_call(*args, **kwargs)
result = forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
result = forward_call(*args, **kwargs)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
causal_mask = self._update_causal_mask(causal_mask = self._update_causal_mask(
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1006, in _update_causal_mask
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1006, in _update_causal_mask
causal_mask = self._update_causal_mask(
causal_mask = self._update_causal_mask( File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1006, in _update_causal_mask
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1006, in _update_causal_mask
if AttentionMaskConverter._ignore_causal_mask_sdpa(if AttentionMaskConverter._ignore_causal_mask_sdpa(
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
if AttentionMaskConverter._ignore_causal_mask_sdpa(
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
if AttentionMaskConverter._ignore_causal_mask_sdpa(
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
elif (is_training or not is_tracing) and torch.all(attention_mask == 1):elif (is_training or not is_tracing) and torch.all(attention_mask == 1):elif (is_training or not is_tracing) and torch.all(attention_mask == 1):
RuntimeErrorRuntimeErrorRuntimeError: : : CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
elif (is_training or not is_tracing) and torch.all(attention_mask == 1):
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
0%| | 0/3414 [00:03<?, ?it/s]
[2024-10-21 02:39:02,892] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1464610) of binary: /scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/bin/python
Traceback (most recent call last):
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1153, in launch_command
deepspeed_launcher(args)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 846, in deepspeed_launcher
distrib_run.run(args)
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
I digged into this error:
pytorch/pytorch#121493 it seems like we need to add model.resize_token_embeddings(len(tokenizer))
line
But this is already in the code i.e
if len(tokenizer) > embedding_size:
model.resize_token_embeddings(len(tokenizer))
SO I am not exactly sure what's the issue here.
Also noted that Llama 3.2 3B has the tokenizer class PreTrainedTokenizerFast
instead of Llama Tokenizer so in the code I did:
if isinstance(tokenizer, LlamaTokenizer) or isinstance(tokenizer, LlamaTokenizerFast) or True:
To add extra tokens in it.