AkariAsai/self-rag

indexSelectLargeIndex: block: [654,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

ahmadmustafaanis opened this issue · 1 comments

Hi, I am training the self-rag generator model and getting this error.

My code:

source venv_3.10/bin/activate
export HF_DATASETS_CACHE="data_cache"
cd retrieval_lm
bash script_finetune_7b.sh

Where I have edited script_finetune_7b.sh a little bit:

export CUDA_VISIBLE_DEVICES=0,1,2,3

MODEL_SIZE=3B  # using llama 3B
NUM_GPUS=4
BATCH_SIZE_PER_GPU=1
TOTAL_BATCH_SIZE=128
GRADIENT_ACC_STEPS=$(($TOTAL_BATCH_SIZE/$NUM_GPUS/$BATCH_SIZE_PER_GPU))
echo "Training llama model ${MODEL_SIZE} using $NUM_GPUS GPUs, $BATCH_SIZE_PER_GPU batch size per GPU, $GRADIENT_ACC_STEPS gradient accumulation steps"

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \
    --mixed_precision bf16 \
    --num_machines 1 \
    --num_processes $NUM_GPUS \
    --use_deepspeed \
    --deepspeed_config_file stage3_no_offloading_accelerate.conf \
    finetune.py \
    --model_name_or_path meta-llama/Llama-3.2-3B \ # this line is changed
    --use_flash_attn \
    --tokenizer_name meta-llama/Llama-3.2-3B \  # changed
    --use_slow_tokenizer \
    --train_file ../full_output_1005.jsonl \  # this line is changed, using https://drive.google.com/file/d/10G_FozUV4u27EX0NjwVe-3YMUMeTwuLk/view dataset as provided
    --max_seq_length 2048 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size $BATCH_SIZE_PER_GPU \
    --gradient_accumulation_steps $GRADIENT_ACC_STEPS \
    --learning_rate 2e-5 \
    --lr_scheduler_type linear \
    --warmup_ratio 0.03 \
    --weight_decay 0. \
    --num_train_epochs 3 \
    --output_dir output/self_rag_${MODEL_SIZE}/ \
    --with_tracking \
    --report_to tensorboard \
    --logging_steps 1 \
    --use_special_tokens

Now I get this error:

  0%|          | 0/3414 [00:00<?, ?it/s]/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
  batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
  batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
  batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
  batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [654,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 726, in <module>
  File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 726, in <module>
  File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 726, in <module>
Traceback (most recent call last):
  File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 726, in <module>
    main()
  File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 660, in main
    main()
  File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 660, in main
        main()    outputs = model(**batch, use_cache=False)
outputs = model(**batch, use_cache=False)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 660, in main

  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    main()
  File "/scratch/project_2001654/Wpeng/AhmadAnis/self-rag/retrieval_lm/finetune.py", line 660, in main
    outputs = model(**batch, use_cache=False)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    outputs = model(**batch, use_cache=False)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
            return self._call_impl(*args, **kwargs)return self._call_impl(*args, **kwargs)return self._call_impl(*args, **kwargs)


  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return self._call_impl(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)return forward_call(*args, **kwargs)

  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    return forward_call(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    return forward_call(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)        
ret_val = func(*args, **kwargs)ret_val = func(*args, **kwargs)  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1833, in forward


      File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1833, in forward
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1833, in forward
ret_val = func(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1833, in forward
            loss = self.module(*inputs, **kwargs)loss = self.module(*inputs, **kwargs)loss = self.module(*inputs, **kwargs)


  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
loss = self.module(*inputs, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
        return self._call_impl(*args, **kwargs)result = forward_call(*args, **kwargs)

  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
      File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
result = forward_call(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
    result = forward_call(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
    result = forward_call(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
            outputs = self.model(outputs = self.model(outputs = self.model(


  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
outputs = self.model(
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)return self._call_impl(*args, **kwargs)

  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
        result = forward_call(*args, **kwargs)    result = forward_call(*args, **kwargs)
result = forward_call(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward

  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
    result = forward_call(*args, **kwargs)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
        causal_mask = self._update_causal_mask(causal_mask = self._update_causal_mask(

  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1006, in _update_causal_mask
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1006, in _update_causal_mask
    causal_mask = self._update_causal_mask(    
causal_mask = self._update_causal_mask(  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1006, in _update_causal_mask

  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1006, in _update_causal_mask
        if AttentionMaskConverter._ignore_causal_mask_sdpa(if AttentionMaskConverter._ignore_causal_mask_sdpa(

  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
    if AttentionMaskConverter._ignore_causal_mask_sdpa(
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
    if AttentionMaskConverter._ignore_causal_mask_sdpa(
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
            elif (is_training or not is_tracing) and torch.all(attention_mask == 1):elif (is_training or not is_tracing) and torch.all(attention_mask == 1):elif (is_training or not is_tracing) and torch.all(attention_mask == 1):


RuntimeErrorRuntimeErrorRuntimeError: : :     CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
elif (is_training or not is_tracing) and torch.all(attention_mask == 1):



RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


  0%|          | 0/3414 [00:03<?, ?it/s]
[2024-10-21 02:39:02,892] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1464610) of binary: /scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/bin/python
Traceback (most recent call last):
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1153, in launch_command
    deepspeed_launcher(args)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 846, in deepspeed_launcher
    distrib_run.run(args)
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/scratch/project_2001654/Wpeng/AhmadAnis/venv_3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 

I digged into this error:
pytorch/pytorch#121493 it seems like we need to add model.resize_token_embeddings(len(tokenizer)) line

But this is already in the code i.e

    if len(tokenizer) > embedding_size:
        model.resize_token_embeddings(len(tokenizer))

SO I am not exactly sure what's the issue here.

Also noted that Llama 3.2 3B has the tokenizer class PreTrainedTokenizerFast instead of Llama Tokenizer so in the code I did:

    if isinstance(tokenizer, LlamaTokenizer) or isinstance(tokenizer, LlamaTokenizerFast) or True:

To add extra tokens in it.