Trainer.train() giving me Key Error: [random number]

Question

Trainer.train() giving me Key Error: [random number]

fishroll23 opened this issue 19 days ago · 3 comments

System Info

peft == 0.10.0
transformers==4.40.2
python 3.10.11

Code

from peft import LoraConfig, TaskType, get_peft_model
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from sklearn.model_selection import train_test_split
from transformers.data.data_collator import default_data_collator
import pandas as pd

df = pd.read_csv('data.csv')
train_df, eval_df = train_test_split(df, test_size=0.2, random_state=1)
train_df.reset_index(inplace=True, drop=True)
eval_df.reset_index(inplace=True, drop=True)

peft_config = LoraConfig(task_type=TaskType.SEQ_CLS,
                         inference_mode=False,
                         r=8,
                         lora_alpha=32,
                         lora_dropout=0.1)

tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/pai-bert-base-zh-llm-risk-detection")
model = AutoModelForSequenceClassification.from_pretrained("alibaba-pai/pai-bert-base-zh-llm-risk-detection")

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="toxic_detect/pai-bert-base-zh-llm-risk-detection-lora",
    learning_rate=1e-3,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=2,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_df,
    eval_dataset=eval_df,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

model.save_pretrained("output_dir")

And here’s the code leading up to the error:

`/Users/mac299/anaconda3/envs/pythonProject1/venv/bin/python /Users/mac299/anaconda3/envs/pythonProject1/train.py 
/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
trainable params: 298,757 || all params: 102,570,250 || trainable%: 0.29127061696739553
  0%|          | 0/912 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
    return self._engine.get_loc(casted_key)
  File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 870

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/mac299/anaconda3/envs/pythonProject1/train.py", line 58, in <module>
    trainer.train()
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2165, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/accelerate/data_loader.py", line 454, in __iter__
    current_batch = next(dataloader_iter)
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/pandas/core/frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/mac299/anaconda3/envs/pythonProject1/venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
    raise KeyError(key) from err
KeyError: 870
  0%|          | 0/912 [00:00<?, ?it/s]

I’ve tried a few different tutorials on HuggingFace, but so far no progress on this one.

I’m working on PyCharm, if it’s relevant. Please let me know if there’s any other information I could give you that would help with diagnosis.

Thanks for reading!

Answer 1 · 2024-05-14T08:48:47.000Z

I'm pretty sure that the issue you see is not PEFT-related. If you remove the PEFT part, I expect the same type of error to occur. Most likely, the issue is that you try to use Trainer with a pandas DataFrame. I'm not super knowledgeable on Trainer, but I would be surprised if that worked. Check the docs for this argument.

If you have tabular data, I don't think AutoModelForSequenceClassification is a good fit anyway. If it's pure text data, I wouldn't load it as a DataFrame.

Answer 2 · 2024-05-14T10:57:46.000Z

Thanks for answer, I try to use datasets.Dataset instead of Dataframe in Trainer and figure it out.

Answer 3 · 2024-05-14T15:11:31.000Z

Good luck. I'll close the issue for now, as it seems to be unrelated to PEFT. Feel free to re-open if you encounter a PEFT error.