Training with out Trainer class. (VRAM usage issue)
venzino-han opened this issue · 4 comments
System Info
Python 3.10.12
transformers 4.40.0.dev0
peft 0.10.1.dev0
torch 2.1.2
I tried to apply lora with peft.
But it still using same amount of GPU VRAM.
How can I train model without using transformers.Trainer class?
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder - My own task or dataset (give details below)
Reproduction
model = T5ForConditionalGeneration.from_pretrained(args.model_name)
peft_config = LoraConfig(
peft_type="LORA",
task_type=TaskType.SEQ_2_SEQ_LM,
r=4,
lora_alpha=32,
target_modules=["q","v"],
lora_dropout=0.1
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
self.optimizer = torch.optim.AdamW(
model.parameters(),
lr=args.learning_rate,
weight_decay=args.weight_decay,
eps=args.adam_epsilon
)
def train_epoch(self, dataloader, epoch):
epoch_loss = 0
# self.model.train()
# self.model = self.model.to(self.device)
accumulated_embeddings = None
accumulated_scores = None
count = 0
for batch in tqdm(dataloader):
count += 1
self.optimizer.zero_grad()
self.model.zero_grad()
# Extract and send batch data to the specified device
source_ids = batch["source_ids"].to(self.device)
attention_mask = batch["source_mask"].to(self.device)
decoder_attention_mask = batch["target_mask"].to(self.device)
target_ids = batch["target_ids"].to(self.device)
scores = batch["y"].to(self.device)
pids = batch["prompt_id"].to(self.device)
scores = scores.to(torch.bfloat16)
# Forward pass and calculate loss
outputs = self.model(
input_ids=source_ids,
attention_mask=attention_mask,
decoder_attention_mask=decoder_attention_mask,
labels=target_ids,
return_dict=True,
)
loss = outputs.loss
loss.backward()
torch.nn.utils.clip_grad_norm_(
self.model.parameters(), self.args.max_grad_norm
)
self.optimizer.step() # Update model parameters
epoch_loss += loss.item()
# Get the current learning rate from scheduler or optimizer
lr = (
self.scheduler.get_last_lr()[0]
if self.scheduler
else self.optimizer.param_groups[0]["lr"]
)
log = f"epoch: {epoch} | "
log += f"train loss: {epoch_loss/len(dataloader):.6f} | "
log += f"lr: {lr:.6f} |"
if self.scheduler:
self.scheduler.step() # Update learning rate
return epoch_loss / len(dataloader)
Expected behavior
I want to run peft without Trainer.
How did you measure the VRAM usage? What values do you get when you run with vs without PEFT? As your code is not complete, I cannot try to replicate your issue.
I measured the VRAM usage by wandb. However, the usages before and after adopting LoRA was similar.
I just want to know that peft can work same when I don't use Trainer in transformers.
@BenjaminBossan Thanks for your kind reply.
Yes, PEFT can definitely work without Trainer
, at first glance you code that you posted looks correct.
When it comes to memory savings of LoRA, it depends on many factors: model size, sequence length, choice of optimizer, LoRA config settings, etc. It has also happened in the past that PyTorch would reserve more memory than it actually needs when using LoRA, so also keep an eye out for reserved vs allocated memory.
@BenjaminBossan thank for your kind comment!