ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided []

Question

ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided []

gtanya89 opened this issue 16 days ago · 4 comments

System Info

Using Trainer with PyTorch DDP on single node multiple GPU. torch.dist.init_process_group() is setup ok. Seems like Trainer _get_train_sampler() does not use DistributedSampler but rather RandomSampler? Or could this be another issue I am missing? Any inputs appreciated! Thanks!

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Error originates in data collator. Same code works with Single GPU.

Expected behavior

Expect to train in a distributed fashion on multiple GPUs using Trainer API.

Answer 1 · 2024-05-13T09:03:08.000Z

cc @pacman100 @muellerzr

Answer 2 · 2024-05-13T15:50:36.000Z

I'm also curious why removing DistributedSampler from _get_train_sampler() as I remember older version has it implemented for training with multi-gpu case.

Answer 3 · 2024-05-13T15:55:54.000Z

It uses Accelerate's sampler for the data now @yuyemin since the trainer has a complete integration. Can you post the error @gtanya89 with the full trace and a reproducer?