huggingface/transformers

ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided []

gtanya89 opened this issue · 4 comments

System Info

Using Trainer with PyTorch DDP on single node multiple GPU. torch.dist.init_process_group() is setup ok. Seems like Trainer _get_train_sampler() does not use DistributedSampler but rather RandomSampler? Or could this be another issue I am missing? Any inputs appreciated! Thanks!

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Error originates in data collator. Same code works with Single GPU.

Expected behavior

Expect to train in a distributed fashion on multiple GPUs using Trainer API.

I'm also curious why removing DistributedSampler from _get_train_sampler() as I remember older version has it implemented for training with multi-gpu case.

It uses Accelerate's sampler for the data now @yuyemin since the trainer has a complete integration. Can you post the error @gtanya89 with the full trace and a reproducer?