GPT2 parallelism does not work on the Tesla K80
0x7o opened this issue · 1 comments
0x7o commented
How to reproduce
from transformers import AutoModelForCausalLM, AutoTokenizer
from parallelformers import parallelize
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
parallelize(model, num_gpus=2, fp16=True, verbose='detail')
inputs = tokenizer("Parallelformers is", return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=15,
)
print(f"Output: {tokenizer.batch_decode(outputs)[0]}")
Problem
The system distributes the model between GPUs, but when generating the second GPU is 100% loaded and does not leave this state. Generation failed.
Environment
PyTorch version: 1.10.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17
Python version: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.15.0-187-generic-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA Tesla K80
GPU 1: NVIDIA Tesla K80
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.10.1+cu113
[conda] numpy 1.21.6 pypi_0 pypi
[conda] torch 1.10.1+cu113 pypi_0 pypi
hyunwoongko commented
we don't support k80.
it's so old gpu.