use Model Parallelism for GPT2?

Question

use Model Parallelism for GPT2?

yijinlee opened this issue 4 years ago · 4 comments

hf-transformers GPT2 (and T5) has a parallelize method to use model parallelism to load into multi-GPU so that the combined GPU-mem is enough for the GPT2 size. I've tried to 'hack' around to see if I can make it work within blurr for fastai, but unfortunately have not been successful. Pointers on how to get it working, and how best to add this into blurr codebase as a PR, will be much appreciated.

Snippet of what I did:

model_cls = AutoModelForSequenceClassification
pretrained_model_name = "gpt2-medium"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, model_cls=model_cls)

hf_model.transformer.parallelize() # the hf method

hf_model.transformer.model_parallel
# True

hf_model.transformer.device_map # I have two GPUs
# {0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
#  1: [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]}

if (hf_tokenizer.pad_token is None): hf_tokenizer.pad_token = '[PAD]'
hf_tokenizer.pad_token, hf_tokenizer.pad_token_id
hf_model.config.pad_token_id = hf_tokenizer.pad_token_id

blocks = (HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model), CategoryBlock)
dblock = DataBlock(blocks=blocks,  get_x=ColReader('x'), get_y=ColReader('y'), splitter=RandomSplitter())
bs = 1
dls = dblock.dataloaders(df, bs=bs)

model = HF_BaseModelWrapper(hf_model)

learn = Learner(dls, 
                model,
                opt_func=partial(Adam, decouple_wd=True),
                loss_func=CrossEntropyLossFlat(),
                metrics=[accuracy],
                cbs=[HF_BaseModelCallback],
                splitter=hf_splitter).to_fp16()

learn.freeze()
learn.fit_one_cycle(1,1e-3)

The error I got was:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking arugment for argument weight in method wrapper_native_layer_norm)

A few links that I looked at, for the parallelize method:
huggingface/transformers#8696
https://huggingface.co/transformers/_modules/transformers/models/gpt2/modeling_gpt2.html

Thanks.

Answer 1 · 2021-07-01T21:41:17.000Z

I'll take a look. I like it.

Answer 2 · 2021-07-09T11:17:11.000Z

I'll take a look. I like it.

Is there anything I can help with on this? : )

Answer 3 · 2021-07-09T16:12:00.000Z

Sure. I've ran through it and I think it is something in fastai causing the issue re: how code gets put on one device or another. Maybe hit folks up on the discord to see if they can shed some light on what needs to be done (with either fastai or blurr); from there can work up PRs to either or both libraries.

…

On Fri, Jul 9, 2021 at 4:17 AM Yijin ***@***.***> wrote: I'll take a look. I like it. Is there anything I can help with on this? : ) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADNMCCVEBIIVYPV534JUDTW3LEDANCNFSM47UY4LWA> .

Answer 4 · 2022-06-01T18:32:32.000Z

Closing this out for now ... feel free to open if there are still issues.