tunib-ai/parallelformers

Support for GPT-J

andreamad8 opened this issue ยท 11 comments

Thanks for the great repo! I have tried it out, it's really amazing to lead such a large model in multiple GPUs.

Describe a requested feature

Currently, GPT-J is supported only in HF 4.7.0 and by installing

pip install git+https://github.com/finetuneanon/transformers@gpt-j

In your requirement, there is HF 4.8.0, and needs to load several new models. Soon gpt-j will be fully integrated in HF: huggingface/transformers#12243

I am wondering if is there an easy way to have back compatibility, or include GPT-J soon.

Thanks again for your great repo ๐Ÿ‘๐Ÿป

-- Andrea

(1) Thanks for the good issue. We will update backward compatibility patch soon :)

(2) However, there are some problems with the implementation of GPT-J, so we will add it when the official PR is merged, not the Draft version.

Thank you !

@andreamad8 I patched it to work in Transformers version 4.2.0 or higher.
Would you like to update and test using pip install parallelformers --upgrade ?

Great this feature works.

One thing I notice, it's the number of GPUs have to be an even number (2,4,8 .. ) to work. If I try to run 10 GPUs the code fails. Is this normal? If you want I can send you a more detailed error.

-- Andrea

This is a limitation of the Megatron LM algorithm that parallelformers are using. This is because parallelization is performed by dividing the parameters into N.

Tensors in most models have parameters of size multiples of 2. For example, if the nn.Linear layer in the model has shape [512, 512], splitting it in half will parallelize it as [[256, 512], [256, 512]].

So, the problem is occurred when using 10 GPUs. [512, 512] divided by 10 gives 51.2, so parallelization is not possible.

Yeah, that's was I thought.

I suggest adding this info in the README, for the people (like me :)) that are not familiar with Megatron LM.

I think I forgot to inform users about this part.
We will add this to the documentation soon.

Thank you very much for your good comments. :)

I have noticed I cannot run multiple experiments because of

RuntimeError: Address already in use

Usually in torch.distributed.launch I can use --master_port, is there an equivalent in this framework?

-- Andrea

thanks, I should have read :)

Hi,

they just added GPT-J in HF.

If I try running with I get this error:

AssertionError: GPTJForCausalLM is not supported yet.

Are you planning to support this model as well?

-- Andrea

We added GPTJ.