huggingface/transformers

Integrate IndicTrans2 models and tokenizer into HF Transformers

VarunGumma opened this issue · 4 comments

Model description

IndicTrans2 is a multilingual transformer model developed by AI4Bharat, and is available in 3 flavors: indic-en, en-indic and indic-indic. Each flavor has 2 versions, a large 1B model, and a distilled 200M model. The architecture is a standard transformer, very similar to NLLB and M2M models. However, the major difference is the vocabularies of the encoder and decoder and not shared, as they require different languages.

Unlike, NLLB and M2M models, IndicTrans2 required specific preprocessing for the inputs. Hence a custom processor class has been developed, and is required for training/inference. More examples can be found in the aforementioned repository.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Authors: @AI4Bharat @jaygala24 @PranjalChitale @oneraghavan @VarunGumma @sumanthd17 @prajdabre @anoopkunchukuttan

Official GitHub Repository: AI4Bharat/IndicTrans2

The HF compatible models and tokenizer are available here as of now:

Hi @VarunGumma, thanks for opening this model request!

This looks like a great candidate for adding the model on the hub. This is the easiest and recommended way to make a model available in transformers and means, once working, the model can be found and used immediately without having to go through the PR process. We find this is a lot quicker as the bar for adding code into the library is high due to the maintenance cost of every new model, and so reviews take quite a while.

We have as much support as we can for this - let us know know if there's any issues in implementation. Here is a tutorial if that sound good to you!

Hu @amyeroberts,

Thank you for your reply. We also need some help to add flash_attention_2 to our model. We were able to modify the modeling script for it, but it throws us an error that our model class IndicTransForConditionalGeneration itself is now supported. How can we proceed in this case?

@VarunGumma Could you share the error message and full traceback?

@amyeroberts , thank you. We were able to resolve it on our end.