/afrolid

AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.

Primary LanguagePythonApache License 2.0Apache-2.0

AfroLID

GitHub release Documentation GitHub license Documentation Status GitHub stars GitHub forks

online_demo

AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. AfroLID is described in this paper: AfroLID: A Neural Language Identification Tool for African Languages.


Requirements

  • Download AfroLID model:
    wget https://demos.dlnlp.ai/afrolid/afrolid_model.tar.gz
    tar -xf afrolid_model.tar.gz

Installation

  • To install AfroLID and develop directly using pip:
    pip install -U afrolid
  • To install AfroLID and develop directly GitHub repo using pip:
    pip install -U git+https://github.com/UBC-NLP/afrolid.git
  • To install AfroLID and develop locally:
    git clone https://github.com/UBC-NLP/afrolid.git
    cd afrolid
    pip install .

Getting Started

The full documentation contains instructions for getting started, translation using diffrent methods, intergrate AfroLID with your code, and provides more examples.

Colab Examples

(1) Integrate AfroLID with your python code

ContentColab link
  • Install AfroLID
  • Download AfroLID's model
  • Initial AfroLID object
  • Get language prediction(s)
  • Integrate with Pandas
colab

(2) Command Line Interface

Command ContentColab link
afrolid_cli
  • Usage and Arguments
  • Examples
colab

Supported languages

Please refer to suported-languages

License

afrolid(-py) is Apache-2.0 licensed. The license applies to the pre-trained models as well.

Citation

If you use AfroLID toolkit or the pre-trained models for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows (to be updated):

@article{adebara2022afrolid,
  title={AfroLID: A Neural Language Identification Tool for African Languages},
  author={Adebara, Ife and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Inciarte, Alcides Alcoba},
  booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = December,
  year = "2022",
}

Acknowledgments

We gratefully acknowledge support from Canada Research Chairs (CRC), the Natural Sciences and Engineering Research Council of Canada (NSERC; RGPIN-2018-04267), the Social Sciences and Humanities Research Council of Canada (SSHRC; 435-2018-0576; 895-2020-1004; 895-2021-1008), Canadian Foundation for Innovation (CFI; 37771), Digital Research Alliance of Canada, UBC ARC-Sockeye, Advanced Micro Devices, Inc. (AMD), and Google. Any opinions, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of CRC, NSERC, SSHRC, CFI, CC, AMD, Google, or UBC ARC-Sockeye.