/camel_tools

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

Primary LanguagePythonMIT LicenseMIT

CAMeL Tools

PyPI Version PyPI Python Version Documentation Status MIT License

CAMeL Tools Logo

Introduction

CAMeL Tools is suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

Please use GitHub Issues to report a bug or if you need help using CAMeL Tools.

Installation

You will need Python 3.8 - 3.12 (64-bit) as well as the Rust compiler installed.

Linux/macOS

You will need to install some additional dependencies on Linux and macOS. Primarily CMake, and Boost.

On Ubuntu/Debian you can install these dependencies by running:

sudo apt-get install cmake libboost-all-dev

On macOS you can install them using Homewbrew by running:

brew install cmake boost

Install using pip

pip install camel-tools

# or run the following if you already have camel_tools installed
pip install camel-tools --upgrade

On Apple silicon Macs you may have to run the following instead:

CMAKE_OSX_ARCHITECTURES=arm64 pip install camel-tools

# or run the following if you already have camel_tools installed
CMAKE_OSX_ARCHITECTURES=arm64 pip install camel-tools --upgrade

Install from source

# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools

# Install from source
pip install .

# or run the following if you already have camel_tools installed
pip install --upgrade .

Installing data

To install the datasets required by CAMeL Tools components run one of the following:

# To install all datasets
camel_data -i all

# or just the datasets for morphology and MLE disambiguation only
camel_data -i light

# or just the default datasets for each component
camel_data -i defaults

See Available Packages for a list of all available datasets.

By default, data is stored in ~/.camel_tools. Alternatively, if you would like to install the data in a different location, you need to set the CAMELTOOLS_DATA environment variable to the desired path.

Add the following to your .bashrc, .zshrc, .profile, etc:

export CAMELTOOLS_DATA=/path/to/camel_tools_data

Windows

Note: CAMeL Tools has been tested on Windows 10. The Dialect Identification component is not available on Windows at this time.

Install using pip

pip install camel-tools -f https://download.pytorch.org/whl/torch_stable.html

# or run the following if you already have camel_tools installed
pip install --upgrade -f https://download.pytorch.org/whl/torch_stable.html camel-tools

Install from source

# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools

# Install from source
pip install -f https://download.pytorch.org/whl/torch_stable.html .
pip install --upgrade -f https://download.pytorch.org/whl/torch_stable.html .

Installing data

To install the data packages required by CAMeL Tools components, run one of the following commands:

# To install all datasets
camel_data -i all

# or just the datasets for morphology and MLE disambiguation only
camel_data -i light

# or just the default datasets for each component
camel_data -i defaults

See Available Packages for a list of all available datasets.

By default, data is stored in C:\Users\your_user_name\AppData\Roaming\camel_tools. Alternatively, if you would like to install the data in a different location, you need to set the CAMELTOOLS_DATA environment variable to the desired path. Below are the instructions to do so (on Windows 10):

  • Press the Windows button and type env.
  • Click on Edit the system environment variables (Control panel).
  • Click on the Environment Variables... button.
  • Click on the New... button under the User variables panel.
  • Type CAMELTOOLS_DATA in the Variable name input box and the desired data path in Variable value. Alternatively, you can browse for the data directory by clicking on the Browse Directory... button.
  • Click OK on all the opened windows.

Documentation

To get started, you can follow along the Guided Tour for a quick overview of the components provided by CAMeL Tools.

You can find the full online documentation here for both the command-line tools and the Python API.

Alternatively, you can build your own local copy of the documentation as follows:

# Install dependencies
pip install sphinx myst-parser sphinx-rtd-theme

# Go to docs subdirectory
cd docs

# Build HTML docs
make html

This should compile all the HTML documentation in to docs/build/html.

Citation

If you find CAMeL Tools useful in your research, please cite our paper:

@inproceedings{obeid-etal-2020-camel,
   title = "{CAM}e{L} Tools: An Open Source Python Toolkit for {A}rabic Natural Language Processing",
   author = "Obeid, Ossama  and
      Zalmout, Nasser  and
      Khalifa, Salam  and
      Taji, Dima  and
      Oudah, Mai  and
      Alhafni, Bashar  and
      Inoue, Go  and
      Eryani, Fadhl  and
      Erdmann, Alexander  and
      Habash, Nizar",
   booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
   month = may,
   year = "2020",
   address = "Marseille, France",
   publisher = "European Language Resources Association",
   url = "https://www.aclweb.org/anthology/2020.lrec-1.868",
   pages = "7022--7032",
   abstract = "We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.",
   language = "English",
   ISBN = "979-10-95546-34-4",
}

License

CAMeL Tools is available under the MIT license. See the LICENSE file for more info.

Contribute

If you would like to contribute to CAMeL Tools, please read the CONTRIBUTE.rst file.

Contributors