/polyglot-windows-installation

Installation and setup of Polyglot package for Python 3 on Windows.

Primary LanguagePython

polyglot-windows-installation

Manual steps, explanation and automatic installation of the package "Polyglot" in any version of Python 3 on Windows.

  • To easily install the package in Windows, run the setup.bat file.

More information on this installation and its explanation can be seen below. This can also be done in a virtual environment, of any python 3 version. Information on creating virtual environments can be found on the Virtualenv Documentation, the Venv Documentation, or the Conda Environments Documentation.


Polyglot is a natural language pipeline that supports various languages, built for Linux and very useful for natural language processing of texts in non-english languages. The Polyglot Documentation explains how to install it on Linux, by installing its dependencies and running pip install polyglot in the terminal. However, as of September 2022, the package is still not stable for Windows OS, requiring either the use of a Windows Subsystem for Linux (WSL) or through various installations of dependencies and setup through the Command Prompt.

  • Using WSL

With the use of a virtual Linux environment in Windows, it is possible to run the regular installation of polyglot in a WSL virtual machine platform. Installation of the WSL can be done following the Microsoft Documentation.

This is a good solution, but may result in slower performance or conflict with running virtual environments in some cases.

  • On base Windows

If trying to install and use polyglot in Windows without the use of a virtual Linux machine, the process is more complex. This repo was created to offer an easy way to do this, by running the setup.bat file.

In case you prefer to run the installation step by step, the processa is as follows:

  1. Download of depencencies PyICU, pycld2, futures and Morfessor. Morfessor and futures wheels may be unnecessary in some cases. These two are already present in the wheels/ folder, and are built for all versions of Python 3. PyICU and pycld2 are version-specific according to which minor version of Python 3 you are using and can be accordingly downloaded from this Archive of Python Extension Packages.

    Within the get_dependencies.py script, which is called by setup.bat, these wheels are automatically downloaded according to the detected Python version of the interpreter. By changing this python script, you can input a specific version of each of these wheels for download, instead of downloading the default versions.

  2. Installation of dependencies using

    python -m pip install [path/to/downloaded/wheel.whl]
    

    In some cases, it is possible to simply run pip install [wheel.whl] or pip3 install [wheel.whl], though these may not work in all cases.

    This is also automated within the get_dependencies.py script for each of the four wheels required.

  3. Setup of the polyglot package itself directly from the GitHub Repo.

    git clone https://github.com/aboSamoor/polyglot.git
    cd polyglot
    python setup.py install
    

    This is done automatically within the setup.bat file. After the setup is finished, the created folder is also removed, because it is no longer necessary.

With this the package is installed on Windows and can be tested by running in Python:

from polyglot.text import Text

Errors may be the result of failed installation of dependencies or unsupported versions of packages.


  • Installing specific language models

When using polyglot for parts-of-speech tagging in languages other than english, it is necessary to install the respective embeddings2 and pos2 models. For example, to use portuguese (pt) parts-of-speech tagging, you need to run in the Command Prompt or Terminal:

polyglot download embedding2.pt pos2.pt

More information on languages and parts-of-speech tagging can be found in the POS Documentation, Model Download Documentation and in the Universal POS Tables.