/oteann4

Primary LanguageJupyter NotebookMIT LicenseMIT

OTEANN

This repository is the official implementation of OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network.

Requirements

  • Require Ubuntu 18.04, Python 3.6+.

  • Download files on your machine

  • Go to the oteann4 main directory

    • cd oteann4
  • Create a virtual environment

    • python3 -m venv oteann
  • Activate virtual environment

    • source oteann/bin/activate
  • Load the python librairies needed for OTEANN (e.g. numpy, pandas, torch...) from the requirements file

    • python -m pip install -r requirements.txt
  • Extract the subdatasets (required local free spacesize=334Mo)

    • python extract_subdatasets.py

Evaluation and Training

In order to run this code, you need to:

  • Run the oteann.ipynb in order to create the ANN model, train it, test it and display the results of the paper.

Results

OTEANN achieves the following performance:

orthography write score read score
ent 99.6 ± 0.3 99.8 ± 0.1
eno 00.0 ± 0.0 00.0 ± 0.0
ar 84.3 ± 0.8 99.4 ± 0.3
br 80.6 ± 0.6 77.2 ± 1.6
de 69.1 ± 1.0 78.0 ± 1.5
en 36.1 ± 1.5 31.1 ± 1.3
eo 99.3 ± 0.2 99.7 ± 0.1
es 66.9 ± 2.0 85.3 ± 1.3
fi 97.7 ± 0.3 92.3 ± 0.8
fr 28.0 ± 1.4 79.6 ± 1.7
fro 99.0 ± 0.3 89.7 ± 1.1
it 94.5 ± 0.8 71.6 ± 0.9
ko 81.9 ± 1.0 97.5 ± 0.5
nl 72.9 ± 1.7 55.7 ± 2.2
pt 75.8 ± 1.0 82.4 ± 0.9
ru 41.3 ± 1.6 97.2 ± 0.5
sh 99.2 ± 0.3 99.3 ± 0.3
tr 95.4 ± 0.7 95.9 ± 0.6
zh 19.9 ± 1.4 78.7 ± 0.9

License

OTEANN uses minGTPT (https://github.com/karpathy/minGPT) which is released under the MIT licence.

Citation

This code was used for the following paper:

@misc{marjou2020oteann,
      title={OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network}, 
      author={Xavier Marjou},
      year={2021},
      eprint={1912.13321v4},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}