If you want to compare the results in the exact same environment as the paper, try please try the demo page (https://yusuke1997.com/tatar) !!
- macOS Catalina
- zsh
- Python 3.9.4
This code uses the shuf
command, so please set it up:
brew install coreutils # if you use brew as a package manager
or
sudo apt install coreutils # ubuntu,debian,kali linux...
# if you use any other OS, search following url.
https://command-not-found.com/shuf
Next, download the python dependencies using pip
pip install -r requierement.txt
Please clone this code.
And then, run the following shell script to complete all the preparations.
./prepare.sh
Next, the transliteration is executed by running the following python file.
python3 predict.py
When you type a Tatar sentence written in Cyrillic, it is processed and dealt with line by line.
If you want to check whether the subwords in each sentence are recognized as Russian or Tatar, run the following python file.
python3 inference.py
However, inference.py only considers the subwords in the initial state, and in fact we do some processing on the labels of the subwords in the initial state.
If you want to know the probabilities and labels of the final subwords, uncomment-out print(elm,lang)
at about line 188 of predict.py.
The whole process can be seen in prepare.sh
and the predict
function in predict.py
.
If you want to start the experiment from scratch, just run ./delete.sh
.
If you want to start from the middle, rewrite . /delete.sh
appropriately.
The data used for the evaluation were kindly provided by the authors and are therefore considered as closed data.
The whole evaluation directory is written in .gitignore
, so if you want the evaluation script, please contact us!
If there is anything else, please feel free to contact us in any language!
@inproceedings{taguchi-etal-2021-transliteration,
title = "Transliteration for Low-Resource Code-Switching Texts: Building an Automatic {C}yrillic-to-{L}atin Converter for {T}atar",
author = "Taguchi, Chihiro and
Sakai, Yusuke and
Watanabe, Taro",
booktitle = "Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching",
month = jun,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.calcs-1.18",
pages = "133--140",
}