Official code for the paper "A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages". If you use this code please cite our paper.
- Python 3.7
- Pytorch 1.1.0
- Cuda 9.0
- Gensim 3.8.1
We assume that you have installed conda beforehand.
conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch
pip install gensim==3.8.1
- Pretrained FastText embeddings for Sanskrit can be obtained from here. Make sure that
.vec
file is placed at approprite position. - For Multilingual experiments, we use UD treebanks and Pretrained FastText embeddings
If you want to run complete model pipeline: (1) Pretraining (2) Integration, then simply run bash script run_san_LCM.sh
.
bash run_san_LCM.sh
If you use our tool, we'd appreciate if you cite the following paper:
@misc{sandhan2021little,
title={A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages},
author={Jivnesh Sandhan and Amrith Krishna and Ashim Gupta and Laxmidhar Behera and Pawan Goyal},
year={2021},
eprint={2102.06551},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Much of the base code is from "DCST Implementation"