/deepQuest

Framework for neural-based Quality Estimation

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

DeepQuest -- Framework for neural-based Quality Estimation

Developed at the University of Sheffield, DeepQuest provides state-of-the-art models for multi-level Quality Estimation.

If you use this, please cite:

DeepQuest: a framework for neural-based Quality Estimation. Julia Ive, Frédéric Blain, Lucia Specia (2018).

@article{ive2018deepquest,
  title={DeepQuest: a framework for neural-based Quality Estimation},
  author={Julia Ive and Frédéric Blain and Lucia Specia},
  journal={In the Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics, Sante Fe, New Mexico, USA},
  year={2018}
}

Documentation:

Build Status

More information on https://sheffieldnlp.github.io/deepQuest

Acknowledgements

The development of DeepQuest received funding from the European Association for Machine Translation and the Amazon Academic Research Awards program.

How to Use Example

Requirement

  • python 2.7

Clone

cd
git clone https://github.com/takatsugu-kato/deepQuest.git

This repository contains qe-2017 exapmle data.

Training

cd deepQuest/quest
./train-test-sentQEbRNN.sh --task qe-2017 --source src --target mt --score hter --activation sigmoid --device cpu

Scoring

rm -rf config.py
ln -s ../configs/config-sentQEbRNNEval.py config.py
THEANO_FLAGS=device=cpu python main.py

Memo

  1. train.srcとtrain.mtを用意する
  2. train.mtをPEしてtrain.peをつくる
  3. train.mtとtrain.peの間のTER(git)を出し、train.hterとして保存する
  4. 上記を適当なフォルダに配置し、config-sentQEbRNN.pyをそれに合わせて編集する
  5. train-test-sentQEbRNN.shをたたいてTrainingする
  6. Trainingがおわったらconfig-sentQEbRNNEval.pyを適宜編集する
  7. test.mtとtest.srcを用意してScoringを出す

TERの出し方

train.mtとtrain.peを用意し、それぞれの行末にカッコ書きでIDをいれる。 IDのフォーマットはなんでもよくて、ユニークかつ両ファイル間で対応がとれていればなんでもいい。

$> cat data/train.mt
i am nice (hoge1)
i am good (hoge2)
i am bad (foo3)

$> cat data/train.pe
i am nice (hoge1)
i am good (hoge2)
i am wild (foo3)

で、以下をたたくとdataフォルダにter_data.*がいろいろでてくる。 -rと-hのファイルは逆にしても結果は同じ。

cd data/
java -jar ../tercom.jar -r train.pe -h train.mt -n ter_data