WODQA-CHT and its Generator for Traditional Chinese Open Domain Question Answering
The dataset download at here
Col | Description | |
---|---|---|
1 | question | Question |
2 | qitm | Question keywords |
3 | article | Wiki Article |
4 | aitm | Article keywords |
5 | answer | Answer |
pip install -q -r requirement.txt
python -m pyserini.index.lucene \\ <br>
--collection JsonCollection \\ <br>
--input corpus \\ <br>
--language zh \\ <br>
--index Index/Wiki_Chinese \\ <br>
--generator DefaultLuceneDocumentGenerator \\ <br>
--threads 1 \\ <br>
--storePositions --storeDocvectors --storeRaw
python main.py -t naive
python main.py -t f1
python main.py -t f2
python main.py -t all
Dataset | EM | F1 | Avg.Len(q) |
---|---|---|---|
Without filter | 29.30 % | 26.90 % | 21.58 |
F1 filter | 32.23 % | 30.77 % | 21.17 |
F1+F2 filter | 35.40 % | 34.35 % | 26.25 |
F1+F2+F3 filter | 61.80 % | 57.30 % | 26.73 |
EM : Exact Match
F1 : F1 score
Avg. Len(q) : The avg length of questions