CompareModels_TRECQA

In a QA system that needs to infer from unstructured corpus, one challenge is to choose the sentence that contains best answer information for the given question.

These files provide six baseline models, i.e. average pooling, RNN, CNN, RNNCNN, QA-LSTM/CNN+attention (Tan, 2015; state-of-art 2015), AP-LSTM/CNN (Santos, 2016; state-of-art 2016) for the TrecQA task (wang et al. 2007).

Model Comparison

All models were trained on train-all using Keras 2.1.2.
You can download the glove parameters at here http://nlp.stanford.edu/data/glove.6B.zip
Batch normalization was used to improve the performance of the models over the results of the pasky's experiments.
https://github.com/brmson/dataset-sts/tree/master/data/anssel/wang

If you see the other performance records on this dataset, visit here. https://aclweb.org/aclwiki/Question_Answering_(State_of_the_art)

Model	devMRR	testMRR	etc
Avg.	0.855998	0.810032	pdim=0.5, Ddim=1
CNN	0.865507	0.859114	pdim=0.5, p_layers=1, Ddim = 1
RNN(LSTM)	0.842302	0.827154	sdim=5~7, rnn=CuDNNLSTM, rnnbidi_mode=concatenate, Ddim = 2, proj=False
RNN+CNN	0.862692	0.803874	Ddim=2, p_layers=2, pdim=0.5, rnn=CuDNNLSTM, rnnbidi_mode=concatenate sdim=1
QA-LSTM/CNN+attention	0.875321	0.832281	Ddim=[1, 1/2], p_layers=2, pdim=0.5, rnn=CuDNNLSTM, rnnbidi_mode=concatenate sdim=1, adim=0.5, state-of-art 2015
AP-LSTM/CNN (Attentive Pooling)	0.883974	0.850000	Ddim=0.1, p_layers=1, pdim=0.5, rnn=CuDNNLSTM, rnnbidi_mode=concatenate sdim=5, w_feat_model=rnn, sdim=4, state-of-art 2016

This year(2017)'s new results (TO DO list to implement)

Model	testMRR	etc
HyperQA	0.865	Tay et al. (2017)
BiMPM	0.875	Wang et al. (2017)
Compare-Aggregate	0.899	Bian et al. (2017)
IWAN	0.889	Shen et al. (2017)

Reference

Wang, Mengqiu and Smith, Noah A. and Mitamura, Teruko. 2007. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA. In EMNLP-CoNLL 2007.
Ming Tan, Cicero dos Santos, Bing Xiang & Bowen Zhou. 2015. LSTM-Based Deep Learning Models for Nonfactoid Answer Selection. In eprint arXiv:1511.04108.
Sergey Ioffe, Christian Szegedy. 2015 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML 2015.
Cicero dos Santos, Ming Tan, Bing Xiang, Bowen Zhou. 2016. Attentive Pooling Networks. In eprint axXiv:1602.03609.
Yi Tay, Luu Anh Tuan, Siu Cheung Hui. 2017 Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks. In eprint arXiv: 1707.07847.
Zhiguo Wang, Wael Hamza and Radu Florian. 2017. Bilateral Multi-Perspective Matching for Natural Language Sentences. In eprint arXiv:1702.03814.
Weijie Bian, Si Li, Zhao Yang, Guang Chen, Zhiqing Lin. 2017. A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection. In CIKM 2017.
Gehui Shen, Yunlun Yang, Zhi-Hong Deng. 2017. Inter-Weighted Alignment Network for Sentence Pair Modeling. In EMNLP 2017.

Kyung-Min/CompareModels_TRECQA

CompareModels_TRECQA

Model Comparison

This year(2017)'s new results (TO DO list to implement)

Reference