Learning to generate Gap-Filling questions from teaching material.
git clone https://github.com/bwanglzu/Automatic-Question-Generation.git
cd Automatic_Question_Generation
pip install -r requirements.txt
- Create a folder to host all the stanford models, e.g.
mkdir /your-path-to-stanford-models/stanford-models
.
- Download Stanford Parser at here, unzip, and:
- Move
stanford-parser.jar
to stanford models folder, e.g./your-path-to-stanford-models/stanford-models/stanford-parser.jar
- Move
stanford-parser-x-x-x-models.jar
to stanford models folder. - Unzip
stanford-parser-x-x-x-models.jar
, move/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz
tostanford-models/
- Move
- Download Stanford NER at here, unzip, and:
- Move
stanford-ner.jar
to stanford models folder. - Move
stanford-ner-x-x-x.jar
to stanford models folder (e.g. 3.7.0). - Move
/classifiers/english.all.3class.distsim.crf.ser.gz
to stanford models folder.
- Move
The stanford models folder should looks like this:
- stanford-models/
| - stanford-parser.jar
| - stanford-parser-x-x-x-models.jar
| - englishPCFG.ser.gz
| - stanford-ner.jar
| - stanford-ner-x-x-x.jar
| - english.all.3class.distsim.crf.ser.gz
Create environment variable file with: touch .env
for configuration (in project root).
SENTENCE_RATIO = 0.05 #The threshold of important sentences
STANFORD_JARS=/path-to-your-stanford-models/stanford-models/
STANFORD_PARSER_CLASSPATH=/path-to-your-stanford-models/stanford-models/stanford-parser-x.x.x-models.jar
STANFORD_NER_CLASSPATH=/path-to-your-stanford-models/stanford-models/stanford-ner.jar
cd aqg
python app.py -f name-of-the-document.txt
nosetests --with-coverage
- Sentence Selection: select topically important sentences from text document.
- Gap Selection: Exmploy standford parser extract NP (noun phrase) and ADJP from important sentences as candidate gaps
- Question Classification: Classify question quality based on pre-trained SVM classifier
- Achieved 77% accuracy on training set.
- Achieved 64% accuracy on 10-fold CV (without parameter tunning).
Learning curve:
- SVM trained with Scikit Learn, C = 300, gamma = 0.001, rbf kernel.