Experiments on SQuAD dataset with Keras. The purpose is to make clean and efficient architecture which is easily reproducible and achieve close to state of the art results.
- We need to parse and split the data
python parse_data.py data/train-v1.1.json --train_ratio 0.9 --outfile data/train_parsed.json --outfile_valid data/valid_parsed.json
python parse_data.py data/train-v1.1.json --outfile data/train_parsed.json
- Preprocess the data
python preprocessing.py data/train_parsed.json --outfile data/train_data.pkl
python preprocessing.py data/valid_parsed.json --outfile data/valid_data.pkl
python preprocessing.py data/dev_parsed.json --outfile data/dev_data.pkl
- Train the model
python train.py --hdim 40 --batch_size 70 --nb_epochs 50 --optimizer adam --dropout 0.2
- Predict on dev/test set samples
python predict.py model/your-model prediction.json