A Docker/Python/Keras/Tensorflow utility to train and predict subject areas for the openeduhub dataset.
-
Install Docker.
-
(The training script
runTraining.sh
requires the Nvidia Docker runtime installed. For processing without a GPU remove the--runtime=nvidia
parameter in the script's docker command.) -
Build the Docker container.
sh build.sh
(The data
folder already containes a pretrained model.)
- The following script retrieves and processes the latest dataset, which results in the
data/wirlernenonline.oeh.csv
file containing the relevant documents (documents with a discipline property).
sh prepareData.sh
- This script initiates the training, which results in the model file
data/wirlernenonline.oeh.h5
, the file with class labelsdata/wirlernenonline.oeh.npy
, and the tokenizer serializationdata/wirlernenonline.oeh.pickle
(existing files will be overwritten without warning).
sh runTraining.sh
- To test the prediction just query the model with an arbitrary text.
sh runPrediction.sh "Der Satz des Pythagoras lautet: a^2 + b^2 = c^2."
The result is a list of tuples of a score and its corresponding class name (name of discipline). Only the top three items are retrieved, in descending order.
- To run the subject prediction tool as a simple REST based webservice, the following script can be used:
sh runService.sh
-
The scripts deploys a CherryPy webservice in a docker container listening at
http://localhost:8080/predict_subject
. -
To retrieve the recommendations, create a POST request and submit a json document with a text as for example:
curl -d '{"text" : "Der Satz des Pythagoras lautet: a^2 + b^2 = c^2."}' -H "Content-Type: application/json" -X POST http://0.0.0.0:8080/predict_subjects