/wlo-classification

WLO discipline classification

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Discipline Classification

A Docker/Python/Keras/Tensorflow utility to train and predict subject areas for the openeduhub dataset.

Prerequisites

  • Install Docker.

  • (The training script runTraining.sh requires the Nvidia Docker runtime installed. For processing without a GPU remove the --runtime=nvidia parameter in the script's docker command.)

  • Build the Docker container.

sh build.sh

Training

(The data folder already containes a pretrained model.)

  • The following script retrieves and processes the latest dataset, which results in the data/wirlernenonline.oeh.csv file containing the relevant documents (documents with a discipline property).
sh prepareData.sh
  • This script initiates the training, which results in the model file data/wirlernenonline.oeh.h5, the file with class labels data/wirlernenonline.oeh.npy, and the tokenizer serialization data/wirlernenonline.oeh.pickle (existing files will be overwritten without warning).
sh runTraining.sh

Prediction

  • To test the prediction just query the model with an arbitrary text.
sh runPrediction.sh "Der Satz des Pythagoras lautet: a^2 + b^2 = c^2."

The result is a list of tuples of a score and its corresponding class name (name of discipline). Only the top three items are retrieved, in descending order.

Webservice

  • To run the subject prediction tool as a simple REST based webservice, the following script can be used:
sh runService.sh
  • The scripts deploys a CherryPy webservice in a docker container listening at http://localhost:8080/predict_subject.

  • To retrieve the recommendations, create a POST request and submit a json document with a text as for example:

curl -d '{"text" : "Der Satz des Pythagoras lautet: a^2 + b^2 = c^2."}' -H "Content-Type: application/json" -X POST http://0.0.0.0:8080/predict_subjects