W2V Training Service

Base model

The selected base W2C gensim model is taken from Materials Science Word Embeddings. It represents Word2Vec model trained across 640k+ materials science journal articles. The model files are placed in models/base.

Preprocessing

The preprocessing of the data is done using the Jupyter Notebook Notebooks/0. Preprocess data.ipynb The data generated by the notebook is stored in data/preprocessed_dataset.pickle

Training

Models are trained by using a pre-trained base model (See Base Model) and given set hyperparameters. The newly trained models are stored in models/.

Setup & Run ReSTful Service

pip install -r requirements.txt
cd w2v_service
python manage.py makemigrations
python manage.py migrate
python runserver

Or use the provided Dockerfile.

Endpoints

/hyperparameters

POST

Description: Add hyperparameter(s) for re-training model(s)

Note: Only one combination of unique hyperparameters is stored in the database.

Note: If any of the supplied parameters is a list, then the hyperparameters search space is expanded and all possible combinations of the given hyperparameters are created.

Parameters:

start_alpha: float or comma separated list of floats
end_alpha: float or comma separated list of floats
epochs: int or comma separated list of ints

GET

List all hyperparameters stored in the database

/train/run

GET

Triggers the training of models for all hyperparameters

/train/run/{id}/

GET

Triggers the training of a single set of hyperparameters given by id

/train

GET

List all training session

/train/{id}

GET

Show details of the training session for a given id

/monitor

Get

Shows statistics for hyperparameters models trained.

Filters:

start_alpha: (float) - shows information for a specific start_alpha value
end_alpha: (float) - shows information for a specific end_alpha value
epochs: (int) - shows information for a specific epochs value

Note: The provided information is only for succesfull training sessions

vasoto/w2cservice