Spam classification machine learning models for Zenodo records and communities.
First of all, create a virtualenv, install the depencencies, and run the Jupyter notebook server:
mkvirtualenv --python python3.6 zenodo-classifier
(zenodo-classifier) pip install -r requirements.txt
# This will also open Kupyter notebook in your browser
(zenodo-classifier) jupyter notebook
To re-train the model:
- Run the
dump_zenodo_open_metadata.py
script in production to generate the latest dump - Download the dump locally
- Open the
model_spam_detection_record.ipynb
notebook - Update the
data_file
andmodel_path
variables to point to the new dump location - Run all the cells up to
4. Dump model
.
To compare with older models:
TODO
experiments/
- Experimental model notebookslegacy/
- Legacy model notebooksdump_zenodo_open_metadata.py
- Generates a dump from the database, that can be used to train classifier modelsdownload_zenodo_open_metadata_archive.py
- Downloads and extracts older dumpsclean_zenodo_open_metadata.py
- Cleans downloaded datasetsmodel_spam_detection_record.ipynb
+model_spam_detection_communities.ipynb
- Currently used classifier model notebooks for producing a trained modelrun_model_py.ipynb
- Improve metrics
- Update README