Open Skills Project - Machine Learning
This is the library for the methods usable by the Open Skills API, including processing algorithms and utilities for computing our jobs and skills taxonomy.
New to Skills-ML? Check out the Skills-ML Tour! It will get you started with the concepts. You can also check out the notebook version of the tour which you can run on your own.
skills-ml depends on python3.6, so create a virtual environment using a python3.6 executable.
virtualenv venv -p /usr/bin/python3.6
Activate your virtualenv
source venv/bin/activate
pip install skills-ml
import skills_ml
- There are a couple of examples of specific uses of components to perform specific tasks in examples.
- Check out the descriptions of different algorithm types in algorithms/ and look at any individual directories that match what you'd like to do (e.g. skill extraction, job title normalization)
- skills-airflow is the open-source production system that uses skills-ml algorithms in an Airflow pipeline to generate open datasets
skills-ml uses a forked version of pydocmd, and a custom script to keep the pydocmd config file up to date. Here's how to keep the docs updated before you push:
$ cd docs $ PYTHONPATH="../" python update_docs.py # this will update docs/pydocmd.yml with the package/module structure and export the Skills-ML Tour notebook to the documentation directory $ pydocmd serve # will serve local documentation that you can check in your browser $ pydocmd gh-deploy # will update the gh-pages branch
- algorithms/ - Core algorithmic module. Each submodule is meant to contain a different type of component, such as a job title normalizer or a skill tagger, with a common interface so different pipelines can try out different versions of the components.
- datasets/ - Wrappers for interfacing with different datasets, such as ONET, Urbanized Area.
- evaluation/ - Code for testing different components against each other.
This project is licensed under the MIT License - see the LICENSE.md
file for details.