/MindAlpha

Primary LanguageC++Apache License 2.0Apache-2.0

MindAlpha

MindAlpha is a machine learning platform integrating PySpark, PyTorch and a parameter server implementation. The platform contains native support for sparse parameters, making it easy for users to develop large-scale models. Together with MindAlpha Serving, the platform provides a one-stop solution for data preprocessing, model training and online prediction.

Features

  • Efficient IO with PySpark. Minibatches read by PySpark as pandas DataFrames can be feed directly to models.

  • Similar API with PyTorch and Spark MLlib, users familar with PyTorch and PySpark can get started quickly.

  • Wrap custom sparse layers as PyTorch modules, making them easy to use. Those sparse layers can contain billions of parameters.

  • Models can be developed in Jupyter Notebook interactively and periodical model training can be scheduled by Airflow.

  • The trained model can be exported via one method call and loaded by MindAlpha Serving for online prediction.

Build

Firstly, run script to build a docker image

sh run_build.sh -i

For more details, please refer to docker/ubuntu20.04/Dockerfile and docker/centos7/Dockerfile.

and run script to compile sources(*cpp && py) to get dynamic-link library (*.so) and python install packages (*.whl) which will generate at directory build by default.

sh run_build.sh -m

Tutorials

Two tutorials are given: