/shifu

An end-to-end machine learning and data mining framework on Hadoop

Primary LanguageJavaApache License 2.0Apache-2.0

Shifu

Build Status

Getting Started

Please visit shifu.ml for download infomation, installation instructions, and our wiki page for current tutorial.

Conference

QCON Shanghai 2015 Slides

What is Shifu?

Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. Shifu is designed for data scientists, simplifying the life-cycle of building machine learning models. While originally built for fraud modeling, Shifu is generalized for many other modeling domains.

Shifu provides a simple command-line interface for each step of the model building process, including

Shifu’s fast Hadoop-based, distributed neural network / logistic regression / gradient boosted trees training can reduce model training time from days to hours on TB data sets. Shifu integrates with Pig workflows on Hadoop, and Shifu-trained models can be integrated into production code with a simple Java API. Shifu leverages Pig, Akka, Encog and other open source projects.

Model details about shifu can be found in our wiki pages

Contributors

Google Group

Please join Shifu group if questions, bugs or anything else.

Copyright and License

Copyright 2012-2016, PayPal Software Foundation under the Apache License.