Please visit shifu.ml for download infomation, installation instructions, and our wiki page for current tutorial.
Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. Shifu is designed for data scientists, simplifying the life-cycle of building machine learning models. While originally built for fraud modeling, Shifu is generalized for many other modeling domains.
Shifu provides a simple command-line interface for each step of the model building process, including
- Statistic calculation & variable selection to determine the most predictive variables in your data
- Variable normalization
- Distributed variable selection based on sensitivity analysis
- Distributed neural network model training
- Distributed tree ensemble model training
- Post training analysis & model evaluation
Shifu’s fast Hadoop-based, distributed neural network / logistic regression / gradient boosted trees training can reduce model training time from days to hours on TB data sets. Shifu integrates with Pig workflows on Hadoop, and Shifu-trained models can be integrated into production code with a simple Java API. Shifu leverages Pig, Akka, Encog and other open source projects.
Model details about shifu can be found in our wiki pages
- Zhanghao Hu (zhanhu@paypal.com)
- Grahame Jastrebski (gjastrebski@paypal.com)
- Lavar Li (lulli@paypal.com)
- Mark Liu (yliu15@paypal.com)
- David Zhang (pengzhang@paypal.com)
- Xin Zhong (xinzhong@paypal.com)
- Simon Zhang (jzhang13@paypal.com)
- Sharma Nitin (nsharma1@paypal.com)
Please join Shifu group if questions, bugs or anything else.
Copyright 2012-2016, PayPal Software Foundation under the Apache License.