/Automated-Manual-Comparison

Automated vs Manual Feature Engineering Comparison. Implemented using Featuretools.

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Manual vs Automated Feature Engineering Comparison

The traditional process of manual feature engineering requires building one feature at a time by hand informed by domain knowledge. This is tedious, time-consuming, error prone, and perhaps most importantly, specific to each dataset, which means the code will have to be re-written for each problem.

Automated feature engineering with Featuretools allows one to create thousands of features automatically from a set of related tables using a framework that can be easily applied to any problem.

Featuretools

Highlights

Featuretools offers us the following benefits:

  1. Up to 10x reduction in development time
  2. Better predictive performance
  3. Interpretable features with real-world significance
  4. Fits into existing machine learning pipelines
  5. Ensures data is valid in time-series problems

Automated feature engineering will change the way you do machine learning by allowing you to develop better predictive models in a fraction of the time as the traditional approach.

Article

For the highlights of the project, check out "Why Automated Feature Engineering Will Change the Way You Do Machine Learning" on Towards Data Science (Link)

Results

Each of the 3 projects in this repository demonstrates different benefits of using automated feature enginering.

  1. Loan Repayment Prediction: Build Better Models Faster

Given a dataset of 58 millions rows spread across 7 tables and the task of predicting whether or not a client will default on a loan, Featuretools delivered a better predictive model in a fraction of the time as manual feature engineering. The features built by Featuretools are also human-intrepretable and can give us insight into the problem:

  1. Retail Spending Prediction: Ensure Models Use Valid Data

When we have time-series data, we traditionally have to be extremely careful about making sure our model only trains on valid data. Often, a model will work in development only to completely fail in deployment because the training data was not properly filtered based on the time. Featuretools can take care of time filters automatically, allowing us to focus on other aspects of the machine learning pipeline and delivering better overall predictive models:

  1. Engine Life Prediction: Automatically Create Meaningful Features

In this problem of predicting how long an engine will run until it fails, we observe that Featuretools creates meaningful features which can inform our thinking about real-world problems as seen in the most important features:

Scaling with Dask

For examples of how Featuretools can scale - either on a single machine or a cluster - see the Feature Matrix with Dask EntitySet and Featuretools on Dask notebooks.

Feature Labs

Featuretools

Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Contact

Any questions can be directed to help@featurelabs.com