/nydssg_pipelines

Presentation at the NYC Data Science Study Group on how to streamline your cross-validation and classification workflow using scikit-learn's Pipelines and FeatureUnions modules.

Primary LanguageJupyter Notebook

As a prediction model grows in complexity, scikit-learn's Pipeline module and FeatureUnion module offers a convenient way to organize all of our data extraction, transformation, normalization, and training steps. By chaining transformers and estimators together, we can extract features into a single unit pipeline. Each feature pipeline can then be reordered and combined using FeatureUnion. This not only saves time, but allows us to keep our code better organized, while we look for the ideal combination of techniques for solving a modeling task.