Neuraxle Pipelines

Code Machine Learning Pipelines - The Right Way.

https://img.shields.io/github/workflow/status/Neuraxio/Neuraxle/Test%20Python%20Package/master?:alt:Build

https://img.shields.io/gitter/room/Neuraxio/Neuraxle?:alt:Gitter

https://img.shields.io/pypi/l/neuraxle?:alt:PyPI-License

https://img.shields.io/pypi/dm/neuraxle?:alt:PyPI-Downloads

https://img.shields.io/github/commit-activity/m/neuraxio/neuraxle?:alt:GitHubcommitactivity

https://img.shields.io/github/v/release/neuraxio/neuraxle?:alt:GitHubrelease(latestbydate)

Neuraxle is a Machine Learning (ML) library for building machine learning pipelines.

Component-Based: Build encapsulated steps, then compose them to build complex pipelines.
Evolving State: Each pipeline step can fit, and evolve through the learning process
Hyperparameter Tuning: Optimize your pipelines using AutoML, where each pipeline step has their own hyperparameter space.
Compatible: Use your favorite machine learning libraries inside and outside Neuraxle pipelines.
Production Ready: Pipeline steps can manage how they are saved by themselves, and the lifecycle of the objects allow for train, and test modes.
Streaming Pipeline: Transform data in many pipeline steps at the same time in parallel using multiprocessing Queues.

Documentation

You can find the Neuraxle documentation on the website.

The documentation is divided into several sections:

Installation

Simply do:

pip install neuraxle

Examples

We have several examples on the website.

For example, you can build a time series processing pipeline as such:

p = Pipeline([
    TrainOnly(DataShuffler()),
    WindowTimeSeries(),
    MiniBatchSequentialPipeline([
        Tensorflow2ModelStep(
            create_model=create_model,
            create_optimizer=create_optimizer,
            create_loss=create_loss
        ).set_hyperparams(HyperparameterSpace({
            'hidden_dim': 12,
            'layers_stacked_count': 2,
            'lambda_loss_amount': 0.0003,
            'learning_rate': 0.001
            'window_size_future': sequence_length,
            'output_dim': output_dim,
            'input_dim': input_dim
        })).set_hyperparams_space(HyperparameterSpace({
            'hidden_dim': RandInt(6, 750),
            'layers_stacked_count': RandInt(1, 4),
            'lambda_loss_amount': Uniform(0.0003, 0.001),
            'learning_rate': Uniform(0.001, 0.01),
            'window_size_future': FixedHyperparameter(sequence_length),
            'output_dim': FixedHyperparameter(output_dim),
            'input_dim': FixedHyperparameter(input_dim)
        }))
    ])
])

# Load data
X_train, y_train, X_test, y_test = generate_classification_data()

# The pipeline will learn on the data and acquire state.
p = p.fit(X_train, y_train)

# Once it learned, the pipeline can process new and
# unseen data for making predictions.
y_test_predicted = p.predict(X_test)

You can also tune your hyperparameters using AutoML algorithms such as the TPE:

auto_ml = AutoML(
    pipeline=pipeline,
    hyperparams_optimizer=TreeParzenEstimatorHyperparameterSelectionStrategy(
        number_of_initial_random_step=10,
        quantile_threshold=0.3,
        number_good_trials_max_cap=25,
        number_possible_hyperparams_candidates=100,
        prior_weight=0.,
        use_linear_forgetting_weights=False,
        number_recent_trial_at_full_weights=25
    ),
    validation_splitter=ValidationSplitter(test_size=0.20),
    scoring_callback=ScoringCallback(accuracy_score, higher_score_is_better=True),
    callbacks[
        MetricCallback(f1_score, higher_score_is_better=True),
        MetricCallback(precision, higher_score_is_better=True),
        MetricCallback(recall, higher_score_is_better=True)
    ],
    n_trials=7,
    epochs=10,
    hyperparams_repository=HyperparamsJSONRepository(cache_folder='cache'),
    refit_trial=True,
)

# Load data, and launch AutoML loop !
X_train, y_train, X_test, y_test = generate_classification_data()
auto_ml = auto_ml.fit(X_train, y_train)

# Get the model from the best trial, and make predictions using predict.
best_pipeline = auto_ml.get_best_model()
y_pred = best_pipeline.predict(X_test)

Why Neuraxle ?

Most research projects don't ever get to production. However, you want your project to be production-ready and already adaptable (clean) by the time you finish it. You also want things to be simple so that you can get started quickly. Read more about the why of Neuraxle here.

Community

Join our Slack workspace and our Gitter! We <3 collaborators. You can also subscribe to our mailing list where we will post updates and news.

For technical questions, we recommend posting them on StackOverflow first with neuraxle in the tags (amongst probably python and machine-learning), and then opening an issue to link to your Stack Overflow question.

For suggestions, comments, and issues, don't hesitate to open an issue.

For contributors, we recommend using the PyCharm code editor and to let it manage the virtual environment, with the default code auto-formatter, and using pytest as a test runner. To contribute, first fork the project, then do your changes, and then open a pull request in the main repository. Please make your pull request(s) editable, such as for us to add you to the list of contributors if you didn't add the entry, for example. Ensure that all tests run before opening a pull request. You'll also agree that your contributions will be licensed under the Apache 2.0 License, which is required for everyone to be able to use your open-source contributions.

License

Neuraxle is licensed under the Apache License, Version 2.0.

Citation

You may cite our extended abstract that was presented at the Montreal Artificial Intelligence Symposium (MAIS) 2019. Here is the bibtex code to cite:

@misc{neuraxle,
author = {Chevalier, Guillaume and Brillant, Alexandre and Hamel, Eric},
year = {2019},
month = {09},
pages = {},
title = {Neuraxle - A Python Framework for Neat Machine Learning Pipelines},
doi = {10.13140/RG.2.2.33135.59043}
}