/fuel

A data pipeline framework for machine learning

Primary LanguagePythonMIT LicenseMIT

https://travis-ci.org/mila-udem/fuel.svg?branch=master https://readthedocs.org/projects/fuel/badge/?version=latest https://requires.io/github/mila-udem/fuel/requirements.svg?branch=master

Fuel

Fuel provides your machine learning models with the data they need to learn.

  • Interfaces to common datasets such as MNIST, CIFAR-10 (image datasets), Google's One Billion Words (text), and many more
  • The ability to iterate over your data in a variety of ways, such as in minibatches with shuffled/sequential examples
  • A pipeline of preprocessors that allow you to edit your data on-the-fly, for example by adding noise, extracting n-grams from sentences, extracting patches from images, etc.
  • Ensure that the entire pipeline is serializable with pickle; this is a requirement for being able to checkpoint and resume long-running experiments. For this, we rely heavily on the picklable_itertools library.

Fuel is developed primarily for use by Blocks, a Theano toolkit that helps you train neural networks.

If you have questions, don't hesitate to write to the mailing list.

Citing Fuel

If you use Blocks or Fuel in your work, we'd really appreciate it if you could cite the following paper:

Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, and Yoshua Bengio, "Blocks and Fuel: Frameworks for deep learning," arXiv preprint arXiv:1506.00619 [cs.LG], 2015.

Documentation
Please see the documentation for more information.