/fastcore

Python supercharged for the fastai library

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Welcome to fastcore

Python is a powerful, dynamic language. Rather than bake everything into the language, it lets the programmer customize it to make it work for them. fastcore uses this flexibility to add to Python features inspired by other languages we’ve loved, like multiple dispatch from Julia, mixins from Ruby, and currying, binding, and more from Haskell. It also adds some “missing features” and clean up some rough edges in the Python standard library, such as simplifying parallel processing, and bringing ideas from NumPy over to Python’s list type.

Installing

To install fastcore run: conda install fastcore (if you use Anaconda, which we strongly recommend) or pip install fastcore. For an editable install, clone this repo and run: pip install -e ".[dev]".

fastcore is tested to work on Ubuntu, Macos and Windows, for the versions tagged with the -latest suffix in these docs.

A tour

fastcore contains many features. See the docs for all the details, which cover the modules provided:

  • test: Simple testing functions
  • foundation: Mixins, delegation, composition, and more
  • xtras: Utility functions to help with functional-style programming, parallel processing, and more
  • dispatch: Multiple dispatch methods
  • transform: Pipelines of composed partially reversible transformations

Here’s a (somewhat) quick tour of a few higlights, showing examples from each of these modules.

Documentation

All fast.ai projects, including this one, are built with nbdev, which is a full literate programming environment built on Jupyter Notebooks. That means that every piece of documentation, including the page you’re reading now, can be accessed as interactive Jupyter notebooks. In fact, you can even grab a link directly to a notebook running interactively on Google Colab - if you want to follow along with this tour, click the link below, or click the badge at the top of the page:

colab_link('index')

Open index in Colab

The full docs are available at fastcore.fast.ai. The code in the examples and in all fast.ai libraries follow the fast.ai style guide. In order to support interactive programming, all fast.ai libraries are designed to allow for import * to be used safely, particular by ensuring that __all__ is defined in all packages. In order to see where a function is from, just type it:

coll_repr
<function fastcore.foundation.coll_repr(c, max_n=10)>

For more details, including a link to the full documentation and source code, use doc, which pops up a window with this information:

doc(coll_repr)

The documentation also contains links to any related functions or classes, which appear like this: coll_repr (in the notebook itself you will just see a word with back-ticks around it; the links are auto-generated in the documentation site). The documentation will generally show one or more examples of use, along with any background context necessary to understand them. As you’ll see, the examples for each function and method are shown as tests, rather than example outputs, so let’s start by explaining that.

Testing

fastcore’s testing module is designed to work well with nbdev, which is a full literate programming environment built on Jupyter Notebooks. That means that your tests, docs, and code all live together in the same notebook. fastcore and nbdev’s approach to testing starts with the premise that all your tests should pass. If one fails, no more tests in a notebook are run.

Tests look like this:

test_eq(coll_repr(range(1000), 5), '(#1000) [0,1,2,3,4...]')

That’s an example from the docs for coll_repr. As you see, it’s not showing you the output directly. Here’s what that would look like:

coll_repr(range(1000), 5)
'(#1000) [0,1,2,3,4...]'

So, the test is actually showing you what the output looks like, because if the function call didn’t return '(#1000) [0,1,2,3,4...]', then the test would have failed.

So every test shown in the docs is also showing you the behavior of the library — and vice versa!

Test functions always start with test_, and then follow with the operation being tested. So test_eq tests for equality (as you saw in the example above). This includes tests for equality of arrays and tensors, lists and generators, and many more:

test_eq([0,1,2,3], np.arange(4))

When a test fails, it prints out information about what was expected:

test_eq([0,1,2,3], np.arange(3))
----
  AssertionError: ==:
  [0, 1, 2, 3]
  [0 1 2]

If you want to check that objects are the same type, rather than the just contain the same collection, use test_eq_type.

You can test with any comparison function using test, e.g test whether an object is less than:

test(2, 3, operator.lt)

You can even test that exceptions are raised:

def divide_zero(): return 1/0
test_fail(divide_zero)

…and test that things are printed to stdout:

test_stdout(lambda: print('hi'), 'hi')

Foundations

fast.ai is unusual in that we often use mixins in our code. Mixins are widely used in many programming languages, such as Ruby, but not so much in Python. We use mixins to attach new behavior to existing libraries, or to allow modules to add new behavior to our own classes, such as in extension modules. One useful example of a mixin we define is Path.ls, which lists a directory and returns an L (an extended list class which we’ll discuss shortly):

p = Path('images')
p.ls()
(#6) [Path('images/mnist3.png'),Path('images/att_00000.png'),Path('images/att_00005.png'),Path('images/att_00007.png'),Path('images/att_00006.png'),Path('images/puppy.jpg')]

You can easily add you own mixins with the patch decorator, which takes advantage of Python 3 function annotations to say what class to patch:

@patch
def num_items(self:Path): return len(self.ls())

p.num_items()
6

We also use **kwargs frequently. In python **kwargs in a parameter like means “put any additional keyword arguments into a dict called kwargs”. Normally, using kwargs makes an API quite difficult to work with, because it breaks things like tab-completion and popup lists of signatures. utils provides use_kwargs and delegates to avoid this problem. See our detailed article on delegation on this topic.

GetAttr solves a similar problem (and is also discussed in the article linked above): it’s allows you to use Python’s exceptionally useful __getattr__ magic method, but avoids the problem that normally in Python tab-completion and docs break when using this. For instance, you can see here that Python’s dir function, which is used to find the attributes of a python object, finds everything inside the self.default attribute here:

class Author:
    def __init__(self, name): self.name = name

class ProductPage(GetAttr):
    _default = 'author'
    def __init__(self,author,price,cost): self.author,self.price,self.cost = author,price,cost

p = ProductPage(Author("Jeremy"), 1.50, 0.50)
[o for o in dir(p) if not o.startswith('_')]
['author', 'cost', 'name', 'price']

Looking at that ProductPage example, it’s rather verbose and duplicates a lot of attribute names, which can lead to bugs later if you change them only in one place. fastcore provides store_attr to simplify this common pattern. It also provides basic_repr to give simple objects a useful repr:

class ProductPage:
    def __init__(self,author,price,cost): store_attr()
    __repr__ = basic_repr('author,price,cost')

ProductPage("Jeremy", 1.50, 0.50)
ProductPage(author='Jeremy', price=1.5, cost=0.5)

One of the most interesting fastcore functions is the funcs_kwargs decorator. This allows class behavior to be modified without sub-classing. This can allow folks that aren’t familiar with object-oriented programming to customize your class more easily. Here’s an example of a class that uses funcs_kwargs:

@funcs_kwargs
class T:
    _methods=['some_method']
    def __init__(self, **kwargs): assert not kwargs, f'Passed unknown args: {kwargs}'

p = T(some_method = print)
p.some_method("hello")
hello

The assert not kwargs above is used to ensure that the user doesn’t pass an unknown parameter (i.e one that’s not in _methods). fastai uses funcs_kwargs in many places, for instance, you can customize any part of a DataLoader by passing your own methods.

fastcore also provides many utility functions that make a Python programmer’s life easier, in fastcore.utils. We won’t look at many here, since you can easily look at the docs yourself. To get you started, have a look at the docs for chunked (remember, if you’re in a notebook, type doc(chunked)), which is a handy function for creating lazily generated batches from a collection.

Python’s ProcessPoolExecutor is extended to allow max_workers to be set to 0, to easily turn off parallel processing. This makes it easy to debug your code in serial, then run it in parallel. It also allows you to pass arguments to your parallel function, and to ensure there’s a pause between calls, in case the process you are running has race conditions. parallel makes parallel processing even easier to use, and even adds an optional progress bar.

L

Like most languages, Python allows for very concise syntax for some very common types, such as list, which can be constructed with [1,2,3]. Perl’s designer Larry Wall explained the reasoning for this kind of syntax:

In metaphorical honor of Huffman’s compression code that assigns smaller numbers of bits to more common bytes. In terms of syntax, it simply means that commonly used things should be shorter, but you shouldn’t waste short sequences on less common constructs.

On this basis, fastcore has just one type that has a single letter name: L. The reason for this is that it is designed to be a replacement for list, so we want it to be just as easy to use as [1,2,3]. Here’s how to create that as an L:

L(1,2,3)
(#3) [1,2,3]

The first thing to notice is that an L object includes in its representation its number of elements; that’s the (#3) in the output above. If there’s more than 10 elements, it will automatically truncate the list:

p = L.range(20).shuffle()
p
(#20) [0,10,7,16,5,1,14,17,9,8...]

L contains many of the same indexing ideas that NumPy’s array does, including indexing with a list of indexes, or a boolean mask list:

p[2,4,6]
(#3) [7,5,14]

It also contains other methods used in array, such as L.argwhere:

p.argwhere(ge(15))
(#5) [3,7,11,14,18]

As you can see from this example, fastcore also includes a number of features that make a functional style of programming easier, such as a full range of boolean functions (e.g ge, gt, etc) which give the same answer as the functions from Python’s operator module if given two parameters, but return a curried function if given one parameter.

There’s too much functionality to show it all here, so be sure to check the docs. Many little things are added that we thought should have been in list in the first place, such as making this do what you’d expect (which is an error with list, but works fine with L):

1 + L(2,3,4)
(#4) [1,2,3,4]

Function dispatch and Transforms

Most Python programmers use object oriented methods and inheritance to allow different objects to behave in different ways even when called with the same method name. Some languages use a very different approach, such as Julia, which uses multiple dispatch generic functions. Python provides single dispatch generic functions as part of the standard library. fastcore provides multiple dispatch, with the typedispatch decorator (which is actually an instance of DispatchReg):

@typedispatch
def _f(x:numbers.Integral, y): return x+1
@typedispatch
def _f(x:int, y:float): return x+y

_f(3,2.0), _f(3,2)
(5.0, 4)

This approach to dispatch is particularly useful for adding implementations of functionality in extension modules or user code. It is heavily used in the Transform class. A Transform is the main building block of the fastai data pipelines. In the most general terms a transform can be any function you want to apply to your data, however the Transform class provides several mechanisms that make the process of building them easy and flexible (see the docs for information about each of these):

  • Type dispatch
  • Dispatch over tuples
  • Reversability
  • Type propagation
  • Preprocessing
  • Filtering based on the dataset type
  • Ordering
  • Appending new behavior with decorators

Transform looks for three special methods, encodes, decodes, and setups, which provide the implementation for __call__, decode, and setup respectively. For instance:

class A(Transform):
    def encodes(self, x): return x+1

A()(1)
2

For simple transforms like this, you can also use Transform as a decorator:

@Transform
def f(x): return x+1

f(1)
2

Transforms can be composed into a Pipeline:

@Transform
def g(x): return x/2

pipe = Pipeline([f,g])
pipe(3)
2.0

The power of Transform and Pipeline is best understood by seeing how they’re used to create a complete data processing pipeline. This is explained in chapter 11 of the fastai book, which is available for free in Jupyter Notebook format.

Contributing

After you clone this repository, please run nbdev_install_git_hooks in your terminal. This sets up git hooks, which clean up the notebooks to remove the extraneous stuff stored in the notebooks (e.g. which cells you ran) which causes unnecessary merge conflicts.

To run the tests in parallel, launch nbdev_test_nbs or make test.

Before submitting a PR, check that the local library and notebooks match. The script nbdev_diff_nbs can let you know if there is a difference between the local library and the notebooks. * If you made a change to the notebooks in one of the exported cells, you can export it to the library with nbdev_build_lib or make fastcore. * If you made a change to the library, you can export it back to the notebooks with nbdev_update_lib.