kk663/znn-release

Multi-core CPU implementation of deep learning for 2D and 3D sliding window convolutional networks (ConvNets).

C++GPL-3.0

ZNN

Most of current deep learning implementation use GPU, but GPU has some limitations:

SIMD (Single Instruction Multiple Data). A single instruction decoder - all cores do same work.
- divergence kills performance
Parallelization done per convolution(s)
- Direct convolution, computationally expensive
- FFT, can’t efficiently utilize all cores
Memory limitations
- Can’t cache FFT transforms for reuse
- limit the dense output size (few alternatives for this feature)

ZNN shines when Filter sizes are large so that FFTs are used

Wide and deep networks
Bigger output patch ZNN is the only (reasonable) open source solution
Very deep networks with large filters
FFTs of the feature maps and gradients can fit in RAM, but couldn’t fit on the GPU
run out of the box on future MUUUUULTI core machines

Resources

Publications

Zlateski, A., Lee, K. & Seung, H. S. (2015) ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines. (arXiv link)
Lee, K., Zlateski, A., Vishwanathan, A. & Seung, H. S. (2015) Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Detection. (arXiv link)

Contact

C++ core

Aleksander Zlateski <zlateski@mit.edu>
Kisuk Lee <kisuklee@mit.edu>

Python Interface

Jingpeng Wu <jingpeng@princeton.edu>
Nicholas Turner <nturner@cs.princeton.edu>