Most of current deep learning implementation use GPU, but GPU has some limitations:
- SIMD (Single Instruction Multiple Data). A single instruction decoder - all cores do same work.
- divergence kills performance
- Parallelization done per convolution(s)
- Direct convolution, computationally expensive
- FFT, can’t efficiently utilize all cores
- Memory limitations
- Can’t cache FFT transforms for reuse
- limit the dense output size (few alternatives for this feature)
ZNN shines when Filter sizes are large so that FFTs are used
- Wide and deep networks
- Bigger output patch ZNN is the only (reasonable) open source solution
- Very deep networks with large filters
- FFTs of the feature maps and gradients can fit in RAM, but couldn’t fit on the GPU
- run out of the box on future MUUUUULTI core machines
- Zlateski, A., Lee, K. & Seung, H. S. (2015) ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines. (arXiv link)
- Lee, K., Zlateski, A., Vishwanathan, A. & Seung, H. S. (2015) Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Detection. (arXiv link)
C++ core
- Aleksander Zlateski <zlateski@mit.edu>
- Kisuk Lee <kisuklee@mit.edu>
Python Interface
- Jingpeng Wu <jingpeng@princeton.edu>
- Nicholas Turner <nturner@cs.princeton.edu>