This repo is still being tested. Yet the most of fuction has been complished in the branch of 'test', so please

git checkout test

and compile the whole thing.

NVCaffe

16 bit (half) floating point train and inference support.
Mixed-precision support. It allows to store and/or compute data in either 64, 32 or 16 bit formats. Precision can be defined for every layer (forward and backward passes might be different too), or it can be set for the whole Net.
Integration with cuDNN v6.
Automatic selection of the best cuDNN convolution algorithm.
Integration with v1.3.4 of NCCL library for improved multi-GPU scaling.
Optimized GPU memory management for data and parameters storage, I/O buffers and workspace for convolutional layers.
Parallel data parser and transformer for improved I/O performance.
Parallel back propagation and gradient reduction on multi-GPU systems.
Fast solvers implementation with fused CUDA kernels for weights and history update.
Multi-GPU test phase for even memory load across multiple GPUs.
Backward compatibility with BVLC Caffe and NVCaffe 0.15.
Extended set of optimized models (including 16 bit floating point examples).

Luoyadan/ssd-NVcaffe