CACU's Evolution version

CACUE is a light weighted Deep learning framework based on standard C++11. Aimed at the engineering aspect usage of deep learning projects. Contains different kinds of released models, includes classification models: 'lenet','vgg16','res18','res50','mobilenet', face detection: 'mtcnn', GANS: 'DCGAN' on cifar10, 'CycleGAN', etc. The framework is written by David Lu.

Brief

We intent to create an easily read and introduced DNN framework. By using the sample logic code, you can complie your DNN model on different kinds of devices. CACUE don't have many definitions, we've decoupled the operator algorithm logic from mathmetic calculation. You just need to focus on the operator compute logic, once you want to create new compute operator. By setting differnet definition, CACUE could help you to fast compute on different device. Also CACUE supports both dynamic computing and static computing. You may find that CACUE's operator could be used as differentiable operator, we also supply different mathmetic operators.

Features

Easily included in your system.
```
  #include "cacu.h" 
  using namespace cacu;
```
that's all you need to do. If you want to compile with blas, open ROOT_PATH/config.h.

#define CBLASTYPE OPENBLAS // for cblas usage. #define PARALLELTYPE OPENBLAS // for parallel type usage.

You can set use_deivce on, if you want to use GPU or other avaible computing deivce to compile CACUE. Less dependencies(opencv,openblas,mkl,cuda.cudnn) or NO dependency, that's all depends on your project.
Switch on static computing and dynamic computing.
```
  #define __OPERATOR__TYPE__ __DYNAMIC_GRAPH__

  cacu_op *conv = new cacu_op(CACU_CONVOLUTION, new data_args(32, 32, 3, 3, 3), train);
  conv->get_param(0)->set_init_type(gaussian,0.1);
  conv->forward(blobs);
```
Dynamic computing is and important feature for a lot of algorithm but not in all scenes, CACUE provide easily method for the change. It's a flexiable usage for operator using.
Support unified math logic functions. DON'T need to focus on the heterogeneous environment. All operator just need to implement the operator logic.

Examples

We provide some of the example models that trained based on CACUE.

mnist && cifar10

create mean file:

#include "example/mnist/mnist_data_proc.h"
//generate mean data
make_mean_mnist("/path/to/mnist/data/", "/path/to/mean.data");

train mnist model (cifar10 almost the same.):

//train model
#include <time.h>

#include "../../cacu/solvers/sgd_solver.h"
#include "../../cacu/solvers/adam_solver.h"

#include "../../cacu/cacu.h"
#include "../../cacu/config.h"

#include "../../tools/imageio_utils.h"
#include "../../tools/time_utils.h"

#include "lenet.h"
#include "mnist_data_proc.h"

using namespace cacu;
using namespace cacu_tools;

void train_net()
{
    int batch_size = 100;

    int max_iter = 5000;

#if __USE_DEVICE__ == ON
#if __PARALLELTYPE__ == __CUDA__
    cuda_set_device(0);

#endif
#endif
    //set random seed
    set_rand_seed();

    network *net = create_lenet(batch_size,train);

    sgd_solver *sgd = new sgd_solver(net);
    sgd->set_lr(0.01f);
    sgd->set_momentum(0.9f);
    sgd->set_weight_decay(0.0005f);

    string datapath = "/home/luhaofang/git/caffe/data/mnist/";

    std::ofstream logger(datapath + "loss.txt", ios::binary);
    logger.precision(std::numeric_limits<cacu::float_t>::digits10);

    string meanfile = datapath + "mean.binproto";

    vector<vec_t> full_data;
    vector<vec_i> full_label;

    load_data_bymean_mnist(datapath, meanfile, full_data, full_label);
    //load_data(datapath, full_data, full_label);

    blob *input_data = (blob*)net->input_blobs()->at(0);
    bin_blob *input_label = (bin_blob*)net->input_blobs()->at(1);

    int step_index = 0;
    time_utils *timer = new time_utils();
    unsigned long diff;
    for (int i = 1 ; i < max_iter; ++i)
    {
        timer->start();

        for (int j = 0 ; j < batch_size ; ++j)
        {
            if (step_index == kMNISTDataCount)
                step_index = 0;
            input_data->copy2data(full_data[step_index], j);
            input_label->copy2data(full_label[step_index],j);
            step_index += 1;
        }
        
        sgd->train_iter(i);
        //cacu_print(net->get_op<inner_product_op>(net->op_count() - 2, CACU_INNERPRODUCT)->out_data<blob>()->s_data(), 10);

        timer->end();

        if(i % 10 == 0){

            LOG_INFO("iter_%d, lr: %f, %ld ms/iter", i, sgd->lr(), timer->get_time_span() / 1000);
            net->get_op<softmax_with_loss_op>(net->op_count() - 1, CACU_SOFTMAX_LOSS)->echo();

            logger << net->get_op<softmax_with_loss_op>(net->op_count() - 1, CACU_SOFTMAX_LOSS)->loss() << endl;
            logger.flush();
        }

        if(i % 4000 == 0)
            sgd->set_lr_iter(0.1f);

    }
    LOG_INFO("optimization is done!");
    net->save_weights(datapath + "lenet.model");

    vector<vec_t>().swap(full_data);
    vector<vec_i>().swap(full_label);
    logger.close();
    delete net;
    delete sgd;

    delete timer;

#if __USE_DEVICE__ == ON
#if __PARALLELTYPE__ == __CUDA__
    cuda_release();
#endif
#endif
}

imageNet

Inference running time cost:

-cpu

	ave(ms)	max(ms)	min(ms)	acc
res18net	99	123	95	66.71%
res50net	192	204	187	72.15%
vgg16net	702	732	679	66.41%
mobilenet	110	127	106	67.85%

-gpu

	ave(ms)	max(ms)	min(ms)	acc
res18net	8	8	8	66.87%
res50net	18	19	18	71.80%
vgg16net	19	20	19	65.98%
mobilenet	32	37	32	67.73%

All the models are trained without data argumentation.

pre-trained res18net
pre-trained res50net
pre-trained vgg16net
pre-trained mobilenet

vgg16net feature map demonstration.

MTCNN (just demo, need modified)

This implementation is referred to MTCNN

DCGAN

DCGAN on cifar10 demonstration.

5000 iterations:
6000 iterations:
7000 iterations:
8000 iterations:

CycleGAN

CycleGAN on imagenet dataset demonstration.

Loss function: sigmoid with cross entropy.

zebra->horse	horse->zebra
->	->
->	->
->	->

References

[1] A Krizhevsky, I Sutskever, GE Hinton. Imagenet classification with deep convolutional neural networks.. Advances in neural information processing systems, 2012: 1097-1105.

[2] Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. arXiv preprint arXiv:1603.05279, 2016.

[3] S Ioffe, C Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift.. arXiv preprint arXiv:1502.03167, 2015.

[4] Courbariaux M, Bengio Y. Binarynet: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016.

[5] Radford, Alec, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks.. arXiv preprint arXiv:1511.06434, 2015.

[6] Zhang K, Zhang Z, Li Z, Qiao Y. Joint face detection and alignment using multitask cascaded convolutional networks.. IEEE Signal Processing Letters, 2016 Oct;23(10):1499-503.

[7] Howard, Andrew G., et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications.. arXiv preprint arXiv:1704.04861, 2017.

[8] He, Kaiming, et al. Deep residual learning for image recognition.. CVPR, 2016.

[9] Zhu, Jun-Yan, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks.. ICCV, 2017.

[10] Krizhevsky, Alex, Vinod Nair, and Geoffrey Hinton. [The CIFAR-10 dataset.]. online: http://www.cs.toronto.edu/kriz/cifar.html, 2014.

[11] LeCun, Yann, Corinna Cortes, and C. J. Burges. "MNIST handwritten digit database." AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2010.

[12] Deng, Jia, et al. [Imagenet: A large-scale hierarchical image database.]. CVPR, 2009.

luhaofang/CACUE