CACUE is a light weighted Deep learning framework based on standard C++11. Aimed at the engineering aspect usage of deep learning projects. Contains different kinds of released models, includes classification models: 'lenet','vgg16','res18','res50','mobilenet', face detection: 'mtcnn', GANS: 'DCGAN' on cifar10, 'CycleGAN', etc. The framework is written by David Lu.
We intent to create an easily read and introduced DNN framework. By using the sample logic code, you can complie your DNN model on different kinds of devices. CACUE don't have many definitions, we've decoupled the operator algorithm logic from mathmetic calculation. You just need to focus on the operator compute logic, once you want to create new compute operator. By setting differnet definition, CACUE could help you to fast compute on different device. Also CACUE supports both dynamic computing and static computing. You may find that CACUE's operator could be used as differentiable operator, we also supply different mathmetic operators.
Easily included in your system.
#include "cacu.h" using namespace cacu;
that's all you need to do. If you want to compile with blas, open ROOT_PATH/config.h.
#define CBLASTYPE OPENBLAS // for cblas usage. #define PARALLELTYPE OPENBLAS // for parallel type usage.
You can set use_deivce on, if you want to use GPU or other avaible computing deivce to compile CACUE. Less dependencies(opencv,openblas,mkl,cuda.cudnn) or NO dependency, that's all depends on your project.
Switch on static computing and dynamic computing.
#define __OPERATOR__TYPE__ __DYNAMIC_GRAPH__ cacu_op *conv = new cacu_op(CACU_CONVOLUTION, new data_args(32, 32, 3, 3, 3), train); conv->get_param(0)->set_init_type(gaussian,0.1); conv->forward(blobs);
Dynamic computing is and important feature for a lot of algorithm but not in all scenes, CACUE provide easily method for the change. It's a flexiable usage for operator using.
Support unified math logic functions. DON'T need to focus on the heterogeneous environment. All operator just need to implement the operator logic.
We provide some of the example models that trained based on CACUE.
create mean file:
#include "example/mnist/mnist_data_proc.h"
//generate mean data
make_mean_mnist("/path/to/mnist/data/", "/path/to/");
train mnist model (cifar10 almost the same.):
//train model
#include <time.h>
#include "../../cacu/solvers/sgd_solver.h"
#include "../../cacu/solvers/adam_solver.h"
#include "../../cacu/cacu.h"
#include "../../cacu/config.h"
#include "../../tools/imageio_utils.h"
#include "../../tools/time_utils.h"
#include "lenet.h"
#include "mnist_data_proc.h"
using namespace cacu;
using namespace cacu_tools;
void train_net()
int batch_size = 100;
int max_iter = 5000;
#if __USE_DEVICE__ == ON
#if __PARALLELTYPE__ == __CUDA__
//set random seed
network *net = create_lenet(batch_size,train);
sgd_solver *sgd = new sgd_solver(net);
string datapath = "/home/luhaofang/git/caffe/data/mnist/";
std::ofstream logger(datapath + "loss.txt", ios::binary);
string meanfile = datapath + "mean.binproto";
vector<vec_t> full_data;
vector<vec_i> full_label;
load_data_bymean_mnist(datapath, meanfile, full_data, full_label);
//load_data(datapath, full_data, full_label);
blob *input_data = (blob*)net->input_blobs()->at(0);
bin_blob *input_label = (bin_blob*)net->input_blobs()->at(1);
int step_index = 0;
time_utils *timer = new time_utils();
unsigned long diff;
for (int i = 1 ; i < max_iter; ++i)
for (int j = 0 ; j < batch_size ; ++j)
if (step_index == kMNISTDataCount)
step_index = 0;
input_data->copy2data(full_data[step_index], j);
step_index += 1;
//cacu_print(net->get_op<inner_product_op>(net->op_count() - 2, CACU_INNERPRODUCT)->out_data<blob>()->s_data(), 10);
if(i % 10 == 0){
LOG_INFO("iter_%d, lr: %f, %ld ms/iter", i, sgd->lr(), timer->get_time_span() / 1000);
net->get_op<softmax_with_loss_op>(net->op_count() - 1, CACU_SOFTMAX_LOSS)->echo();
logger << net->get_op<softmax_with_loss_op>(net->op_count() - 1, CACU_SOFTMAX_LOSS)->loss() << endl;
if(i % 4000 == 0)
LOG_INFO("optimization is done!");
net->save_weights(datapath + "lenet.model");
delete net;
delete sgd;
delete timer;
#if __USE_DEVICE__ == ON
#if __PARALLELTYPE__ == __CUDA__
Inference running time cost:
ave(ms) | max(ms) | min(ms) | acc | |
res18net | 99 | 123 | 95 | 66.71% |
res50net | 192 | 204 | 187 | 72.15% |
vgg16net | 702 | 732 | 679 | 66.41% |
mobilenet | 110 | 127 | 106 | 67.85% |
ave(ms) | max(ms) | min(ms) | acc | |
res18net | 8 | 8 | 8 | 66.87% |
res50net | 18 | 19 | 18 | 71.80% |
vgg16net | 19 | 20 | 19 | 65.98% |
mobilenet | 32 | 37 | 32 | 67.73% |
All the models are trained without data argumentation.
vgg16net feature map demonstration.
This implementation is referred to MTCNN
DCGAN on cifar10 demonstration.
CycleGAN on imagenet dataset demonstration.
Loss function: sigmoid with cross entropy.
zebra->horse | horse->zebra |
-> | -> |
-> | -> |
-> | -> |
