尝试使用CNN实现视线预测 会逐步从简单的神经网络实现入手,搭起自己的CNN模型。 项目进展参考网站: http://paulweihan.github.io 实现LeNet5 model00 done 改lenet5 model01-03 done 改googlenet model04,05 done model00 得到和[ CVPR 2015 ] Appearance-Based Gaze Estimation in the Wild 近似的结果 主要参考: 1,深度学习教程网站的python代码 http://deeplearning.net/tutorial/lenet.html#lenet http://blog.csdn.net/u012162613/article/details/43225445 基于python theano实现的经典LeNet结构,第二个链接对第一个链接中给出的代码进行了注释: 代码来自于深度学习教程:Convolutional Neural Networks (LeNet) [第一个链接],这个代码实现的是一个简化了的LeNet5,具体如下: 没有实现location-specific gain and bias parameters 用的是maxpooling,而不是average_pooling 分类器用的是softmax,LeNet5用的是rbf LeNet5第二层并不是全连接的,本程序实现的是全连接 另外,代码里将卷积层和子采用层合在一起,定义为“LeNetConvPoolLayer“(卷积采样层),这好理解,因为它们总是成对出现。但是有个地方需要注意,代码中将卷积后的输出直接作为子采样层的输入,而没有加偏置b再通过sigmoid函数进行映射,即没有了下图中fx后面的bx以及sigmoid映射,也即直接由fx得到Cx。 最后,代码中第一个卷积层用的卷积核有20个,第二个卷积层用50个,而不是上面那张LeNet5图中所示的6个和16个。 2,斯坦福的cs231n课程网站: http://cs231n.stanford.edu http://cs231n.stanford.edu/syllabus.html * 3,Hacker's guide to Neural Networks 理解BP,SVM,NN算法及其实现 http://karpathy.github.io/neuralnets/ * 4,Unsupervised Feature Learning and Deep Learning http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial 注:最新的教程网址是:http://ufldl.stanford.edu/ 吴恩达/Andrew Ng主编 学之前先看:他的机器学习 http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning 5,代码基于githup的Deep Learning的toolbox https://github.com/rasmusbergpalm/DeepLearnToolbox http://blog.csdn.net/zouxy09/article/details/9993743/ * 6,Caffe网站: http://caffe.berkeleyvision.org Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional Architecture for Fast Feature Embedding[J]. Eprint Arxiv, 2014. http://www.csdn.net/article/2015-01-22/2823663 CUDA:https://developer.nvidia.com/cuda-downloads 7, 菜鸟从零开始学习Deep learning http://blog.csdn.net/yihaizhiyan/article/category/2388401 主要论文: [ CVPR 2015 ] Appearance-Based Gaze Estimation in the Wild [ CVPR 2014 ] Learning-by-Synthesis for Appearance-based 3D Gaze Estimation GoogLenet Szegedy C, Liu W, Jia Y, et al. Going Deeper with Convolutions[J]. Eprint Arxiv, 2014. DeepID-Net: http://www.ee.cuhk.edu.hk/~wlouyang/projects/imagenetDeepId/index.html Ouyang W, Luo P, Zeng X, et al. DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection[C]. Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015. Ouyang W, Luo P, Zeng X, et al. DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection[J]. Eprint Arxiv, 2014. . Sun Y, Wang X, Tang X. Deep Learning Face Representation from Predicting 10,000 Classes[C]// Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014:1891-1898. RCNN: https://github.com/rbgirshick/rcnn bibtex @inproceedings{girshick14CVPR, Author = {Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra}, Title = {Rich feature hierarchies for accurate object detection and semantic segmentation}, Booktitle = {Computer Vision and Pattern Recognition}, Year = {2014} } 经典论文: Notes on Convolutional Neural Networks Gradient-Based Learning Applied to Document Recognition My notes: paulweihan.github.io