foundations for deep learning:

  1. I emphasize mathematical/conceptual foundations because implementations of ideas(ex. Torch, Tensorflow) will keep evolving but the underlying theory must be sound. Anybody with an interest in deep learning can and should try to understand why things work.
  2. I include neuroscience as a useful conceptual foundation for two reasons. First, it may provide inspiration for future models and algorithms. Second, the success of deep learning can contribute to useful hypotheses and models for computational neuroscience.
  3. Information Theory is also a very useful foundation as there's a strong connection between data compression and statistical prediction. In fact, data compressors and machine learning models approximate Kolmogorov Complexity which is the ultimate data compressor.

You might notice that I haven't emphasized the latest benchmark-beating paper. My reason for this is that a good theory ought to be scalable which means that it should be capable of explaining why deep models generalise and we should be able to bootstrap these explanations for more complex models(ex. sequences of deep models(aka RNNs)). This is how all good science is done.

For an excellent historical overview of deep learning, I would recommend reading Deep Learning in Neural Networks as well as R. Salakhutdinov's Deep Learning Tutorials.

Deep Learning:

  1. History:
  2. Optimisation:
  3. Regularisation:
  4. Inference:
  5. Representation Learning:
  6. Deep Generative Models:
  7. Continual Learning:
  8. Hyperparameter Optimization:

Mathematics:

  1. Optimisation:

  2. Representation Learning:

  3. Learning theory:

  4. Learning behaviour:

Information Theory:

  1. Shannon Information and Kolmogorov Complexity (Grunwald 2010)
  2. Discovering Neural Nets with Low Kolmogorov Complexity(Schmidhuber 1997. Neural Networks.)
  3. Opening the black box of Deep Neural Networks via Information (Schwartz-Ziv 2017.)

Neuroscience:

  1. Towards an integration of deep learning and neuroscience(Marblestone 2016. Frontiers in Computational Neuroscience.)
  2. Equilibrium Propagation(Scellier 2016. Frontiers in Computational Neuroscience.)
  3. Towards Biologically plausible deep learning(Bengio 2015. CoRR.)
  4. Random synaptic feedback weights support error backpropagation for deep learning(Lillicrap 2016. Nature communications.)
  5. Towards deep learning with spiking neurons(Mesnard 2016. NIPS.)
  6. Towards deep learning with segregated dendrites(Guergiuev 2017)
  7. Variational learning for recurrent spiking networks(Rezende 2011. NIPS.)
  8. A view of Neural Networks as dynamical systems(Cessac 2009. I. J. Bifurcation and Chaos)
  9. Convolutional network layers map the function of the human visual system (M. Eickenberg. 2016. NeuroImage Elsevier.)
  10. Cortical Algorithms for Perceptual Grouping (P. Roelfsema. 2006. Annual Review of Neuroscience.)

Statistical Physics:

  1. Phase Transitions of Neural Networks (W. Kinzel. 1997. Universitat Weiburg.)
  2. Convolutional Neural Networks Arise From Ising Models and Restricted Boltzmann Machines (S. Pai)
  3. Non-equilibrium statistical mechanics: From a paradigmatic model to biological transport (T. Chou et al. 2011.)
  4. Replica Theory and Spin Glasses (F. Morone et al. 2014.)

Note 1: There are many who love quoting Richard Feynman and Albert Einstein whenever it suits their purpose. However, Feynman's popular quote: 'What I cannot create, I do not understand' has been taken out of context by many AI researchers. There are many things we can build that we can't understand and many things we can't build that we understand very well. Take any non-constructive proof in mathematical physics for example. From this it follows that it's important to create, but it's essential to understand. In fact, I think it makes more sense to consider the perspective of Marie Curie: "Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less." I believe that this is the attitude we should have to artificial intelligence.

Note 2: This is a work in progress. I have more papers to add.