/Predictive_Coding_Papers

A repository collating interesting and/or influential predictive coding papers

Primary LanguageTeXMIT LicenseMIT

Predictive Coding Paper Repository

This repository provides a list of papers that are interesting or influential about Predictive Coding. If you believe I have missed any papers, please contact me at beren@millidge.name or make a pull request with the information about the paper. I will be happy to include it.

Predictive Coding

Predictive Coding is a neurophysiologically-grounded theory of perception and learning in the brain. The core idea is that the brain always maintains a prediction of the expected state of the world, and that this prediction is then compared against the true sensory data. Where this prediction is wrong, prediction errors are generated and propagated throughout the brain. The brain's 'task' then is simply to minimize prediction errors.

The key distinction of this theory is that it proposes that prediction-errors, rather than predictions, or direct representation of sense-data is in some sense the core computational primitive in the brain.

Predictive coding originated in studies of ganglion cells in the retina, in light of theories in signal processing, about how it is much more efficient to send only 'different' or 'unpredicted signals' than repeating the whole signal every time -- see delta-encoding.

Predictive coding has several potential neurobiologically plausible process theories proposed for it -- see 'Process Theories' section, although the empirical evidence for precise prediction error minimization in the brain is mixed

Predictive coding has also been extended in several ways. It can be understood as a variational inference algorithm under a Gaussian generative model and variational distribution. It can be setup as an autoencoder (predict your input, or next-state), or else in a supervised learning fashion.

Predictive coding can also be extended to a hierarchical model of multiple predictive coding layers -- as in the brain -- as well as using 'generalised coordinates' which explicitly model the higher order derivatives a state in order to be able to explicitly model dynamical systems.

More recent work has also focused on the relationship between predictive coding and the backpropagation of error algorithm in machine learning where under certain assumptions, predictive coding can approximate this fundamental algorithm in a biologically plausible fashion. Although the exact details and conditions still need to be worked out.

There has also been much exciting work trying to merge predictive coding with machine learning to produce highly performant predictive-coding-inspired architectures.

Surveys and Tutorials

This is a great review which introduces the basics of predictive coding and its interpretation as variational inference. It also contains sample MATLAB code that implements a simple predictive coding network. I would start here.

This review walks through the mathematical framework and potential neural implementations in predictive coding, and also covers much recent work on the relationship between predictive coding and machine learning.

This is a fantastic review which presents a complete walkthrough of the mathematical basis of the Free Energy Principle and Variational Inference, and derives predictive coding and (continuous time and state) active inference. It also presents the 'full-construct' predictive coding including with hierarchical layers and generalised coordinates in an accessible fashion. I would reccomend reading this after Bogacz' tutorial (although be prepared -- it is a long and serious read)

A short and concise review of predictive coding algorithms up to 2017.

A nice review of simple predictive coding architectures with a focus on their potential implementation in the brain.

Classics

A key influential early paper proposing predictive coding as a general theory of cortical function.

One of the earliest works proposing predictive coding in the retina.

An early but complete description of predictive coding as an application of the FEP and variational inference under Gaussian and Laplace assumptions. Also surprisingly readable. This is core reading on predictive coding and the FEP

The first paper establishing the links between predictive coding and variational inference.

Makes a conjectured link between precision in predictive coding and attention in the brain.

Presents the 'full-construct' predictive coding model with both hierarchies and generalised coordinates.

Extends predictive coding to generalised coordinates, and derives the necessary inference algorithms for working with them -- i.e. DEM, dynamic expectation maximisation.

Foundational treatment of variational inference for dynamical systems, as represented in generalised coordinates. Also relates variational filtering to other non-variational schemes like particle filtering and Kalman filtering.

Andy's book is great for a high level overview, strong intuition pumps for understanding the theory, and a fantastic review of potential evidence and neuropyschiatric applications.

Neurobiological Process Theories

A key process theory paper. Proposing perhaps the default implementation of predictive coding in cortical layers.

Demonstrates that predictive coding is equivalent to popular biased competition models of neural function.

Another great overview of a potentially neurobiologiclaly plausible process theory for predictive coding.

A process theory of predictive coding including action predictions which implement active inference (continuous version).

A great review delving deep into the evidence for predictive coding being implemented in the brain. Evidence is currently somewhat lacking, although the flexibility of the predictive coding framework allows it to encompass a lot of the findings here.

Neuroscience applications

Relationship to Backpropagation

PC-inspired machine learning

Extensions and Developments

This paper investigates how serveral biologically implausible aspects of the standard predictive coding algorithm -- namely requiring symmetric forward and backward weights, nonlinear derivatives, and 1-1 error unit connections can be relaxed without unduly harming performance of the network.

This paper further looks at how various implausibility of the predictive coding algorithm can be relaxed, and focuses especially on the question of how negative prediction errors could be represented, as well as invents a divisive prediction error scheme -- where prediction errors are the activities divided by the predictions.

Contributing

To contribute, please make pull requests adding entries to the bibtex file.

The README file was generated from bibtex using the bibtex_to_md.py file. The keywords to use for each classification ('Classic','Backprop') can be found at the bottom of the .py file.

This code and structure is heavily inspired by https://github.com/optimass/continual_learning_papers.