A curated list of resources on implicit neural representations, inspired by awesome-computer-vision.
I am looking for graduate students to join my new lab at MIT CSAIL in July 2022. If you are excited about neural implicit representations, neural rendering, neural scene representations, and their applications in vision, graphics, and robotics, apply here! In the webform, you can choose me as "Potential Adviser", and in your SoP, please describe how our research interests are well-aligned. The deadline is Dec 15th!
This list does not aim to be exhaustive, as implicit neural representations are a rapidly growing research field with hundreds of papers to date. Instead, it lists the papers that I give my students to read, which introduce key concepts & foundations of implicit neural representations across applications. I will therefore generally not merge pull requests. This is not an evaluation of the quality or impact of a paper, but rather the result of my and my students' research interests.
However, if you see potential for another list that is broader or narrower in scope, get in touch, and I'm happy to link to it right here and contribute to it as well as I can!
Disclosure: I am an author on the following papers.
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
- MetaSDF: MetaSDF: Meta-Learning Signed Distance Functions
- Implicit Neural Representations with Periodic Activation Functions
- Inferring Semantic Information with 3D Neural Scene Representations
- Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering
Implicit Neural Representations (sometimes also referred to as coordinate-based representations) are a novel way to parameterize signals of all kinds. Conventional signal representations are usually discrete - for instance, images are discrete grids of pixels, audio signals are discrete samples of amplitudes, and 3D shapes are usually parameterized as grids of voxels, point clouds, or meshes. In contrast, Implicit Neural Representations parameterize a signal as a continuous function that maps the domain of the signal (i.e., a coordinate, such as a pixel coordinate for an image) to whatever is at that coordinate (for an image, an R,G,B color). Of course, these functions are usually not analytically tractable - it is impossible to "write down" the function that parameterizes a natural image as a mathematical formula. Implicit Neural Representations thus approximate that function via a neural network.
Implicit Neural Representations have several benefits: First, they are not coupled to spatial resolution anymore, the way, for instance, an image is coupled to the number of pixels. This is because they are continuous functions! Thus, the memory required to parameterize the signal is independent of spatial resolution, and only scales with the complexity of the underyling signal. Another corollary of this is that implicit representations have "infinite resolution" - they can be sampled at arbitrary spatial resolutions.
This is immediately useful for a number of applications, such as super-resolution, or in parameterizing signals in 3D and higher dimensions, where memory requirements grow intractably fast with spatial resolution. Further, generalizing across neural implicit representations amounts to learning a prior over a space of functions, implemented via learning a prior over the weights of neural networks - this is commonly referred to as meta-learning and is an extremely exciting intersection of two very active research areas! Another exciting overlap is between neural implicit representations and the study of symmetries in neural network architectures - for intance, creating a neural network architecture that is 3D rotation-equivariant immediately yields a viable path to rotation-equivariant generative models via neural implicit representations.
Another key promise of implicit neural representations lie in algorithms that directly operate in the space of these representations. In other words: What's the "convolutional neural network" equivalent of a neural network operating on images represented by implicit representations?
This is a list of Google Colabs that immediately allow you to jump in and toy around with implicit neural representations!
- Implicit Neural Representations with Periodic Activation Functions shows how to fit images, audio signals, and even solve simple Partial Differential Equations with the SIREN architecture.
- Neural Radiance Fields (NeRF) shows how to fit a neural radiance field, allowing novel view synthesis of a single 3D scene.
- MetaSDF & MetaSiren shows how you can leverage gradient-based meta-learning to generalize across neural implicit representations.
- Neural Descriptor Fields Learn how you can use globally conditioned neural implicit representations as self-supervised correspondence learners, enabling robotics imitation tasks.
The following three papers first (and concurrently) demonstrated that implicit neural representations outperform grid-, point-, and mesh-based representations in parameterizing geometry and seamlessly allow for learning priors over shapes.
- DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation (Park et al. 2019)
- Occupancy Networks: Learning 3D Reconstruction in Function Space (Mescheder et al. 2019)
- IM-Net: Learning Implicit Fields for Generative Shape Modeling (Chen et al. 2018)
Since then, implicit neural representations have achieved state-of-the-art-results in 3D computer vision:
- Sal: Sign agnostic learning of shapes from raw data (Atzmon et al. 2019) shows how we may learn SDFs from raw data (i.e., without ground-truth signed distance values)
- Implicit Geometric Regularization for Learning Shapes (Gropp et al. 2020) shows how we may learn SDFs from raw data (i.e., without ground-truth signed distance values)
- Local Implicit Grid Representations for 3D Scenes, Convolutional Occupancy Networks, Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction concurrently proposed hybrid voxelgrid/implicit representations to fit large-scale 3D scenes.
- Implicit Neural Representations with Periodic Activation Functions (Sitzmann et al. 2020) demonstrates how we may parameterize room-scale 3D scenes via a single implicit neural representation by leveraging sinusoidal activation functions.
- Neural Unsigned Distance Fields for Implicit Function Learning (Chibane et al. 2020) proposes to learn unsigned distance fields from raw point clouds, doing away with the requirement of water-tight surfaces.
3D scenes can be represented as 3D-structured neural scene representations, i.e., neural implicit representations that map a 3D coordinate to a representation of whatever is at that 3D coordinate. This then requires the formulation of a neural renderer, in particular, a ray-marcher, which performs rendering by repeatedly sampling the neural implicit representation along a ray.
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations proposed to learn an implicit representations of 3D shape and geometry given only 2D images, via a differentiable ray-marcher, and generalizes across 3D scenes for reconstruction from a single image via hyper-networks. This was demonstrated for single-object scenes, but also for simple room-scale scenes (see talk).
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision (Niemeyer et al. 2020), replaces LSTM-based ray-marcher in SRNs with a fully-connected neural network & analytical gradients, enabling easy extraction of the final 3D geometry.
- Neural Radiance Fields (NeRF) (Mildenhall et al. 2020) proposes positional encodings, volumetric rendering & ray-direction conditioning for high-quality reconstruction of single scenes, and has spawned a large amount of follow-up work on volumetric rendering of 3D implicit representations. For a curated list of NeRF follow-up work specifically, see awesome-NeRF
- SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images (Lin et al. 2020), demonstrates how we may train Scene Representation Networks from a single observation only.
- Pixel-NERF (Yu et al. 2020) proposes to condition a NeRF on local features lying on camera rays, extracted from contact images, as proposed in PiFU (see "from 3D supervision").
- Multiview neural surface reconstruction by disentangling geometry and appearance (Yariv et al. 2020) demonstrates sphere-tracing with positional encodings for reconstruction of complex 3D scenes, and proposes a surface normal and view-direction dependent rendering network for capturing view-dependent effects.
One may also encode geometry and appearance of a 3D scene via its 360-degree, 4D light field. This obviates the need for ray-marching and enables real-time rendering and fast training with minimal memory footprint, but requires additional machinery to ensure multi-view consistency.
- Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering (Sitzmann et al. 2021) proposes to represent 3D scenes via their 360-degree light field parameterized as a neural implicit representation.
- Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization (Saito et al. 2019) Pifu first introduced the concept of conditioning an implicit representation on local features extracted from context images. Follow-up work achieves photo-realistic, real-time re-rendering.
- Texture Fields: Learning Texture Representations in Function Space (Oechsle et al.)
- Occupancy flow: 4d reconstruction by learning particle dynamics (Niemeyer et al. 2019) first proposed to learn a space-time neural implicit representation by representing a 4D warp field with an implicit neural representation.
The following papers concurrently proposed to leverage a similar approach for the reconstruction of dynamic scenes from 2D observations only via Neural Radiance Fields.
- D-NeRF: Neural Radiance Fields for Dynamic Scenes
- Deformable Neural Radiance Fields
- Neural Radiance Flow for 4D View Synthesis and Video Processing
- Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
- Space-time Neural Irradiance Fields for Free-Viewpoint Video
- Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video
- Vector Neurons: A General Framework for SO(3)-Equivariant Networks (Deng et al. 2021) makes conditional implicit neural representations equivariant to SO(3), enabling the learning of a rotation-equivariant shape space and subsequent reconstruction of 3D geometry of single objects in unseen poses.
The following four papers concurrently proposed to condition an implicit neural representation on local features stored in a voxelgrid:
- Implicit Functions in Feature Space for 3D ShapeReconstruction and Completion
- Local Implicit Grid Representations for 3D Scenes
- Convolutional Occupancy Networks
- Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction
This has since been leveraged for inverse graphics as well:
- Neural Sparse Voxel Fields Applies a similar concept to neural radiance fields.
- Pixel-NERF (Yu et al. 2020) proposes to condition a NeRF on local features lying on camera rays, extracted from contact images, as proposed in PiFU (see "from 3D supervision").
The following papers condition a deep signed distance function on local patches:
- Local Deep Implicit Functions for 3D Shape
- PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations
- Inferring Semantic Information with 3D Neural Scene Representations leverages features learned by Scene Representation Networks for weakly supervised semantic segmentation of 3D objects.
- Neural Descriptor Fields: SE(3)-Equvariant Object Representations for Manipulation leverages features learned by occupancy networks to establish correspondence, used for robotics imitation learning.
- 3D Neural Scene Representations for Visuomotor Control learns latent state space for robotics tasks using neural rendering, and subsequently expresses policies in that latent space.
- Full-Body Visual Self-Modeling of Robot Morphologies uses neural implicit geometry representation for learning a robot self-model, enabling space occupancy queries for given joint angles.
- Neural Descriptor Fields: SE(3)-Equvariant Object Representations for Manipulation leverages neural fields & vector neurons as an object-centric representation that enables imitation learning of pick-and-place tasks, generalizing across SE(3) poses.
- Geodesy of irregular small bodies via neural density fields: geodesyNets learns an implicit representation using the gravitational signature of irregular bodies. github.
- Study of the asteroid Bennu using geodesyANNs and Osiris-Rex data shows the possibility to learn an implicit representation of the asteroid Bennu from the NASA Osirix-Rex mission data.
- DeepSDF, Occupancy Networks, IM-Net concurrently proposed conditioning via concatenation.
- Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization (Saito et al. 2019) proposed to locally condition implicit representations on ray features extracted from context images.
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations (Sitzmann et al. 2019) proposed meta-learning via hypernetworks.
- MetaSDF: MetaSDF: Meta-Learning Signed Distance Functions (Sitzmann et al. 2020) proposed gradient-based meta-learning for implicit neural representations
- SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images (Lin et al. 2020) show how to learn 3D implicit representations from single-image supervision only.
- Learned Initializations for Optimizing Coordinate-Based Neural Representations (Tancik et al. 2020) explored gradient-based meta-learning for NeRF.
- Neural Radiance Fields (NeRF) (Mildenhall et al. 2020) proposed positional encodings.
- Implicit Neural Representations with Periodic Activation Functions (Sitzmann et al. 2020) proposed implicit representations with periodic nonlinearities.
- Fourier features let networks learn high frequency functions in low dimensional domains (Tancik et al. 2020) explores positional encodings in an NTK framework.
- Compositional Pattern-Producing Networks: Compositional pattern producing networks: A novel abstraction of development (Stanley et al. 2007) first proposed to parameterize images implicitly via neural networks.
- Implicit Neural Representations with Periodic Activation Functions (Sitzmann et al. 2020) proposed to generalize across implicit representations of images via hypernetworks.
- X-Fields: Implicit Neural View-, Light- and Time-Image Interpolation (Bemana et al. 2020) parameterizes the Jacobian of pixel position with respect to view, time, illumination, etc. to naturally interpolate images.
- Learning Continuous Image Representation with Local Implicit Image Function (Chen et al. 2020) proposed a hypernetwork-based GAN for images.
- Alias-Free Generative Adversarial Networks (StyleGAN3) uses FILM-conditioned MLP as an image GAN.
The following papers propose to assemble scenes from per-object 3D implicit neural representations.
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields (Niemeyer et al. 2021)
- Object-centric Neural Rendering (Guo et al. 2020)
- Unsupervised Discovery of Object Radiance Fields (Yu et al. 2021)
- Implicit Geometric Regularization for Learning Shapes (Gropp et al. 2020) learns SDFs by enforcing constraints of the Eikonal equation via the loss.
- Implicit Neural Representations with Periodic Activation Functions (Sitzmann et al. 2020) proposes to leverage the periodic sine as an activation function, enabling the parameterization of functions with non-trivial higher-order derivatives and the solution of complicated PDEs.
- AutoInt: Automatic Integration for Fast Neural Volume Rendering (Lindell et al. 2020)
- MeshfreeFlowNet: Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework (Jiang et al. 2020) performs super-resolution for spatio-temporal flow functions using local implicit representaitons, with auxiliary PDE losses.
- Generative Radiance Fields for 3D-Aware Image Synthesis (Schwarz et al. 2020)
- pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis (Chan et al. 2020)
- Unconstrained Scene Generation with Locally Conditioned Radiance Fields (DeVries et al. 2021) Leverage a hybrid implicit-explicit representation, by generating a 2D feature grid floorplan with a classic convolutional GAN, and then conditioning a 3D neural implicit representation on these features. This enables generation of room-scale 3D scenes.
- Alias-Free Generative Adversarial Networks (StyleGAN3) uses FILM-conditioned MLP as an image GAN.
For 2D image synthesis, neural implicit representations enable the generation of high-resolution images, while also allowing the principled treatment of symmetries such as rotation and translation equivariance.
- Adversarial Generation of Continuous Images (Skorokhodov et al. 2020)
- Learning Continuous Image Representation with Local Implicit Image Function (Chen et al. 2020)
- Image Generators with Conditionally-Independent Pixel Synthesis (Anokhin et al. 2020)
- Alias-Free GAN (Karras et al. 2021)
- Spatially-Adaptive Pixelwise Networks for Fast Image Translation (Shaham et al. 2020) leverages a hybrid implicit-explicit representation for fast high-resolution image2image translation.
- NASA: Neural Articulated Shape Approximation (Deng et al. 2020) represents an articulated object as a composition of local, deformable implicit elements.
- Vincent Sitzmann: Implicit Neural Scene Representations (Scene Representation Networks, MetaSDF, Semantic Segmentation with Implicit Neural Representations, SIREN)
- Andreas Geiger: Neural Implicit Representations for 3D Vision (Occupancy Networks, Texture Fields, Occupancy Flow, Differentiable Volumetric Rendering, GRAF)
- Gerard Pons-Moll: Shape Representations: Parametric Meshes vs Implicit Functions
- Yaron Lipman: Implicit Neural Representations
- awesome-NeRF - List of implicit representations specifically on neural radiance fields (NeRF)
License: MIT