Hello, I am Naga Karthik, a second-year PhD student at Polytechnique Montreal. My research broadly focuses on developing deep learning-based methods for medical image analysis. My current research project aims at designing novel methods for automatically segmenting lesions in traumatic spinal cord (SC) injury patients. During the first year of my PhD, I have worked on continual/lifelong learning methods for segmenting multiple sclerosis lesions in the brain.
Naga Karthik
This project explores the idea of using implicit neural representations (INRs) for reconstructing super-resolution (or, high-resolution) spinal cord MR images.
Acquisition of MR images strikes a delicate balance between scanning time, spatial resolution of the image, and the signal-to-noise (SNR) ratio [1]. While images with isotropic resolution are ideal, they cannot always be acquired due to constraints such as patients' condition, motion artifacts, possibly limited MRI resources, etc. [2]. Moreover, improving spatial resolution comes at the high cost of lowering SNR and/or increasing the acquisition time. As a result, it is a common practice to acquire multiple low-resolution anisotropic images with high in-plane resolutions.
Super-resolution Given multiple low-resolution images as inputs, can we combine them in a way that generates a high-resolution output? Does this method improve the resolution and the acquistion time over directly acquiring a high-resolution image? While several methods, for instance, iterative back projection, regularized least squares, etc., exist in the literature, this project explores INRs for super-resolution.
INRs Implicit neural representations present a novel way of parameterizing images. Instead of considering images as discrete grid-based representations (of an object/scene), INRs provide a powerful alternative for parameterizing images as continuous functions that map a 3D coordinate to its intensity value at that coordinate. As such continuous functions are analytically intractable, INRs use neural networks for approximating them.
Intuitively, going from discrete to continuous representations is not uncommon. For instance, 1D discrete time-signals are approximations of a continuous function sampled at discrete points in time. For images, this implies transitioning from pixels with discrete boundaries to (continuous) RGB values where the boundaries of pixels are no longer visible and the transitions are smooth as shown in the figure below.
Given the background on super-resolution and INRs, the objectives of my project were as follows:
- Understand the concept of implicit representations of images and their potential applications in the medical domain. ✅
- Train a neural network (particuarly, a multi-layer perceptron) to reconstruct a high-resolution MRI of a spinal cord given two different low-resolution views as inputs. ✅
- Perform an ablation study on the neural network parameters to analyse their effect on the reconstruction accuracy. ✅
The Spine Generic Public Database [3,4] was used in this project. This is a BIDS-standarized multi-site dataset consisting of 6 contrasts from a single healthy subject acquired using the spine-generic protocol. The dataset is open-source and can be downloaded via git-annex
. The installation procedure is described in the README inside the data/
folder along with the specific subjects used.
In the context of this project, only the T2w contrast was used as it is an isotropic image. Typically, super-resolution methods tend to use isotropic images as they could be used as the ground-truth, which helps in quantifying the reconstruction accuracy.
The following tools were used in the project:
bash
for all terminal-related commands and running python scripts from the command-linegit
andGitHub
for version controlgit-annex
for downloading the spine-generic datasetpython
for all the code. Notable packages include:scikit-learn
for computing reconstruction accuracy metricnibabel
andnilearn
for data I/Opytorch
for training INRs (i.e. deep learning)
jupyter
(notebook) for plotting and analysing the results
The deliverables for this project are:
- Introduction to INRs and references to existing literature --> can be found here
- Data and preprocessing scripts used --> can be found here and here
- Code and the related documentation for training an INR --> can be found here
- Jupyter notebook containing the ablation study analysis --> can be found here
This section presents the super-resolution results and the ablation study analysis. A brief explanation of the method including the inputs and the model used, could be found here.
The figure below shows a comparison between the original (ground-truth, GT) image (left) and the reconstructed image (right), both at 0.8 mm isotropic resolution. Focusing on the zoomed patch, we can observe that the prediction is smoother compared to the original image.
One of the advantages of INRs is that images can be reconstructed at an arbitrary resolution. This means that one can generate a high resolution image that is not bounded by the resolution of the GT image. The figure below shows the model's prediction of 0.5 mm^3
isotropic image. For such outputs, it is important to note that one can only perform a visual assessment of the reconstruction quality because a quantitative assessment is infeasible due to the lack of GT image.
A crucial step when training a NN to reconstruct a high-resolution image is the projection of the 3D coordinates to a higher dimensional space using Fourier feature mapping. As explained in the methods section, such a projection helps in learning (and reconstructing) the high-frequency content in the image. Naturally, the dimensionality of the Fourier feature space plays an importat role in the quality of the reconstruction. Therefore, the purpose of this ablation study is to tweak this parameter and observe its downstream effect on the reconstruction accuracy. Two metrics, namely, structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR), are used as quantitative metrics.
SSIM It is a metric used to measure the similarity between two images. It is designed to evaluate the perceptual quality of an image by comparing its structural information with a reference image. SSIM takes into account the luminance, contrast, and structural similarities between corresponding patches of the images.
PSNR It is a commonly used metric to measure the quality or fidelity of a reconstructed or compressed image or video compared to the original, reference signal. It quantifies the ratio between the maximum possible power of a signal (usually taken as the maximum possible pixel value) and the distortion introduced by the compression or reconstruction process. It must be noted that PSNR is based solely on pixel-wise differences and does not always correlate perfectly with human perception. Typically, metrics such as SSIM, are often used in conjunction with PSNR to provide a more comprehensive assessment of visual quality.
The plots below show the SSIM (left) and PSNR (right) values as function of the dimensionality of the Fourier features (higher the metrics, the better). We observe that the reconstruction accuracy increases when increasing the Fourier feature dimensionality in both cases.
Neural implicit representations present a powerful alternative for parameterizing discrete voxel-grid based 3D images. The parameterization is done by learning (approximating) a continuous function that maps the input coordinate to its intensity value at that coordinate. Based on two downsampled views of the same contrast on different anatomical planes, the model was trained to reconstruct a high-resolution image using voxel-wise mean-squared error as the loss function. Lastly, an ablation study on the effect of the dimensionality of Fourier features showed that projecting the input coordinates to higher dimensions improves the reconstruction accuracy and results in better high-resolution images.
I would like to thank the course instructor Prof. Eva Alonso Ortiz and the TAs Jan Valošek and Andjela Dimitrijevic for their feedback and quick response to my questions during the course. I also thank the global organizers of BrainHack School for creating/managing the training modules.
- Plenge, Esben et al. “Super-resolution methods in MRI: can they improve the trade-off between resolution, signal-to-noise ratio, and acquisition time?.” Magnetic resonance in medicine vol. 68,6 (2012): 1983-93.
- McGinnis, Julian et al. “Multi-contrast MRI Super-resolution via Implicit Neural Representations.” ArXiv abs/2303.15065 (2023): n. Pag.
- Cohen-Adad et al. Generic acquisition protocol for quantitative MRI of the spinal cord. Nature Protocols 2021 (doi: 10.1038/s41596-021-00588-0).
- Spine Generic Public Dataset (Single Subject)