Real-Spike-Transformer-Model

Use a Deep-Learning Transformer to reconstruct images from spike camera files

main_img.py main_fs.py: reconstruction from DSFT spikes to greyscale images

main.py: reconstruction from optical flows to greyscale images

main_img_evaT.py: reconstruction from un-DSFT-ed spikes to greyscale images

TP_jun19.txt 2>&1

About this Graduate Design

Background: Spike Signals

In the current landscape of rapid industrial advancements in visual perception and processing, extensive research has been conducted on target detection, motion recognition, edge annotation, and semantic segmentation in both images and videos. Traditional methods, which focus on pixel-level image frames, have been revolutionized by the introduction of neuromorphic vision sensors inspired by biological retinal mechanisms. These sensors operate without the constraints of frame rates and are particularly well-suited for detecting and tracking high-speed targets in various scenarios such as chemical reaction playback, autonomous driving, and wind tunnel experiments. Spike sequences, organized naturally in chronological order, possess intricate optical dynamics that enhance the dynamic range of imaging and render images more vivid. Image reconstruction and recognition algorithms designed specifically for spike signals exhibit higher accuracy and faster motion and texture reconstruction capabilities.

Major Problem

However, the challenge lies in the fact that the information within the spike stream doesn't stem from a single moment. Reconstructing the image necessitates dealing with correlations among different parts of the sensory domain and between various moments. This complexity poses a significant challenge, especially in the context of large-scale video processing. This thesis aims to tackle this challenge by delving into the complexities of spike signals and reconstruction algorithms.

Dataset

To address the limitations of single data types and features in the algorithm experiments, this study unifies data generated by spike cameras proposed by Professor Huang Tiejun's team at the Institute of Digital Media, Peking University. This includes datasets like SpikeCV and RSSF from Peking University, enriching the dataset types and quantities. Differential calculations(DSFT) are performed to optimize the loading form of spike streams, enhancing feature extraction.

Deep Learning model

Concerning spatio-temporal correlation, traditional reconstruction algorithms based on fixed time windows or intervals, would often struggle to select appropriate time windows for all pixel positions and moments. Additionally, deep learning models based on mutual learning contain an excessive number of parameters and are challenging to deploy. To address these issues, this paper adopts the recurrent neural network transformation model for optical flow estimation. It introduces the Reconstructed Spike Transformer (RSFM), a deep learning model that comprehensively analyzes global and local dynamics in an end-to-end manner. The model is optimized to efficiently process a large volume of spike stream image reconstructions with low time complexity, resulting in clearly reconstructed images. This paper adopts Supervised Learning, uses original png images as real values, dat spike streams as input.

Performance

To balance memory consumption and feature restoration, this study effectively leverages motion features, achieving a maximum PSNR of 31.83 and an SSIM of 0.82. These results surpass classical reconstruction algorithms, more than doubling the performance in terms of both /PSNR/ and /SSIM/. It is observed that reconstructions of different scenarios are reliable.