/Visual-Decoding-using-EEG

Visual Decoding from EEG explores reconstructing images from EEG signals using models like BLIP, MiDAS, VAE, CLIP, and Stable Diffusion. This multimodal pipeline aligns EEG, textual, and depth features to decode visual stimuli, advancing brain-computer interfaces and assistive tech.

Primary LanguageJupyter Notebook

🧠 Visual Decoding from EEG

Author

Faculty Guides:
Prof. Arnav Bhaskar


πŸ“ Overview

This project explores decoding EEG (Electroencephalogram) signals to reconstruct the visual stimuli experienced by the brain β€” using text and image generation models. It leverages deep learning and multi-modal alignment to generate high-fidelity image reconstructions from brain signals.

This work opens new pathways in brain-computer interfaces, neuroscience, and thought-driven AI systems.


🎯 Objectives

  • EEG-based Textual Encoding: Extract meaningful embeddings from EEG data.
  • Image Reconstruction: Use captions (via BLIP-2) and images (via Stable Diffusion) to reconstruct what the subject saw.
  • Direct Thought-to-Image: Create an end-to-end pipeline from EEG β†’ Text β†’ Image.

🧠 Dataset

  • EEG Signals: 16,740 EEG samples (17 channels, 100 timepoints each).
  • Images: Each of the 16,740 images shown to 10 subjects.
  • Labels: For supervised and aligned training.

πŸ”§ Methodology

πŸ”Ή Step 1: EEG Embedding (VAE)

  • VAE trained on DEAP dataset to extract EEG embeddings.
  • Ensures compact, meaningful signal representation.

πŸ”Ή Step 2: Caption Generation (BLIP-2)

  • BLIP-2 generates captions from the original image.

🧾 "A small armadillo walking on the dirt"

πŸ”Ή Step 3: Cross-Modal Alignment (CLIP / Masked CLIP)

  • Align EEG and text embeddings via CLIP.
  • Trained to bring both into a common latent space.

πŸ”Ή Step 4: Text Generation (GPT-2)

  • GPT-2 decodes EEG β†’ Text via autoregressive generation.

🧠 ➑️ GPT-2 ➑️ "A baby armadillo in its enclosure at the zoo"

πŸ”Ή Step 5: Depth Estimation (GCNN/GAT)

  • Graph CNN captures spatial relations in EEG for image depth features.

πŸ”Ή Step 6: Image Reconstruction (Stable Diffusion)

  • Prompt + Depth Map β†’ Stable Diffusion (v2.1 base) to synthesize visual output.

🧩 Model Architecture

Model Architecture


πŸ“Š Results

βœ… Caption Alignment Results

EEG Caption (GPT-2) BLIP Caption ROUGE Score
"a man holding an accordion..." "a person playing an accordion..." 0.44
"a floral air mattress..." "an air mattress with a floral pattern..." 0.52

βœ… Image Reconstruction Results

EEG Signal Original Image Caption Generated Text Reconstructed Image SSIM
"a small armadillo walking in the dirt" "a baby armadillo enclosure at the zoo" 11.02%
"a group of people riding on a boat" "a group of people in an airboat" 14.32%

πŸ”¬ Quantitative Analysis

  • CLIP Loss: Dropped from 3.48 to 0.12 (30 epochs).
  • Cosine Similarity Matrix: Strong diagonals (high EEG-text alignment).
  • ROUGE Scores: ROUGE-1 between 0.44–0.52.
  • SSIM: Image similarity remains low (~10–15%), but semantically accurate.

πŸ“š References


πŸ™ Acknowledgements

Special thanks to our guides Prof. Arnav Bhaskar for their constant support and insights.