/CoCoG

Primary LanguageJupyter Notebook

CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations

CoCoG Motivation

Introduction

Understanding how humans process visual objects and uncover their low-dimensional concept representation from high-dimensional visual stimuli is crucial in cognitive science. The Concept based Controllable Generation (CoCoG) framework addresses this by:

  • Extracting Interpretable Concepts: CoCoG includes an AI agent that efficiently predicts human decision-making in visual similarity tasks, offering insights into human concept representations.

  • Controllable Visual Stimuli Generation: It employs a conditional generation model to produce visual stimuli based on extracted concepts, facilitating studies on causality in human cognition.

CoCoG's notable achievements are:

  • High Prediction Accuracy: It outperforms current models in predicting human behavior in similarity judgment tasks.
  • Diverse Object Generation: The framework can generate a wide range of objects controlled by concept manipulation.
  • Behavioral Manipulation Capability: CoCoG demonstrates the ability to influence human decision-making through concept intervention.

Method

Concept Encoder for Embedding Low-Dimensional Concepts:

  • Process: Begins with training a concept encoder to map visual objects to concept embeddings. This involves using the CLIP image encoder to extract CLIP embeddings, then transforming these into concept embeddings via a learnable concept projector.
  • Training Task: Utilizes the 'odd-one-out' similarity judgment task from the THINGS dataset, predicting human decisions based on the similarity between concept embeddings of visual objects.

Two-Stage Concept Decoder for Controllable Visual Stimuli Generation:

  • Stage I - Prior Diffusion: Adopts a diffusion model conditioned on concept embeddings to approximate the distribution of CLIP embeddings, serving as the prior for the subsequent generation stage.
  • Stage II - CLIP Guided Generation: Leverages pre-trained models to generate visual objects from CLIP embeddings, utilizing the concept embeddings as a guiding condition for the generation process.

framework

Figure 1: The framework of CoCoG.

Results

Model Validation

Predicting and Explaining Human Behaviors:

  • Accuracy: Achieved 64.07% accuracy in predicting human behavior on the THINGS Odd-one-out dataset, surpassing the previous SOTA model.
  • Interpretability: Demonstrated through the activation of concepts within visual objects, aligning with human intuition and providing clear explanations of visual object characteristics.

framework

Figure 2: The performance of the concept encoder in predicting and explaining human behavior.

Generative Effectiveness of Concept Decoder:

  • Consistency with Concept Embedding: Generated visual objects are consistent with their concept embeddings, showcasing the model's ability to conditionally generate visuals aligned with specific concepts.
  • Control Over Diversity: By adjusting the guidance scale, the model can balance the similarity and diversity of generated visual objects.

framework

Figure 3: The performance of generated visual objects by controlling concept embeddings.

CoCoG for Counterfactual Explanations

Flexible Control with Text Prompts:

  • Using the same concept embedding with different text prompts, CoCoG generates visual objects that retain the concept's characteristics, demonstrating its utility in exploring counterfactual questions.

framework

Figure 4: Visual objects generated with the same concept embedding combined with different text prompts.

framework

Figure 5: Visual objects generated with different concept embeddings combined with the same text prompt.

Manipulating Similarity Judgments:

  • CoCoG can directly influence human similarity judgment by intervening with key concepts, offering a powerful tool for analyzing the causal mechanisms of concepts in human cognition and decision-making processes.

framework

Figure 7: Manipulating similarity judgment behavior by key concepts intervention.

Conclusion

The CoCoG model innovatively combines AI and cognitive science, enhancing our understanding and interaction with human cognition:

  • AI and Human Cognition: Merges DNNs with human cognition, improving visual understanding and safety in AI-human interactions.

  • Cognitive Science Advancement: Offers a novel approach for cognitive research, enabling the generation of diverse stimuli to study human behavior and cognitive mechanisms.

  • Future Directions: Promises to align AI models with human cognition more closely and improve research efficiency through optimal experimental design.

Code

Get started

You can then set up a conda environment with all dependencies like so:

conda env create -f cocog.yml
conda activate cocog

File structure

The code is organized as follows:

  • ConceptDecoder.ipynb: concept-based image generation and decision intervention
  • ConceptEncoder.ipynb: extract concept embedding
  • dataset.py: load images
  • diffusion_prior.py: diffusion model and pipeline to generate clip embedding from concept embedding
  • networks.py: concept encoder model
  • utils.py: utility functions for plotting