AImageLab
AImageLab is a research laboratory of the Dipartimento di Ingegneria "Enzo Ferrari" at the University of Modena and Reggio Emilia, Italy.
Modena, Italy
Pinned Repositories
art2real
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation. CVPR 2019
dress-code
Dress Code: High-Resolution Multi-Category Virtual Try-On. ECCV 2022
LLaVA-MORE
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
mammoth
An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning
meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
multimodal-garment-designer
This is the official repository for the paper "Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing". ICCV 2023
novelty-detection
Latent space autoregression for novelty detection.
show-control-and-tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
VATr
VKD
PyTorch code for ECCV 2020 paper: "Robust Re-Identification by Multiple Views Knowledge Distillation"
AImageLab's Repositories
aimagelab/mammoth
An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning
aimagelab/LLaVA-MORE
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
aimagelab/pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
aimagelab/awesome-human-visual-attention
This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
aimagelab/ReflectiVA
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
aimagelab/CoDE
[ECCV'24] Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
aimagelab/Alfie
Democratising RGBA Image Generation With No $$$ (AI4VA@ECCV24)
aimagelab/HySAC
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
aimagelab/ReT
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
aimagelab/awesome-captioning-evaluation
[IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
aimagelab/TransFusion
Official codebase of "Update Your Transformer to the Latest Release: Re-Basin of Task Vectors" - ICML 2025
aimagelab/MaPeT
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
aimagelab/MAD
Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Attend-Diffuse operator (ECCV24)
aimagelab/FourBi
Binarizing Documents by Leveraging both Space and Frequency. (ICDAR 2024)
aimagelab/ScanDiff
This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV 2025
aimagelab/Emuru-autoregressive-text-img
Official PyTorch implementation for "Zero-Shot Styled Text Image Generation, but Make It Autoregressive" (CVPR25)
aimagelab/MissRAG
[ICCV 2025] MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
aimagelab/COGT
[ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding
aimagelab/DICE
[ICCV 2025] What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
aimagelab/ReT-2
Recurrence Meets Transformers for Universal Multimodal Retrieval
aimagelab/fed-mammoth
General Federated Continual Learning Framework
aimagelab/mammoth-lite
aimagelab/CHAIR-DPO
[BMVC 2025] Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization
aimagelab/DitHub
aimagelab/MLLMs-FlowTracker
[CAIP 2025] Tracing Information Flow in LLaMA Vision: A Step Toward Multimodal Understanding
aimagelab/itserr-wp8-latin-embeddings
ITSERR WP8 - Code for Latin embeddings semantic search
aimagelab/Sanctuaria-Gaze
Sanctuaria-Gaze is a multimodal dataset of egocentric recordings from visits to four sanctuaries in Northern Italy. Alongside the data, we release an open-source framework for automatic detection and analysis of Areas of Interest (AOIs), enabling gaze-based research in dynamic, real-world settings without manual annotation.
aimagelab/biblical-retrieval-synthesis
[TPDL 2025] Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval
aimagelab/coldfront
HPC Resource Allocation System
aimagelab/synthcap_pp
Official implementation of "Augmenting and Mixing Transformers with Synthetic Data for Image Captioning"