AImageLab

AImageLab is a research laboratory of the Dipartimento di Ingegneria "Enzo Ferrari" at the University of Modena and Reggio Emilia, Italy.

Modena, Italy

Pinned Repositories

art2real
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation. CVPR 2019
Language:Python79 9 47
dress-code
Dress Code: High-Resolution Multi-Category Virtual Try-On. ECCV 2022
Language:Python604 17 3573
LLaVA-MORE
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Language:Python150 7 108
mammoth
An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning
Language:Python732 13 44126
meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Language:Python544 11 97135
multimodal-garment-designer
This is the official repository for the paper "Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing". ICCV 2023
Language:Python436 28 3152
novelty-detection
Latent space autoregression for novelty detection.
Language:Python197 11 1659
show-control-and-tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Language:Python284 9 3961
VATr
Language:Python83 3 288
VKD
PyTorch code for ECCV 2020 paper: "Robust Re-Identification by Multiple Views Knowledge Distillation"
Language:Python73 7 1814

AImageLab's Repositories

aimagelab/mammoth
An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning
Language:Python732 13 44126
aimagelab/LLaVA-MORE
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Language:Python150 7 108
aimagelab/pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Language:Python64 5 79
aimagelab/awesome-human-visual-attention
This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
56 3 02
aimagelab/ReflectiVA
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Language:Python46 4 10
aimagelab/CoDE
[ECCV'24] Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Language:Python43 3 30
aimagelab/Alfie
Democratising RGBA Image Generation With No $$$ (AI4VA@ECCV24)
Language:Python30 4 11
aimagelab/HySAC
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
Language:Python24
aimagelab/ReT
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Language:Python231
aimagelab/awesome-captioning-evaluation
[IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Language:Python21
aimagelab/TransFusion
Official codebase of "Update Your Transformer to the Latest Release: Re-Basin of Task Vectors" - ICML 2025
Language:Python17
aimagelab/MaPeT
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Language:Python16 5 21
aimagelab/MAD
Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Attend-Diffuse operator (ECCV24)
Language:Python14 4 01
aimagelab/FourBi
Binarizing Documents by Leveraging both Space and Frequency. (ICDAR 2024)
Language:Python13 3 23
aimagelab/ScanDiff
This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV 2025
12
aimagelab/Emuru-autoregressive-text-img
Official PyTorch implementation for "Zero-Shot Styled Text Image Generation, but Make It Autoregressive" (CVPR25)
Language:Python11
aimagelab/MissRAG
[ICCV 2025] MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
Language:Python11
aimagelab/COGT
[ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding
Language:Python90
aimagelab/DICE
[ICCV 2025] What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
6
aimagelab/ReT-2
Recurrence Meets Transformers for Universal Multimodal Retrieval
Language:Python6
aimagelab/fed-mammoth
General Federated Continual Learning Framework
Language:Python5 1 13
aimagelab/mammoth-lite
Language:Python5
aimagelab/CHAIR-DPO
[BMVC 2025] Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization
Language:Python4
aimagelab/DitHub
Language:HTML3
aimagelab/MLLMs-FlowTracker
[CAIP 2025] Tracing Information Flow in LLaMA Vision: A Step Toward Multimodal Understanding
Language:Python2
aimagelab/itserr-wp8-latin-embeddings
ITSERR WP8 - Code for Latin embeddings semantic search
Language:Python1
aimagelab/Sanctuaria-Gaze
Sanctuaria-Gaze is a multimodal dataset of egocentric recordings from visits to four sanctuaries in Northern Italy. Alongside the data, we release an open-source framework for automatic detection and analysis of Areas of Interest (AOIs), enabling gaze-based research in dynamic, real-world settings without manual annotation.
Language:Python1
aimagelab/biblical-retrieval-synthesis
[TPDL 2025] Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval
aimagelab/coldfront
HPC Resource Allocation System
Language:Python0 0
aimagelab/synthcap_pp
Official implementation of "Augmenting and Mixing Transformers with Synthetic Data for Image Captioning"
Language:Python