Awesome-Compositional-Zero-Shot

Papers and codes about Compositional Zero Shot Learning(CZSL) for computer vision are present on this page. Besides, the commonly-used datasets for CZSL are also introduced.

Papers

2024

Title Venue Dataset PDF CODE
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition ECCV 2024 C-GQA & Sth-com PDF CODE
Prompting Language-Informed Distribution for Compositional Zero-Shot Learning ECCV 2024 MIT-States & C-GQA & VAW-CZSL PDF CODE
Beyond Seen Primitive Concepts for Attributes-Objects Compositional Learning CVPR 2024 MIT-States & C-GQA & VAW-CZSL PDF -
Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning CVPR 2024 MIT-States & UT-Zappos & C-GQA PDF -
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning CVPR 2024 MIT-States & UT-Zappos & C-GQA PDF -
Retrieval-Augmented Primitive Representations for Compositional Zero-Shot Learning AAAI 2024 MIT-States & UT-Zappos & C-GQA PDF -
ProCC: Progressive Cross-primitive Compatibility for Open-World Compositional Zero-Shot Learning AAAI 2024 MIT-States & UT-Zappos & C-GQA PDF CODE
Revealing the Proximate Long-Tail Distribution in Compositional Zero-Shot Learning AAAI 2024 MIT-States & UT-Zappos & C-GQA PDF -
A Dynamic Learning Method towards Realistic Compositional Zero-Shot Learning AAAI 2024 MIT-States & UT-Zappos & C-GQA PDF -
Simple Primitives With Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-Shot Learning TPAMI 2024 MIT-States & UT-Zappos & C-GQA PDF -
CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning ICASSP 2024 MIT-States & C-GQA PDF CODE
LVAR-CZSL: Learning Visual Attributes Representation for Compositional Zero-Shot Learning TCSVT 2024 MIT-States & UT-Zappos & C-GQA PDF CODE
Agree to Disagree: Exploring Partial Semantic Consistency against Visual Deviation for Compositional Zero-Shot Learning TCDS 2024 MIT-States & UT-Zappos & C-GQA PDF -
Compositional Zero-Shot Learning using Multi-Branch Graph Convolution and Cross-layer Knowledge Sharing PR 2024 MIT-States & UT-Zappos & C-GQA PDF -
Visual primitives as words: Alignment and interaction for compositional zero-shot learning PR 2024 MIT-States & UT-Zappos & C-GQA PDF -
Mutual Balancing in State-Object Components for Compositional Zero-Shot Learning PR 2024 MIT-States & UT-Zappos & C-GQA PDF -
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-ShotLearning WACV 2024 MIT-States & UT-Zappos & C-GQA PDF CODE
CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning WACV 2024 MIT-States & UT-Zappos & C-GQA PDF -

2023

Title Venue Dataset PDF CODE
Distilled Reverse Attention Network for Open-world Compositional Zero-Shot Learning ICCV 2023 MIT-States & UT-Zappos & C-GQA PDF -
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning ICCV 2023 MIT-States & C-GQA &VAW-CZSL PDF CODE
Do Vision-Language Pretrained Models Learn Composable Primitive Concepts? TMLR 2023 MIT-States PDF CODE
Reference-Limited Compositional Zero-Shot Learning ICMR 2023 RL-CZSL-ATTR & RL-CZSL-ACT PDF CODE
Learning Conditional Attributes for Compositional Zero-Shot Learning CVPR 2023 MIT-States & UT-Zappos & C-GQA PDF CODE
Learning Attention as Disentangler for Compositional Zero-shot Learning CVPR 2023 Clothing16K & UT-Zappos & C-GQA PDF CODE
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning CVPR 2023 MIT-States & UT-Zappos & C-GQA PDF CODE
Learning to Compose Soft Prompts for Compositional Zero-Shot Learning ICLR 2023 MIT-States & UT-Zappos & C-GQA PDF CODE
Compositional Zero-Shot Artistic Font Synthesis IJCAI 2023 SSAF & Fonts PDF CODE
Hierarchical Prompt Learning for Compositional Zero-Shot Recognition IJCAI 2023 MIT-States & UT-Zappos & C-GQA PDF -
Leveraging Sub-Class Discrimination for Compositional Zero-shot Learning AAAI 2023 UT-Zappos & C-GQA PDF CODE
Dual-Stream Contrastive Learning for Compositional Zero-Shot Recognition TMM 2023 MIT-States & UT-Zappos PDF -
Isolating Features of Object and Its State for Compositional Zero-Shot Learning TETCI 2023 MIT-States & UT-Zappos & C-GQA PDF -
Learning Attention Propagation for Compositional Zero-Shot Learning WACV 2023 MIT-States & UT-Zappos & C-GQA PDF -

2022

Title Venue Dataset PDF CODE
A Decomposable Causal View of Compositional Zero-Shot Learning TMM 2022 MIT-States & UT-Zappos PDF CODE
KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning CVPR 2022 MIT-States & UT-Zappos & C-GQA PDF CODE
Disentangling Visual Embeddings for Attributes and Objects CVPR 2022 MIT-States & UT-Zappos & VAW-CZSL PDF CODE
Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning CVPR 2022 MIT-States & UT-Zappos & C-GQA PDF CODE
On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning ACM MM 2022 MIT-States & UT-Zappos & C-GQA PDF -
3D Compositional Zero-shot Learning with DeCompositional Consensus ECCV 2022 C-PartNet PDF CODE
Learning Invariant Visual Representations for Compositional Zero-Shot Learning ECCV 2022 UT-Zappos & AO-CLEVr & Clothing16K PDF CODE
Learning Graph Embeddings for Open World Compositional Zero-Shot Learning TPAMI 2022 MIT-States & UT-Zappos & C-GQA PDF -
Bi-Modal Compositional Network for Feature Disentanglement ICIP 2022 MIT-States & UT-Zappos PDF -

2021

Title Venue Dataset PDF CODE
Learning Graph Embeddings for Compositional Zero-Shot Learning CVPR 2021 MIT-States & UT-Zappos & C-GQA PDF CODE
Open World Compositional Zero-Shot Learning CVPR 2021 MIT-States & UT-Zappos PDF CODE
Independent Prototype Propagation for Zero-Shot Compositionality NeurIPS 2021 AO-Clevr & UT-Zappos PDF CODE
Revisiting Visual Product for Compositional Zero-Shot Learning NeurIPS 2021 MIT-States & UT-Zappos & C-GQA PDF -
Learning Single/Multi-Attribute of Object with Symmetry and Group TPAMI 2021 MIT-States & UT-Zappos PDF CODE
Relation-aware Compositional Zero-shot Learning for Attribute-Object Pair Recognition TMM 2021 MIT-States & UT-Zappos PDF CODE
A Contrastive Learning Approach for Compositional Zero-Shot Learning ICMI 2021 MIT-States & UT-Zappos & Fashion200k PDF -

2020

Title Venue Dataset PDF CODE
Symmetry and Group in Attribute-Object Compositions CVPR 2020 MIT-States & UT-Zappos PDF CODE
Learning Unseen Concepts via Hierarchical Decomposition and Composition CVPR 2020 MIT-States & UT-Zappos PDF -
A causal view of compositional zero-shot recognition NeurIPS 2020 UT-Zappos & AO-Clevr PDF CODE
Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition NeurIPS 2020 DFashion & AWA2 & CUB & SUN PDF CODE

2019

Title Venue Dataset PDF CODE
Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object Recognition ICCV 2019 MIT-States & UT-Zappos PDF -
Task-Driven Modular Networks for Zero-Shot Compositional Learning ICCV 2019 MIT-States & UT-Zappos PDF CODE
Recognizing Unseen Attribute-Object Pair with Generative Model AAAI 2019 MIT-States & UT-Zappos PDF -

2018

Title Venue Dataset PDF CODE
Attributes as Operators: Factorizing Unseen Attribute-Object Compositions CVPR 2018 MIT-States & UT-Zappos PDF CODE

2017

Title Venue Dataset PDF CODE
From Red Wine to Red Tomato: Composition with Context CVPR 2017 MIT-States & UT-Zappos PDF CODE

Datasets

Most CZSL papers usually conduct experiments on MIT-States and UT-Zappos datasets. However, as CZSL receives more attention, some new datasets are proposed and used in recent papers, such as C-GQA, AO-CLEVr, etc.

MIT-States

Introduced by Isola et al. in Discovering States and Transformations in Image Collections.

The MIT-States dataset has 245 object classes, 115 attribute classes and ∼53K images. There is a wide range of objects (e.g., fish, persimmon, room) and attributes (e.g., mossy, deflated, dirty). On average, each object instance is modified by one of the 9 attributes it affords.

Source:http://web.mit.edu/phillipi/Public/states_and_transformations/index.html

UT-Zappos

Introduced by Yu et al. in Fine-Grained Visual Comparisons with Local Learning.

UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are divided into 4 major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The shoes are centered on a white background and pictured in the same orientation for convenient analysis.

Source:https://vision.cs.utexas.edu/projects/finegrained/utzap50k/

C-GQA

Introduced by Naeem et al. in Learning Graph Embeddings for Compositional Zero-shot Learning.

Compositional GQA (C-GQA) dataset is curated from the recent Stanford GQA dataset originally proposed for VQA. C-GQA includes 413 attribute classes and 674 object classes, contains over 9.5k compositional labels with diverse compositional classes and clean annotations, making it the most extensive dataset for CZSL.

Source:https://github.com/ExplainableML/czsl

AO-CLEVr

Introduced by Atzmon et al. in A causal view of compositional zero-shot recognition.

AO-CLEVr is a new synthetic-images dataset containing images of "easy" Attribute-Object categories, based on the CLEVr. AO-CLEVr has attribute-object pairs created from 8 attributes: { red, purple, yellow, blue, green, cyan, gray, brown } and 3 object shapes {sphere, cube, cylinder}, yielding 24 attribute-object pairs. Each pair consists of 7500 images. Each image has a single object that consists of the attribute-object pair. The object is randomly assigned one of two sizes (small/large), one of two materials (rubber/metallic), a random position, and random lightning according to CLEVr defaults.

Source:https://github.com/nv-research-israel/causal_comp

VAW-CZSL

Introduced by Nirat Saini et al. in Disentangling Visual Embeddings for Attributes and Objects.

VAW-CZSL, a subset of VAW, which is a multilabel attribute-object dataset. Sample one attribute per image, leading to much larger dataset in comparison to previous datasets. The images in the VAW dataset come from the Visual Genome dataset which is also the source of the images in the GQA and the VG-Phrasecut datasets.

Source:https://github.com/nirat1606/OADis.

Compositional PartNet

Introduced by Naeem et al. in 3D Compositional Zero-shot Learning with DeCompositional Consensus.

Compositional PartNet (C-PartNet) is refined from PartNet with a new labeling scheme that relates the compositional knowledge between objects by merging and renaming the repeated labels. The relabelled C-PartNet consists of 96 parts compared to 128 distinct part labels in the original PartNet.

Source:https://github.com/ferjad/3DCZSL

Results

Experimental results of some methods on the two most commonly-used datasets(MIT-States, UT-Zappos) and the most challenging dataset(C-GQA) are collected and presented.

All the results are obtained under the setting of Closed World Generalized Compositional Zero-Shot Learning. The current optimal metrics are in bold.

MIT-States

Method Seen Unseen HM AUC
DECA 32.2 27.4 20.3 6.6
OADis 31.1 25.6 18.9 5.9
SCEN 29.9 25.2 18.4 5.3
CVGAE 28.5 25.5 18.2 5.3
Co-CGE 31.1 5.8 6.4 1.1
CGE 32.8 28.0 21.4 6.5
BMP-Net 32.9 19.3 16.5 4.3
CompCos 25.3 24.6 16.4 4.5
SymNet(CVPR) 24.4 25.2 16.1 3.0
SymNet(TPAMI) 26.2 26.3 16.8 4.5
HiDC - 15.4 15.0 -
AdvFineGrained - 13.5 14.0 -
TMN 20.2 20.1 13.0 2.9
GenModel 24.8 13.4 11.2 2.3
AttrAsOp 14.3 17.4 9.9 1.6
RedWine 20.7 17.9 11.6 2.4

UT-Zappos

Method Seen Unseen HM AUC
DECA 64.0 68.8 51.7 37.0
IVR(fixed) 56.9 65.5 46.2 30.6
OADis 59.5 65.5 44.4 30.0
SCEN 63.5 63.1 47.8 32.0
CVGAE 65.0 62.4 49.8 34.6
ProtoProp 62.1 65.5 50.2 34.7
Co-CGE 62.0 44.3 40.3 23.1
CGE 64.5 71.5 60.5 33.5
BMP-Net 83.9 60.9 56.9 44.7
CompCos 59.8 62.5 43.1 28.7
SymNet(TPAMI) 10.3 56.3 24.1 26.8
HiDC - 53.4 52.4 -
CAUSAL 39.7 26.6 31.8 23.3
AdvFineGrained - 48.5 50.7 -
TMN 58.7 60.0 45.0 29.3
AttrAsOp 59.8 54.2 40.8 25.9
RedWine 57.3 62.3 41.0 27.1

C-GQA

Method Seen Unseen HM AUC
SCEN 28.9 25.4 17.5 5.5
CVGAE 28.2 11.9 13.9 2.8
Co-CGE 32.1 2.0 3.4 0.78
CGE 31.4 14.0 14.5 3.6

Acknowledgements

This page is made by Yanyi Zhang and Jianghao Li, both of whom are graduate students of Dalian University of Technology.