/RIM

The official repo for the paper "Referring Image Matting".

Referring Image Matting


This is the official repository of the paper Referring Image Matting.

Jizhizi Li, Jing Zhang, and Dacheng Tao

Introduction | RefMatte | Results | Statement

Introduction

Image matting refers to extracting the accurate foregrounds in the image. Current automatic methods tend to extract all the salient objects in the image indiscriminately. In this paper, we propose a new task named Referring Image Matting (RIM), referring to extracting the meticulous alpha matte of the specific object that can best match the given natural language description. We also propose a large-scale dataset RefMatte to serve as a good test bed for the task RIM. We define the task of RIM in two settings, i.e., prompt-based and expression-based, and then benchmark several representative methods together with specific model designs for image matting. The results provide empirical insights into the limitations of existing methods as well as possible solutions. We believe the new task RIM along with the RefMatte dataset will open new research directions in this area and facilitate future studies.

RefMatte

Prevalent visual grounding methods are all limited to the segmentation level, probably due to the lack of high-quality datasets for RIM. To fill the gap, we establish the first large-scale challenging dataset RefMatte by designing a comprehensive image composition and expression generation engine to produce synthetic images on top of current public high-quality matting foregrounds with flexible logics and re-labelled diverse attributes. RefMatte consists of 230 object categories, 47,500 images, 118,749 expression-region entities, and 474,996 expressions, which can be further extended easily in the future. Besides this, we also construct a real-world test set with manually generated phrase annotations consisting of 100 natural images to further evaluate the generalization of RIM models. We show some examples of our RefMatte train and test set as follows, including the images, the alpha mattes and the input texts.

We also generate the wordcloud of the prompts, attributes and relationships in RefMatte as belows. As can be seen, the dataset has a large portion of human and animals since they are very common in the image matting task. The most frequent attributes in RefMatte are male, gray, transparent, and salient, while the relationship words are more balanced.

Results

We show some examples of our test results on RefMatte test set and RefMatte-RW100 by our CLIPIMat given text inputs and the images under both prompt- and expression- based setting.

Statement

If you are interested in our work, please consider citing the following:

@article{li2022rim,
  title={Reffering Image Matting},
  author={Jizhizi Li and Jing Zhang and Dacheng Tao},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.05149}
}

This project is under the CC BY-NC license. For further questions, please contact Jizhizi Li at jili8515@uni.sydney.edu.au.

Relevant Projects

[1] Bridging Composite and Real: Towards End-to-end Deep Image Matting, IJCV, 2022 | Paper | Github
     Jizhizi Li, Jing Zhang, Stephen J. Maybank, Dacheng Tao

[2] Deep Automatic Natural Image Matting, IJCAI, 2021 | Paper | Github
     Jizhizi Li, Jing Zhang, and Dacheng Tao

[3] Privacy-Preserving Portrait Matting, ACM MM, 2021 | Paper | Github
     Jizhizi Li, Sihan Ma, Jing Zhang, and Dacheng Tao

[4] Rethinking Portrait Matting with Pirvacy Preserving, arXiv, 2022 | Paper | Github
     Sihan Ma, Jizhizi Li, Jing Zhang, He Zhang, and Dacheng Tao