A Unified Knowledge Protocol

Title: UKnow: A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training
Authors: Biao Gong, Xiaoying Xie, Yutong Feng, Yiliang Lv, Yujun Shen, Deli Zhao
Institutes: Alibaba Group, Ant Group
More details: arXiv / Home Page / 魔搭社区(ModelScope)

The code, dataset, arXiv and websit will be available upon acceptance.

We are in the process of cleaning up the source code and awaiting corporate review to open source it. We will release a version of the basic UKnow in ModelScope. ModelScope is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications.

Overview

Figure 1: Overview of UKnow protocol, consisting of five unit knowledge types, namely, in-image I_{in} (e.g., object), in-text T_{in} (e.g., entity), cross-image I_{cross} (e.g., image similarity), cross-text T_{cross} (e.g., text continuity), and image-text IT_{cross} (e.g., description).

This work presents a unified knowledge protocol, called UKnow, which facilitates knowledge-based studies from the perspective of data. Particularly focusing on visual and linguistic modalities, we categorize data knowledge into five unit types, namely, in-image, in-text, cross-image, cross-text, and image-text, and set up an efficient pipeline to help construct the multimodal knowledge graph from any data collection. Thanks to the logical information naturally contained in knowledge graph, organizing datasets under UKnow format opens up more possibilities of data usage compared to the commonly used image-text pairs. Following UKnow protocol, we collect, from public international news, a large-scale multimodal knowledge graph dataset that consists of 1,388,568 nodes (with 571,791 vision-related ones) and 3,673,817 triplets. The dataset is also annotated with rich event tags, including 11 coarse labels and 9,185 fine labels. Experiments on four benchmarks demonstrate the potential of UKnow in supporting common-sense reasoning and boosting vision-language pre-training with a single dataset, benefiting from its unified form of knowledge organization. Code, dataset, and models will be made publicly available.

Citation

If you find this work useful in your research, please cite our paper:

@article{Gong2023UKnow,
    title={UKnow: A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training},
    author={Biao Gong and Xiaoying Xie and Yutong Feng and Yiliang Lv and Yujun Shen and Deli Zhao},
    journal={arXiv:2302.06891},
    year={2023}
}

Gongggg/UKnow

A Unified Knowledge Protocol

Overview

Citation