stellar-MM

stellar for 4th China AI Competition(2023)

Pre-train Model

CLIP: github.com/OpenAI/CLIP
Chinese-CLIP: github.com/OFA-Sys/Chinese-CLIP

参考

CVPR 2022：图像分类+图文匹配=统一多模态对比学习框架

数据集结构

数据集文件目录，此处以COCO为例：

- Datasets: 		#数据集目录
 - COCO:		#COCO数据集，不同类型的数据集文件结构与命名规则一致
  - train:		#训练集
   - P0000001.PNG
   ...
  - test:		#测试集 
   - T000001.PNG
   ...
  - val:		#验证集
   - V0000001.PNG
   ...
  - captions:		#字幕文件
   - train.jsonl
   - test.jsonl
   - val.jsonl			
 ...			#其他数据集

图片

图片命名采用编号形式，分别存储在不同的文件目录下。

其中：

1.训练集（./Datasets/{DATASET}/train/）

图片命名均以标识字母+7位长度的自增数字+图片格式组成，其中数字从1开始自增。图片格式最好是JPG、PNG、JPEG中的一种。

正样本的标识字母为P，例如：P0253181.JPG。
负样本的标识字母为N，例如：N0253181.JPG。

2.测试集（./Datasets/{DATASET}/test/）

图片命名均以标识字母+7位长度的自增数字+图片格式组成，其中数字从1开始自增。图片格式最好是JPG、PNG、JPEG中的一种。

图片标识字母为T，例如：T0028341.PNG。

3.验证集（./Datasets/{DATASET}/val/）

图片命名均以标识字母+7位长度的自增数字+图片格式组成，其中数字从1开始自增。图片格式最好是JPG、PNG、JPEG中的一种。

图片标识字母为V，例如：V0000316.JPEG

字幕文件

字幕文件均采用jsonl文件形成存储，即文件的每一行都是一个图片的字幕描述的json文件，方便流式读取和处理。

如：train.jsonl

{"type": "train_p", "id": "P0000001", "ftype": "PNG", "caption": ["A dog is running in the sky", "In the sunny sky, a dog is running"], "tags": ["dog", "sky"]}
{"type": "train_p", "id": "P0000001", "ftype": "PNG", "caption": ["A dog is running in the sky", "In the sunny sky, a dog is running"], "tags": ["dog", "sky"]}
{"type": "train_p", "id": "P0000001", "ftype": "PNG", "caption": ["A dog is running in the sky", "In the sunny sky, a dog is running"], "tags": ["dog", "sky"]}
{"type": "train_p", "id": "P0000001", "ftype": "PNG", "caption": ["A dog is running in the sky", "In the sunny sky, a dog is running"], "tags": ["dog", "sky"]}
{"type": "train_p", "id": "P0000001", "ftype": "PNG", "caption": ["A dog is running in the sky", "In the sunny sky, a dog is running"], "tags": ["dog", "sky"]}
...

json格式

{
    "type": "val",		//图片类型
    "id": "P0000001",		//图片ID，即图片名称
    "ftype": "PNG",		//图片格式，支持PNG、JPG及JPEG
    "caption": ["A dog is running in the sky", "In the sunny sky, a dog is running"],	//字幕列表
    "tags": ["dog", "sky"]	//图像里的关键标签
}

其中：

type: 图片类型，有train_p、train_n、test、val四种类型
id：图片ID，即图片名称（见图片部分）。
ftype：图片格式，支持PNG、JPG及JPEG。
caption：字幕列表
tags：图像里的关键标签

编码规则

采用模块集成方式。

1.自己编写的模块文件名均小写，如log.py，该模块用于打印日志时使用。

2.自己编写模块测试无误后请在./stellar/__init__.py中的__all__ = []中添加自己的模块名。假设自己编写了coco_cn.py模块，则添加为___all__ = ['log', 'coco_cn']。

比赛所用的代码集成到python包名为stellar的文件结构里，文件结构如下：

- stellar:
 - __init__.py	#包初始化模块
 - log.py	#打印日志的模块
 - coco_cn.py	#自编码的模块

其中，__init__.py文件内容如下：

from . import *

__all__ = ['log'] # 新增的模块需要在此处添加

Inetgeek/stellar-MM

stellar-MM

Pre-train Model

参考

数据集结构

图片

字幕文件

编码规则