COIN COIN
Files
-
COIN.json: original annotation of COIN
-
anno_tool: annotation tool
-
data_gen: codes to generate COIN COIN
- gen_frames.py: extract 16 frames per action
- gen_dataset.py: generate metadata of COIN COIN dataset
- gen_gifs.py: generate gif for each action for visualization
-
data_raw:
Usage
Annotation
Extract 16 frames for each action:
python data_gen/gen_frames.py
After finished annotating using anno_tool, dump annotation results to json:
cd anno_tool
python manage.py dumpdata anno -o ../data_gen/anno.json
The format of annotation is (only useful keys are listed):
[
{
"pk": 1, # primary key
"fields": {
"video_name": "4MiubN1oQkg",
"video_class": "ArcWeld",
"cut_points": "",
"state": 2, # 2 for finished
"steps": 3,
"train": false, # belongs to the train/val set
"action_ids": "587,585,587",
}
},
]
Generate COIN_COIN metadata
python data_gen/gen_dataset.py -s <setting> -m <min length of clip> -M <max length of clip> -r <test-ratio>
- setting:
short
orlong
- the number of actions in each clip must between
m
andM
. Forshort
setting,(m, M) = (1, 1)
by default; Forlong
setting,(m, M) = (2, 5)
by default - percentage of test set is approximately (default: 0.1)
data_gen/gen_dataset.py
will perform the following steps:
- filter_clips: Cut the clips longer than
M
and drop the clips shorter thanm
- split_train_test: Split the whole dataset into train and test set, ensuring that the clips from the same video are not seperated.
- gen_QA: Generate questions and choices. There are 5 types of wrong answers:
- missing step
- swapped steps
- extra step (from the same video)
- replaced step (from other clips of the same video)
- replaced step (from other videos in the same class)
In long
setting, all of the above types of wrong answers will appear; in short
setting only replaced step
will appear. When generating the Q/A pair, we try to construct at most 2 choices for each wrong type and random selected 3 wrong choices. Therefore, there are 3 wrong choices and 1 correct choice in sample.
[
{
"question": [path/to/img1, path/to/img2]
"choices": [
[path/to/choices/1],
[path/to/choices/2],
[path/to/choices/3],
[path/to/choices/4],
]
}
]
To visualize the dataset, run
python data_gen/gen_gifs.py
and visit http://166.111.72.69:61081 and go to Dataset
tab.
Training
dataset.py
provides the way to load the dataset. Each batch contains
- question:
(batch_size, 2, channel=3, height, width)
- choices:
(batch_size, 4, max_length, num_frames=8, channel=3, height, width)
- label:
(batch_size, 4)
For long
setting, max_legnth = M + 1
(the length of choice with wrong type of extra
is M + 1
); For short
setting, max_legnth = 1
. Currently, all the images are padded and resized to (256, 256)
.