taoyang1122/adapt-image-models

About input frames and sampling interval

Closed this issue · 1 comments

Thank you for your excellent work! By the way I want to know about clip_len and frame_interval for Kinetics. In Appendix A.1, "We evaluate the model on 8, 16, 32 frames and the sampling interval is 16, 8, 4, respectively." Does this mean for kinetics400/700, the data pipeline (train, val, test) should be the same? For example, in configs/recognition/vit/vit_imagenet_k400.py, the config of data pipeline keeps the same as the paper mentioned.

i.e., clip_len=8, frame_interval=16 for train/val/test pipeline, which is the same as the paper mentioned.

train_pipeline = [
dict(type='DecordInit'),
dict(type='SampleFrames', clip_len=8, frame_interval=16, num_clips=1),

val_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=8,
frame_interval=16,
num_clips=1,
test_mode=True),

test_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=8,
frame_interval=16,
num_clips=3,
test_mode=True),

But, for CLIP pretrained, the configs are confused.

  1. vitclip_base_k400, clip_len=32, frame_interval=16 for train pipeline, while clip_len=32, frame_interval=8 for val/test pipeline. However, if clip_len=32, frame_interval should be 4?

train_pipeline = [
dict(type='DecordInit'),
dict(type='SampleFrames', clip_len=32, frame_interval=16, num_clips=1),

val_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=32,
frame_interval=8,
num_clips=1,
test_mode=True),

test_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=32,
frame_interval=8,
num_clips=3,
test_mode=True),

  1. vitclip_large_k400, clip_len=16, frame_interval=16 for train/val/test pipeline. However, if clip_len=16, frame_interval should be 8?

train_pipeline = [
dict(type='DecordInit'),
dict(type='SampleFrames', clip_len=16, frame_interval=16, num_clips=1),

dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=16,
frame_interval=16,

dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=16,
frame_interval=16,

Thank you.

Hi @BinhuiXie , thanks for your interest in our work. You can safely follow the settings descripbed in the paper. I will update the codes.