ego4d-eccv2022-solutions

It is our solutions repository for Ego4D challenges in ECCV2022 workshop.

Ego4D Slides (in Chinese) Ego4D Solutions (in Chinese)

📢News

(2023/10/10) We use the weights pre-trained on the verb subset for TAL of Perception Test task and bring 2 points of performance improvement. The extracted features can be download at here.

(2023/07/13) We release the ViT-L weights finetuned on Ego4D-MQ dataset.

(2023/04/11) 🚀We release the leading model of SCOD task.

(2022/12/11) 🚀🚀We release code and checkpoints of pretraining, FHP task and SCOD task.

(2022/12/01) 🚀The VideoMAE features for MQ and NLQ are released.

(2022/11/17) 🔄The repository is created.

Catalog

Video Features for MQ and NLQ.

We provide the video features extracted by VideoMAE-L pretrained on verb and noun subset.

Feature	Baidu Netdisk	Zenodo
MQ(verb)	Download. code: sxda	Download
NLQ(verb)	Download. code: teod	Download
NLQ(noun)	Download. code: wrop	Download

You can check more details in our techical report.

Pretraining.

Our training strategy is based on the vanilla method and is easy to follow. We use VideoMAE codebase for training and validation. Before training, you have to follow it to install the python environment. We split the training annotations filtered by EgoVLP for rapid development. The second-filtered annotations files are available here. We release the checkpoints in the below table.

Method	Pretrain	Resolution	Subset	Top-1	Top-5	Weights
ViT-L	K700	224x224	verb	52.51	86.05	Download
ViT-L	K700	224x224	noun	33.41	85.51	Download
ViT-L	K700+verb	224x224	MQ	-	-	Download
UniFormer-B	K600	320x320	verb	49.30	83.61	Download

Note: For the ViT-L weight finetuned on MQ tasks, some keys of state_dict may need to modify to adapt the model code.

Training

We provide the training script on SLURM mode. If you want to use PyTorch-DDP mode, you can use scripts in scripts/pytorch_ddp.

bash scripts/slurm/ego4d_verb_slurm_pretrain_vitl_k400.sh

In the script, you need to set the approaiate OUTPUT_DIR and MODEL_PATH.

STA.

Training

We use the ViT-Large model to train the STA task.

sh scripts/slurm/sta_train.sh

Validation

cd forecasting_eval
sh sta_val.sh

FHP

Training

We train the FHP task using Uniformer-B and the weights pretrained on Ego4D verb subset. We provide the training script on SLURM mode. If you want to use PyTorch-DDP mode, you can use scripts in scripts/pytorch_ddp.

bash scripts/slurm/ego4d_hands_uniformer.sh

In the script, you need to set the approaiate OUTPUT_DIR and MODEL_PATH.

Validation

We also provide the script for validation and testing. You can launch the script below to validate a specific checkpoint's performance.

bash scripts/slurm/ego4d_hands_uniformer_val.sh

In the script, you need to set the approaiate OUTPUT_DIR, MODEL_PATH, --test_subset and --test_num_segment.

SCOD

Our detection code for SCOD is developed on top of MMDetection.

Download: the converted annotations for SCOD: download

We report the performance on the validation set and release the checkpoint in the below table.

Method	Pretrain	Resolution	AP	AP50	AP75	Config	Download
UniFormer-L	IN-1K	800-1600/2000	24.8	44.2	24.0	config	ckpt \| log
Swin-L	IN-22K+O365	800-1600/2000	36.4	56.5	37.6	config	ckpt \| log

To train UniFormer-L + DINO on the SCOD training set with 8 gpus for 12 epochs:

sh tools/dist_train.sh configs/scod/dino_5scale_uniformer-l_8x2_12e_scod_imagenet1k.py 8

To test UniFormer-L + DINO on the SCOD validation set with 8 gpus:

sh tools/dist_test.sh configs/scod/dino_5scale_uniformer-l_8x2_12e_scod_imagenet1k.py <ckpt-path> 8 --eval bbox

It should give:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.248
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.442
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.240
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.002
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.075
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.282
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.054
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.321
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.697

🎓Citation

If this work is helpful for your research, please consider citing our techical report.

@article{chen2022ego4d,
  title={InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges},
  author={Chen, Guo and Xing, Sen and Chen, Zhe and Wang, Yi and Li, Kunchang and Li, Yizhuo and Liu, Yi and Wang, Jiahao and Zheng, Yin-Dong and Huang, Bingkun and others},
  journal={arXiv preprint arXiv:2211.09529},
  year={2022}
}

OpenGVLab/ego4d-eccv2022-solutions

ego4d-eccv2022-solutions

📢News

Catalog

Video Features for MQ and NLQ.

Pretraining.

Training

STA.

Training

Validation

FHP

Training

Validation

SCOD

🎓Citation