/Chaotic-World

[ICCV2023] Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events

Chaotic-World

This is the official repository for
[2023 ICCV] Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events
Kian Eng ONG, Xun Long NG, Yanchao LI, Wenjie AI, Kuangyi ZHAO, Si Yong YEO, Jun LIU

[NEW - 06 Feb 2024] We are organizing the 2024 ICME Grand Challenge: Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) based on this dataset. The Grand Challenge starts on 06 Feb 2024 and will end on 25 March 2024. More details can be found at https://sutdcv.github.io/MMVRAC

Download dataset and codes here

Paper

ICCV2023

ResearchGate

Citation

@InProceedings{Ong_2023_ICCV,
author    = {Ong, Kian Eng and Ng, Xun Long and Li, Yanchao and Ai, Wenjie and Zhao, Kuangyi and Yeo, Si Yong and Liu, Jun},
title     = {Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month     = {October},
year      = {2023},
pages     = {20213-20223}
}

Abstract

Understanding and analyzing human behaviors (actions and interactions of people), voices, and sounds in chaotic events is crucial in many applications, e.g., crowd management, emergency response services. Different from human behaviors in daily life, human behaviors in chaotic events are generally different in how they behave and influence others, and hence are often much more complex. However, currently there is lack of a large video dataset for analyzing human behaviors in chaotic situations. To this end, we create the first large and challenging multi-modal dataset, Chaotic World, that simultaneously provides different levels of fine-grained and dense spatio-temporal annotations of sounds, individual actions and group interaction graphs, and even text descriptions for each scene in each video, thereby enabling a thorough analysis of complicated behaviors in crowds and chaos. Our dataset consists of a total of 299,923 annotated instances for detecting human behaviors for Spatiotemporal Action Localization in chaotic events, 224,275 instances for identifying interactions between people for Behavior Graph Analysis in chaotic events, 336,390 instances for localizing relevant scenes of interest in long videos for Spatiotemporal Event Grounding, and 378,093 instances for triangulating the source of sound for Event Sound Source Localization. Given the practical complexity and challenges in chaotic events (e.g., large crowds, serious occlusions, complicated interaction patterns), our dataset shall be able to facilitate the community to develop, adapt, and evaluate various types of advanced models for analyzing human behaviors in chaotic events. We also design a simple yet effective IntelliCare model with a Dynamic Knowledge Pathfinder module that intelligently learns from multiple tasks and can analyze various aspects of a chaotic scene in a unified architecture. This method achieves promising results in experiments. Dataset and code can be found at https://github.com/sutdcv/Chaotic-World.

Acknowledgement and Contributors

  • Foo Lin Geng
  • Goh Jet Wei
  • Hui Xiaofei
  • Li Rui
  • Lu Mingqi
  • Peng Duo
  • Qu Haoxuan
  • Shu Xiu
  • Wang Pengfei
  • Umali Mike Guil Anonuevo
  • Xu Li
  • Zhang Wenxiao