Welcome to the repository for the Cross-platform News Event Detection (CNED) Dataset. This dataset is a comprehensive collection designed for the advancement of cross-platform news event detection algorithms within the field of computational journalism and social media analysis.
The CNED dataset consists of 37,711 multimodal samples from three different platforms (i.e., 24,607 from Twitter, 9,191 from Flickr and 3,913 from Online News), each paired with corresponding images, meticulously annotated with 40 real-world events. Specifically, this dataset contains 109 languages, aiming to remain as objective as possible. CNED encompasses a wide array of event themes such as political events (elections, referendums, political crises, protests), sports events (the Olympics, soccer matches), and natural disasters (hurricanes, floods), amongst others. With the inclusion of events with subtle differences (e.g., “2016 Summer Olympics” vs. “2018 Winter Olympics”), CNED presents a challenging landscape for detection algorithms, simulating the complexity encountered in real-world scenarios.
For any inquiries regarding the dataset, please feel free to contact Mr. Zehang Lin at cszlin@comp.polyu.edu.hk. If you encounter any issues with the dataset links or have further questions about the data, do not hesitate to contact us.
Contributions to the dataset and the related benchmarks are welcome. If you have suggestions or updates, please contact the repository administrators.
We appreciate all contributors and collaborators who have made this dataset possible and look forward to seeing the innovations it will enable in the realm of cross-platform news event detection.
Please note that the CNED dataset is provided for research purposes only. Any commercial use is strictly prohibited without prior consent.