TIB dataset for abstractive summarization of long multimodal videoconference records

TIB is an English dataset for abstractive summarization of long multimodal presentations, introduced in the paper TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records, published at CBMI 2023.

It is a collection of 9,103 videoconference records extracted from the German National Library of Science and Technology (TIB) archive, along with their metadata, an abstract and automatically processed transcripts and key frames.

It is hosted on the Hugging Face dataset hub as the repository gigant/tib and can be used easily using the datasets library.

Relevant links:

The dataset
The conference paper
A blog post introducing the dataset
The TIB AV-Portal

giganttheo/tib-dataset

TIB dataset for abstractive summarization of long multimodal videoconference records