ESSumm: Extractive Speech Summarization from Untranscribed Meeting

The code base for ESSumm: Extractive Speech Summarization from Untranscribed Meeting
Jun Wang

NEWS

[22-06-15] 🔥 ESSumm is accepted at INTERSPEECH 2022.

Abstract

In this paper, we propose a novel architecture for direct extractive speech-to-speech summarization, ESSumm, which is an unsupervised model without dependence on intermediate transcribed text. Different from previous methods with text presentation, we are aimed at generating a summary directly from speech without transcription. First, a set of smaller speech segments are extracted based on speech signal's acoustic features. For each candidate speech segment, a distance-based summarization confidence score is designed for latent speech representation measure. Specifically, we leverage the off-the-shelf self-supervised convolutional neural network to extract the deep speech features from raw audio. Our approach automatically predicts the optimal sequence of speech segments that capture the key information with a target summary length. Extensive results on two well-known meeting datasets (AMI and ICSI corpora) show the effectiveness of our direct speech-based method to improve the summarization quality with untranscribed data. We also observe that our unsupervised speech-based method even performs on par with recent transcript-based summarization approaches, where extra speech recognition is required.

Features

The first automatic speech summarization system with Wav2vec 2.0.
Support major Meeting datasets: AMI and ICSI.

Installation

See installation instructions.

Getting Started

See Getting Started with ESSumm.

Citation

Please cite our work if you found it useful,

@inproceedings{wang22n_interspeech,
  author={Jun Wang},
  title={{ESSumm: Extractive Speech Summarization from Untranscribed Meeting}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={3243--3247},
  doi={10.21437/Interspeech.2022-945}
}

License

This project is released under the Apache 2.0 license.

Acknowledgement

The source code of ESSumm is based on CoreRank and wav2vec 2.0.

HenryJunW/ESSumm