/SIG-Challenge

Primary LanguagePythonMIT LicenseMIT

Speech Signal Improvement Challenge – ICASSP 2024

The Speech Signal Improvement Challenge Grand Challenge proposal at ICASSP 2024 is intended to stimulate research in the area of improving the speech signal quality in communication systems. The speech signal quality is measured with SIG in ITU-T P.804 and is still a top issue in audio communication and conferencing systems.

This challenge is to benchmark the performance of real-time speech enhancement models with a real (not simulated) test set. The audio scenario is the send signal in telecommunication; it does not include echo impairments. Participants will evaluate their speech enhancement model on a test set and submit the results (clips) for evaluation.

For more details about the challenge, please visit the challenge website. The paper will be released soon.

Training data

The datasets are provided under the original terms that Microsoft received such datasets. For the training data, we suggest participants to use AEC-Challenge data and DNS-Challenge data, presented in the Dataset licenses section. Nevertheless, participants could use any other publicly available data for the training.

Data synthesizer

We released a demo data synthesizer which can be used to generate distorted and noisy samples from clean audio files. While we strongly encourage participants to utilize and enhance this synthesizer, they are also free to employ alternative methods of their preference.

Global processing and latency checker

We released a Python script, designed for verifying that your model is compliant with the latency requirements specified by the challenge. We highly recommend that participants rigorously assess the compatibility of their architecture using this script. Regarding generative models, this check could be ignored.

Evaluation metrics

Our evaluation will be based on subjective listening test. We suggest participants to evaluate models also in accordance with the DNSMOS P.835, the SIG metric being directly correlated with the signal quality. We have also developed the SigMOS estimator, which estimates the P.804 audio quality dimensions. This model was trained using subjectively annotated data from P.804 to mimic human perception of audio quality. Nevertheless, participants could use any metrics for the model's evaluation.

We provide an example subjectively annotated with MOS 5 for the LOUDNESS dimension. This example might help participants to tune their algorithm in terms of loudness.

Datasets

  • Test set is available in test_data directory. Moreover, we release the transcripts for the test set, such that the participants could compute Word Error Rate (WER) on the test set.
  • Blind set is available in blind_data directory.

Citation

If you use this dataset in a publication please cite the following paper:

@inproceedings{ristea2024icassp,
  title={ICASSP 2024 Speech Signal Improvement Challenge},
  author={Ristea, Nicolae Catalin and Saabas, Ando and Cutler, Ross and Naderi, Babak and Braun, Sebastian and Branets, Solomiya},
  booktitle={ICASSP},
  year={2024}
}

The previous challenges are:

@article{cutler2024icassp,
  title={ICASSP 2023 speech signal improvement challenge},
  author={Cutler, Ross and Saabas, Ando and Naderi, Babak and Ristea, Nicolae-C{\u{a}}t{\u{a}}lin and Braun, Sebastian and Branets, Solomiya},
  journal={IEEE Open Journal of Signal Processing},
  year={2024},
  publisher={IEEE}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.