This is the offical implemenet of the paper KNN-CTC: ENHANCING ASR VIA RETRIEVAL OF CTC PSEUDO LABELS
To get started, follow these steps:
- Use the provided recipe from Wenet to download and prepare AISHELL-1 in the
.list
format. Alternatively, you can manually download the AISHELL-1 dataset from OpenSLR. The format should adhere to the following structure:
{"key": "BAC009S0764W0121", "wav": "/mnt/sda/jiaming_space/datasets/aishell/data_aishell/wav/test/S0764/BAC009S0764W0121.wav", "txt": "甚至出现交易几乎停滞的情况"}
{"key": "BAC009S0764W0122", "wav": "/mnt/sda/jiaming_space/datasets/aishell/data_aishell/wav/test/S0764/BAC009S0764W0122.wav", "txt": "一二线城市虽然也处于调整中"}
{"key": "BAC009S0764W0123", "wav": "/mnt/sda/jiaming_space/datasets/aishell/data_aishell/wav/test/S0764/BAC009S0764W0123.wav", "txt": "但因为聚集了过多公共资源"}
- Download the AISHELL-1 checkpoint from wenet_pretrain_models
Once the data is prepared, you can run the in-domain KNN-CTC on AISHELL-1 using the following script:
# for knn-CTC (pruned)
bash knn_run_aishell-1.sh --stage 1 --stop-stage 2 --lmbda 0.5 --use_null_mask True --decode_skip_blank True --dstore_size 1798000
# for knn-CTC (full)
bash knn_run_aishell-1.sh --stage 1 --stop-stage 2 --lmbda 0.4 --use_null_mask Fasle --decode_skip_blank False --dstore_size 13000000
This project relies on the following open-source frameworks:
Special thanks to the developers and contributors of Wenet and knn-transformers for their incredible work and dedication.
@INPROCEEDINGS{10447075,
author={Zhou, Jiaming and Zhao, Shiwan and Liu, Yaqi and Zeng, Wenjia and Chen, Yong and Qin, Yong},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels},
year={2024},
volume={},
number={},
pages={11006-11010},
keywords={Codes;Signal processing;Natural language processing;Acoustics;Speech processing;Task analysis;Automatic speech recognition;speech recognition;CTC;retrieval-augmented method;datastore construction},
doi={10.1109/ICASSP48485.2024.10447075}}