/Hypo2Trans

Single-blind supplementary materials for NeurIPS 2023 submission

Primary LanguagePythonMIT LicenseMIT

Hypo2Trans

  • The hypotheses-to-transcription (H2T) training in NeurIPS 2023 and IEEE ASRU 2023

  • Fine-tuning Llama-7b for ASR-LLM Correction

git clone https://github.com/Hypotheses-Paradise

cd Hypo2Trans/H2T-LoRA

python finetune.py \
    --base_model 'yahma/llama-7b-hf' \
    --data_path './data/train_wsj.json' \
    --output_dir './wsj' \
    --lora_target_modules='[q_proj,k_proj, v_proj, o_proj]' \
    --num_epochs=10 \
    --cutoff_len=512 \
    --group_by_length \
    --learning_rate 2e-4 \
    --micro_batch_size=64 \
    --batch_size=256 \
    --lora_r=16
  • Inference with LLaMA-7b + well-trained LoRA
python inference.py \
    --ckpt_path './wsj'
    --test_data_path './data/test_wsj.json'

The table below presents the WER(%) results of H2T-ft and H2T-LoRA in finetuning setting, where $o_{nb}$ and $o_{cp}$ respectively denote n-best oracle and compositional oracle:

Test Set Baseline LM $_{rank}$ T5-ft LLaMA-ft T5-LoRA LLaMA-LoRA $o_{nb}$ $o_{cp}$
WSJ 4.5 4.3-4.4% 4.0-11.1% 3.8-15.6% 2.7-40.0% 2.2-51.1% 4.1 1.2
ATIS 8.3 6.9-16.9% 2.7-67.5% 3.4-59.0% 1.7-79.5% 1.9-77.1% 5.2 1.1
CHiME-4 11.1 11.0-0.9% 7.9-28.8% 8.2-26.1% 7.0-36.9% 6.6-40.5% 9.1 2.8
Tedlium-3 8.5 8.0-5.8% 6.6-22.4% 5.2-38.8% 7.4-12.9% 4.6-45.9% 3.0 0.7
CV-accent 14.8 16.0+8.1% 12.9-12.8% 15.5+4.7% 11.0-25.7% 11.0-25.7% 11.4 7.9
SwitchBoard 15.7 15.4-1.9% 15.9+1.3% 18.4+17.1% 14.9-5.1% 14.1-10.2% 12.6 4.2
LRS2 10.1 9.6-5.0% 9.5-5.9% 10.2+1.0% 6.6-34.7% 8.8-12.9% 6.9 2.6
CORAAL 21.4 21.4-0.0% 23.1+7.9% 22.9+7.0% 20.9-2.3% 19.2-10.3% 21.8 10.7

Reference

  • Please consider to cite NeurIPS 2023 and ASRU 2023 works, thank you.
@inproceedings{yang2023generative,
  title={Generative speech recognition error correction with large language models and task-activating prompting},
  author={Yang, Chao-Han Huck and Gu, Yile and Liu, Yi-Chieh and Ghosh, Shalini and Bulyko, Ivan and Stolcke, Andreas},
  booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}

@inproceedings{chen2023hyporadise,
  title={HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models},
  author={CHEN, CHEN and Hu, Yuchen and Yang, Chao-Han Huck and Siniscalchi, Sabato Marco and Chen, Pin-Yu and Chng, Ensiong},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2023}
}