Hypo2Trans

The hypotheses-to-transcription (H2T) training in NeurIPS 2023 and IEEE ASRU 2023
Fine-tuning Llama-7b for ASR-LLM Correction

git clone https://github.com/Hypotheses-Paradise

cd Hypo2Trans/H2T-LoRA

python finetune.py \
    --base_model 'yahma/llama-7b-hf' \
    --data_path './data/train_wsj.json' \
    --output_dir './wsj' \
    --lora_target_modules='[q_proj,k_proj, v_proj, o_proj]' \
    --num_epochs=10 \
    --cutoff_len=512 \
    --group_by_length \
    --learning_rate 2e-4 \
    --micro_batch_size=64 \
    --batch_size=256 \
    --lora_r=16

Inference with LLaMA-7b + well-trained LoRA

python inference.py \
    --ckpt_path './wsj'
    --test_data_path './data/test_wsj.json'

The table below presents the WER(%) results of H2T-ft and H2T-LoRA in finetuning setting, where $o_{nb}$ and $o_{cp}$ respectively denote n-best oracle and compositional oracle:

Test Set	Baseline	LM $_{rank}$	T5-ft	LLaMA-ft	T5-LoRA	LLaMA-LoRA	$o_{nb}$	$o_{cp}$
WSJ	4.5	4.3_-4.4%	4.0_-11.1%	3.8_-15.6%	2.7_-40.0%	2.2_-51.1%	4.1	1.2
ATIS	8.3	6.9_-16.9%	2.7_-67.5%	3.4_-59.0%	1.7_-79.5%	1.9_-77.1%	5.2	1.1
CHiME-4	11.1	11.0_-0.9%	7.9_-28.8%	8.2_-26.1%	7.0_-36.9%	6.6_-40.5%	9.1	2.8
Tedlium-3	8.5	8.0_-5.8%	6.6_-22.4%	5.2_-38.8%	7.4_-12.9%	4.6_-45.9%	3.0	0.7
CV-accent	14.8	16.0_+8.1%	12.9_-12.8%	15.5_+4.7%	11.0_-25.7%	11.0_-25.7%	11.4	7.9
SwitchBoard	15.7	15.4_-1.9%	15.9_+1.3%	18.4_+17.1%	14.9_-5.1%	14.1_-10.2%	12.6	4.2
LRS2	10.1	9.6_-5.0%	9.5_-5.9%	10.2_+1.0%	6.6_-34.7%	8.8_-12.9%	6.9	2.6
CORAAL	21.4	21.4_-0.0%	23.1_+7.9%	22.9_+7.0%	20.9_-2.3%	19.2_-10.3%	21.8	10.7

Reference

Please consider to cite NeurIPS 2023 and ASRU 2023 works, thank you.

@inproceedings{yang2023generative,
  title={Generative speech recognition error correction with large language models and task-activating prompting},
  author={Yang, Chao-Han Huck and Gu, Yile and Liu, Yi-Chieh and Ghosh, Shalini and Bulyko, Ivan and Stolcke, Andreas},
  booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}

@inproceedings{chen2023hyporadise,
  title={HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models},
  author={CHEN, CHEN and Hu, Yuchen and Yang, Chao-Han Huck and Siniscalchi, Sabato Marco and Chen, Pin-Yu and Chng, Ensiong},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2023}
}

Hypotheses-Paradise/Hypo2Trans

Hypo2Trans

Reference