A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus.
- Built-in Speech Enhancement
- Word-level Alignment
pip install kfa
Note
audio.wav
Input audio sample rate should be in 16kHz. Use ffmpeg or any other tools to resample the audio before processing.
ffmpeg -i audio_orig.wav -ac 1 -ar 16000 audio.wav
kfa -a audio.wav -t text.txt -o alignments.jsonl
# Output as Whisper style JSON format
kfa -a audio.wav -t text.txt --format whisper -o alignments.json
from kfa import align, create_session
import librosa
with open("test.txt") as infile:
text = infile.read()
y, sr = librosa.load("text.wav", sr=16000, mono=True)
session = create_session()
for alignment in align(y, sr, text, session=session):
print(alignment)
- MMS: Scaling Speech Technology to 1000+ languages
- CTC FORCED ALIGNMENT API TUTORIAL
- Phonetisaurus
- Fine-Tune Wav2Vec2 for English ASR with 🤗 Transformers
- Thai Wav2vec2 model to ONNX model
Apache-2.0