/SyllableLM

Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Primary LanguagePython

SyllableLM

Official Public Code for "SyllableLM: Learning Coarse Semantic Units for Speech Language Models"

Paper: https://arxiv.org/abs/2410.04029

In submission to ICLR 2025

Setup:

conda create -n syllablelm python=3.9
conda activate syllablelm

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu115

pip install omegaconf
pip install timm

SylBoost:

Checkpoints

SylBoost Model KMeans Agglomerative Clustering
8.33Hz Model KMeans Agglom
6.25Hz Model KMeans Agglom
5.0Hz Model KMeans Agglom

Usage

SylBoost inference and efficient extraction code in extract_units.py

People have had trouble setting up Data2Vec2 so I copied it and stripped it. No Fairseq reqired!

sylboost_reader = SylBoostFeatureReader(
        '/path/to/model.pt'
        '/path/to/kmeans.npy',
        '/path/to/agglom.npy',
        '8.33Hz',  # '6.25Hz', '5.0Hz'
    )

SyllableLM:

Checkpoints

SyllableLM Model
6.25Hz Base Model
6.25Hz Large Model
6.25Hz Interleaved Vocoder LM Model

Usage

Todo: migrate code over and facilitate twist dependency.

Resynthesis:

Todo

Continuation Pipeline:

Todo

LossPred:

This will be provided as-is

SylBoost training:

This will be provided as-is

SyllableLM training:

This is standard language model training and will be provided as is.