SyllableLM

Official Public Code for "SyllableLM: Learning Coarse Semantic Units for Speech Language Models"

In submission to ICLR 2025

Setup:

conda create -n syllablelm python=3.9
conda activate syllablelm

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu115

pip install omegaconf
pip install timm

SylBoost:

Checkpoints

SylBoost	Model	KMeans	Agglomerative Clustering
8.33Hz	Model	KMeans	Agglom
6.25Hz	Model	KMeans	Agglom
5.0Hz	Model	KMeans	Agglom

Usage

SylBoost inference and efficient extraction code in extract_units.py

People have had trouble setting up Data2Vec2 so I copied it and stripped it. No Fairseq reqired!

sylboost_reader = SylBoostFeatureReader(
        '/path/to/model.pt'
        '/path/to/kmeans.npy',
        '/path/to/agglom.npy',
        '8.33Hz',  # '6.25Hz', '5.0Hz'
    )

SyllableLM:

Checkpoints

SyllableLM	Model
6.25Hz Base	Model
6.25Hz Large	Model
6.25Hz Interleaved Vocoder LM	Model

Usage

Todo: migrate code over and facilitate twist dependency.

Resynthesis:

Todo

Continuation Pipeline:

Todo

LossPred:

This will be provided as-is

SylBoost training:

This will be provided as-is

SyllableLM training:

This is standard language model training and will be provided as is.

AlanBaade/SyllableLM

SyllableLM

Setup:

SylBoost:

Checkpoints

Usage

SyllableLM:

Checkpoints

Usage

Resynthesis:

Continuation Pipeline:

LossPred:

SylBoost training:

SyllableLM training: