DM-Codec

This repository contains the source code for DM-Codec.

As illustrated in Figure 1, DM-Codec introduces speech tokenization approaches using discrete acoustic, semantic, and contextual tokens. DM-Codec integrates these multimodal representations for robust speech tokenization, learning comprehensive speech representations.

The DM-Codec framework is further detailed in Figure 2. The framework consists of an encoder that extracts latent representations from the input speech signal. These latent vectors are subsequently quantized using a Residual Vector Quantizer (RVQ). We designed two distinct distillation approaches: (i) distillation from a language model, and (ii) a combined distillation from both a language model (LM) and a speech model (SM). These approaches integrate acoustic,semantic, and contextual representations into the quantized vectors to improve speech representation for downstream tasks.

Status

We have released code and trained model checkpoints.

More instructions and details will be provided soon.

Model Checkpoints

Model	Description
DM-Codec_checkpoint_LM_SM	Utilizes LM and SM-guided representation distillation approach uniting acoustic, semantic, and contextual representations into DM-Codec.
DM-Codec_checkpoint_LM	Utilizes LM-guided representation distillation approach incorporating acoustic and contextual representations into DM-Codec.

Speech Reconstruction

Below, we provide reconstructed speech samples from DM-Codec and compare them with the reconstructed speech from EnCodec, SpeechTokenizer, and FACodec. Download the audio files to listen.

Codec	Reconstructed Sample 1	Reconstructed Sample 2
Original	Download Sample 1	Download Sample 2
DM-Codec	Download Sample 1	Download Sample 2
EnCodec	Download Sample 1	Download Sample 2
SpeechTokenizer	Download Sample 1	Download Sample 2
FACodec	Download Sample 1	Download Sample 2

mubtasimahasan/DM-Codec

DM-Codec

Status

Model Checkpoints

Speech Reconstruction