🐶 MLX Bark

A port of Suno's Bark model in Apple's ML Framework, MLX. Bark is a transformer based text-to-audio model that can generate speech and miscellaneous audio i.e. background noise / music.

Disclaimer

Repository is under active development, but the model is functional. Currently the model has a few dependencies that are not supported in MLX, such as encodec and the tokenizer. I am working on a port for these dependencies and will update the repository as soon as I have a working solution.

Example

Hello World! My name is Bark and I'm running on Apple's new machine learning framework MLX

generation.mp4

TODO

Sorted by priority

Add support for MLX based Encodec
Add support for MLX based Tokenizer
Fix softmax and multinomial sampling issue
Add support for large model
Support for max_gen_duration and history prompts

Setup

First, install the dependencies:

pip install -r requirements.txt

To convert a model, first download the Bark PyTorch checkpoint and convert the weights to the MLX format. For example, to convert the small model use:

huggingface-cli download suno/bark coarse.pt fine.pt text.pt

Then, convert the weights to the MLX format:

# for large model, specify --model large instead of small
python convert.py --torch_weights_dir weights/ --model small

Running the model

# Run the model
python model.py --path weights/ --model small --text "hello world my name is bark"

Requirements:

Listed in requirements.txt

Acknowledgements

Thanks to Suno for the original model, weights and training code repository. Also thanks to the MLX team for the MLX framework and examples.

Links:

j-csc/mlx_bark