/mlx_bark

Port of Suno's Bark TTS transformer in Apple's MLX Framework

Primary LanguagePython

🐶 MLX Bark

A port of Suno's Bark model in Apple's ML Framework, MLX. Bark is a transformer based text-to-audio model that can generate speech and miscellaneous audio i.e. background noise / music.

Disclaimer

Repository is under active development, but the model is functional. Currently the model has a few dependencies that are not supported in MLX, such as encodec and the tokenizer. I am working on a port for these dependencies and will update the repository as soon as I have a working solution.

Example

Hello World! My name is Bark and I'm running on Apple's new machine learning framework MLX

generation.mp4

TODO

Sorted by priority

  • Add support for MLX based Encodec
  • Add support for MLX based Tokenizer
  • Fix softmax and multinomial sampling issue
  • Add support for large model
  • Support for max_gen_duration and history prompts

Setup

First, install the dependencies:

pip install -r requirements.txt

To convert a model, first download the Bark PyTorch checkpoint and convert the weights to the MLX format. For example, to convert the small model use:

huggingface-cli download suno/bark coarse.pt fine.pt text.pt

Then, convert the weights to the MLX format:

# for large model, specify --model large instead of small
python convert.py --torch_weights_dir weights/ --model small 

Running the model

# Run the model
python model.py --path weights/ --model small --text "hello world my name is bark"

Requirements:

Listed in requirements.txt

Acknowledgements

Thanks to Suno for the original model, weights and training code repository. Also thanks to the MLX team for the MLX framework and examples.

Links: