/bark.cpp

Port of Suno AI's Bark in C/C++ for fast inference

Primary LanguageC++MIT LicenseMIT

bark.cpp

bark.cpp

Actions Status License: MIT

Roadmap / encodec.cpp / ggml

Inference of SunoAI's bark model in pure C/C++.

Description

With bark.cpp, our goal is to bring real-time realistic multilingual text-to-speech generation to the community.

  • Plain C/C++ implementation without dependencies
  • AVX, AVX2 and AVX512 for x86 architectures
  • CPU and GPU compatible backends
  • Mixed F16 / F32 precision
  • 4-bit, 5-bit and 8-bit integer quantization
  • Metal and CUDA backends

Models supported

Models we want to implement! Please open a PR :)

Demo on Google Colab (#95)


Here is a typical run using bark.cpp:

make -j && ./main -p "This is an audio generated by bark.cpp"

   __               __
   / /_  ____ ______/ /__        _________  ____
  / __ \/ __ `/ ___/ //_/       / ___/ __ \/ __ \
 / /_/ / /_/ / /  / ,<    _    / /__/ /_/ / /_/ /
/_.___/\__,_/_/  /_/|_|  (_)   \___/ .___/ .___/
                                  /_/   /_/

bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222

Generating semantic tokens: [========>                                          ] (17%)

bark_print_statistics:   sample time =    10.98 ms / 138 tokens
bark_print_statistics:  predict time =   614.96 ms / 4.46 ms per token
bark_print_statistics:    total time =   633.54 ms

Generating coarse tokens: [==================================================>] (100%)

bark_print_statistics:   sample time =     3.75 ms / 410 tokens
bark_print_statistics:  predict time =  3263.17 ms / 7.96 ms per token
bark_print_statistics:    total time =  3274.00 ms

Generating fine tokens: [==================================================>] (100%)

bark_print_statistics:   sample time =    38.82 ms / 6144 tokens
bark_print_statistics:  predict time =  4729.86 ms / 0.77 ms per token
bark_print_statistics:    total time =  4772.92 ms

write_wav_on_disk: Number of frames written = 65600.

main:     load time =   324.14 ms
main:     eval time =  8806.57 ms
main:    total time =  9131.68 ms

Here are typical audio pieces generated by bark.cpp:

audio1.mp4
audio2.mp4

Usage

Here are the steps to use Bark.cpp

Get the code

git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
git submodule update --init --recursive

Build

In order to build bark.cpp you must use CMake:

mkdir build
cd build
cmake ..
cmake --build . --config Release

Prepare data & Run

# Install Python dependencies
python3 -m pip install -r requirements.txt

# Download the Bark checkpoints and vocabulary
python3 download_weights.py --out-dir ./models --models bark-small bark

# Convert the model to ggml format
python3 convert.py --dir-model ./models/bark-small --use-f16

# run the inference
./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4

(Optional) Quantize weights

Weights can be quantized using the following strategy: q4_0, q4_1, q5_0, q5_1, q8_0.

Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.

./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0

Seminal papers

Contributing

bark.cpp is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be

  • bug report: you may encounter a bug while using bark.cpp. Don't hesitate to report it on the issue section.
  • feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions.
  • pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you.

Coding guidelines

  • Avoid adding third-party dependencies, extra files, extra headers, etc.
  • Always consider cross-compatibility with other operating systems and architectures