https://github.com/ggerganov/llama.cpp
Blog: https://mistral.ai/news/mistral-nemo/
Hugging Face: https://huggingface.co/mistralai/Mistral-Nemo-Base-2407
Text-To-Speech:
Speech-To-Text: https://github.com/mozilla/DeepSpeech/
Rust: https://github.com/utilityai/llama-cpp-rs
Python: https://github.com/abetlen/llama-cpp-python
Download de modelos do Huggingface Hub e convertê-los pro formato GGML/GGUF do llama.cpp : https://github.com/akx/ggify
Load-balancer stateful feito pro llama.cpp: https://github.com/distantmagic/paddler