bokesyo

LLM engineer

@OpenBMBBeijing

bokesyo's Stars

openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python72.1k 587 08.6k
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python32.7k 188 5653.5k
fishaudio/fish-speech
Brand new TTS solution
Language:Python14.8k 99 4121.1k
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python12.7k 140 7261.3k
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language:Jupyter Notebook6.4k 71 996791
huggingface/parler-tts
Inference and training library for high-quality TTS models.
Language:Python4.7k 55 117476
philz1337x/clarity-upscaler
Clarity AI | AI Image Upscaler & Enhancer - free and open-source Magnific Alternative
Language:Python3.9k 31 46409
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Language:Python3.5k 57 71305
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python3.1k 98 116281
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
Language:Python2.7k 30 130217
lucidrains/audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Language:Python2.5k 62 174266
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Language:Python2k 28 131164
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.5k 25 67109
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Language:Python1.4k 11 227195
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
Language:Python1.3k 47 5386
lizhe2004/Awesome-LLM-RAG-Application
the resources about the application based on LLM with RAG pattern
898 15 154
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Language:Python832 32 5396
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
728 44 340
lenML/ChatTTS-Forge
🍦 ChatTTS-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Language:Python703 7 12586
yangdongchao/UniAudio
The Open Source Code of UniAudio
Language:Python524 36 3332
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Language:Python486 16 2241
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Language:Python298 42 021
haoheliu/SemantiCodec-inference
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Language:Python156 5 99
fumiama/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
Language:Python143 5 4220
MagicHub-io/MagicData-RAMC
MagicData-RAMC Dataset and Baseline
Language:Shell51 2 1512
RicherMans/Dasheng
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
Language:Python45 3 33
ParadoxZW/LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
Language:Python32 4 63
RhapsodyAILab/MiniCPM-V-Embedding
Language:Python23 0 61
RhapsodyAILab/Awesome-MiniCPMV-Projects
80
fncokg/UntPlot
UntPhesoca调值格局可视化图的Python绘图包
Language:Python7 1 00

bokesyo

bokesyo's Stars

openai/whisper

2noise/ChatTTS

fishaudio/fish-speech

m-bain/whisperX

pyannote/pyannote-audio

huggingface/parler-tts

philz1337x/clarity-upscaler

facebookresearch/encodec

gpt-omni/mini-omni

lucidrains/vector-quantize-pytorch

lucidrains/audiolm-pytorch

NVlabs/VILA

QwenLM/Qwen-Audio

open-compass/VLMEvalKit

0nutation/SpeechGPT

lizhe2004/Awesome-LLM-RAG-Application

gemelo-ai/vocos

ga642381/speech-trident

lenML/ChatTTS-Forge

yangdongchao/UniAudio

ZhangXInFD/SpeechTokenizer

liutaocode/TTS-arxiv-daily

haoheliu/SemantiCodec-inference

fumiama/Retrieval-based-Voice-Conversion-WebUI

MagicHub-io/MagicData-RAMC

RicherMans/Dasheng

ParadoxZW/LLaVA-UHD-Better

RhapsodyAILab/MiniCPM-V-Embedding

RhapsodyAILab/Awesome-MiniCPMV-Projects

fncokg/UntPlot