dionylon

Making change

China

dionylon's Stars

FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python3.4k311
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python6.3k673
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python12.5k1k
KoljaB/RealtimeSTT
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Language:Python2.1k185
Ikaros-521/RealtimeSTT_LLM_TTS
实时STT，连接OpenAI接口/智谱AI（流式LLM）和GPT-SOVITS/Edge-TTS，通过网页的方式，进行跨网络的服务调用，实现实时对话的效果
Language:Python25340
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python7k740
chinese-poetry/chinese-poetry
The most comprehensive database of Chinese poetry 🧶最全中华古诗词数据库, 唐宋两朝近一万四千古诗人, 接近5.5万首唐诗加26万宋诗. 两宋时期1564位词人，21050首词。
Language:JavaScript48.2k9.7k
PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Language:Python44.3k7.8k
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python35.7k4.1k
ultralytics/yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Language:Python50.9k16.4k
CASIA-IVA-Lab/FastSAM
Fast Segment Anything
Language:Python7.5k709
CircleRadon/Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Language:Python77143
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Language:Python5.1k385
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python20.3k2.2k
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
14.5k974
ltdrdata/ComfyUI-Manager
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.
Language:JavaScript6.9k905
comfyanonymous/ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Language:Python56.9k6k
KevinWang676/Bark-Voice-Cloning
Bark Voice Cloning and Voice Cloning for Chinese Speech
Language:Jupyter Notebook2.8k401
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Language:Python7.7k762
RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
Language:Python24.6k3.6k
OpenBMB/XAgent
An Autonomous LLM Agent for Complex Task Solving
Language:Python8.2k845
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python35.5k4.3k
synesthesiam/opentts
Open Text to Speech Server
Language:Python954133
serp-ai/bark-with-voice-clone
🔊 Text-prompted Generative Audio Model - With the ability to clone voices
Language:Jupyter Notebook3.2k426
josStorer/RWKV-Runner
A RWKV management and startup tool, full automation, only 8MB. And provides an interface compatible with the OpenAI API. RWKV is a large language model that is fully open source and available for commercial use.
Language:TypeScript5.3k503
apache/skywalking
APM, Application Performance Monitoring System
Language:Java23.9k6.5k
oliverschwendener/ueli
Cross-Platform Keystroke Launcher
Language:TypeScript3.7k242
Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Language:Python168k44.4k
yihong0618/xiaogpt
Play ChatGPT and other LLM with Xiaomi AI Speaker
Language:Python6.3k877
GuyTevet/motion-diffusion-model
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
Language:Python3.1k342

dionylon

dionylon's Stars

FunAudioLLM/SenseVoice

FunAudioLLM/CosyVoice

SYSTRAN/faster-whisper

KoljaB/RealtimeSTT

Ikaros-521/RealtimeSTT_LLM_TTS

modelscope/FunASR

chinese-poetry/chinese-poetry

PaddlePaddle/PaddleOCR

RVC-Boss/GPT-SoVITS

ultralytics/yolov5

CASIA-IVA-Lab/FastSAM

CircleRadon/Osprey

QwenLM/Qwen-VL

haotian-liu/LLaVA

HumanAIGC/AnimateAnyone

ltdrdata/ComfyUI-Manager

comfyanonymous/ComfyUI

KevinWang676/Bark-Voice-Cloning

Plachtaa/VALL-E-X

RVC-Project/Retrieval-based-Voice-Conversion-WebUI

OpenBMB/XAgent

coqui-ai/TTS

synesthesiam/opentts

serp-ai/bark-with-voice-clone

josStorer/RWKV-Runner

apache/skywalking

oliverschwendener/ueli

Significant-Gravitas/AutoGPT

yihong0618/xiaogpt

GuyTevet/motion-diffusion-model