buliaoyin

buliaoyin's Stars

FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python58143
udlbook/udlbook
Understanding Deep Learning - Simon J.D. Prince
Language:Jupyter Notebook5.3k1.1k
KwaiVGI/LivePortrait
Make one portrait alive!
Language:Python89380
CH563/shot-easy-website
Take a screenshot online and compresses images in browser with Webassembly
Language:JavaScript30935
fishaudio/fish-speech
Brand new TTS solution
Language:Python4.4k351
jerry20181228/fake-screenshot
Language:HTML9322
OutofAi/StableFace
Build your own Face App with Stable Diffusion 2.1
Language:Jupyter Notebook11711
Ikaros-521/RealtimeSTT_LLM_TTS
实时STT，连接OpenAI接口/智谱AI（流式LLM）和GPT-SOVITS/Edge-TTS，通过网页的方式，进行跨网络的服务调用，实现实时对话的效果
Language:Python11522
CosmosShadow/gptpdf
Using GPT to parse PDF
Language:Python2k127
ByungKwanLee/Full-Segment-Anything
This is Pytorch Implementation Code for adding new features in code of Segment-Anything. Here, the features support batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size
Language:Python1189
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
Language:Python5.9k428
Calcium-Ion/new-api
AI模型接口管理与分发系统，支持将多种大模型转为OpenAI格式调用、支持Midjourney Proxy、Suno、Rerank，兼容易支付协议，仅供个人或者企业内部管理与分发渠道使用，请勿用于商业用途，本项目基于One API二次开发。
Language:Go2.3k589
TeamWiseFlow/wiseflow
Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and uploads them to the database.
Language:JavaScript892111
DachunKai/EvTexture
[ICML 2024] EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Language:Python81047
PeterH0323/Streamer-Sales
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️
Language:Python1.4k203
MontrealCorpusTools/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
Language:Python1.3k242
openvpi/DiffSinger
An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Language:Python2.6k275
zzw922cn/awesome-speech-recognition-speech-synthesis-papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
2.9k510
niedev/RTranslator
Open source real-time translation app for Android that runs locally
Language:C++5.4k406
ZuodaoTech/everyone-can-use-english
人人都能用英语
Language:TypeScript21.4k3.4k
fudan-generative-vision/hallo
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Language:Python6.1k747
pcb9382/PlateRecognition
License-Plate-Recognition 支持12种车牌检测识别，包含yolov5,yolov7,yolov8车牌检测，车牌矫正，车牌识别等，准确率高达99.5% 还有车牌数据集提供下载
Language:C++18223
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
Language:Python61.4k7.7k
ali-vilab/MimicBrush
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
Language:Python78259
smalltong02/keras-llm-robot
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
Language:Python19824
hiyouga/LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Language:Python25.7k3.2k
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language:Jupyter Notebook5.5k725
Rudrabha/Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Language:Python9.7k2.1k
AMAAI-Lab/MidiCaps
A large-scale dataset of caption-annotated MIDI files.
Language:Python33
THU-MIG/yolov10
YOLOv10: Real-Time End-to-End Object Detection
Language:Python8.1k676