buliaoyin's Stars
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
udlbook/udlbook
Understanding Deep Learning - Simon J.D. Prince
KwaiVGI/LivePortrait
Make one portrait alive!
CH563/shot-easy-website
Take a screenshot online and compresses images in browser with Webassembly
fishaudio/fish-speech
Brand new TTS solution
jerry20181228/fake-screenshot
OutofAi/StableFace
Build your own Face App with Stable Diffusion 2.1
Ikaros-521/RealtimeSTT_LLM_TTS
实时STT,连接OpenAI接口/智谱AI(流式LLM)和GPT-SOVITS/Edge-TTS,通过网页的方式,进行跨网络的服务调用,实现实时对话的效果
CosmosShadow/gptpdf
Using GPT to parse PDF
ByungKwanLee/Full-Segment-Anything
This is Pytorch Implementation Code for adding new features in code of Segment-Anything. Here, the features support batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
Calcium-Ion/new-api
AI模型接口管理与分发系统,支持将多种大模型转为OpenAI格式调用、支持Midjourney Proxy、Suno、Rerank,兼容易支付协议,仅供个人或者企业内部管理与分发渠道使用,请勿用于商业用途,本项目基于One API二次开发。
TeamWiseFlow/wiseflow
Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and uploads them to the database.
DachunKai/EvTexture
[ICML 2024] EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
PeterH0323/Streamer-Sales
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️
MontrealCorpusTools/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
openvpi/DiffSinger
An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
zzw922cn/awesome-speech-recognition-speech-synthesis-papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
niedev/RTranslator
Open source real-time translation app for Android that runs locally
ZuodaoTech/everyone-can-use-english
人人都能用英语
fudan-generative-vision/hallo
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
pcb9382/PlateRecognition
License-Plate-Recognition 支持12种车牌检测识别,包含yolov5,yolov7,yolov8车牌检测,车牌矫正,车牌识别等,准确率高达99.5% 还有车牌数据集提供下载
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
ali-vilab/MimicBrush
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
smalltong02/keras-llm-robot
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
hiyouga/LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Rudrabha/Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
AMAAI-Lab/MidiCaps
A large-scale dataset of caption-annotated MIDI files.
THU-MIG/yolov10
YOLOv10: Real-Time End-to-End Object Detection