lujiale621's Stars
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Stirling-Tools/Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
fishaudio/fish-speech
Brand new TTS solution
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
niedev/RTranslator
Open source real-time translation app for Android that runs locally
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
YaoFANGUK/video-subtitle-extractor
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
yisol/IDM-VTON
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
pytorch/torchchat
Run PyTorch LLMs locally on servers, desktop and mobile
NapNeko/NapCatQQ
现代化的基于 NTQQ 的 Bot 协议端实现
ShareGPT4Omni/ShareGPT4Video
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
jsksxs360/How-to-use-Transformers
Transformers 库快速入门教程
muzishen/IMAGDressing
👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
lxfater/BilibiliSummary
A chrome extension helps you summary video on bilibili.
X-T-E-R/Uni-TTS
本项目意图在于让使用各类语音合成引擎的方式变得统一,支持多种语音合成引擎适配器,允许直接作为模组使用或启动后端服务
R3gm/SoniTranslate
Synchronized Translation for Videos. Video dubbing
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Xiaojiu-z/Stable-Hair
Stable-Hair: Real-World Hair Transfer via Diffusion Model
X-T-E-R/GPT-SoVITS-Inference
Inference Specialization
OpenGVLab/video-mamba-suite
The suite of modeling video with Mamba
yeliudev/R2-Tuning
🌀 R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)
wenet-e2e/wesubtitle
用 OCR 提取视频硬字幕
sudo-Boris/mr-Blip
Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"