Pinned Repositories
AI-Youtube-Shorts-Generator
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
api4sensevoice
API and websocket server for sensevoice. It has inherited some enhanced features, such as VAD detection, real-time streaming recognition, and speaker verification.
chatwiki
CompreFace
Leading free and open-source face recognition system
demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
easegen-admin
easegen-front
echomimic
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
echomimic_v2
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
yaojun's Repositories
yaojun/vad
Voice activity detector (VAD) for the browser with a simple API
yaojun/echomimic_v2
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
yaojun/echomimic
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
yaojun/JoyVASA
yaojun/Thinking-Claude
Let your Claude able to think
yaojun/Hunyuan3D-1
yaojun/TANGO
yaojun/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
yaojun/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
yaojun/LiveTalking
Real time interactive streaming digital human
yaojun/hallo2
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
yaojun/surya
OCR, layout analysis, reading order, table recognition in 90+ languages
yaojun/fish-speech
Brand new TTS solution
yaojun/easegen-front
yaojun/easegen-admin
yaojun/streaming-sensevoice
Pseudo Streaming SenseVoice with Hotwords
yaojun/api4sensevoice
API and websocket server for sensevoice. It has inherited some enhanced features, such as VAD detection, real-time streaming recognition, and speaker verification.
yaojun/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
yaojun/yt-dlp
A feature-rich command-line audio/video downloader
yaojun/insightface
State-of-the-art 2D and 3D Face Analysis Project
yaojun/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
yaojun/JoyHallo
JoyHallo: Digital human model for Mandarin
yaojun/CompreFace
Leading free and open-source face recognition system
yaojun/Modelscope_Faster_Whisper_Multi_Subtitle
基于Faster-whisper和modelscope一键生成双语字幕,双语字幕生成器,基于离线大模型,Generate bilingual subtitles with one click based on Faster-whisper and modelscope. Off-line large model
yaojun/VideoLingo
Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
yaojun/Westlake-Omni
yaojun/AI-Youtube-Shorts-Generator
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
yaojun/PySceneDetect
:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.
yaojun/Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
yaojun/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.