markgao-916

markgao-916's Stars

hacksider/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
Language:Python41k 243 5456k
hiroi-sora/Umi-OCR
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。
Language:Python27.4k 146 6022.8k
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。
Language:Python18.1k 97 6041.3k
Cinnamon/kotaemon
An open-source RAG-based tool for chatting with your documents.
Language:Python17.5k 100 2881.4k
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python12.7k 105 589889
TencentARC/PhotoMaker
PhotoMaker [CVPR 2024]
Language:Jupyter Notebook9.6k 101 162768
NirDiamant/RAG_Techniques
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.
Language:Jupyter Notebook8.7k 110 17894
Huanshere/VideoLingo
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组
Language:Python7k 41 198683
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Language:Python6.1k 53 157524
fudan-generative-vision/hallo2
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Language:Python4.4k 412 46622
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Language:Python3.2k 27 369202
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.6k 28 47174
ddean2009/MoneyPrinterPlus
AI一键批量生成各类短视频,自动批量混剪短视频,自动把视频发布到抖音,快手,小红书,视频号上,赚钱从来没有这么容易过! 支持本地语音模型chatTTS,fasterwhisper,GPTSoVITS,支持云语音：Azure,阿里云,腾讯云。支持Stable diffusion,comfyUI直接AI生图。Generate short videos with one click using AI LLM,print money together! support:chatTTS,faster-whisper,GPTSoVITS,Azure,tencent Cloud,Ali Cloud.
Language:Python2.4k 24 79459
Dicklesworthstone/llm_aided_ocr
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
Language:Python2.2k 19 17147
MhLiao/DB
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
Language:Python2.1k 43 365481
linyqh/NarratoAI
利用AI大模型，一键解说并剪辑视频； Using AI models to automatically provide commentary and edit videos with a single click.
Language:Python2k 9 41234
kijai/ComfyUI-SUPIR
SUPIR upscaling wrapper for ComfyUI
Language:Python1.6k 16 15788
resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
Language:Python1.4k 18 47145
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.4k 28 19187
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Language:Python968 39 5559
lenML/Speech-AI-Forge
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Language:Python859 13 151113
mk-minchul/AdaFace
Language:Jupyter Notebook659 7 152121
NVlabs/EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Language:Python539 32 2045
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
Language:Python448 31 2330
yeyupiaoling/AudioClassification-Pytorch
The Pytorch implementation of sound classification supports EcapaTdnn, PANNS, TDNN, Res2Net, ResNetSE and other models, as well as a variety of preprocessing methods.
Language:Python416 7 3485
balazik/ComfyUI-PuLID-Flux
PuLID-Flux ComfyUI implementation
Language:Python400 4 5128
tanshuai0219/EDTalk
[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation
Language:Python350 15 4032
jnjaby/KEEP
[ECCV'24] Kalman-Inspired Feature Propagation for Video Face Super-Resolution
Language:Python319 25 1215
kepengxu/PGTFormer
[IJCAI'24] Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer
Language:Python185 7 1823
lightChaserX/alphaLDM
Matting by Generation
33 13 31