zhang0jhon

zhang0jhon's Stars

black-forest-labs/flux
Official inference repo for FLUX.1 models
Language:Python15.6k1.1k
PixArt-alpha/PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Language:Python1.7k82
Fanghua-Yu/SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
Language:Python4.4k381
luosiallen/latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Language:Python4.4k227
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python22.1k2.2k
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python6.3k560
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook12.1k1.1k
meta-llama/llama-models
Utilities intended for use with Llama models.
Language:Python4.7k812
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Language:Jupyter Notebook15.1k1.4k
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python6.7k682
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
2.5k216
zhang0jhon/otamatch
Language:Python21
diff-usion/Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
Language:HTML11k944
facebookresearch/MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Language:Python1.2k54
baaivision/EVA
EVA Series: Visual Representation Fantasies from BAAI
Language:Python2.3k167
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Language:Jupyter Notebook7.1k451
FoundationVision/Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Language:Python55659
hustvl/Vim
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Language:Python3k196
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Language:Python3.2k277
meta-llama/llama3
The official Meta Llama 3 GitHub site
Language:Python27k3.1k
apple/ml-ferret
Language:Python8.5k497
xai-org/grok-1
Grok open release
Language:Python49.5k8.3k
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
Language:Jupyter Notebook2.8k274
IDEA-Research/T-Rex
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Language:Python2.2k144
babysor/MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Language:Python35.3k5.2k
CorentinJ/Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Language:Python52.6k8.8k
facefusion/facefusion
Industry leading face manipulation platform
Language:Python19.4k2.9k
Rudrabha/Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Language:Python10.7k2.3k
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
Language:Python3.2k247
HumanAIGC/EMO
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
7.5k909