zhang0jhon's Stars
black-forest-labs/flux
Official inference repo for FLUX.1 models
PixArt-alpha/PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Fanghua-Yu/SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
luosiallen/latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
meta-llama/llama-models
Utilities intended for use with Llama models.
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
zhang0jhon/otamatch
diff-usion/Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
facebookresearch/MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
baaivision/EVA
EVA Series: Visual Representation Fantasies from BAAI
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
FoundationVision/Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
hustvl/Vim
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
meta-llama/llama3
The official Meta Llama 3 GitHub site
apple/ml-ferret
xai-org/grok-1
Grok open release
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
IDEA-Research/T-Rex
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
babysor/MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
CorentinJ/Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
facefusion/facefusion
Industry leading face manipulation platform
Rudrabha/Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
HumanAIGC/EMO
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions