AItechnology's Stars
DCDmllm/Momentor
ExplainableML/ReNO
[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
HandsOnLLM/Hands-On-Large-Language-Models
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
johannakarras/DreamPose
Official implementation of "DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion"
mayuelala/FollowYourPose
[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos"
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
SparksJoe/Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
modelscope/FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
sming256/OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
levihsu/OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
tyxsspa/AnyText
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
LargeWorldModel/LWM
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
segmind/segmoe
Flode-Labs/vid2densepose
Convert your videos to densepose and use it on MagicAnimate
modelscope/scepter
SCEPTER is an open-source framework used for training, fine-tuning, and inference with generative models.
crowsonkb/k-diffusion
Karras et al. (2022) diffusion models for PyTorch
magic-research/magic-animate
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
openai/consistencydecoder
Consistency Distilled Diff VAE
SkyworkAI/Skywork
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
Zeqiang-Lai/Mini-DALLE3
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
ximinng/DiffSketcher
[NIPS 2023] Official implementation for "DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models" https://arxiv.org/abs/2306.14685
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)