AItechnology

AItechnology's Stars

DCDmllm/Momentor
Language:Python461
ExplainableML/ReNO
[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Language:Python785
HandsOnLLM/Hands-On-Large-Language-Models
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
Language:Jupyter Notebook1.7k299
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.4k85
johannakarras/DreamPose
Official implementation of "DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion"
Language:Python96573
mayuelala/FollowYourPose
[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos"
Language:Python1.2k87
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
Language:Jupyter Notebook2.8k275
SparksJoe/Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
Language:Python381
BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
38112
modelscope/FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Language:Python3.5k377
sming256/OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Language:Python16811
levihsu/OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Language:Python5.5k806
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
Language:Python1.7k177
tyxsspa/AnyText
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Language:Python4.3k280
LargeWorldModel/LWM
Language:Python7.1k550
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
Language:Python2.7k170
segmind/segmoe
Language:Python40424
Flode-Labs/vid2densepose
Convert your videos to densepose and use it on MagicAnimate
Language:Python980127
modelscope/scepter
SCEPTER is an open-source framework used for training, fine-tuning, and inference with generative models.
Language:Python40522
crowsonkb/k-diffusion
Karras et al. (2022) diffusion models for PyTorch
Language:Python2.3k375
magic-research/magic-animate
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Language:Python10.4k1.1k
openai/consistencydecoder
Consistency Distilled Diff VAE
Language:Python2.1k75
SkyworkAI/Skywork
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数，训练数据，评估数据，评估方法。
Language:Python1.2k111
Zeqiang-Lai/Mini-DALLE3
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Language:Python29928
ximinng/DiffSketcher
[NIPS 2023] Official implementation for "DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models" https://arxiv.org/abs/2306.14685
Language:Python23625
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python6.1k542
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Language:Python2.7k175
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Language:Jupyter Notebook51328
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
Language:Python3.9k339
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Language:Python3.8k302