Echo0125's Stars
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
programthink/books
【编程随想】收藏的电子书清单(多个学科,含下载链接)
ruanyf/free-books
互联网上的免费书籍
KwaiVGI/LivePortrait
Bring portraits to life!
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
QwenLM/Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
Dooy/chatgpt-web-midjourney-proxy
One UI is all done with chatgpt web, midjourney, gpts,suno,luma,runway,viggle,flux,ideogram,realtime,pika; Simultaneous support Web / PWA / Linux / Win / MacOS platform
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
PKU-YuanGroup/MoE-LLaVA
Mixture-of-Experts for Large Vision-Language Models
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
apple/ml-4m
4M: Massively Multimodal Masked Modeling
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
facebookresearch/MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
FoundationVision/OmniTokenizer
OmniTokenizer: one model and one weight for image-video joint tokenization.
IDEA-Research/MotionLLM
[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
baaivision/DIVA
Diffusion Feedback Helps CLIP See Better
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
sming256/OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
zhaoyue-zephyrus/bsq-vit
[arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization
yuecao0119/MMInstruct
The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". The MMInstruct dataset includes 973K instructions from 24 domains and four instruction types.
brown-palm/AntGPT
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
EasyRy/RepKPU
Point Cloud Upsampling with Kernel Point Representation and Deformation