Echo0125

KwaiVGINanJing

Echo0125's Stars

vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python29.3k 243 5.1k4.4k
programthink/books
【编程随想】收藏的电子书清单（多个学科，含下载链接）
18.3k 949 1313.3k
ruanyf/free-books
互联网上的免费书籍
14.9k 541 222.6k
KwaiVGI/LivePortrait
Bring portraits to life!
Language:Python12.7k 110 3641.3k
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python12.4k 102 565869
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python10.4k 162 7642.3k
QwenLM/Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
Language:Shell7.5k 42 772460
Dooy/chatgpt-web-midjourney-proxy
One UI is all done with chatgpt web, midjourney, gpts,suno,luma,runway,viggle,flux,ideogram,realtime,pika; Simultaneous support Web / PWA / Linux / Win / MacOS platform
Language:JavaScript5.1k 36 5071.3k
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Language:Python5.1k 32 535424
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！
Language:Python2.8k 19 187170
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
Language:Python2.1k 28 167141
PKU-YuanGroup/MoE-LLaVA
Mixture-of-Experts for Large Vision-Language Models
Language:Python2k 24 92124
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
Language:Python1.7k 23 106176
apple/ml-4m
4M: Massively Multimodal Masked Modeling
Language:Python1.6k 33 2194
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Language:Python1.3k 22 6053
facebookresearch/MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Language:Python1.2k 12 3153
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Language:Python951 18 6350
mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Language:Python804 10 3358
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Language:Python673 20 3727
BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
387 5 3012
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Language:HTML333 13 417
FoundationVision/OmniTokenizer
OmniTokenizer: one model and one weight for image-video joint tokenization.
Language:Python250 4 197
IDEA-Research/MotionLLM
[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Language:Python239 3 118
baaivision/DIVA
Diffusion Feedback Helps CLIP See Better
Language:Python212 8 911
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Language:Python211 5 2615
sming256/OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Language:Python173 3 2911
zhaoyue-zephyrus/bsq-vit
[arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization
Language:Python82 5 50
yuecao0119/MMInstruct
The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". The MMInstruct dataset includes 973K instructions from 24 domains and four instruction types.
Language:Python30 4 10
brown-palm/AntGPT
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Language:Python19 2 02
EasyRy/RepKPU
Point Cloud Upsampling with Kernel Point Representation and Deformation
Language:Python14 3 13

Echo0125

Echo0125's Stars

vllm-project/vllm

programthink/books

ruanyf/free-books

KwaiVGI/LivePortrait

OpenBMB/MiniCPM-V

NVIDIA/Megatron-LM

QwenLM/Qwen2

Dooy/chatgpt-web-midjourney-proxy

THUDM/GLM-4

modelscope/data-juicer

THUDM/CogVLM2

PKU-YuanGroup/MoE-LLaVA

Vchitect/Latte

apple/ml-4m

FoundationVision/LlamaGen

facebookresearch/MetaCLIP

LTH14/mar

mbzuai-oryx/LLaVA-pp

TencentARC/Open-MAGVIT2

BradyFU/Video-MME

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

FoundationVision/OmniTokenizer

IDEA-Research/MotionLLM

baaivision/DIVA

mbzuai-oryx/VideoGPT-plus

sming256/OpenTAD

zhaoyue-zephyrus/bsq-vit

yuecao0119/MMInstruct

brown-palm/AntGPT

EasyRy/RepKPU