VincentDENGP's Stars
meta-llama/llama
Inference code for Llama models
karpathy/LLM101n
LLM101n: Let's build a Storyteller
openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
mlfoundations/open_clip
An open source implementation of CLIP.
QwenLM/Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Kwai-Kolors/Kolors
Kolors Team
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Doubiiu/DynamiCrafter
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
OpenLLMAI/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
zhoubolei/bolei_awesome_posters
CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!
XueFuzhao/OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
mini-sora/minisora
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Lightning-AI/litdata
Transform datasets at scale. Optimize datasets for fast AI model training.
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
dvlab-research/Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
yuweihao/MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
tianyi-lab/HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
lupantech/MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
Psycoy/MixEval
The official evaluation suite and dynamic data release for MixEval.
foundation-model-stack/fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
JUNJIE99/MLVU
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
imagegridworth/IG-VLM
PKU-YuanGroup/Video-Bench
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
jinyucn/Video-Streaming-Research-Papers
Research materials about multimedia network and system, including paper list, tools, etc.