loveunk's Stars
nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
XingangPan/DragGAN
Official Code for DragGAN (SIGGRAPH 2023)
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
OpenBMB/ChatDev
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
stas00/ml-engineering
Machine Learning Engineering Open Book
ludwig-ai/ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
huggingface/trl
Train transformer language models with reinforcement learning.
OpenBMB/XAgent
An Autonomous LLM Agent for Complex Task Solving
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
lencx/Noi
🚀 Power Your World with AI - Explore, Extend, Empower.
OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
apple/ml-mgie
amazon-science/mm-cot
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Yuliang-Liu/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
punica-ai/punica
Serving multiple LoRA finetuned LLM as one
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
Efficient-Large-Model/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Ucas-HaoranWei/Vary-toy
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
lupantech/ScienceQA
Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".
allenai/unified-io-2
Vision-CAIR/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
HyperGAI/HPT
HPT - Open Multimodal LLMs from HyperGAI