TheSouthFrog's Stars
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Stability-AI/generative-models
Generative Models by Stability AI
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Stability-AI/StableLM
StableLM: Stability AI Language Models
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
dvlab-research/LongLoRA
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
dvlab-research/LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
chflame163/ComfyUI_LayerStyle
A set of nodes for ComfyUI that can composite layer and mask to achieve Photoshop like functionality.
open-mmlab/Multimodal-GPT
Multimodal-GPT
OpenGenerativeAI/llm-colosseum
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
hotshotco/Hotshot-XL
✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
allenai/mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
opendilab/LMDrive
[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models
hzxie/CityDreamer
The official implementation of "CityDreamer: Compositional Generative Model of Unbounded 3D Cities". (Xie et al., CVPR 2024)
sambanova/bloomchat
This repo contains the data preparation, tokenization, training and inference code for BLOOMChat. BLOOMChat is a 176 billion parameter multilingual chat model based on BLOOM.
dvlab-research/3D-Box-Segment-Anything
We extend Segment Anything to 3D perception by combining it with VoxelNeXt.
ID-Animator/ID-Animator
dvlab-research/Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
deepcs233/Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
yangjianxin1/LongQLoRA
LongQLoRA: Extent Context Length of LLMs Efficiently
dvlab-research/Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
LengSicong/Tell2Design
[ACL2023 Area Chair Award] Official repo for the paper "Tell2Design: A Dataset for Language-Guided Floor Plan Generation".
gyxxyg/TRACE
[Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
smthemex/ComfyUI_ID_Animator
A ID_Animator node
pHaeusler/codef-experiments
camenduru/ID-Animator-jupyter