1349949's Stars
xtekky/gpt4free
The official gpt4free repository | various collection of powerful language models
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LiLittleCat/awesome-free-chatgpt
🆓免费的 ChatGPT 镜像网站列表,持续更新。List of free ChatGPT mirror sites, continuously updated.
Stability-AI/StableLM
StableLM: Stability AI Language Models
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
haoel/haoel.github.io
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Mooler0410/LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
deep-floyd/IF
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
OpenDriveLab/UniAD
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
GT-RIPL/Awesome-LLM-Robotics
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
baaivision/Painter
Painter & SegGPT Series: Vision Foundation Models from BAAI
lamini-ai/lamini
The Official Python Client for Lamini's API
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
chenking2020/FindTheChatGPTer
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
open-mmlab/Multimodal-GPT
Multimodal-GPT
microsoft/MM-REACT
Official repo for MM-REACT
exiawsh/StreamPETR
[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
facebookresearch/SWAG
Official repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.
google-research/slot-attention-video
mit-han-lab/flatformer
[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer