hailin-shi's Stars
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
UX-Decoder/Semantic-SAM
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
apple/ml-ferret
Surrey-UP-Lab/RegionSpot
Recognize Any Regions
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
baichuan-inc/Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
mbzuai-oryx/Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
meta-llama/llama
Inference code for Llama models
ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
OptimalScale/LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
lutzroeder/netron
Visualizer for neural network, deep learning and machine learning models
CompVis/stable-diffusion
A latent text-to-image diffusion model
zalandoresearch/fashion-mnist
A MNIST-like fashion product database. Benchmark :point_down:
cvlab-epfl/EPnP
EPnP: Efficient Perspective-n-Point Camera Pose Estimation
lucasjinreal/yolov7_d2
🔥🔥🔥🔥 (Earlier YOLOv7 not official one) YOLO with Transformers and Instance Segmentation, with TensorRT acceleration! 🔥🔥🔥
Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
SmallStoneSK/github-star-trend
一个可以查看项目Star增长趋势的Chrome插件
JDAI-CV/CoTNet
This is an official implementation for "Contextual Transformer Networks for Visual Recognition".
JDAI-CV/fast-reid
SOTA Re-identification Methods and Toolbox
JDAI-CV/centerX
This repo is implemented based on detectron2 and centernet