zuo1188

a simple programer

xiaomi automotivebeijing

zuo1188's Stars

microsoft/TaskMatrix
Language:Python34.3k 316 3393.4k
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python20.9k 158 1.6k2.3k
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language:Python20.5k 305 1.4k2.6k
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Language:Jupyter Notebook15.5k 115 3951.4k
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
Language:Python8.4k 99 93783
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python7.1k 44 310712
princeton-vl/infinigen
Infinite Photorealistic Worlds using Procedural Generation
Language:Python5.9k 88 325492
amazon-science/mm-cot
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
Language:Python3.8k 56 54317
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language:Python3.1k 37 237253
gligen/GLIGEN
Open-Set Grounded Text-to-Image Generation
Language:Python2k 35 88152
microsoft/X-Decoder
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
Language:Python1.3k 34 69137
wzzheng/TPVFormer
[CVPR 2023] An academic alternative to Tesla's occupancy network for autonomous driving.
Language:Python1.2k 34 72109
hustvl/MapTR
[ICLR'23 Spotlight & IJCV'24] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
Language:Python1.2k 41 185178
facebookresearch/home-robot
Mobile manipulation research tools for roboticists
Language:Python961 31 169133
hustvl/VAD
[ICCV 2023] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
Language:Python786 30 9582
JeffWang987/OpenOccupancy
[ICCV 2023] OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
Language:Python601 13 5751
OpenDriveLab/OccNet
[ICCV 2023] OccNet: Scene as Occupancy
Language:Python576 16 4951
Vision-CAIR/ChatCaptioner
Official Repository of ChatCaptioner
Language:Jupyter Notebook457 4 828
DerryHub/BEVFormer_tensorrt
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
Language:Python447 5 8171
OpenGVLab/Instruct2Act
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Language:Python341 3 2420
zhangyp15/OccFormer
[ICCV 2023] OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
Language:Python340 9 2425
atfortes/Awesome-Multimodal-Reasoning
Collection of papers and resources on Multimodal Reasoning, including Vision-Language Models, Multimodal Chain-of-Thought, Visual Inference, and others.
219 4 011
autonomousvision/nuplan_garage
[ARXIV'23] Parting with Misconceptions about Learning-based Vehicle Motion Planning
Language:Python177 17 311
PrieureDeSion/drive-any-robot
Official code and checkpoint release for "GNM: A General Navigation Model to Drive Any Robot".
Language:Python165 13 1526
Tsinghua-MARS-Lab/ViP3D
Language:Python150 9 2721
JonDoe-297/cross-view
[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation
Language:Python148 4 1121
JinkyuKimUCB/BDD-X-dataset
Berkeley Deep Drive-X (eXplanation) dataset
106 4 31
PrieureDeSion/visualnav-transformer
Official code and checkpoint release for "ViNT: A Foundation Model for Visual Navigation".
951
Vision-CAIR/3DCoMPaT-v2
3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
Language:Python78 7 107
tomguluson92/SCAT
SCAT: Stride Consistency with Auto-regressive regressor and Transformer for hand pose estimation (ICCVW 2021)
4 2 10

zuo1188

zuo1188's Stars

microsoft/TaskMatrix

haotian-liu/LLaVA

microsoft/unilm

IDEA-Research/Grounded-Segment-Anything

facebookresearch/ImageBind

IDEA-Research/GroundingDINO

princeton-vl/infinigen

amazon-science/mm-cot

OpenGVLab/Ask-Anything

gligen/GLIGEN

microsoft/X-Decoder

wzzheng/TPVFormer

hustvl/MapTR

facebookresearch/home-robot

hustvl/VAD

JeffWang987/OpenOccupancy

OpenDriveLab/OccNet

Vision-CAIR/ChatCaptioner

DerryHub/BEVFormer_tensorrt

OpenGVLab/Instruct2Act

zhangyp15/OccFormer

atfortes/Awesome-Multimodal-Reasoning

autonomousvision/nuplan_garage

PrieureDeSion/drive-any-robot

Tsinghua-MARS-Lab/ViP3D

JonDoe-297/cross-view

JinkyuKimUCB/BDD-X-dataset

PrieureDeSion/visualnav-transformer

Vision-CAIR/3DCoMPaT-v2

tomguluson92/SCAT