marinero4972's Stars
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
datawhalechina/self-llm
《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合**宝宝的部署教程
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
catcathh/UltraPixel
Implementation of UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks
Open3DA/LL3DA
[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
TangYuan96/MiniGPT-3D
[MM 2024] [Need a RTX 3090] MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
dvlab-research/LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
qizekun/ShapeLLM
[ECCV 2024] ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
ZiyuGuo99/Point-Bind_Point-LLM
Align 3D Point Cloud with Multi-modalities for Large Language Models
LLaVA-VL/LLaVA-NeXT
UMass-Foundation-Model/3D-LLM
Code for 3D-LLM: Injecting the 3D World into Large Language Models
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
GraphPKU/MachineLearning2024
NVIDIA/MinkowskiEngine
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
pengsongyou/openscene
[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies
YunzeMan/Lexicon3D
[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
ScanNet/ScanNet
dk-liang/UniSeg3D
[NeurIPS 2024] A Unified Framework for 3D Scene Understanding
OpenRobotLab/EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
OpenRobotLab/Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
OpenRobotLab/PointLLM
[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds
52CV/CVPR-2024-Papers
graphdeco-inria/gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
3DTopia/LGM
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
skyhehe123/ScatterFormer
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention (ECCV 2024)
black-forest-labs/flux
Official inference repo for FLUX.1 models
deepseek-ai/DeepSeek-MoE
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
davidmrau/mixture-of-experts
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538