Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models (LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents. Also, we include other Foundation Models (CLIP, SAM) for the whole picture of this area.
This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star this repo.
- [2023-12-16] Xianzheng Ma and Yash Bhalgat curated this list and published the first version;
- [2024-01-06] Runsen Xu added chronological information and Xianzheng Ma reorganized it in Z-A order for better following the latest advances.
Date | keywords | Institute (first) | Paper | Publication | Others |
---|---|---|---|---|---|
2023-5-20 | 3D-CLR | UCLA | 3D Concept Learning and Reasoning from Multi-View Images | CVPR'2023 | github |
- | Transcribe3D | TTI, Chicago | Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning | CoRL'2023 | github |
Date | keywords | Institute | Paper | Publication | Others |
---|---|---|---|---|---|
2023-11-29 | ShapeGPT | Fudan University | ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model | Arxiv | github |
2023-11-27 | MeshGPT | TUM | MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers | Arxiv | project |
2023-10-19 | 3D-GPT | ANU | 3D-GPT: Procedural 3D Modeling with Large Language Models | Arxiv | github |
2023-9-21 | LLMR | MIT | LLMR: Real-time Prompting of Interactive Worlds using Large Language Models | Arxiv | github |
2023-9-20 | DreamLLM | MEGVII | DreamLLM: Synergistic Multimodal Comprehension and Creation | Arxiv | github |
2023-4-1 | ChatAvatar | Deemos Tech | DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance | ACM TOG | website |
Date | keywords | Institute | Paper | Publication | Others |
---|---|---|---|---|---|
2023-12-26 | EmbodiedScan | Shanghai AI Lab | EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI | Arxiv | github |
2023-12-17 | M3DBench | Fudan University | M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts | Arxiv | github |
2023-11-29 | - | DeepMind | Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects | Arxiv | github |
2022-10-14 | SQA3D | BIGAI | SQA3D: Situated Question Answering in 3D Scenes | ICLR'2023 | github |
2021-12-20 | ScanQA | RIKEN AIP | ScanQA: 3D Question Answering for Spatial Scene Understanding | CVPR'2023 | github |
2020-12-3 | Scan2Cap | TUM | Scan2Cap: Context-aware Dense Captioning in RGB-D Scans | CVPR'2021 | github |
2020-8-23 | ReferIt3D | Stanford | ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes | ECCV'2020 | github |
2019-12-18 | ScanRefer | TUM | ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language | ECCV'2020 | github |
your contributions are always welcome!
I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.
If you have any questions about this opinionated list, please get in touch at xianzheng@robots.ox.ac.uk or Wechat ID: mxz1997112.
This repo is inspired by Awesome-LLM