Text-conditioned-3d-generation

The page will maintain various algorithms on text-conditioned 3d content generation, from object, human to scene.

Text-conditioned-3d-generation

0. Surveys

[2023-arXiv] State of the Art on Diffusion Models for Visual Computing, [paper]

[2023-arXiv] AIGC for Various Data Modalities: A Survey, [paper]

[2023-arXiv] Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era, [paper]

[2023-arXiv] Text-guided Image-and-Shape Editing and Generation: A Short Survey, [paper]

1. Object

1.1 Text-conditioned 3D Object Generation

[2023-arXiv] DreamComposer: Controllable 3D Object Generation via Multi-View Conditions, [paper] [project]

[2023-arXiv] AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation, [paper] [project]

[2023-arXiv] X³: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies, [paper] [project]

[2023-arXiv] Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes, [paper] [project]

[2023-arXiv] Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views, [paper] [project]

[2023-arXiv] GPT4Point: A Unified Framework for Point-Language Understanding and Generation, [paper] [project]

[2023-arXiv] X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation, [paper] [project]

[2023-arXiv] ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation, [paper] [project]

[2023-arXiv] ControlDreamer: Stylized 3D Generation with Multi-View ControlNet, [paper] [project]

[2023-arXiv] DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding, [paper]

[2023-arXiv] StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D, [paper]

[2023-arXiv] LucidDreaming: Controllable Object-Centric 3D Generation, [paper]

[2023-arXiv] GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs, [paper] [project]

[2023-SIGGRAPHAsia] HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image, [paper]

[2023-arXiv] DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling, [paper] [project]

[2023-arXiv] GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation, [paper] [project]

[2023-arXiv] RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D, [paper] [project]

[2023-arXiv] ET3D: Efficient Text-to-3D Generation via Multi-View Distillation, [paper]

[2023-arXiv] Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion, [paper] [project]

[2023-arXiv] Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to 3D Prior with Progressive Learning, [paper]

[2023-arXiv] MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation, [paper] [project]

[2023-arXiv] ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model, [paper] [project]

[2023-arXiv] MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers, [paper] [project]

[2023-arXiv] Surf-D: High-Quality Surface Generation for Arbitrary Topologies using Diffusion Models, [paper] [project]

[2023-arXiv] GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise, [paper]

[2023-arXiv] FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation, [paper]

[2023-arXiv] LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching, [paper] [project]

[2023-arXiv] MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture, [paper] [project]

[2023-arXiv] One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion, [paper] [project]

[2023-arXiv] DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model, [paper] [project]

[2023-arXiv] Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model, [paper] [project]

[2023-arXiv] Instant3D: Instant Text-to-3D Generation, [paper] [project]

[2023-arXiv] 3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation, [paper] [project]

[2023-MM] 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models, [paper] [project]

[2023-arXiv] Mesh Neural Cellular Automata, [paper] [project]

[2023-arXiv] Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video, [paper] [project]

[2023-MM] Control3D: Towards Controllable Text-to-3D Generation, [paper]

[2023-arXiv] LRM: Large Reconstruction Model for Single Image to 3D, [paper] [project]

[2023-NeurIPS] ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image, [paper] [project]

[2023-SIGGRAPHAsia&TOG] EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape Generation, [paper] [project]

[2023-arXiv] Text-to-3D with Classifier Score Distillation, [paper] [project]

[2023-arXiv] Generative Neural Fields by Mixtures of Neural Implicit Functions, [paper]

[2023-arXiv] Wonder3D: Single Image to 3D using Cross-Domain Diffusion, [paper] [project]

[2023-arXiv] Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model, [paper] [project]

[2023-arXiv] DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior, [paper] [project]

[2023-arXiv] HyperFields: Towards Zero-Shot Generation of NeRFs from Text, [paper]

[2023-arXiv] Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts, [paper] [project]

[2023-arXiv] Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping, [paper] [project]

[2023-arXiv] ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion, [paper] [project]

[2023-arXiv] IPDreamer: Appearance-Controllable 3D Object Generation with Image Prompts, [paper]

[2023-arXiv] HiFi-123: Towards High-fidelity One Image to 3D Content Generation, [paper] [project]

[2023-arXiv] Consistent123: Improve Consistency for One Image to 3D Object Synthesis, [paper] [project]

[2023-arXiv] GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors, [paper] [project]

[2023-arXiv] SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D, [paper] [project]

[2023-arXiv] TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields, [paper] [project]

[2023-arXiv] DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation, [paper] [project]

[2023-arXiv] Text-to-3D using Gaussian Splatting, [paper] [project]

[2023-arXiv] Progressive Text-to-3D Generation for Automatic 3D Prototyping, [paper]

[2023-ICCVW] Looking at words and points with attention: a benchmark for text-to-shape coherence, [paper]

[2023-arXiv] Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following, [paper] [project]

[2023-arXiv] Chasing Consistency in Text-to-3D Generation from a Single Image, [paper]

[2023-arXiv] HOLOFUSION: Towards Photo-realistic 3D Generative Modeling, [paper]

[2023-arXiv] EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior, [paper]

[2023-arXiv] MVDream: Multi-view Diffusion for 3D Generation, [paper] [project]

[2023-arXiv] One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization, [paper] [project]

[2023-arXiv] Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors, [paper] [project]

[2023-arXiv] Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior, [paper] [project]

[2023-arXiv] Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation, [paper]

[2023-arXiv] Shap-E: Generating Conditional 3D Implicit Functions, [paper]

[2023-arXiv] MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR, [paper] [project]

[2023-arXiv] Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation, [paper] [project]

[2023-arXiv] Magic3D: High-Resolution Text-to-3D Content Creation, [paper] [project]

[2023-arXiv] TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision, [paper] [project]

[2023-arXiv] 3D-LDM: Neural Implicit 3D Shape Generation with Latent Diffusion Models, [paper]

[2023-arXiv] Autodecoding Latent 3D Diffusion Models, [paper] [project]

[2023-arXiv] SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation, [paper] [project]

[2023-arXiv] Pushing the Limits of 3D Shape Generation at Scale, [paper]

[2023-arXiv] 3DGen: Triplane Latent Diffusion for Textured Mesh Generation, [paper]

[2023-arXiv] ATT3D: Amortized Text-to-3D Object Synthesis, [paper] [project]

[2023-arXiv] HyperNeRFGAN: Hypernetwork approach to 3D NeRF GAN, [paper]

[2023-arXiv] Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation, [paper]

[2023-arXiv] DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation, [paper]

[2023-arXiv] HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance, [paper] [project]

[2023-arXiv] ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation, [paper] [project]

[2023-arXiv] TextMesh: Generation of Realistic 3D Meshes From Text Prompts, [paper] [project]

[2023-arXiv] Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond, [paper]

[2023-arXiv] Debiasing Scores and Prompts of 2D Diffusion for Robust Text-to-3D Generation, [paper]

[2023-arXiv] Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation, [paper] [project]

[2023-arXiv] DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model, [paper] [project]

[2023-arXiv] Text-driven Visual Synthesis with Latent Diffusion Prior, [paper] [project]

[2022-arXiv] Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures, [paper]

[2022-arXiv] DreamFusion: Text-to-3D using 2D Diffusion, [paper] [project]

[2022-arXiv] Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation, [paper] [project]

[2022-arXiv] Understanding Pure CLIP Guidance for Voxel Grid NeRF Models, [paper] [project]

[2021-arXiv] Zero-Shot Text-Guided Object Generation with Dream Fields, [paper] [project]

[2023-arXiv] Deceptive-NeRF: Enhancing NeRF Reconstruction using Pseudo-Observations from Diffusion Models, [paper]

[2023-AAAI] 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation, [paper]

[2023-CVPR] Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models, [paper] [project]

[2023-arXiv] SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation, [paper] [project]

[2022-arXiv] Point·E: A System for Generating 3D Point Clouds from Complex Prompts, [paper] [project]

[2022-arXiv] LION: Latent Point Diffusion Models for 3D Shape Generation, [paper]

[2022-arXiv] Fast Point Cloud Generation with Straight Flows, [paper]

[2023-arXiv] Learning Versatile 3D Shape Generation with Improved AR Models, [paper]

[2023-arXiv] Zero3D: Semantic-Driven 3D Shape Generation For Zero-shot Learning, [paper]

[2023-ICLR] MeshDiffusion: Score-based Generative 3D Mesh Modeling, [paper] [project]

[2023-arXiv] ISS++: Image as Stepping Stone for Text-Guided 3D Shape Generation, [paper] [project]

[2022-arXiv] ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation, [paper] [project]

[2023-arXiv] CLIP-Mesh: Generating textured meshes from text using pretrained image-text models, [paper] [project]

[2018-arXiv] Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences, [paper]

[2023-arXiv] T2TD: Text-3D Generation Model based on Prior Knowledge Guidance, [paper]

[2023-arXiv] ZeroForge: Feedforward Text-to-Shape Without 3D Supervision, [paper] [project]

[2022-arXiv] Diffusion-SDF: Text-to-Shape via Voxelized Diffusion, [paper]

[2022-arXiv] ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model, [paper]

[2022-arXiv] CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language, [paper] [project]

[2021-arXiv] CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation, [paper] [project]

[2022-arXiv] Towards Implicit Text-Guided 3D Shape Generation, [paper] [project]

[2019-arXiv] Generation High resolution 3D model from natural language by Generative Adversarial Network, [paper]

[2018-arXiv] Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings, [paper] [project]

[2023-arXiv] MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation, [paper] [project]

[2023-arXiv] 3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models, [paper] [project]

[2023-arXiv] Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation, [paper] [project]

[2023-arXiv] Joint Representation Learning for Text and 3D Point Cloud, [paper]

[2023-arXiv] CLIP2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data, [paper]

[2023-arXiv] ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding, [paper] [project]

[2022-arXiv] ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding, [paper] [project]

[2022-arXiv] Neural Shape Compiler: A Unified Framework for Transforming between Text, Point Cloud, and Program, [paper]

[2023-arXiv] Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words, [paper]

1.2 Text-conditioned 3D Object Editing

[2023-arXiv] Consistent Latent Diffusion for Mesh Texturing, [paper]

[2023-arXiv] TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes, [paper]

[2023-arXiv] Mesh-Guided Neural Implicit Field Editing, [paper] [project]

[2023-arXiv] SPiC·E : Structural Priors in 3D Diffusion Models using Cross-Entity Attention, [paper] [project]

[2023-arXiv] Posterior Distillation Sampling, [paper] [project]

[2023-arXiv] EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth, [paper]

[2023-arXiv] GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions, [paper] [project]

[2023-arXiv] GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting, [paper] [project]

[2023-arXiv] Advances in 3D Neural Stylization: A Survey, [paper]

[2023-arXiv] Text-Guided Texturing by Synchronized Multi-View Diffusion, [paper]

[2023-arXiv] TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models, [paper] [project]

[2023-arXiv] ITEM3D: Illumination-Aware Directional Texture Editing for 3D Models, [paper]

[2023-arXiv] InstructP2P: Learning to Edit 3D Point Clouds with Text Instructions, [paper]

[2023-arXiv] FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly, [paper] [project]

[2023-arXiv] Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields, [paper]

[2023-arXiv] TextDeformer: Geometry Manipulation using Text Guidance, [paper]

[2023-arXiv] CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration, [paper]

[2023-arXiv] Vox-E: Text-guided Voxel Editing of 3D Objects, [paper] [project]

[2023-arXiv] DreamBooth3D: Subject-Driven Text-to-3D Generation, [paper] [project]

[2023-arXiv] RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models, [paper]

[2023-arXiv] Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields, [paper] [project]

[2023-arXiv] DreamEditor: Text-Driven 3D Scene Editing with Neural Fields, [paper]

[2023-arXiv] SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field, [paper] [project]

[2023-arXiv] SKED: Sketch-guided Text-based 3D Editing, [paper]

[2022-arXiv] LADIS: Language Disentanglement for 3D Shape Editing, [paper]

[2023-arXiv] MatFuse: Controllable Material Generation with Diffusion Models, [paper]

[2023-arXiv] Texture Generation on 3D Meshes with Point-UV Diffusion, [paper]

[2023-arXiv] Generating Parametric BRDFs from Natural Language Descriptions, [paper]

[2023-arXiv] Text-guided High-definition Consistency Texture Model, [paper]

[2023-arXiv] X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance, [paper] [project]

[2023-arXiv] Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion, [paper] [project]

[2023-arXiv] Text2Tex: Text-driven Texture Synthesis via Diffusion Models, [paper] [project]

[2023-arXiv] TEXTure: Text-Guided Texturing of 3D Shapes, [paper] [project]

[2022-arXiv] 3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models, [paper] [project]

[2022-arXiv] TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition, [paper] [project]

[2021-arXiv] Text2Mesh: Text-Driven Neural Stylization for Meshes, [paper] [project]

[2020-arXiv] Convolutional Generation of Textured 3D Meshes, [paper] [project]

2. Scene

2.1 Text-conditioned 3D Scene Generation

[2023-arXiv] CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting, [paper] [project]

[2023-arXiv] 4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling, [paper] [project]

[2023-arXiv] A Unified Approach for Text- and Image-guided 4D Scene Generation, [paper]

[2023-arXiv] Animate124: Animating One Image to 4D Dynamic Scene, [paper] [project]

[2023-arXiv] Pyramid Diffusion for Fine 3D Large Scene Generation, [paper] [project]

[2023-arXiv] LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes, [paper]

[2023-NeurIPS] Language-driven Scene Synthesis using Multi-conditional Diffusion Model, [paper] [project]

[2023-arXiv] DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation, [paper] [project]

[2024-3DV] RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation, [paper] [project]

[2023-arXiv] 3D-GPT: Procedural 3D Modeling with Large Language Models, [paper] [project]

[2023-arXiv] Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints, [paper]

[2023-arXiv] Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions, [paper] [project]

[2023-arXiv] DiffuScene: Scene Graph Denoising Diffusion Probabilistic Model for Generative Indoor Scene Synthesis, [paper] [project]

[2023-arXiv] LayoutGPT: Compositional Visual Planning and Generation with Large Language Models, [paper] [project]

[2023-arXiv] Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models, [paper] [project]

[2023-arXiv] Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields, [paper] [project]

[2023-arXiv] Towards Language-guided Interactive 3D Generation: LLMs as Layout Interpreter with Generative Feedback, [paper]

[2023-arXiv] Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes, [paper]

[2023-arXiv] Compositional 3D Scene Generation using Locally Conditioned Diffusion, [paper] [project]

[2023-arXiv] CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout, [paper]

[2023-arXiv] Text-To-4D Dynamic Scene Generation, [paper] [project]

[2022-arXiv] GAUDI: A Neural Architect for Immersive 3D Scene Generation, [paper] [project]

[2020-arXiv] SceneFormer: Indoor Scene Generation with Transformers, [paper]

[2020-arXiv] Static and Animated 3D Scene Generation from Free-form Text Descriptions, [paper] [project]

[2017-arXiv] SceneSeer: 3D Scene Design with Natural Language, [paper]

[2017-arXiv] SceneSuggest: Context-driven 3D Scene Design, [paper]

[2015-arXiv] Text to 3D Scene Generation with Rich Lexical Grounding, [paper]

2.2 Text-conditioned 3D Scene Editing

[2023-arXiv] Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training, [paper] [project]

[2023-arXiv] SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors, [paper] [project]

[2023-arXiv] ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context, [paper] [project]

[2023-arXiv] ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF, [paper]

[2023-arXiv] Text-driven Editing of 3D Scenes without Retraining, [paper] [project]

[2023-arXiv] Text2Scene: Text-driven Indoor Scene Stylization with Part-aware Details, [paper]

[2023-arXiv] CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer, [paper]

[2023-arXiv] OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields, [paper]

[2023-arXiv] InpaintNeRF360: Text-Guided 3D Inpainting on Unbounded Neural Radiance Fields, [paper]

[2023-arXiv] RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture, [paper]

[2023-arXiv] CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding, [paper]

[2023-arXiv] SceneScape: Text-Driven Consistent Scene Generation, [paper] [project]

3. Human

3.1 Text-conditioned 3D Human Generation

[2023-arXiv] Text-Guided 3D Face Synthesis - From Generation to Editing, [paper] [project]

[2023-arXiv] AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text, [paper] [project]

[2023-arXiv] Gaussian Shell Maps for Efficient 3D Human Generation, [paper] [project]

[2023-arXiv] HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting, [paper] [project]

[2023-arXiv] Deceptive-Human: Prompt-to-NeRF 3D Human Generation with 3D-Consistent Synthetic Images, [paper] [project]

[2023-NeurIPS] XAGen: 3D Expressive Human Avatars Generation, [paper] [project]

[2023-arXiv] HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation, [paper] [project]

[2023-arXiv] TECA: Text-Guided Generation and Editing of Compositional 3D Avatars, [paper] [project]

[2023-arXiv] Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model, [paper] [project]

[2023-arXiv] Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images, [paper]

[2023-arXiv] DreamHuman: Animatable 3D Avatars from Text, [paper] [project]

[2023-arXiv] TeCH: Text-guided Reconstruction of Lifelike Clothed Humans, [paper]

[2023-arXiv] Guide3D: Create 3D Avatars from Text and Image Guidance, [paper]

[2023-arXiv] Semantify: Simplifying the Control of 3D Morphable Models using CLIP, [paper]

[2023-arXiv] AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose, [paper]

[2023-arXiv] Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models, [paper]

[2023-arXiv] AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion, [paper]

[2023-arXiv] AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation, [paper] [project]

[2023-arXiv] Text-guided 3D Human Generation from 2D Collections, [paper] [project]

[2023-arXiv] High-Fidelity 3D Face Generation from Natural Language Descriptions, [paper]

[2023-arXiv] DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models, [paper] [project]

[2023-arXiv] DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance, [paper] [project]

[2023-arXiv] 3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion, [paper]

[2023-arXiv] StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation, [paper] [project]

[2023-arXiv] AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control, [paper] [project]

[2023-arXiv] Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation, [paper]

[2023-arXiv] AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation, [paper]

[2023-arXiv] Text-Conditional Contextualized Avatars For Zero-Shot Personalization, [paper]

[2023-arXiv] Text2Face: A Multi-Modal 3D Face Model, [paper]

[2023-arXiv] Towards Realistic Generative 3D Face Models, [paper] [project]

3.2 Text-conditioned 3D Human Editing

[2023-arXiv] PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via Denoised Score Distillation, [paper]

[2023-arXiv] FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields, [paper]

[2023-arXiv] ClipFace: Text-guided Editing of Textured 3D Morphable Models, [paper]

[2023-arXiv] AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars, [paper]

[2023-arXiv] Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions, [paper]

[2023-arXiv] Local 3D Editing via 3D Distillation of CLIP Knowledge, [paper]

[2023-arXiv] Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Mode, [paper]

[2023-arXiv] HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks, [paper]

[2023-arXiv] DreamWaltz: Make a Scene with Complex 3D Animatable Avatars, [paper]

[2023-arXiv] HeadSculpt: Crafting 3D Head Avatars with Text, [paper] [project]

[2022-arXiv] Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion, [paper] [project]

[2022-arXiv] AvatarGen: A 3D Generative Model for Animatable Human Avatar, [paper] [project]

[2022-arXiv] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars, [paper] [project]

[2023-arXiv] NeRF-Art: Text-Driven Neural Radiance Fields Stylization, [paper] [project]

[2023-arXiv] Text and Image Guided 3D Avatar Generation and Manipulation, [paper] [project]

Datasets

[2023-arXiv] T³Bench: Benchmarking Current Progress in Text-to-3D Generation, [paper] [project]

[2023-arXiv] Objaverse-XL: A Universe of 10M+ 3D Objects, [paper]

[2023-arXiv] Infinite Photorealistic Worlds using Procedural Generation, [paper] [project]

[2023-arXiv] Scalable 3D Captioning with Pretrained Models, [paper] [project]

[2023-arXiv] UniG3D: A Unified 3D Object Generation Dataset, [paper] [project]

[2023-arXiv] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation, [paper] [project]

[2023-arXiv] Objaverse: A Universe of Annotated 3D Objects, [paper] [project]

[2023-arXiv] RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars, [paper] [project]

Experts

Hao Su(UC San Diego): 3D Deep Learning

Matthias Nießner(TUM): 3D reconstruction, Semantic 3D Scene Understanding

GeorgeDu/text-conditioned-3d-generation