/CLIPMesh

Unofficial implementation of CLIPMesh (https://arxiv.org/abs/2203.13333)

Primary LanguagePythonMIT LicenseMIT

CLIPMesh

This is an unofficial implementation of CLIPMesh (https://arxiv.org/abs/2203.13333), a method for text2mesh using CLIP. This repo is based on CLIPMesh-SMPLX and uses nvdiffmodeling for differentiable mesh rendering.

The results generated using this repo are currently hit-or-miss: for the example prompts shown in the paper and on the website, some results are significantly worse than those shown on the website/paper, but some are comparable. Final mesh quality is not great but it seems to be comparable to Figure 3(i) and (j) in the paper. For details on how this implementation may vary from the paper, see the Issues page. If you have any suggestions or find any particularly good hyperparameters feel free to open an issue or PR and I'll gladly merge them in.

Install

# Clone recursively
git clone --recurse-submodules git@github.com:cpacker/CLIPMesh.git
cd CLIPMesh

# Setup pytorch and cudatoolkit using conda
conda create -n clipmesh-py37 python=3.7
conda activate clipmesh-py37
conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=10.2 -c pytorch

# Install the rest of the deps
pip install -r requirements.txt

Usage

Example with a prompt from the paper that works well:

python clipmesh.py --text "a matte painting of a bonsai tree; trending on artstation"

image

Example with a prompt from the paper that is significantly worse:

python clipmesh.py --text "a cowboy hat"

image

Options

  • The config file configs/arxiv.yaml intends to mimic the algorithm described in the paper as closely as possibly (there currently is no official public implementation, so many details might be off).
  • The default config (loaded w/o a --path arg) has a few differences from the algorithm described in the paper, e.g., the use of a TV weight on the texture maps (used in the CLIPMesh-SMPLX repo), and negative prompts for "face" and "text" (without them, I found CLIP had a tendency to "stamp" faces and text into textures as well as geometry).
  • configs/improved.yaml has more experimental changes with the goal of matching the performance in the paper (i.e., creating reasonable looking objects for all the the prompts shown in the paper and website).

To override a setting from the config file, simple pass the setting as an extra argument:

# To see all options:
python clipmesh.py --help

Example prompts

Prompts to try from the paper
(Figure 2) "a christmas tree with a star on top"
(Figure 3a) "a 🛸"
(Figure 3b) "thors hammer"
(Figure 3c) "a red and blue fire hydrant with flowers round it."
(Figure 3d) "a cowboy hat"
(Figure 3e) "a red chair"
(Figure 3g) "a matte painting of a bonsai tree; trending on artstation"
Prompts to try from the website
an armchair in the shape of an avocado
a lamp shade
a wooden table
a 🥞
a colorful crotchet candle
a pyramid of giza
a professional high quality emoji of a lovestruck cup of boba.
matte painting of a bonsai tree; trending on artstation
a red and blue fire hydrant with flowers around it.
a cowboy hat
a redbull can
a UFO
a milkshake
salvador dali
a table with oranges on it

License and acknolwedgements

This codebase was built using the CLIPMesh-SMPLX repo (which is also MIT licensed), and similarly makes heavy use of nvdiffmodeling, which uses the NVIDIA Source Code License. For more details regarding the NVIDIA license please visit their repo.