/SparseDiff

Official implementation for 'Sparse denoising diffusion for large graph generation'

Primary LanguagePythonMIT LicenseMIT

Sparse denoising diffusion for large graph generation

Official code for the paper, "Sparse Training of Discrete Diffusion Models for Graph Generation," available here.

Checkpoints to reproduce the results can be found at this link. Please refer to the updated version of our paper on arXiv.

Environment installation

This code was tested with PyTorch 2.0.1, cuda 11.8 and torch_geometrics 2.3.1

  • Download anaconda/miniconda if needed

  • Create a rdkit environment that directly contains rdkit:

    conda create -c conda-forge -n sparse rdkit=2023.03.2 python=3.9

  • conda activate sparse

  • Check that this line does not return an error:

    python3 -c 'from rdkit import Chem'

  • Install graph-tool (https://graph-tool.skewed.de/):

    conda install -c conda-forge graph-tool=2.45

  • Check that this line does not return an error:

    python3 -c 'import graph_tool as gt'

  • Install the nvcc drivers for your cuda version. For example:

    conda install -c "nvidia/label/cuda-11.8.0" cuda

  • Install a corresponding version of pytorch, for example:

    pip3 install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118

  • Install other packages using the requirement file:

    pip install -r requirements.txt

  • Install mini-moses:

    pip install git+https://github.com/igor-krawczuk/mini-moses

  • Run:

    pip install -e .

  • Navigate to the ./sparse_diffusion/analysis/orca directory and compile orca.cpp:

    g++ -O2 -std=c++11 -o orca orca.cpp

Run the code

  • All code is currently launched through python3 main.py. Check hydra documentation (https://hydra.cc/) for overriding default parameters.
  • To run the debugging code: python3 main.py +experiment=debug.yaml. We advise to try to run the debug mode first before launching full experiments.
  • To run a code on only a few batches: python3 main.py general.name=test.
  • You can specify the dataset with python3 main.py dataset=guacamol. Look at configs/dataset for the list of datasets that are currently available
  • You can specify the edge fraction (denoted as $\lambda$ in the paper) with python3 main.py model.edge_fraction=0.2 to control the GPU-usage

Cite the paper

@misc{qin2023sparse,
      title={Sparse Training of Discrete Diffusion Models for Graph Generation}, 
      author={Yiming Qin and Clement Vignac and Pascal Frossard},
      year={2023},
      eprint={2311.02142},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Troubleshooting

PermissionError: [Errno 13] Permission denied: 'SparseDiff/sparse_diffusion/analysis/orca/orca': You probably did not compile orca.