/mace

MACE implementation in PyTorch

Primary LanguagePython

MACE

This repository contains the MACE reference implementation developed by Ilyes Batatia, Gregor Simm, and David Kovacs.

Installation

Requirements:

conda installation

If you do not have CUDA pre-installed, it is recommended to follow the conda installation process:

# Create a virtual environment and activate it
conda create mace_env
conda activate mace_env

# Install PyTorch
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c conda-forge

# Clone and install MACE (and all required packages), use token if still private repo
git clone git@github.com:ACEsuit/mace.git 
pip install ./mace

pip installation

To install via pip, follow the steps below:

# Create a virtual environment and activate it
python -m venv mace-venv
source mace-venv/bin/activate

# Install PyTorch (for example, for CUDA 10.2 [cu102])
pip install torch==1.8.2 --extra-index-url "https://download.pytorch.org/whl/lts/1.8/cu102"

# Clone and install MACE (and all required packages)
git clone git@github.com:ACEsuit/mace.git
pip install ./mace

Note: The homonymous package on PyPI has nothing to do with this one.

Usage

Training

To train a MACE model, you can use the run_train.py script:

python ./mace/scripts/run_train.py \
    --name="MACE_model" \
    --train_file="train.xyz" \
    --valid_fraction=0.05 \
    --test_file="test.xyz" \
    --config_type_weights='{"Default":1.0}' \
    --E0s='{1:-13.663181292231226, 6:-1029.2809654211628, 7:-1484.1187695035828, 8:-2042.0330099956639}' \
    --model="MACE" \
    --hidden_irreps='128x0e + 128x1o' \
    --r_max=5.0 \
    --batch_size=10 \
    --max_num_epochs=1500 \
    --swa \
    --start_swa=1200 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --restart_latest \
    --device=cuda \

To give a specific validation set, use the argument --valid_file. To set a larger batch size for evaluating the validation set, specify --valid_batch_size.

To control the model's size, you need to change --hidden_irreps. For most applications, the recommended default model size is --hidden_irreps='256x0e' (meaning 256 invariant messages) or --hidden_irreps='128x0e + 128x1o'. If the model is not accurate enough, you can include higher order features, e.g., 128x0e + 128x1o + 128x2e, or increase the number of channels to 256.

It is usually preferred to add the isolated atoms to the training set, rather than reading in their energies through the command line like in the example above. To label them in the training set, set config_type=IsolatedAtom in their info fields. If you prefer not to use or do not know the energies of the isolated atoms, you can use the option --E0s="average" which estimates the atomic energies using least squares regression.

If the keyword --swa is enabled, the energy weight of the loss is increased for the last ~20% of the training epochs (from --start_swa epochs). This setting usually helps lower the energy errors.

The precision can be changed using the keyword --default_dtype, the default is float64 but float32 gives a significant speed-up (usually a factor of x2 in training).

The keywords --batch_size and --max_num_epochs should be adapted based on the size of the training set. The batch size should be increased when the number of training data increases, and the number of epochs should be decreased. An heuristic for initial settings, is to consider the number of gradient update constant to 200 000, which can be computed as $\text{max-num-epochs}*\frac{\text{num-configs-training}}{\text{batch-size}}$.

Evaluation

To evaluate your MACE model on an XYZ file, run the eval_configs.py:

python3 ./mace/scripts/eval_configs.py \
    --configs="your_configs.xyz" \
    --model="your_model.model" \
    --output="./your_output.xyz"

Tutorial

You can run our Colab tutorial to quickly get started with MACE.

Development

We use black, isort, pylint, and mypy. Run the following to format and check your code:

bash ./scripts/run_checks.sh

We have CI set up to check this, but we highly recommend that you run those commands before you commit (and push) to avoid accidentally committing bad code.

We are happy to accept pull requests under an MIT license. Please copy/paste the license text as a comment into your pull request.

References

If you use this code, please cite our papers:

@misc{Batatia2022MACE,
  title = {MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields},
  author = {Batatia, Ilyes and Kov{\'a}cs, D{\'a}vid P{\'e}ter and Simm, Gregor N. C. and Ortner, Christoph and Cs{\'a}nyi, G{\'a}bor},
  year = {2022},
  number = {arXiv:2206.07697},
  eprint = {2206.07697},
  eprinttype = {arxiv},
  doi = {10.48550/ARXIV.2206.07697},
  archiveprefix = {arXiv}
}
@misc{Batatia2022Design,
  title = {The Design Space of E(3)-Equivariant Atom-Centered Interatomic Potentials},
  author = {Batatia, Ilyes and Batzner, Simon and Kov{\'a}cs, D{\'a}vid P{\'e}ter and Musaelian, Albert and Simm, Gregor N. C. and Drautz, Ralf and Ortner, Christoph and Kozinsky, Boris and Cs{\'a}nyi, G{\'a}bor},
  year = {2022},
  number = {arXiv:2205.06643},
  eprint = {2205.06643},
  eprinttype = {arxiv},
  doi = {10.48550/arXiv.2205.06643},
  archiveprefix = {arXiv}
 }

Contact

If you have any questions, please contact us at ilyes.batatia@ens-paris-saclay.fr.

For bugs or feature requests, please use GitHub Issues.

License

MACE is published and distributed under the Academic Software License v1.0 .

Initialize dataset and run expermient

git clone https://github.com/davkovacs/BOTNet-datasets.git
ls BOTNet-datasets/dataset_3BPA
python ./mace/scripts/run_train.py \
  --name="MACE_model" \
  --train_file="BOTNet-datasets/dataset_3BPA/train_300K.xyz" \
  --valid_fraction=0.05 \
  --test_file="BOTNet-datasets/dataset_3BPA/test_300K.xyz" \
  --E0s='{1:-13.663181292231226, 6:-1029.2809654211628, 7:-1484.1187695035828, 8:-2042.0330099956639}' \
  --model="MACE" \
  --hidden_irreps='128x0e + 128x1o' \
  --r_max=5.0 \
  --batch_size=10 \
  --max_num_epochs=1500 \
  --swa \
  --start_swa=1200 \
  --ema \
  --ema_decay=0.99 \
  --amsgrad \
  --default_dtype="float32" \
  --device=cuda \
  --seed=123
  | tee mace_original.log &