DeePMD-kit(PaddlePaddle backend)

Important

本项目为 DeePMD-kit 的 PaddlePaddle 版本，修改了部分代码，使其可以以 PaddlePaddle(GPU) 为后端进行训练、评估、模型导出、LAMMPS 推理等任务。案例支持情况如下所示。

example	Train	Test	Export	LAMMPS
water/se_e2_a	✅	✅	✅	✅
spin/se_e2_a	✅	✅	✅	TODO

1. 环境安装

安装 tensorflow 2.12

由于 DeepMD-kit 大量代码基于 tensorflow 编写，暂时没有完全迁移到 PaddlePaddle 上，因此运行前需要安装 tensorflow 2.12。

# Current stable release for CPU and GPU(CPU和GPU使用同一个命令，不再以安装tensorflow-gpu的形式安装GPU版本)
pip install tensorflow==2.12 -i https://pypi.tuna.tsinghua.edu.cn/simple

安装 paddlepaddle-develop

参考 Paddle 官网，安装对应机器环境的 GPU 版 paddlepaddle-develop

安装 deepmd-kit

git clone https://github.com/deepmodeling/deepmd-kit.git -b paddle2
cd deepmd-kit
# 以 editable 的方式安装，方便调试
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple

2. 运行具体功能

2.1 安装 python 自定义算子

在运行训练、评估、导出静态图模型这 3 个功能之前，需要先安装 python 端的自定义算子库 paddle_deepmd_lib，LAMMPS 推理功能由于单独使用自定义算子的源代码进行联合编译，因此不需要安装python 端的自定义算子。

cd ./source/lib/paddle_src
python custom_op_install.py install

安装完毕之后建议运行如下命令测试一下 python 端自定义算子在 CPU、GPU 上的正确性：

wget -nc https://paddle-org.bj.bcebos.com/paddlescience/deepmd/deepmd_custom_op_test_data.tar
tar -xf deepmd_custom_op_test_data.tar
export UNITTEST_DIR=$PWD/deepmd_custom_op_test_data
python ./custom_op_test.py

除少量 deprecated 相关的警告外，如果输出全部都是 True，则说明 python 端自定义算子安装成功并且运行正常。

2.2 训练

# 进入案例目录
cd examples/water/se_e2_a
# 运行 GPU 训练
dp train ./input.json

# 运行 CPU 训练(速度极慢，不推荐运行，仅作为跑通测试)
dp train ./input.json --cpu

2.3 评估

# 进入案例目录
cd examples/water/se_e2_a
# 设置好权重文件路径
WEIGHT_PATH="path/to/your_model.pdparams"
# 运行评估
dp test -m ${WEIGHT_PATH} -s ../data/data_3/ -n 30

2.4 导出静态图模型

# 进入案例目录
cd examples/water/se_e2_a
# 设置权重文件路径
WEIGHT_PATH="path/to/your_model.pdparams"
# 设置导出的静态图模型路径前缀(不需要加.pdmodel或.pdiparams后缀)
DUMP_PATH="path/to/your_dump"
# 导出静态图模型
dp freeze -i ${WEIGHT_PATH} -o ${DUMP_PATH}

2.5 在 LAMMPS(GPU) 中推理

修改 examples/water/lmp/in.lammps 文件，将 pair_style deepmd 后面的路径改为 2.3 导出静态图模型 这一章节内设置好的 DUMP_PATH 的值(末尾不需要加 .pdmodel 或 .pdiparams)
```
pair_style  deepmd "path/to/your_dump"
```

编译 Paddle，得到未裁剪算子的 Paddle 推理库(LAMMPS 推理涉及到 xxx_grad 反向算子，因而在此需要手动编译 Paddle，得到未裁剪的 Paddle 推理库)

git clone https://github.com/PaddlePaddle/Paddle.git -b develop
cd Paddle
mkdir build
cd build
# 推荐使用 Anaconda 安装 python3.9 环境，并在该环境下执行编译命令
cmake .. -DPY_VERSION=3.9 -DWITH_GPU=ON -WITH_DISTRIBUTE=ON -DWITH_TESTING=ON -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# 编译完成后，确认 paddle_inference_install_dir 推理库是否存在
ls build/paddle_inference_install_dir

Paddle 推理库和 LAMMPS 联合编译安装，并运行推理

# 下载并解压 lammps 源码
wget https://github.com/lammps/lammps/archive/stable_2Aug2023_update1.tar.gz
tar xf stable_2Aug2023_update1.tar.gz

# LAMMPS_DIR 设置为 LAMMPS 的安装目录
export LAMMPS_DIR="/path/to/lammps-stable_2Aug2023_update1"

# 设置推理时的 GPU 卡号
export CUDA_VISIBLE_DEVICES=0

# PADDLE_DIR 设置为第二步 clone下来的 Paddle 目录
export PADDLE_DIR="/path/to/Paddle"

# DEEPMD_DIR 设置为本项目的根目录
export DEEPMD_DIR="/path/to/deepmd-kit"

# PADDLE_INFERENCE_DIR 设置为第二步编译得到的 Paddle 推理库目录
export PADDLE_INFERENCE_DIR="/path/to/paddle_inference_install_dir"

# TENSORFLOW_DIR 设置为 tensorflow 的安装目录，可用 pip show tensorflow 确定
export TENSORFLOW_DIR="/path/to/tensorflow"

export LD_LIBRARY_PATH=${PADDLE_DIR}/paddle/fluid/pybind/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${DEEPMD_DIR}/deepmd/op:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${PADDLE_INFERENCE_DIR}/paddle/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${PADDLE_INFERENCE_DIR}/third_party/install/mkldnn/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${PADDLE_INFERENCE_DIR}/third_party/install/mklml/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${DEEPMD_DIR}/source/build:$LD_LIBRARY_PATH
export LIBRARY_PATH=${DEEPMD_DIR}/deepmd/op:$LIBRARY_PATH

cd ${DEEPMD_DIR}/source
# rm -rf build # 若改动CMakeLists.txt，则需要打开该注释
mkdir build
cd build

# DEEPMD_INSTALL_DIR 设置为 deepmd-lammps 的目标安装目录，可自行设置任意路径
export DEEPMD_INSTALL_DIR="path/to/deepmd_root"

# 开始编译
cmake -DCMAKE_INSTALL_PREFIX=${DEEPMD_INSTALL_DIR} \
    -DUSE_CUDA_TOOLKIT=TRUE \
    -DTENSORFLOW_ROOT=${TENSORFLOW_DIR} \
    -DPADDLE_LIB=${PADDLE_INFERENCE_DIR} \
    -DFLOAT_PREC=low ..
make -j4 && make install
make lammps

cd ${LAMMPS_DIR}/src/
\cp -r ${DEEPMD_DIR}/source/build/USER-DEEPMD .
make yes-kspace
make yes-extra-fix
make yes-user-deepmd
make serial -j
export PATH=${LAMMPS_DIR}/src:$PATH

cd ${DEEPMD_DIR}/examples/water/lmp

lmp_serial -in in.lammps

[可选]直接运行推理

若已完成 3. Paddle 推理库和 LAMMPS 联合编译安装，并运行推理，且没有对 C++ 代码进行修改，则无需重新联合编译 Paddle 推理库和 LAMMPS，直接运行以下命令即可开始推理。

# 设置推理时的 GPU 卡号
export CUDA_VISIBLE_DEVICES=0
# LAMMPS_DIR 设置为 LAMMPS 的安装目录
export LAMMPS_DIR="/path/to/lammps-stable_2Aug2023_update1"

cd ${LAMMPS_DIR}/src/
export PATH=${LAMMPS_DIR}/src:$PATH

cd ${DEEPMD_DIR}/examples/water/lmp

lmp_serial -in in.lammps

DeePMD-kit Manual

About DeePMD-kit
Download and install
Use DeePMD-kit
Code structure
Troubleshooting

About DeePMD-kit

DeePMD-kit is a package written in Python/C++, designed to minimize the effort required to build deep learning-based model of interatomic potential energy and force field and to perform molecular dynamics (MD). This brings new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems.

For more information, check the documentation.

Highlights in DeePMD-kit v2.0

Model compression. Accelerate the efficiency of model inference 4-15 times.
New descriptors. Including se_e2_r and se_e3.
Hybridization of descriptors. Hybrid descriptor constructed from the concatenation of several descriptors.
Atom type embedding. Enable atom-type embedding to decline training complexity and refine performance.
Training and inference of the dipole (vector) and polarizability (matrix).
Split of training and validation dataset.
Optimized training on GPUs.

Highlighted features

interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient, in addition, Tensorboard can be used to visualize training procedures.
interfaced with high-performance classical MD and quantum (path-integral) MD packages, i.e., LAMMPS and i-PI, respectively.
implements the Deep Potential series models, which have been successfully applied to finite and extended systems including organic molecules, metals, semiconductors, insulators, etc.
implements MPI and GPU supports, making it highly efficient for high-performance parallel and distributed computing.
highly modularized, easy to adapt to different descriptors for deep learning-based potential energy models.

License and credits

The project DeePMD-kit is licensed under GNU LGPLv3.0. If you use this code in any future publications, please cite the following publications for general purpose:

Han Wang, Linfeng Zhang, Jiequn Han, and Weinan E. "DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics." Computer Physics Communications 228 (2018): 178-184.
Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, Han Wang. "DeePMD-kit v2: A software package for Deep Potential models." arXiv:2304.09409.

In addition, please follow the bib file to cite the methods you used.

Deep Potential in a nutshell

The goal of Deep Potential is to employ deep learning techniques and realize an inter-atomic potential energy model that is general, accurate, computationally efficient and scalable. The key component is to respect the extensive and symmetry-invariant properties of a potential energy model by assigning a local reference frame and a local environment to each atom. Each environment contains a finite number of atoms, whose local coordinates are arranged in a symmetry-preserving way. These local coordinates are then transformed, through a sub-network, to so-called atomic energy. Summing up all the atomic energies gives the potential energy of the system.

The initial proof of concept is in the Deep Potential paper, which employed an approach that was devised to train the neural network model with the potential energy only. With typical ab initio molecular dynamics (AIMD) datasets this is insufficient to reproduce the trajectories. The Deep Potential Molecular Dynamics (DeePMD) model overcomes this limitation. In addition, the learning process in DeePMD improves significantly over the Deep Potential method thanks to the introduction of a flexible family of loss functions. The NN potential constructed in this way reproduces accurately the AIMD trajectories, both classical and quantum (path integral), in extended and finite systems, at a cost that scales linearly with system size and is always several orders of magnitude lower than that of equivalent AIMD simulations.

Although highly efficient, the original Deep Potential model satisfies the extensive and symmetry-invariant properties of a potential energy model at the price of introducing discontinuities in the model. This has negligible influence on a trajectory from canonical sampling but might not be sufficient for calculations of dynamical and mechanical properties. These points motivated us to develop the Deep Potential-Smooth Edition (DeepPot-SE) model, which replaces the non-smooth local frame with a smooth and adaptive embedding network. DeepPot-SE shows great ability in modeling many kinds of systems that are of interest in the fields of physics, chemistry, biology, and materials science.

In addition to building up potential energy models, DeePMD-kit can also be used to build up coarse-grained models. In these models, the quantity that we want to parameterize is the free energy, or the coarse-grained potential, of the coarse-grained particles. See the DeePCG paper for more details.

See our latest paper for details of all features.

Download and install

Please follow our GitHub webpage to download the latest released version and development version.

DeePMD-kit offers multiple installation methods. It is recommended to use easy methods like offline packages, conda and docker.

One may manually install DeePMD-kit by following the instructions on installing the Python interface and installing the C++ interface. The C++ interface is necessary when using DeePMD-kit with LAMMPS, i-PI or GROMACS.

Use DeePMD-kit

A quick start on using DeePMD-kit can be found here.

A full document on options in the training input script is available.

Advanced

Code structure

The code is organized as follows:

data/raw: tools manipulating the raw data files.
examples: examples.
deepmd: DeePMD-kit python modules.
source/api_cc: source code of DeePMD-kit C++ API.
source/ipi: source code of i-PI client.
source/lib: source code of DeePMD-kit library.
source/lmp: source code of Lammps module.
source/gmx: source code of Gromacs plugin.
source/op: TensorFlow op implementation. working with the library.

Troubleshooting

Contributing

See DeePMD-kit Contributing Guide to become a contributor! 🤓

HydrogenSulfate/deepmd-kit