tl;dr
Computational chemistry describes the use of computer modelling and simulation - including ab initio approaches based on quantum chemistry, and empirical approaches - to study the structures and properties of molecules and materials. Computational chemistry is also used to describe the computational techniques aimed at understanding the structure and properties of molecules and materials.
- Chemical reactions, kinetics
- Photochemistry
- Catalyst design
- Potential energy surface
- Vibrational and electronic spectroscopy
- Solid-state chemistry
- Drug discovery, biomolecular docking
- Cheminformatics
- Machine learning
- Quantum computing
- Conference: WATOC, ICQC, APATCC
- Journals: A list of well-known journals that regularly publish computational chemistry articles
- Forum: StackExchange, Matter Modeling
- Mailing list: CCL
- Core skills
- General technical skills
- Essential skills for method development
- Essential skills for software development
- Essential skills for HPC
- Essential skills for coding GPU
- Essential skills for machine learning (bonus)
- Essential skills for machine learning chemistry (bonus)
- Understanding of theoretical principles including kinetics, thermodynamics, and electronic structure
- Various levels of programming, code development, and software architecture skills
- Problem-solving skills and an interest in solving basic and applied research problems
- Skills in adapting and integrating computer software to solve new categories of problems
- Critical thinking for analyzing and interpreting computational results and statistical data
- Googling and stackoverflowing 🙃
- For Windows:
- Install programs and modify system variables such as
PATH
- Install Nvidia CUDA toolkit and driver
- Setup VPN and local network
- Setup dual boost for Linux, or install WSL for Ubuntu
- Install programs and modify system variables such as
- For Linux and macOS:
- Basic/intermediate commands:
ls
,cd
,cp
,rm
,ssh
,scp
, and many more - Know some important files/folders:
.bashrc
,.ssh
- Understand some environmen variables:
$PATH
,$LD_LIBRARY_PATH
- Basic/intermediate commands:
- Scripting programming language
- Bash, awk, perl, Python + Jupyter notebook
- Cluster / HPC
- Understand terminology: master node, compute node, scheduler, CPU cores, processes, memory management
- Scheduler: Slurm, PBS, SGE
- Software manager: module (
avail
,load
,unload
,switch
)
- Quantum chemistry software
- Commercial: Gaussian, Q-Chem, ADF, MOLPRO, MOLCAS, TURBOMOLE and many more
- Non-commercial: PySCF, Psi4, OpenMOLCAS, GAMESS, ORCA, NWChem, DIRAC, DALTON, CP2K, LAMMPS, VASP, Quantum Espresso and many more
- Full list is here
- Graphic visualization
- JMol, Molden, Gaussview, Avogadro, UCSF Chimera, VMD, Ovito, PyMol
- 2D and 3D plots
- Other useful tools
- ASE, MDTraj, Pymatgen, RDKit, OpenBabel
- Writing
- Microsoft Word
- LaTeX
- Compiler: pdflatex, xelatex, lualatex
- Distribution: TeX Live, MikTeX
- Editor: OverLeaf, TeXstudio, Texmaker
- Presentation
- Powerpoint
- LaTeX (LuaLaTeX) Beamer
- Linear algebra
- Vectors and matrices
- Geenral properties: Complex conjugate, transpose and conjugate transpose
- Diagonalization
- Matrix multiplication (Dense and sparse)
- Operators & commutators
- Eigen-problem
- Jacobi iteration
- Eigenvalue & eigenvector
- Singular value decomposition
- Optimization algorithms
- Numerical analysis
- Vectors and matrices
- Calculus
- Numerical methods
- Differential equation & ODE
- Vector calculus
- Data fitting
- Taylor expansion
- Polynomial interpolation
- Least squeare approximation
- Finding roots
- Bilinear interpolation
- Newton-Raphson method
- PDE
- Quantum chemistry
- Wavefunctions and molecular orbitals
- Wavefunction and its properties, Hilbert space, linearity, Bra-Ket notation
- Born-Oppenheimer approximation, Slater determinant, linear combination of atomic orbtials (LCAO)
- Basis functions, basis sets (Gaussian-type orbitals, GTOs)
- Ab initio (wavefunction-based) method
- HF, MPn, CI, CC, MRCI, MSSCF, CASSCF, CASPT2
- DMRG (matrix product states), FCIQMC
- Density functional theory method
- KS equation, exchange and correlation functionals
- (Real-time) TDDFT
- Gaussian and plane wave method (GPW, GAPW)
- Pseudopotential
- Effective core potential (ECP)
- Semi-empirical
- AM1, PM3, PM6
- Tight-binding methods (e.g. DFTB, xTB)
- Excited state, transition state, atomic/molecular bond
- Adiabatic state, non-adiabatic state, Delta-SCF, constrained DFT
- Surface hopping, quantum dynamics
- Vibrational spectroscopy
- IR, Raman
- Linear response (first, second response)
- Perturbation theory
- Ab initio molecular dynamic (AIMD)
- Car-Parrinello MD (CPMD)
- Born-Oppenheimer MD (BOMD)
- Other methods
- QM/MM
- Energy decomposition analysis
- Wavefunctions and molecular orbitals
- Molecular dynamics
- Classical mechanics
- Force field
- Statistical mechanics
- Enhanced sampling: Free energy, unbrella sampling, etc.
- Monte Carlo method
- Material simulation
- Multiscale modeling
- Coarse grained
- Condense matter simulation
- Programming (for mathematical proof)
- Scripting language: Bash, Python
- Intensive subroutine with OOP: C++, Fortran
- Symbolic programming (Mathematica, SymPy)
- Code editors
- Vi/Vim, Nano
- VS Code, Atom, Eclipse, Sublime, Notepad++
- File format
- XML, JSON
- General programming skills
- Type of variables
- Loops and conditional statement
- Input/output
- High-level programming
- Python
- Pip and conda: Python helper
- NumPy: Array (vector, matrix) computation
- Numba: JIT compiler for NumPy
- Jax: autograd of NumPy array
- SciPy: a collection of math functions/routines
- Scikit-learn: statistics routines, optimization, curve fitting
- Intel Scikit-learn is 10x faster than the standard one
- Matplotlib / Plotly for plotting graph
- Theano: numerical computation
- SCOOP: distributed modules for parallel programming
- NetworkX: Graph library
- Python
- Low-level programming
- C
- Function, pointer, storage class
- Enum, struct, union
- Preprocessor
- Operator, memory management, array
- File handling
- C++
- C++ 11 or newer
- Type of variable: signed, unsigned, long, double, etc.
- Loops, conditional statement
- Standard libraries: vector, rand
- Understanding header (
.hpp
) and source file (.cpp
or.cc
) - Preprocessor (
#if
,#ifdef
,#ifndef
,#define
, etc.) - Function, class, struct, template
- Declaration
- namespace, const, attribute, pointer, pass by reference, static_assert
- Initialization
- Misc: casting, lambda expression, encapsulation, file handling, exception handling
- Fortran
- Learn either F77 or F90 or modern fortran (2003, 2008, 2018)
- Module, subroutine, function
- Array (allocatable and multidimentional) and string
- Operator overloading
- Flow control
- Derived type
- Callback
- Interfacing to other language e.g. Python or C++
- GNU library
- GSL
- Many more libraries here
- C
- Memory allocation
- Stack, heap, global memory
- Math libraries
- BLAS (OpenBLAS)
- LAPACK for linear algebra
- ScaLAPACK - a higher level LAPACK
- Intel MKL (Intel oneAPI)
- FFTW: for computing the discrete Fourier transform in one or more dimensions, real and complex data
- Eigen: linear algebra library
- Boost: a collection of C++ functions e.g.
regex
,serialization
- QM libraries
- libxc: XC function library
- libint: For computing Gaussian integral
- libcint: general GTO integrals
- Code optimization
- Benchaming/scaling
- Complexity (Big O)
- GNU
- Static and dynamic libraries
- Archive
- Compiling (g++, gcc) and linking (ld)
- Useful flags for compiler and linker e.g.
-O2
,-O3
,-fPIC
- Compilng tools
- autoconf
- configure
- Make, cmake, automake
- Debugging
- gdb for general debugging
- Valgrind for memory leak analysis
- Git (source code control)
- Basic/intermediate commands
- GitHub & GitLab
- Documentation
- Sphinx (for markdown and reStructuredText)
- Doxygen
- Architecture
- Memory management
- Threading, multithreading
- Block
- Parallel computing (SPMD)
- Shared memory: OpenMP
- Distributed memory: MPI
- Implementations: OpenMPI, Intel MPI, MVAPICH
- Intel ecosystem
- OpenMP compiler: icc, ifort
- MPI compiler: mpicc, mpiicc (for Intel C compiler), mpicxx (for C++), mpiifort (for Fortran)
- Cloud computing (bonus)
- Server and database
- Networking
- Intermediate/advanced C or C++ skills
- Programming model: Kernels, thread hierarchy, memory hierarchy, heterogeneous hierarchy, asynchronous SIMT
- CUDA
- Understand CUDA operation:
- Declare and allocate host and device memory.
- Initialize host data.
- Transfer data from the host to the device.
- Execute one or more kernels.
- Transfer results from the device to the host.
- CUDA C and CUDA C++ API
- Compiler: nvcc
- Understand CUDA operation:
- Basic math: linear algebra and calculus
- Programming
- Python, R, Julia, Matlab
- TensorFlow, PyTorch, Scikit-learn
- Python lib
- NumPy
- Pandas
- Terminology: regression, classification, descriptor, feature, kernel, activation function
- Data analysis/engineering: EDA, ETL
- Graphical representation
- Histogram, bar plot, heatmaps
- ML algorithms
- Decision tree
- Random forest
- Support vector machine
- Principal component analysis
- Kernel-ridge method
- Neural network
- Feedforward NN
- Autoencoder
- CNN
- RNN (LSTM)
- GNN
- Adversarial NN
- GAN
- Feedforward NN
- Model training and optimization
- Hyperparameter optimization
- Techniques to prevent overfittingTechniques
- Data augmentation, early stopping, regularization, dropout, batch normalization
- Deploying model
- Atomic and molecular representation
- Structural-based: SMILES, one-hot encoding, 1D/2D fingerprint
- Electronic-based:
- Coulomb matrix, BoB
- Sine matrix, Ewald sum matrix
- Smooth Overlap of Atomic Positions (SOAP)
- Symmetry and Gaussian functions, and many more
- Many-body tensor representation
- Configurational space, chemical space
- Target prediction
- Energy and force
- Molecular properties:
- (transition) dipole moment, polarizability
- Electron transfer matrix element
- Database
- PubChem
- GDB
- DrugBank
- QM: QM7, QM8, QM9