- cuQuantum (requires compute capability 7.0+)
- OpenMPI/MPICH
- cmake >= 3.18
- NCCL
- CUDA
We support two modes for simulation in Atlas. First one is distributed GPU-based simulation (USE_LEGION=OFF
). The other one is CPU-offload enabled simulation (USE_LEGION=ON
), which support simulating more qubits on a single machine. Note that the second mode hasn't been tested for multi-node execution.
In addition, please also replace all hard-coded paths (starting with /global/homes/m/mingkuan
) with your home directory.
cd deps/quartz/external/HiGHS
mkdir build
cd build
cmake ..
make -j 12
cd ../../../../.. # cd $TORQUE_HOME
mkdir build
cd build
# module load nccl # You may need this on Perlmutter
bash ../config/config.linux
make -j 12
There are some sbatch
scripts for running simulation using Atlas in scripts/perlmutter/bench
. Run them with:
sbatch xxx.sh
- Allocate nodes
salloc --nodes 2 -q regular --time 00:20:00 --constraint gpu --gpus-per-node 4 --account=YOUR_ACCOUNT
- Load modules
module load nccl
module load cudatoolkit
conda activate qs
export PATH=$PATH:$HiGHS_HOME/build/bin
export MPICH_GPU_SUPPORT_ENABLED=1
- Distributed GPU-based simulation launched with
srun
, for example:
srun -u \
--ntasks="$(( SLURM_JOB_NUM_NODES ))" \
--ntasks-per-node=1\
$TORQUE_HOME/build/examples/mpi-based/simulate --import-circuit qft --n 31 --local 28 --device 4 --use-ilp
- Create a Python 3.8 environment with PuLP:
conda create --name pulp python=3.8
conda activate pulp
pip install pulp
-
Make sure the
setenv("PYTHONPATH", ...)
inexamples/legion-based/test_sim_legion.cc
is pointing to the correct location. -
Build and run in interactive mode:
cd build
make -j 12
cd ../scripts/perlmutter/bench
salloc --nodes 1 -q regular --time 00:30:00 --constraint gpu --gpus-per-node 4 --account=YOUR_ACCOUNT
bash offload.sh # takes around 25 minutes
To run the scalability test on an AWS p3.8xlarge instance:
cd $TORQUE_HOME/build/examples/legion-based/`
./test -ll:gpu NUM_GPU -ll:fsize F_SIZE -ll:zsize Z_SIZE --local-qubits LOCAL_QUBITS_NUM --all-qubits ALL_QUBITS_NUM
-
-ll:gpu
: set the number of GPUs we have for the simulation. -
-ll:fsize
: total gpu memory we have (e.g., 15000) -
-ll:zsize
: zero-copy DRAM size (e.g., 100000) -
--local-qubits
: the number of local qubits (for a 16G GPU, 28 local qubits at most) -
--all-qubits
: the number of all the qubits.