QMCPACK

Introduction

QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids.

We study QMCPACK version 474062068a9f6348dbf7d55be7d1bd375c24f1fe.

There are a bunch of packages required to compiled QMCPACK, including clang, OpenMP (offloading), HDF5, FFTW, and BOOST. These packages can be installed directly via spack.

To compile QMCPACK, we pass the following variables to cmake:

CMAKE_C_COMPILER=mpicc
CMAKE_CXX_COMPILER=mpicxx
ENABLE_OFFLOAD=ON
USE_OBJECT_TARGET=ON
OFFLOAD_ARCH=<gpu-arch>
ENABLE_CUDA=1
CUDA_ARCH=<gpu-arch>
CUDA_HOST_COMPILER=`which gcc`
QMC_DATA=<path/to/qmc/data>
ENABLE_TIMERS=1

The following environment variables are also required:

export OMPI_CC=clang
export OMPI_CXX=clang++

Profiling

First follow the instructions in tests/performance/NiO/README to enable and run the NiO tests. The configuration file used is Nio-fcc-S1-dmc.xml under the batched_driver folder.

At runtime, we use four worker threads (export OMP_NUM_THREADS=4). For a small scale run, one can adjust control variables such as warmupSteps to reduce execution time.

The data flow pattern can be profiled directly using gvprof. For the value pattern mode, one has to find the interesting function’s names and use gvprof’s whitelist to focus on these functions.

Optimization

  • data_flow - redundant values

MatrixDelayedUpdateCUDA.h: 627. This line is often copying the same base pointers to the arrays on the GPU. Though this is not be a performance bottleneck for the current workload, it might be worth attention once the number of arrays increases.