PyTorch
Introduction
PyTorch is a popular machine learning framework.
We use PyTorch version f5788898a928cb2489926c1a5418c94c598c361b
. We study resnet50
, bert
, deepwave
models.
We apply the following commands to compile PyTorch from source.
spack install miniconda3
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses
conda install -c pytorch magma-cuda110
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export USE_CUDA=1
export REL_WITH_DEB_INFO=1
export MAX_JOBS=16
export USE_NINJA=OFF
python setup.py install
resnet
We get the resnet
example from the pytorch benchmark repo.
To ease the installtion, we provide 1-spatial-convolution-model.py
and 1-spatial-convolution-unit.py
to check layer-wise and end-to-end performance.
deepwave
We provide the instructions for installing deepwave here.
To ease checking the problematic kernel, we provide 2-replication-pad3d.py
script which only has a single ReplicationPad3d
kernel.
bert
We get the reset
example from the pytorch benchmark.
To ease checking the problematic kernel, we provide 3-embedding-unit.py
script which only has a single Embedding
kernel.
Profiling
Profiling a Python application takes extra steps than a normal application. We have a general guide to profile application in the FAQ page.
An example profiling command is attached below for reference:
LD_LIBRARY_PATH=/path/to/python/install/lib/python<version>/site-packages/torch:$LD_LIBRARY_PATH hpcrun -e gpu=nvidia,data_flow -ck HPCRUN_SANITIZER_READ_TRACE_IGNORE=1 -ck HPCRUN_SANITIZER_DATA_FLOW_HASH=0 -ck HPCRUN_SANITIZER_GPU_ANALYSIS_BLOCKS=1 -ck HPCRUN_SANITIZER_GPU_PATCH_RECORD_NUM=131072 python ./<pytorch-script>.py
Optimization
We don’t provide an automate performance testing suite for PyTorch in GVProf because recompile PyTorch for just small code changes still take long time and is a pain on low end servers.
data_flow - redundant values
Please refer to this issue
data_flow - redundant values - value_pattern - single zeros