15d94af91SJunchao Zhang#!/usr/bin/python3 25d94af91SJunchao Zhang 3*3ab125cbSJunchao Zhang# Use GNU compilers: 45d94af91SJunchao Zhang# 55d94af91SJunchao Zhang# module load cudatoolkit-standalone PrgEnv-gnu cray-libsci 65d94af91SJunchao Zhang# 75d94af91SJunchao Zhang# Note cray-libsci provides BLAS etc. In summary, we have 85d94af91SJunchao Zhang# 9*3ab125cbSJunchao Zhang# module load cudatoolkit-standalone/11.8.0 PrgEnv-gnu gcc/10.3.0 cray-libsci 10*3ab125cbSJunchao Zhang# 115d94af91SJunchao Zhang# $ module list 125d94af91SJunchao Zhang# Currently Loaded Modules: 13*3ab125cbSJunchao Zhang# 1) craype-x86-rome 5) craype-accel-nvidia80 9) cray-dsmml/0.2.2 13) PrgEnv-gnu/8.3.3 14*3ab125cbSJunchao Zhang# 2) libfabric/1.15.2.0 6) cmake/3.23.2 10) cray-pmi/6.1.10 14) cray-libsci/23.02.1.1 15*3ab125cbSJunchao Zhang# 3) craype-network-ofi 7) cudatoolkit-standalone/11.8.0 11) cray-pals/1.2.11 15) gcc/10.3.0 16*3ab125cbSJunchao Zhang# 4) perftools-base/23.03.0 8) craype/2.7.20 12) cray-libpals/1.2.11 16) cray-mpich/8.1.25 175d94af91SJunchao Zhang 185d94af91SJunchao Zhangif __name__ == '__main__': 195d94af91SJunchao Zhang import sys 205d94af91SJunchao Zhang import os 215d94af91SJunchao Zhang sys.path.insert(0, os.path.abspath('config')) 225d94af91SJunchao Zhang import configure 235d94af91SJunchao Zhang configure_options = [ 245d94af91SJunchao Zhang '--with-cc=cc', 255d94af91SJunchao Zhang '--with-cxx=CC', 265d94af91SJunchao Zhang '--with-fc=ftn', 275d94af91SJunchao Zhang '--with-debugging=0', 285d94af91SJunchao Zhang '--with-cuda', 295d94af91SJunchao Zhang '--with-cudac=nvcc', 305d94af91SJunchao Zhang '--with-cuda-arch=80', # Since there is no easy way to auto-detect the cuda arch on the gpu-less Polaris login nodes, we explicitly set it. 315d94af91SJunchao Zhang '--download-kokkos', 325d94af91SJunchao Zhang '--download-kokkos-kernels', 335d94af91SJunchao Zhang ] 345d94af91SJunchao Zhang configure.petsc_configure(configure_options) 355d94af91SJunchao Zhang 36*3ab125cbSJunchao Zhang# Use NVHPC compilers 37*3ab125cbSJunchao Zhang# 38*3ab125cbSJunchao Zhang# Unset so that cray won't add -gpu to nvc even when craype-accel-nvidia80 is loaded 39*3ab125cbSJunchao Zhang# unset CRAY_ACCEL_TARGET 40*3ab125cbSJunchao Zhang# module load nvhpc/22.11 PrgEnv-nvhpc 41*3ab125cbSJunchao Zhang# 42*3ab125cbSJunchao Zhang# I met two problems with nvhpc and Kokkos (and Kokkos-Kernels) 4.2.0. 43*3ab125cbSJunchao Zhang# 1) Kokkos-Kernles failed at configuration to find TPL cublas and cusparse from NVHPC. 44*3ab125cbSJunchao Zhang# As a workaround, I just load cudatoolkit-standalone/11.8.0 to let KK use cublas and cusparse from cudatoolkit-standalone. 45*3ab125cbSJunchao Zhang# 2) KK failed at compilation 46*3ab125cbSJunchao Zhang# "/home/jczhang/petsc/arch-kokkos-dbg/externalpackages/git.kokkos-kernels/batched/dense/impl/KokkosBatched_Gemm_Serial_Internal.hpp", line 94: error: expression must have a constant value 47*3ab125cbSJunchao Zhang# constexpr int nbAlgo = Algo::Gemm::Blocked::mb(); 48*3ab125cbSJunchao Zhang# ^ 49*3ab125cbSJunchao Zhang# "/home/jczhang/petsc/arch-kokkos-dbg/externalpackages/git.kokkos-kernels/blas/impl/KokkosBlas_util.hpp", line 58: note: cannot call non-constexpr function "__builtin_is_device_code" (declared implicitly) 50*3ab125cbSJunchao Zhang# KOKKOS_IF_ON_HOST((return 4;)) 51*3ab125cbSJunchao Zhang# ^ 52*3ab125cbSJunchao Zhang# detected during: 53*3ab125cbSJunchao Zhang# 54*3ab125cbSJunchao Zhang# It is a KK problem and I have to wait for their fix. 55