HPC Tutorial notes

Notes compiled from HPC introduction by Niclas Jansson on July 3. 2012

Slightly updated to reflect changes in Hydra's installation, 2013

CSC ressources

Servers on which you can work

General purpose machines: you can login from home and work on them

  • descartes.csc.kth.se
  • riesz.csc.kth.se

Batch server: use it only for batch computing

  • hydra.csc.kth.se

PDC supercomputer:

  • lindgren

To be able to keep your program running after logging out you should use the program '_screen_'.

AFS

When you logging you are in your home directory which is managed by AFS.
To whichever server you connect, you end up in the same directory which is shared across the network.

First you should make sure you have a valid Kerberos token:

larcher@hydra:~$ klist
Credentials cache: FILE:/tmp/krb5cc_7993_g13719
        Principal: larcher@NADA.KTH.SE

  Issued           Expires          Principal
Jul  3 14:23:55  Jul  4 00:23:55  krbtgt/NADA.KTH.SE@NADA.KTH.SE
Jul  3 14:23:56  Jul  4 00:23:55  afs@NADA.KTH.SE
Jul  3 14:23:56  Jul  4 00:23:55  afs/pdc.kth.se@NADA.KTH.SE

To get a new identification token use:

larcher@hydra:~$ kinit 

To activate your identification to the AFS system or update it (credentials for instance)

larcher@hydra:~$ aklog

or (afslog)

NOBACKUP

You should run the computations in a dedicated directory located in /NOBACKUP

Module system

In order to manage the software you should use the program 'module'.

First, setup the path to the modules:

larcher@descartes:~$ export MODULEPATH=/afs/nada.kth.se/dept/na/ctl/pkg/@sys/modulefiles

You can then check the list of modules:

larcher@descartes:~$ module avail
--------------------------------- /afs/nada.kth.se/dept/na/ctl/pkg/@sys/modulefiles ----------------------------------
dolfin-hpc/0.8.0    ffc/0.5.1           paraview/3.14.1     unicorn-hpc/0.1.0   unicorn-hpc/current
dolfin-hpc/0.8.1    fiat/0.3.4          parmetis/3.1.1      unicorn-hpc/0.1.1   valgrind/3.7.0
dolfin-hpc/0.8.2    instant/0.9.5       petsc/3.0.0         unicorn-hpc/0.1.2
dolfin-hpc/current  paraview/3.12.0     ufc/1.1             unicorn-hpc/0.1.3

To load a module:

larcher@descartes:~$ module add unicorn-hpc

To list the loaded modules:

larcher@descartes:~$ module list
Currently Loaded Modulefiles:
1) unicorn-hpc/current
2) dolfin-hpc/current

By default the system is picking the most recent module version.
If you have several versions, you can chose a specific one by using 'module swap'.

DOLFIN-HPC/UNICORN Tutorial

First check out the latest version of the tutorial with Bazaar:

larcher@descartes:~$ bzr branch /afs/nada.kth.se/dept/na/ctl/repo/bzr/hpc-tutorial
Branched 8 revision(s).

TODO: Describe the files.

   larcher@descartes:~/hpc-tutorial$ ls -dl *
   -rw-r--r-- 1 larcher dip    1505 2012-07-03 13:59 chkp.cpp
   -rwxr-xr-x 1 larcher dip     164 2012-07-03 13:59 daisy.csh
   -rw-r--r-- 1 larcher dip  141130 2012-07-03 13:59 hpc-tutorial-2012.pdf
   -rw-r--r-- 1 larcher dip  158196 2012-07-03 13:59 hpc-tutorial.pdf
   -rw-r--r-- 1 larcher dip     491 2012-07-03 13:59 Makefile
   -rw-r--r-- 1 larcher dip    3582 2012-07-03 13:59 mesh.cpp
   -rw-r--r-- 1 larcher dip    1225 2012-07-03 13:59 minimal.cpp
   -rw-r--r-- 1 larcher dip      52 2012-07-03 13:59 parameters
   -rw-r--r-- 1 larcher dip      52 2012-07-03 13:59 parameters_restart
   -rw-r--r-- 1 larcher dip     182 2012-07-03 13:59 submitfile
   -rw-r--r-- 1 larcher dip     179 2012-07-03 13:59 submitfile_chkp
   -rw-r--r-- 1 larcher dip     183 2012-07-03 13:59 submitfile_restart
   -rw-r--r-- 1 larcher dip 2958009 2012-07-03 13:59 usquare.xml

Make sure that 'dolfin-hpc' and 'unicorn-hpc' are loaded before running the test:

  larcher@descartes:~/hpc-tutorial$ module list
  Currently Loaded Modulefiles:
     1) unicorn-hpc/current
     2) dolfin-hpc/current

Three C++ files are provided with the tutorial as an introduction to DOLFIN

   larcher@descartes:~/hpc-tutorial$ ls -l *.cpp
   -rw-r--r-- 1 larcher dip 1505 2012-07-03 13:59 chkp.cpp
   -rw-r--r-- 1 larcher dip 3582 2012-07-03 13:59 mesh.cpp
   -rw-r--r-- 1 larcher dip 1225 2012-07-03 13:59 minimal.cpp

Example: mesh.cpp

  • Create a mesh object by loading the unit square mesh described in the XML file 'usquare.xml':
Mesh mesh("usquare.xml");
  • The best way to print from DOLFIN is to use the function dolfin::message
  • To save meshes, functions, vectors and matrices you should use the "File" class:
File f_mesh("mesh.pvd");
f_mesh << mesh;

To build the examples:

   larcher@descartes:~/hpc-tutorial$ make
   `pkg-config --variable=compiler dolfin` `pkg-config --cflags unicorn` -I./ -I../ mesh.cpp   `pkg-config --libs unicorn`  -o mesh
   `pkg-config --variable=compiler dolfin` `pkg-config --cflags unicorn` -I./ -I../ minimal.cpp   `pkg-config --libs unicorn`  -o minimal
   `pkg-config --variable=compiler dolfin` `pkg-config --cflags unicorn` -I./ -I../ chkp.cpp   `pkg-config --libs unicorn`  -o chkp

Then you can run an example on two processors:

   larcher@descartes:~/hpc-tutorial$ mpirun -np 2 ./mesh
   Initializing DOLFIN version 0.8.2-hpc.
   Initializing DOLFIN version 0.8.2-hpc.
   *** Warning: Reading DOLFIN xml meshes in parallel is depricated. For better I/O performance, consider converting to flat binary
   *** Warning: Reading DOLFIN xml meshes in parallel is depricated. For better I/O performance, consider converting to flat binary
   Mesh loaded with 16641 vertices
   Rank: 0 has 8294 vertices
   Rank: 0 has 83 ghosted and 137 shared vertices
   Rank: 0 vertex 10 has global number 10
   Rank: 1 has 8484 vertices
   Rank: 1 has 54 ghosted and 137 shared vertices
   Rank: 1 vertex 10 has global number 8221

   ...
   ...

In that case the program loads a mesh, distributes it on 2 CPUs, call the refinement and balances the newly created refined mesh on the 2 CPUs.

Example: minimal.cpp

This file gives an example of a basic solver.

The 'unicorn_init' function needs arguments to run a computations:

larcher@descartes:~/hpc-tutorial$ ./minimal
Initializing DOLFIN version 0.8.2-hpc.
Initializing Unicorn version 0.1.3-hpc.
Usage: -p &lt;parameters&gt; [-m &lt;mesh&gt; -c &lt;checkpoint&gt;] [-i iteration] [-l &lt;wall clock limit&gt;] [-o &lt;petsc arguments&gt;] [-s &lt;structure mesh&gt;]

You need to provide a 'parameters' file and a mesh:

larcher@descartes:~/hpc-tutorial$ ./minimal -p parameters -m usquare.xml
Initializing DOLFIN version 0.8.2-hpc.
Initializing Unicorn version 0.1.3-hpc.
Running on 1 node
Global number of vertices: 16641
Global number of cells: 32768
Running iteration 0 of 5
Pre
Solver
Post
Running iteration 1 of 5
Pre
Solver
Post
Running iteration 2 of 5
Pre
Solver
Post
Running iteration 3 of 5
Pre
Solver
Post
Running iteration 4 of 5
Pre
Solver
Post

Queue system on Hydra

Make sure you set the MODULEPATH environment again if you log on hydra and that 'torque' is loaded

larcher@hydra:~$ module add torque

Show the queue:

larcher@hydra:~$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
6401.hydra wccm2012 spuhler 110:30:2 R batch

You need to use a submit file to add a job in the queue, like 'submitfile':

   #PBS -N unicorn
   #PBS -l walltime=00:10:00,nodes=1:ppn=4
   #PBS -m abe
   #PBS -v KRB5CCNAME
   cd $PBS_O_WORKDIR
   afslog
   mpirun --hostfile $PBS_NODEFILE minimal -p parameters -m usquare.xml

To add the job:

qsub submitfile