Feature #137

Add support for all the supercomputer architectures CTL/KTH and CT/BCAM have access to

Added by Johan Jansson over 3 years ago. Updated over 3 years ago.

Status:In ProgressStart date:03/14/2014
Priority:NormalDue date:04/18/2014
Assignee:Eneko Perez% Done:

0%

Category:-
Target version:-

Description

We now have access to 5 supercomputer architectures:

Lindgren/KTH (Cray XE6)
Povel/KTH (standard Infiniband cluster)
MareNostrum (standard Infiniband cluster)
Hermit/LRZ (Cray XE6)
SuperMUC (standard Infiniband cluster)

To be able to develop and use all of these efficiently in our FEniCS-HPC development and simulations, it would be good to have support for all of these architectures in ctl-ports.

History

#1 Updated by Eneko Perez over 3 years ago

  • Assignee set to Eneko Perez

#2 Updated by Eneko Perez over 3 years ago

  • Status changed from New to In Progress

About SuperMUC:

I made this progress:

  1. PETSc Makefile

Changed --with-mpi-dir and --with-blas-lapack-dir. Added the following parameters:

--with-batch --with-fc=0 --known-mpi-shared=1

However, the compilation process fails while calling Lapack:

  • Configure ===============================================================================
    Configuring PETSc to compile on your system =============================================================================== =============================================================================== WARNING! Compiling PETSc with no debugging, this should only be done for timing and production runs. All development should be done when configured using --with-debugging=1 =============================================================================== TESTING: checkLib from config.packages.BlasLapack(config/BuildSystem/config/packages/BlasLapack.py:98) ***********************************************************************
    UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details):
    -------------------------------------------------------------------------------
    You set a value for --with-blas-lapack-dir=<dir>, but /lrz/sys/intel/mkl110u3/lib/intel64 cannot be used ***********************************************************************

make: *** [config] Error 1

#3 Updated by Niclas Jansson over 3 years ago

Eneko Perez wrote:

About SuperMUC:

I made this progress:

  1. PETSc Makefile

Changed --with-mpi-dir and --with-blas-lapack-dir. Added the following parameters:

--with-batch --with-fc=0 --known-mpi-shared=1

However, the compilation process fails while calling Lapack:

  • Configure ===============================================================================
    Configuring PETSc to compile on your system =============================================================================== =============================================================================== WARNING! Compiling PETSc with no debugging, this should only be done for timing and production runs. All development should be done when configured using --with-debugging=1 =============================================================================== TESTING: checkLib from config.packages.BlasLapack(config/BuildSystem/config/packages/BlasLapack.py:98) ***********************************************************************
    UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details):
    -------------------------------------------------------------------------------
    You set a value for --with-blas-lapack-dir=<dir>, but /lrz/sys/intel/mkl110u3/lib/intel64 cannot be used ***********************************************************************

make: *** [config] Error 1

LRZ already provides PETSc, so there is no need to compile it our self.

The approach of recompiling the entire software stack is doomed to fail on more advanced platforms (e.q BG or Cray, where configure scripts needs to be submitted to the batch queue).

Bottom line, use vendor/center provided packages as much as possible

#4 Updated by Eneko Perez over 3 years ago

Niclas Jansson wrote:

Eneko Perez wrote:

About SuperMUC:

I made this progress:

  1. PETSc Makefile

Changed --with-mpi-dir and --with-blas-lapack-dir. Added the following parameters:

--with-batch --with-fc=0 --known-mpi-shared=1

However, the compilation process fails while calling Lapack:

  • Configure ===============================================================================
    Configuring PETSc to compile on your system =============================================================================== =============================================================================== WARNING! Compiling PETSc with no debugging, this should only be done for timing and production runs. All development should be done when configured using --with-debugging=1 =============================================================================== TESTING: checkLib from config.packages.BlasLapack(config/BuildSystem/config/packages/BlasLapack.py:98) ***********************************************************************
    UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details):
    -------------------------------------------------------------------------------
    You set a value for --with-blas-lapack-dir=<dir>, but /lrz/sys/intel/mkl110u3/lib/intel64 cannot be used ***********************************************************************

make: *** [config] Error 1

LRZ already provides PETSc, so there is no need to compile it our self.

The approach of recompiling the entire software stack is doomed to fail on more advanced platforms (e.q BG or Cray, where configure scripts needs to be submitted to the batch queue).

Bottom line, use vendor/center provided packages as much as possible

Well, it wasn't as easy as it sounded:

While compiling dolfin-hpc I get this

checking for PETSc... no
configure: error: Linear Algebra backend not found
make: *** [config] Error 1

This is the adapted makefile to use the provided petsc:

PORTNAME=dolfin
PORTVERSION=0.8.3-hpc
PORTDIR=$(PORTNAME)-$(PORTVERSION)
PORTSRC=$(PORTNAME)-$(PORTVERSION).tar.gz
UNICORNVERSION=0.1.3-hpc
PORTURL=http://www.csc.kth.se/~larcher/archive/$(PORTSRC)
#PORTURL=https://launchpad.net/unicorn/hpc/$(UNICORNVERSION)/+download/${PORTSRC}
PETSCDIR=/lrz/sys/libraries/petsc/3.3-p2

CONFIGURE=sh regen.sh && ./configure --prefix=${CTLOPTROOT} \
        --libdir=${CTLOPTROOT}/lib/${ARCH} \
        --enable-shared --with-pic \
        --enable-function-cache --enable-optimize-p1 \
        --disable-boost-tr1 \
        --with-petsc --with-petsc-dir=${PETSCDIR} \
        --with-parmetis=${CTLOPTROOT} \
        --with-parmetis-libdir=${CTLOPTROOT}/lib/${ARCH} \
        --enable-mpi --enable-mpi-io \
        --with-gts \
        --disable-progress-bar
BUILD=make -j 5
INSTALL=make -j 5 install

include ../Makefile.inc

The two lines that are relevant:

PETSCDIR=/lrz/sys/libraries/petsc/3.3-p2
...
--with-petsc --with-petsc-dir=${PETSCDIR} \

#5 Updated by Eneko Perez over 3 years ago

When doing ls on the petsc directory I see:

ls /lrz/sys/libraries/petsc/3.3-p2/
complex_mpi.ibm_121_debug complex_mpi.intel_121_debug complex_mpi.mpt_121_debug real_mpi.ibm_121_debug real_mpi.intel_121_debug real_mpi.mpt_121_debug
complex_mpi.ibm_121_opt complex_mpi.intel_121_opt complex_mpi.mpt_121_opt real_mpi.ibm_121_opt real_mpi.intel_121_opt real_mpi.mpt_121_opt

I tried with both complex_mpi.intel_121_opt and real_mpi.inel_121_opt, however dolfin-hpc keeps failing. Any idea?

#6 Updated by Niclas Jansson over 3 years ago

Eneko Perez wrote:

When doing ls on the petsc directory I see:

ls /lrz/sys/libraries/petsc/3.3-p2/
complex_mpi.ibm_121_debug complex_mpi.intel_121_debug complex_mpi.mpt_121_debug real_mpi.ibm_121_debug real_mpi.intel_121_debug real_mpi.mpt_121_debug
complex_mpi.ibm_121_opt complex_mpi.intel_121_opt complex_mpi.mpt_121_opt real_mpi.ibm_121_opt real_mpi.intel_121_opt real_mpi.mpt_121_opt

I tried with both complex_mpi.intel_121_opt and real_mpi.inel_121_opt, however dolfin-hpc keeps failing. Any idea?

At build or runtime? It's impossible to say what's wrong without any more information.

#7 Updated by Eneko Perez over 3 years ago

Niclas Jansson wrote:

Eneko Perez wrote:

When doing ls on the petsc directory I see:

ls /lrz/sys/libraries/petsc/3.3-p2/
complex_mpi.ibm_121_debug complex_mpi.intel_121_debug complex_mpi.mpt_121_debug real_mpi.ibm_121_debug real_mpi.intel_121_debug real_mpi.mpt_121_debug
complex_mpi.ibm_121_opt complex_mpi.intel_121_opt complex_mpi.mpt_121_opt real_mpi.ibm_121_opt real_mpi.intel_121_opt real_mpi.mpt_121_opt

I tried with both complex_mpi.intel_121_opt and real_mpi.inel_121_opt, however dolfin-hpc keeps failing. Any idea?

At build or runtime? It's impossible to say what's wrong without any more information.

At build. I'm doing the 'make install' on dolfin-hpc alone to see what's failing during the compilation.

#8 Updated by Niclas Jansson over 3 years ago

Eneko Perez wrote:

Niclas Jansson wrote:

Eneko Perez wrote:

When doing ls on the petsc directory I see:

ls /lrz/sys/libraries/petsc/3.3-p2/
complex_mpi.ibm_121_debug complex_mpi.intel_121_debug complex_mpi.mpt_121_debug real_mpi.ibm_121_debug real_mpi.intel_121_debug real_mpi.mpt_121_debug
complex_mpi.ibm_121_opt complex_mpi.intel_121_opt complex_mpi.mpt_121_opt real_mpi.ibm_121_opt real_mpi.intel_121_opt real_mpi.mpt_121_opt

I tried with both complex_mpi.intel_121_opt and real_mpi.inel_121_opt, however dolfin-hpc keeps failing. Any idea?

At build or runtime? It's impossible to say what's wrong without any more information.

At build. I'm doing the 'make install' on dolfin-hpc alone to see what's failing during the compilation.

Well, without the error message it's pretty hard to guess...

#9 Updated by Eneko Perez over 3 years ago

There you go:

di68wob@login03:~/ctl-ports-supermuc/dolfin-hpc> make install
  • Configure
    Updating configuration...
    Running libtoolize
    libtoolize: serial numbers `2006.12.25.00' or `2011.01.19.21; # UTC' contain non-digit chars
    libtoolize: `./ltmain.sh' is newer: use `--force' to overwrite
    libtoolize: `m4/libtool.m4' is newer: use `--force' to overwrite
    libtoolize: `m4/ltoptions.m4' is newer: use `--force' to overwrite
    libtoolize: `m4/ltversion.m4' is newer: use `--force' to overwrite
    libtoolize: `m4/lt~obsolete.m4' is newer: use `--force' to overwrite
    Running aclocal
    /usr/share/aclocal/dotconf.m4:5: warning: underquoted definition of AM_PATH_DOTCONF
    /usr/share/aclocal/dotconf.m4:5: run info '(automake)Extending aclocal'
    /usr/share/aclocal/dotconf.m4:5: or see http://sources.redhat.com/automake/automake.html#Extending-aclocal
    Running autoconf
    Running automake
    Deleting autom4te.cache directory
    configure: WARNING: you should use --build, --host, --target
    configure: WARNING: invalid host type: --disable-boost-tr1
    checking for a BSD-compatible install... /usr/bin/install -c
    checking whether build environment is sane... yes
    checking for a thread-safe mkdir -p... /bin/mkdir -p
    checking for gawk... gawk
    checking whether make sets $(MAKE)... yes
    checking whether to enable maintainer-specific portions of Makefiles... no
    checking build system type... configure: error: /bin/sh ./config.sub --disable-boost-tr1 failed
    configure: WARNING: cache variable ac_cv_build contains a newline
    make: *** [config] Error 1

#10 Updated by Niclas Jansson over 3 years ago

Eneko Perez wrote:

There you go:

di68wob@login03:~/ctl-ports-supermuc/dolfin-hpc> make install
  • Configure
    Updating configuration...
    Running libtoolize
    libtoolize: serial numbers `2006.12.25.00' or `2011.01.19.21; # UTC' contain non-digit chars
    libtoolize: `./ltmain.sh' is newer: use `--force' to overwrite
    libtoolize: `m4/libtool.m4' is newer: use `--force' to overwrite
    libtoolize: `m4/ltoptions.m4' is newer: use `--force' to overwrite
    libtoolize: `m4/ltversion.m4' is newer: use `--force' to overwrite
    libtoolize: `m4/lt~obsolete.m4' is newer: use `--force' to overwrite
    Running aclocal
    /usr/share/aclocal/dotconf.m4:5: warning: underquoted definition of AM_PATH_DOTCONF
    /usr/share/aclocal/dotconf.m4:5: run info '(automake)Extending aclocal'
    /usr/share/aclocal/dotconf.m4:5: or see http://sources.redhat.com/automake/automake.html#Extending-aclocal
    Running autoconf
    Running automake
    Deleting autom4te.cache directory
    configure: WARNING: you should use --build, --host, --target
    configure: WARNING: invalid host type: --disable-boost-tr1
    checking for a BSD-compatible install... /usr/bin/install -c
    checking whether build environment is sane... yes
    checking for a thread-safe mkdir -p... /bin/mkdir -p
    checking for gawk... gawk
    checking whether make sets $(MAKE)... yes
    checking whether to enable maintainer-specific portions of Makefiles... no
    checking build system type... configure: error: /bin/sh ./config.sub --disable-boost-tr1 failed
    configure: WARNING: cache variable ac_cv_build contains a newline
    make: *** [config] Error 1

First of all, --disable-boost-tr1 was removed from dolfin-hpc a long time ago. But that might not be the issue here. Which arguments are passed to configure? Somehow ctl-ports are making autotools belive that your are cross-compiling (not an issue by itself, dolfin can handle that)

Also, why is sh used to regenerated he build environment? It's suggested to use the login shell (LRZ might be doing something during login).

However, there is NO need to run regen.sh when using a release tarball (given that Larcher has put the ones from dryad in the archive). A release is created using make dist, which copies all the necessary autotools files into the archive.

Also available in: Atom PDF