Java CoG Kit MPI Guide

From Java CoG Kit
Jump to: navigation, search

Gregor von Laszewski and Kaizar Amin

Version: 4.1.3

Last update: September 3, 2005

Purpose: This document includes basic information about running MPI programs through the Java CoG Kit.

Introduction

The Message Passing Interface (MPI) provides a powerful programming paradigm for high performance computing based on the interchange of messages between processes. An optimized standardized interface is available on almost all high performance computing platforms.

In this guide we will focus on the submission of MPI jobs to a supercomputer or distributed cluster. As an example serves the TeraGrid [1]. However, the examples discussed in this guide are generic enough for you to modify it for your own server environment. We will demonstrate how to submit MPI programs through the Java CoG Kit with the help of the mpirun and the qsub commands. Although we will use the GT2 provider for our submission examples, other providers such as the SSH provider can also be used.

We assume that you have acquired a TeraGrid accounts and that you have a Grid certificate that allows you to submit jobs on the TeraGrid. If you do not know how to do this, please consult with the TeraGrid Web site at http://www.tergrid.org.

Example MPI Program

We have chosen an extremely simple test program that prints on each processor the rank and the number of requested processors.


 #include <stdio.h>
 #include "mpi.h"
 
 int main( argc, argv )
 int  argc;
 char **argv;
 {
     int rank, size;
     MPI_Init( &argc, &argv );
     MPI_Comm_size( MPI_COMM_WORLD, &size );
     MPI_Comm_rank( MPI_COMM_WORLD, &rank );
     printf( "Hello world from process %d of %d\n", rank, size );
     MPI_Finalize();
     return 0;
 }

For the rest of this guide, we will assume that you have this program on the TeraGrid under the following directory: $HOME/mpitest/helloworld.c

Testing the MPI Program on TeraGrid

It is important to make sure that you can compile the program on a teragrid frontend node, just to verify that your TeraGrid account is properly configured. As an example configuration we will submit it on the TeraGrid frontend node and run it on four backend nodes.

To test and build an executable for the TeraGrid, please login first to the frontend node. In this document we will use the name “laszewski” as a place-holder for your username. Please change the example appropriately.

 > ssh laszewski@tg-login.uc.teragrid.org

Place the MPI helloworld.c program in the directory mpitest. This program can now be compiled with the mpicc compiler.

 > mpicc -o helloworld helloworld.c

We recommend that you check out if you can run this program on the TeraGrid by simply using the TeraGrid mpirun program. In our example we execute it on four processors.

 mpirun -np 4 helloworld

You should get the following output:

 Hello world from process 0 of 4
 Hello world from process 3 of 4
 Hello world from process 1 of 4
 Hello world from process 2 of 4

If for some reason this simple test program does not work, your account may not properly configured and you should consult with a TeraGrid systems administrator. However, if your program works, you can proceed to the next examples. The next examples do not require that you are logged into a TeraGrid node. Instead, you can execute them from the clients where you have installed the Java CoG Kit.

MPI Job on TeraGrid using mpirun

We assume that you will submit the job to the host tg-grid1.uc.teragrid.org while using the Globus Toolkit version 2 submission (GRAM) service. We will be using the mpirun command that is available in a directory under /soft on the TeraGrid. We will place the output from this program in your home directory on the TeraGrid with the name cog-mpirun.out. We will reuse the helloworld program that you have previously compiled on the TeraGrid as discussed in Section 5.

Before you can submit a job you have to first authenticate your self. Do do so, you create a proxy certificate on the client machine that is used as part of the initialization step.

 > cog-proxy-init

The command for submitting the job is

 > ./cog-job-submit -s tg-grid1.uc.teragrid.org \
                    -p gt2 \
                    -e /soft/mpich-gm-1.2.5..10-intel-r2a/bin/mpirun \
                    -args "-np 4 helloworld" \
                    -stdout cog-mpirun.out

The meaning of the flags is as follows:

-s: The service location of the globus gatekeeper installed on the teragrid. For the ANL/UC sites this is tg-grid1.uc.teragrid.org
-p: The provider used for this execution. In our example we use GT2 provider
-e: The executable that is run on the teragrid machine. For our mpi program it is mpirun. Note that we give the complete absolute path of the mpirun program. For the ANL/UC site it is /soft/mpich-gm-1.2.5..10-intel-r2a/bin/mpirun
-args: The arguments that need to be supplied with the executable. We want to specify that the mpirun should use 4 backend nodes and executes the helloworld program, hence ”-np 4 helloworld”
-stdout: The remote file that should be used to redirect the standard output. In our example it will be /home/laszewski/cog-mpirun.out on the TeraGrid login node (tg-login.uc.teragrid.org)

The output for this command looks similar to the following. However, the contents maybe slightly different as we have reformatted the output for this guide.

 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Task Identity: urn:cog-1099697597908
 DEBUG [org.globus.cog.core.impl.common.CoreFactory]
   - Instantiating org.globus.cog.core.impl.execution.gt2.TaskHandlerImpl
     for provider gt2
 DEBUG [org.globus.cog.core.impl.execution.gt2.JobSubmissionTaskHandler]
   - RSL:
         &(executable=/soft/mpich-gm-1.2.5..10-intel-r2a/bin/mpirun)
          (arguments=-np 4 helloworld)
          (stdout=cog-mpirun.out)
 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Status changed to Submitted
 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Status changed to Active
 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Status changed to Completed
 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Job completed

On the TeraGrid, the file cog-mpirun.out will be created with the following contents.

 tg-login1:~> cat cog-mpirun.out
 Hello world from process 2 of 4
 Hello world from process 3 of 4
 Hello world from process 0 of 4
 Hello world from process 1 of 4

If this program does not work, make sure that you log once more in on the TeraGrid and verify with

 > which mpirun

if the location of mpirun is properly obtained.

MPI Job on TeraGrid using qsub

Previously we showed how to submit an MPI program with the help of mpirun. In certain circumstances you may need more control over the queue parameters or may just submit a non MPI program. In these cases it is useful to use the job submission routine that is part of the batch processing system installed on the machine. At time of writing of this guide, the UC teragrid node used the qsub program from torque. To submit a job through qsub you need to make sure that you create a script that is submitted as part of the submission process. We assume that you have placed the script (qsub-script) in your home directory on the TeraGrid (/home/laszewski/qsub-script).

 > ./cog-job-submit -s tg-grid1.uc.teragrid.org \
                    -p gt2 \
                    -e /soft/torque-1.1.0p0-r1a/bin/qsub \
                    -args "qsub-script"

The meaning of the flags has been explained in the previous section. They just have different values.

The output for this command looks similar to the following. However, the contents maybe slightly different as we have reformatted the output for this guide.

 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Task Identity: urn:cog-1099698205741
 DEBUG [org.globus.cog.core.impl.common.CoreFactory]
   - Instantiating org.globus.cog.core.impl.execution.gt2.TaskHandlerImpl
     for provider gt2
 DEBUG [org.globus.cog.core.impl.execution.gt2.JobSubmissionTaskHandler]
   - RSL: &(executable=/soft/torque-1.1.0p0-r1a/bin/qsub)(arguments=mpi/script)
 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Status changed to Submitted
 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Status changed to Completed
 DEBUG [org.globus.cog.core.examples.execution.JobSubmission]
   - Job completed

This example assumes you have the script ”qsub-script” on the remote machine. The contents of the file is as follows.

 > cat qsub-script
 #!/bin/sh
 #
 #PBS -q dque
 #PBS -N example
 #PBS -l nodes=4:ia64-compute:ppn=2
 #PBS -l walltime=0:05:00
 #PBS -A TG-ABC
 #PBS -o helloworld-pbs.out
 #PBS -e helloworld-pbs.err
 #
 ## Export all my environment variables to the job
 #PBS -V
 #
 ## Change to my working directory
 cd $HOME/mpi/
 #
 ## Run my parallel job (the PBS shell knows PBS_NODEFILE)
 mpirun -machinefile $PBS_NODEFILE  -np 4 ./helloworld

However, in order to make this script work, you must adapt the account information appropriately (we use here TG-ABC).

Once this program is run, the output in form of the standard output and standard error are written into the files “helloworld-pbs.out” and “helloworld-pbs.err”. The contents of the file helloworld-pbs.out will look similar to the one listed below.

 cat helloworld-pbs.out
 ----------------------------------------
 Begin PBS Prologue Fri Nov  5 17:45:00 CST 2004
 Job ID:         146641.tg-master.uc.teragrid.org
 Username:       laszewski
 Group:          allocate
 Nodes:          tg-c047 tg-c048 tg-c051 tg-c052
 End PBS Prologue Fri Nov  5 17:45:07 CST 2004
 ----------------------------------------
 Hello world from process 2 of 4
 Hello world from process 0 of 4
 Hello world from process 3 of 4
 Hello world from process 1 of 4
 ----------------------------------------
 Begin PBS Epilogue Fri Nov  5 17:45:29 CST 2004
 Job ID:         146641.tg-master.uc.teragrid.org
 Username:       laszewski
 Group:          allocate
 Job Name:       example
 Session:        12505
 Limits:         nodes=4:ia64-compute:ppn=2,walltime=00:05:00
 Resources:      cput=00:00:04,mem=1312kb,vmem=3808kb,walltime=00:00:14
 Queue:          dque
 Account:        TG-ABC
 Nodes:          tg-c047 tg-c048 tg-c051 tg-c052
 
 Killing leftovers...
 
 End PBS Epilogue Fri Nov  5 17:45:38 CST 2004
 ----------------------------------------

For more details on PBS scripts please take a look at the TeraGrid help pages for PBS scripts on: http://www.teragrid.org/userinfo/guide_jobs_pbs.html

MPI Job on TeraGrid using MPICH-G2 (Single Site)

In this section we discuss MPI job submissions on the TeraGrid from the Java CoG Kit using the Globus Toolkit’s MPICH-G2 libraries for a single site job. This mechanism is useful to test your programs. We assume that the user is familiar with formulating MPICH-G2 tasks as well as the TeraGrid MPICH-G2 environment. A good introductory background on basic MPICH-G2 concepts is available at http://www3.niu.edu/mpi/. Information on establishing the MPICH-G2 environment on TeraGrid is available at http://www.teragrid.org/userinfo/guide_jobs_mpich_g2.html.

For this Section, we assume the TeraGrid UC/ANL site. Further, we assume the following “soft” environment on the ANL machines.

 > cat ~/.soft
 @remove +globus
 @remove +mpich-gm-intel
 @remove +mpich-gm-gcc
 +globus-2.4.3-intel-r5
 +mpich-g2-intel
 @teragrid

For our example, we will use the same helloworld.c program described in Section 4. Place the MPI helloworld.c program in the directory mpichg2. This program can now be compiled with the MPICH-G2 based mpicc compiler.

 > which mpicc
 /soft/globus-2.4.3-intel-r5/mpich-g2-1.2.6/bin/mpicc
 
 > mpicc -o helloworld helloworld.c

Once the compilation is successful, the server-side environment setup is completed. Next, we submit an MPICH-G2 task from the client machine which has the Java CoG Kit installed. For our example, we will formulate the following RSL:

 +(
   &(resourceManagerContact=tg-grid1.uc.teragrid.org/jobmanager-pbs_gcc)
    (count=4)
    (hostcount=4:ia64-compute)
    (project=<your project number>)
    (jobtype=mpi)
    (label=subjob 0)
    (environment=
      (GLOBUS_DUROC_SUBJOB_INDEX 0)
      (LD_LIBRARY_PATH
        /soft/globus-2.4.3-intel-r5/lib/:/soft/intel-c-8.0.066-f-8.0.046/lib/))
    (directory=/home/laszewski/mpichg2)
    (executable=/home/laszewski/mpichg2/helloworld)
    (stdout=/home/laszewski/mpichg2/helloworld.out)
    (stderr=/home/laszewski/mpichg2/helloworld.err)
 )

The above RSL is formulated based on the information available at http://www.teragrid.org/userinfo/guide_jobs_mpich_g2.html and our TeraGrid environment. The various parameters should be appropriately adjusted to reflect your TeraGrid server environment. Additionally, replace the “(project=your project number)” with the appropriate TeraGrid project number.

Please Note that at the time of writing this guide, it was required to include

/soft/intel-c-8.0.066-f-8.0.046/lib/ 

directory in the “LD_LIBRARY_PATH” for this example to work. This was due to an MPICH-G2 installation problem at the ANL/UC site, which might have been fixed now.

Next, we create a proxy certificate on the client machine.

 > ./cog-proxy-init

The command for submitting the job using the Java CoG Kit is as follows:

 > ./cog-job-submit -specification
 ’+(
    &(resourceManagerContact=tg-grid1.uc.teragrid.org/jobmanager-pbs_gcc)
     (count=4)
     (hostcount=4:ia64-compute)
     (project=<your project number>)
     (jobtype=mpi)
     (label=subjob 0)
     (environment=
       (GLOBUS_DUROC_SUBJOB_INDEX 0)
       (LD_LIBRARY_PATH
        /soft/globus-2.4.3-intel-r5/lib/:/soft/intel-c-8.0.066-f-8.0.046/lib/))
     (directory=/home/laszewski/mpichg2)
     (executable=/home/laszewski/mpichg2/helloworld)
     (stdout=/home/laszewski/mpichg2/helloworld.out)
     (stderr=/home/laszewski/mpichg2/helloworld.err)
 )’

The output for this command looks similar to the following. However, the contents maybe slightly different as we have reformatted the output for this guide.


 DEBUG [org.globus.cog.abstraction.examples.execution.JobSubmission]  -
 Task Identity: urn:cog-1123748613059
 DEBUG [org.globus.cog.abstraction.examples.execution.JobSubmission]  -
 Status changed to Submitted
 DEBUG [org.globus.cog.abstraction.examples.execution.JobSubmission]  -
 Status changed to Active
 DEBUG [org.globus.cog.abstraction.examples.execution.JobSubmission]  -
 Status changed to Completed
 Job completed

On the TeraGrid, the file helloworld.out will be created with the following contents.

 > cat helloworld.out
 ----------------------------------------
 Begin PBS Prologue Thu Aug 11 01:45:01 CDT 2005
 Job ID:         196562.tg-master.uc.teragrid.org
 Username:       laszewski
 Group:          allocate
 Nodes:          tg-c059 tg-c060 tg-c061 tg-c062
 End PBS Prologue Thu Aug 11 01:45:03 CDT 2005
 ----------------------------------------
 Hello world from process 0 of 4
 Hello world from process 1 of 4
 Hello world from process 3 of 4
 Hello world from process 2 of 4
 ----------------------------------------
 Begin PBS Epilogue Thu Aug 11 01:45:26 CDT 2005
 Job ID:         196562.tg-master.uc.teragrid.org
 Username:       laszewski
 Group:          allocate
 Job Name:       STDIN
 Session:        13536
 Limits:         nodes=4:ia64-compute,walltime=00:01:00
 Resources:      cput=00:00:04,mem=1312kb,vmem=3808kb,walltime=00:00:10
 Queue:          dque
 Account:        <your project number>
 Nodes:          tg-c059 tg-c060 tg-c061 tg-c062
 
 Killing leftovers...
 
 End PBS Epilogue Thu Aug 11 01:45:40 CDT 2005
 ----------------------------------------

MPI Job on TeraGrid using MPICH-G2 (Multiple Site)

In this section we discuss how to execute an MPICH-G2 job across multiple sites. Multi-site MPICH-G2 execution requires coallocation and synchronization support from the Globus toolkit DUROC libraries. However, the Java CoG Kit does not support these libraries anymore. Therefore, multi-site MPICH-G2 execution is only possible via a three layered execution pattern. The frontend layer comprises of the Java CoG Kit and the client machine. The middle layer comprises of the Globus toolkit services and client-side tools. The backend layer comprises of the clutser/server that supports MPICH-G2 execution.

The user uses the Java CoG Kit on the frontend layer and submits the MPICH-G2 job to the C-based globusrun client (middle layer) using full delegation. The C-based globusrun client then (with the help of DUROC libraries) accomplishes the multi-site MPICH-G2 execution.

Lets assume that the Globus toolkit is installed on some machine named hot.anl.gov with its execution service using the default port. Let the absolute path to the Globus installation be /software/globus-4.0.0. The client can submit the MPICH-G2 task in the following manner:

bash-2.05b$ ./cog-job-submit -s hot.anl.gov 
                    -p mpichg2 
                    -a  "remote_globus_location=/software/globus-4.0.0, rsl_file=rsl.txt" 
                    -r

The meaning of the flags are as follows:

-s the remote globus execution service (middle layer)

-p the mpichg2 provider that facilitaes execution via full delegation

-a mpihcg2 provider-specific attributes for specifying the remote globus location and MPICH-G2 rsl file

-r redirect the putput of the globus execution service to the local machine (note that this will redirect the putput of the middle layer, not the backend layer)

Lets assume the rsl.txt to have the following contents: (Note: this has to be replaced with multi-site mpichg2 rsl)


bash-2.05b$ cat rsl.txt
’+(
   &(resourceManagerContact=tg-grid1.uc.teragrid.org/jobmanager-pbs_gcc)
    (count=4)
    (hostcount=4:ia64-compute)
    (project=<your project number>)
    (jobtype=mpi)
    (label=subjob 0)
    (environment=
      (GLOBUS_DUROC_SUBJOB_INDEX 0)
      (LD_LIBRARY_PATH
       /soft/globus-2.4.3-intel-r5/lib/:/soft/intel-c-8.0.066-f-8.0.046/lib/))
    (directory=/home/laszewski/mpichg2)
    (executable=/home/laszewski/mpichg2/helloworld)
    (stdout=/home/laszewski/mpichg2/helloworld.out)
    (stderr=/home/laszewski/mpichg2/helloworld.err)
)’

The output on the client machine will look as follows:

DEBUG  - Execution server: hot.anl.gov
DEBUG  - Submitted job with Globus ID: https://hot.anl.gov:50001/26306/1128005889/
DEBUG  - Status changed to Submitted
DEBUG  - Status changed to Active
DEBUG  - Status changed to Completed
Job completed
making globus_duroc request: ...
duroc request status: 0
duroc job contact: "1"
duroc subjob status:
   Submission of subjob (label = "<no label>") succeeded
releasing barrier in automatic mode...
waiting for job termination

Cancelling MPI jobs on the TeraGrid using the MPICH-G2 provider

Due to a bug in c-globusrun, we currently cannot cancel MPI jobs executed throught the MPICH-G2 provider. We are monitoring the issue and will post a solution as soon as it is resolved.

References

  1. “TeraGrid,” Web Page, 2001. http://www.teragrid.org/
  1. “A Java Commodity Grid Kit,” [las01cogconcurency]
  1. “Java CoG Kit Registration,” http://www.cogkit.org/register

Additional publications about the Java CoG Kit can be found as part of the vita of Gregor von Laszewski http://www.mcs.anl.gov/~gregor.

If you need to cite the Java CoG Kit, please use [2].

Notes

  • Note A: the cog-job-submit -specification command does not work over multiple machines. When multiple subjobs are specified one will get the error Specification Exception: Cannot parse the given RSL.
  • Note B: An example on how to do this across multiple machines is developed at present.