Java CoG Kit Long Running Jobs

From Java CoG Kit

Jump to: navigation, search

Gregor von Laszewski and Kaizar Amin and ...

Corresponding Author: gregor@mcs.anl.gov

Contents

Abstract

TBD. once other parts are finished

Before making the mods, please review http://www.cogkit.org/w/index.php/Talk:Java_CoG_Kit_Workflow_Examples#Proposed_Enhancement:_Graphs

I like to have an integrated solution.

The above was designed before I got kaizars proposal. the above was clearly based on the original abstractons, but if we can get a higher level api to cog-task and cog-set we may just use that to interface with karajan.

Introduction

This section explains on how to run really long duration jobs with the help of the Java CoG Kit. This feature is implemented in two special providers. They are termed

  • gt2ft and
  • gt4ft (NOT YET IMPLEMENTED)

During the period a long job is running, it is assumed the client will come online and go offline at any time. The user must check upon the status through a pulling command. The advantage of thie use case is that

  • (a) a network connectivity does not need to be maintained throughout the period the job is running.
  • (b) the JVM in which the CoG run can be resarted. Hence it is possible to shutdown your computer and continue working on it at a later time.

The disadvantage is that the jobs submitted are assumed to run for a longer period as the query for the state is more costly (WE NEED A PERFORMANCE EXPERIMENT THAT CONTRASTS THIS.)

State model

Detailed description

TBD

  • Specified:
  • Submitted:
  • Running:
  • Completed:
  • Pending:
  • Failed:
  • Susended: not implemented

Functionality

The GT2FT has the following features:

  • It Extends the GT2 provider. So all functionality of gt2 provider is also available in gt2ft provider
  • If the submitted task has a status of unsubmitted... it will do a fresh submission
  • If the status of the task is "submitted", "active", or "suspended" .. this implies that the task was previously submitted and this time we want to simply reconnect to the submitted task. Thus it does a BIND to the existing task. And continues with status notification as usual.

Managing a single task

Submiting a long running task

The enhnaced fault tolerant feature can be accessed easily from the cog-job-submit command found in the bin directory of the Java CoG Kit. We have also added a cog-get-status command (USED TO BE cog-checpoint-submit).

The cog-task -submit launcher supports an option "-c". When this option is specified... the submitted task is checkpointed allowing the user to reconnect to it after potential failures detected at a later time.

Assuming /home/user/runLong is a script that runs for a very long time (lets assume 2 month), we checkpoint it in checkpoint.xml

cog-task -submit -p gt2ft 
                 -s hot.mcs.anl.gov 
                 -e /home/user/runLong 
                 -args "-i 20 -s 2" 
                 -c checkpoint.xml

At time of submission, we will get an output such as

PUT OUTPUT WITHOUT DEBUGGING HERE. I ASSUME IT WILL BE
Job submitted to: hot.mcs.anl.gov
Checkpointfile:   checkpoint.xml
Command:          /home/user/runLong -i 20 -s 2
Environment Vars: none     


Assuming we have debugging switched on we get the following output

DEBUG [org.globus.cog.abstraction.examples.execution.JobSubmission]  
   - Task Identity: urn:cog-1116756045868
DEBUG [org.globus.cog.abstraction.impl.common.AbstractionFactory]  
   - Instantiating org.globus.cog.abstraction.impl.execution.gt2.GlobusSecurityContextImpl 

for provider gt2ft

DEBUG [org.globus.cog.abstraction.impl.common.AbstractionClassLoader]  
   - Using system class loader for provider gt2ft
DEBUG [org.globus.cog.abstraction.impl.common.AbstractionFactory]  
   - Instantiating org.globus.cog.abstraction.impl.execution.gt2ft.TaskHandlerImpl for provider gt2ft
DEBUG [org.globus.cog.abstraction.impl.execution.gt2ft.JobSubmissionTaskHandler]  
   - RSL: &(executable=/home/user/runLong)(arguments=-i 20 -s 2)
DEBUG [org.globus.cog.abstraction.examples.execution.JobSubmission]  
   - Status changed to Submitted
Task checkpointed to file: checkpoint.xml

THIS DEBUG OUTPUT DOES NOT SHOW THE RESOURCE ON WHICH WE SUBMITTED.

Checking the status of a long runnig job

Assume we lost connection, or we rebooted our machine, or simple some time has passed. Asume I like to check for the status of the job than I can do this with the following command:

./cog-task -status -c checkpoint.xml

This command will go out to the apropiate remote system and check for the status. It will print

 PUT THE OUTPUT HERE

In case you use the debug output you see (REFORMAT BETTER)

DEBUG [org.globus.cog.abstraction.impl.common.AbstractionFactory]  - Instantiating org.globus.cog.abstraction.impl.execution.gt2.GlobusSecurityContextImpl for provider gt2ft
DEBUG [org.globus.cog.abstraction.impl.common.AbstractionClassLoader]  - Using system class loader for provider gt2ft
DEBUG [org.globus.cog.abstraction.impl.common.AbstractionFactory]  - Instantiating org.globus.cog.abstraction.impl.execution.gt2ft.TaskHandlerImpl for provider gt2ft
DEBUG [org.globus.cog.abstraction.impl.execution.gt2ft.JobSubmissionTaskHandler]  - Task binding successful
DEBUG [org.globus.cog.abstraction.impl.execution.gt2ft.JobSubmissionTaskHandler]  - Task identity:urn:cog-1116756045868
DEBUG [org.globus.cog.abstraction.impl.execution.gt2ft.JobSubmissionTaskHandler]  - Previous status = Submitted
DEBUG [org.globus.cog.abstraction.examples.xml.XML2Task]  - Status changed to Completed
DEBUG [org.globus.cog.abstraction.examples.xml.XML2Task]  - Output = null

Obtaining information about a Task

When doing

cog-task -info checkpoint.xml 

the checkpointed task looks like this:

identity:                 1116756507318
name:                     myTask
type:                     Job Submission
service.identity:         1116756507319
service.provider:         gt2ft
service.type:             Job Submission
service.Contact:          hot.mcs.anl.gov
specification.type:       JobSpecification
specification.executable: /home/amin/goWorld
specification.arguments:  -i 20 -s 2
specification.batchjob:   false
specification.redirected: false
specification.localexec:  false
attribute.name.globusid:  https://hot.mcs.anl.gov:50001/28882/1116756633/
status.state:             Submitted
status.time.submitted:    2005-05-22T05:08:32.496


When doing

cog-task -info checkpoint.xml -format xml

the checkpointed task looks like this:

<?xml version="1.0"?>
<task>
   <identity>1116756507318</identity>
   <name>myTask</name>
   <type>Job Submission</type>
   <serviceList>
       <service>
           <identity>1116756507319</identity>
           <provider>gt2ft</provider>
           <type>Job Submission</type>
           <serviceContact>hot.mcs.anl.gov</serviceContact>
       </service>
   </serviceList>
   <specification>
       <JobSpecification>
           <executable>/home/user/longRun</executable>
           <arguments>-i 20 -s 2</arguments>
           <batchJob>false</batchJob>
           <redirected>false</redirected>
           <localExecutable>false</localExecutable>
       </JobSpecification>
   </specification>
   <attributeList>
       <attribute name="globusid" 
                  value="https://hot.mcs.anl.gov:50001/28882/1116756633/"/>
   </attributeList>
   <status>Submitted</status>
   <submittedTime>2005-05-22T05:08:32.496</submittedTime>
</task>

Managing a set of tasks

As it is very likely that you may need to manage multiple long runing jobs at a time the Java CoG kit provides conveneinet ways to make this more simple for you. We allow checkpointing multiple jobs into a directory or into a database (NOTE THET THE DATBASE IS NOT YET IMPLEMENTED).

Specifying the checkpoint system

The system to conduct checkpointing (file, or database) is specified with the help of the command

cog-set -checkpoint -type directory -location <path to the directory>

for a directory based location in which subsequent checkpoint files are written, or

cog-set -checkpoint -type database 
                   -location mysql://<path to the database>
                   -password password

In case the password is not specified a GUI will apear to ask you for it.

Labels

To make it simple for the user, we have augmented the job submission command with a label option to allow him to use user defined labels to refer to a specific job. Grid middleware assigns job with a unique ID which is in most cases not suitable to be remembered by the user. Hence the label feature allows the user to identify labels thet are more convenient for the use by the user. In case a label has already been predifined it must first be removed

cog-set -delete label

To list the information attached with a job we have defined the following options

cog-set -info 

lists the jobs in the set in a convenient ASCII table. To cahnge the format to XNL you can use

cog-set -info -format xml

To just obtain the information for a single job, you can use the label option

cog-set -info -label myjob 

returns the information with the job myjob if the job is not available an error is retured.

To list jobs that correspond to a particular state the state optin can be used. Hence the command

cog-set -info -state failed

will list all jobs that have failed.

Adding tasks

To add a task to the set you can use the command

cog-set -label label -add ....

Please nnote that adding a task does not submit the task to the backend system, but it just adds a placeholder to the set. for submission at a later time. THis feature is useful to generate a number of jobs before submitting them. The feature is only useful if a sophisticated scheduler is used in conjunction with the set. At this time we recommend to using cog-set -submit instead of the add function.

Submitting tasks

A task can be submitted with the command

cog-set -label label -submit ...

It will add the job to the set and submits the job to the Grid. After the submit option the usual job options are specified. Hoever if the job has already been added before, than just the label can be used to conduct the submission.

cog-set -label label -submit

Order

In future we will add the ability to define orders for sets submitte dto the grid. This can be achieved in one of two ways. First, through a file that contains the labels of the jobs to support parameter studies.

cog-set -order filename

Or through the definition of explicit dependencies between the jobs

cog-set -dependency label1 label2

The order feature is naturally most useful with the add feature as to generate a schedule of the tasks. However it is also possible to add additional tasks at runtime. In this scenario we asume a particular set of jobs is already running. We can now add additional jobs through the add and the submit commands during runtime. The responsibility of avoiding or preventing deadlocks is up to the user.

Displaying a set with dependencies

To display a set with dependencies we can use the info option with some special other options

cog-set -info -dependencies 

lists the jobs but adds a column in which the labels of the parents are listed

cog-set -info -dependencies -resolved no

lists only the parents that have not yet resolved in order to run the job. Adding the label to any of the commands restricts the output to just the label.

cog-set -info -dependencies -label label

would retunr the information of the task with the label

To return a garphical representation of the job dependencies and its states one can use the command

cog-set -info -graph filename.png -format dot 

or

cog-set -info -graph filename.png -format karajan

to return a png that uses internally either the dot engine (in case it is installed on your system) or the karajan engine with comes with the Java CoG Kit.

Manual Pages

the above canges the jobsubmit to

cog-task and cog-setup

we need to discuss if we keep cog-submit or do cog-task instead

Here we put in the refence manuals for all of the commands

cog-task

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

BUGS

EXAMPLES

  • pointers to CVSVIEW

cog-set

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

BUGS

EXAMPLES

  • pointers to CVSVIEW

cog-job-submit (depricated)

Depricated in favour of cog-task and cog-set

Integration into Karajan

This is in more detail descripbed in

http://www.cogkit.org/w/index.php/Talk:Java_CoG_Kit_Workflow_Examples#Proposed_Enhancement:_Graphs

Specific Implementation Issues

Globus Toolkit 2

see Globus Toolkit 3 clasical model.

Globus Toolkit 3

These features will not be supported in the OGSI based services. In GT3 we do recommend you use the GT2 classical services. Hence our provider is called gt2ft

A prototype is available but does not yet conform to the commands presented in this guide

Globus Toolkit 4

A system will be implemented.

Personal tools
Collaboration and Jobs