V:4.1.5/Karajan:Task Library
From Java CoG Kit
Files: task.k, task.xml
The task library interfaces with the Java CoG Kit abstraction classes, allowing the use of services for job submission and file operations. The tasks in this library can function in two modes: scheduled or unscheduled. When scheduled, remote tasks are not executed directly. They are rather passed to a scheduler which can handle issues such as throttling, resource allocation, and task-to-resource mapping.
Task Elements
task:scheduler
task:scheduler(type, resources, handlers, properties, taskTransformers)
Defines a scheduler to be used. A scheduler in Karajan has the role of managing resources and assigning abstract tasks (such as the ones produced by execute and transfer) to concrete resources. More details about the role of Karajan schedulers can be found in Karajan:The role of schedulers.
The type describes the particular type of scheduler that is desired. The resources that can be used by the scheduler are passed in the resources argument and can be defined using resources. Each scheduler will also require a set of task handlers, specified with the help of the handler element. Each type of scheduler may support an optional set of properties. The properties channel and the property element can be used to specify any number of properties.
The taskTransformers channel and the taskTransformer element can be used to specify task transformers. A task transformer allows changes to be done to a task after it has been bound to a physical resource by the scheduler. For example, it could be used to set the initial directory of a job to a value that depends on the host on which the job is executed.
For a list of currently available schedulers and their supported properties take a look at Available Schedulers
The scope of scheduler is similar to a deeply accessible variable.
The following example shows a typical scheduler definition:
import("sys.k")import("task.k") scheduler("default"resources(host("host1", cpus = 256service("execution", provider = "gt2", jobManager = "PBS", uri = "host1.example.net:2119")service("file", provider = "gsiftp", uri = "host1.example.net:2911") )host("host2", cpus = 2service("execution", provider = "gt2", uri = "host2.example.net:2119")service("file", provider = "gsiftp", uri = "host2.example.net:2911") ) )handler("execution", "gt2")handler("execution", "gt4")handler("file", "gsiftp")handler("file-transfer", "ssh")property("jobsPerCpu", "1") )
Or in XML:
<importfile="sys.xml"/> <importfile="task.xml"/> <scheduler type = "default"> <resources> <hostname="host1" cpus="256"> <servicetype="execution" provider="gt2" jobManager="PBS" uri = "host1.example.net:2119"/> <servicetype="file" provider="gsiftp" uri="host1.example.net:2911"/> </host> <hostname="host2" cpus="2"> <servicetype="execution" provider="gt2" uri="host2.example.net:2119"/> <servicetype="file" provider="gsiftp" uri="host2.example.net:2911"/> </host> </resources> <handlertype="execution" provider="gt2"/> <handlertype="execution" provider="gt4"/> <handlertype="file" provider="gsiftp"/> <handlertype="file-transfer" provider="ssh"/> <propertyname="jobsPerCpu" value="1"/> </scheduler>
task:handler
task:handler(type, provider)
A handler specifies a Java CoG Kit Abstraction handler. A handler is used to submit tasks. Type indicates the type of handler. They type is string and can have one of the following values: “execution”, “file”, and “file-transfer”.
Execution handlers are used for submitting jobs. File handlers are used for file operations (such as renaming, deleting, and listing of files). File transfer handlers are used only for transferring files. It is possible to transfer files using file handlers, but it is not possible to delete a file using a file transfer handler.
The provider argument indicates the provider to be used for the handler. For a list of currently supported providers please see the abstractions guide.
task:taskTransformer
task:taskTransformer(classname)
Returns a task transformer implementation on the taskTransformers channel. Currently task transformers can only be defined as Java classes implementing the org.globus.cog.karajan.scheduler.TaskTransformer interface.
task:resources
task:resources(...)
Encapsulates a set of hosts which can be specified using host.
task:host
task:host(name , *cpus , ...)
Returns a host definition. The name argument indicates the host name or IP address. The number of CPUs of the host can be specified using the *cpus argument. A set of services can also be specified on the default channel.
task:service
task:service(type, provider, *uri, *project, *jobManager, *securityContext)
Returns a service definition. The type of the service can be one of “execution”, “file”, or “file-transfer”. Provider indicates The Java CoG Kit abstraction provider for the service. For a list of currently supported providers please see the abstractions guide.
The *uri argument can be used to specify a URI for the service. If missing the host name of the host containing the service will be used.
The *project argument can be used to automatically bind a queuing system project to the service in order to alleviate the need to do it with the execute element.
The *jobManager argument can be used to specify a job manager different from the default. Examples of job managers include Fork, PBS, and Condor.
A non-default security context can be specified using the *securityContext argument.
task:securityContext
task:securityContext(provider, credentials)
Returns a Java CoG Kit abstraction security context. The returned context will be instantiated for the specified provider. The credentials argument can be used to pass a specific set of credentials to security context.
task:allocateHost
task:allocateHost(name)
Allows tasks to be grouped on one host. By default, the scheduler assigns a different host to each task. AllocateHost can be used to reserve a host from the scheduler until it completes. The name indicates the name of the variable to be set with the allocated host, and is automatically quoted.
//Define a schedulerscheduler( ... ) allocateHost(host1execute("/bin/date", stdout="date", host=host1)transfer(srcfile="date", srchost=host1, desthost="localhost") )
Or, in XML:
<scheduler> ... </scheduler> <allocateHost name="host1"> <executeexecutable="/bin/date" stdout="date" host="{host1}"/> <transfersrcfile="date" srchost="{host1}" desthost="localhost"/> </allocateHost>
The default scheduler uses a late binding mechanism with allocateHost. It generates a virtual host that is only bound to an actual host when the first task using it is submitted to the scheduler. This removes the limitation on the number of parallel allocateHost that can be running, and allows contained jobs to be submitted to the scheduler, which will later handle the throttling issues.
Multiple allocateHost can be nested allowing the grouping of tasks on multiple dependent hosts.
task:host:hasService
task:host(host, type, provider)
Checks if a host, specified with the host element contains a service of the specified type and with the specified provider. Returns true if such a service exists, and false otherwise.
task:execute
task:execute(executable, arguments, *directory, *stdout, *stderr, *, *redirect, *provider, *host, *count, *jobtype, *maxtime, *maxwalltime, *maxcputime, *environment, environment, *queue, *project, *minmemory, *maxmemory, *nativespec, *delegation)
Executes a remote job. Executable indicates the executable to be run. Arguments can be passed to the executable using arguments. If present, the *directory argument specifies the remote directory in which the job will be executed. *Stdout and *stderr allow the redirection of the output and error streams to a remote file. *Stdin allows the redirection of the standard input from a remote file. If *redirect is set to true the standard output and standard error of the remote job is redirected to the local console. The *host argument allows the job to be executed on a specific host, and the *provider argument allows the job to be executed using a specific provider.
The *delegation can be used to enable credential delegation with providers which support it. Credential delegation is disabled by default.
A set of environment variables can be passed in two ways:
- Through the
*environmentargument, in which case the value should be either amapor a string with the following format: [<name>=<value>[, <name>=<value>[, ...]]] - Through the
environmentchannel, in which caseenvVarcan be used
The rest of the arguments are passed to the underlying provider.
A native specification (such as a classic GRAM RSL, or WS-GRAM RSL) can be passed to the provider using the *nativespec argument.
If the exit code of the job is not 0, the job is considered failed and execute will in turn fail. However some providers (notably GT2) provide no means of retrieving the job exit code. With such providers execute will always succeed no matter what the exit code of the job is.
task:transfer
task:transfer(*srcfile, *srcdir, *srchost, *destfile, *destdir, *desthost, *provider, *srcprovider, *destprovider, *thirdparty, *srcOffset, *length, *destOffset, *tcpBufferSize)
Transfers a file. The file can be transfered between the local machine and a remote machine, or between two remote machines. The name of the source file is specified by the *srcfile argument. If present, *destfile specifies the name of the target file, otherwise the source file name is used.
The *srcdir argument indicates the directory on the source machine where the source file can be found. If the *srcdir argument is missing, the default directory will be assumed (provider dependent).
The *destdir argument indicates the directory on the target machine where the file will be copied. If the *destdir argument is missing, the default directory will be assumed (provider dependent).
*Srchost and *desthost indicate the source and the target machines respectively, while the *provider argument can be used to force the scheduler to use a specific provider, or in the event a scheduler is not used. If the source and the destination use distinct providers, the *srcprovider and *destprovider arguments can be used.
The *thirdparty can be used to indicate that a direct transfer between two machines, none of which are the local host, is requested. At this time, only GridFTP supports third party transfers. By default, the Java CoG Kit Abstractions will use simulated third party transfers (routed through the local host) even if both the source and destination are different from the local host.
Partial transfers can be achieved using *srcOffset, *length, and *destOffset. Currently these are only supported with GridFTP 3rd party transfers.
The *tcpBufferSize can be used to set the buffer size used by TCP/IP connections initiated by the transfers for providers that support it. Currently this feature only works with 3rd party GridFTP transfers.
task:transferParams
task:transferParams(srchost, desthost, provider)
Can be used to load TCP buffer size information from a file and pass the parameters to transfer. The file containing the buffer size information is located in the etc directory and is named bdp.conf. It has the same format as the identically named file used by TGCP
Example:
transfer( srcfile="file.txt" srcdir="/home/me"transferParames( srchost="tg-gridftp.uc.teragrid.org" desthost="tg-gridftp.sdsc.teragrid.org" provider="gsiftp" ) )
task:file:list
task:file:list(dir, *host, *provider)
Returns a list of files in a directory specified by dir, on the *host machine. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:file:remove
task:file:remove(name, *host, *provider)
Removes a file specified by name, on the *host machine. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:file:exists
task:file:exists(name, *host, *provider)
Returns true if the file specified by name exists on the *host machine. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:dir:make
task:dir:make(name, *host, *provider)
Creates a directory specified by name, on the *host machine. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:dir:remove
task:dir:remove(name, *host, *provider)
Removes an empty directory.
task:file:isDirectory
task:file:isDirectory(name, host, provider)
Returns true if the file specified by name exists on the *host machine and it is a directory. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:file:chmod
task:file:chmod(name, mode, *host, *)
Changes the permissions on the file specified with the name argument to the mode string indicated by the mode argument. If *host and *provider are present, the operation is done remotely using the respective provider.
task:file:rename
task:file:rename(from, to, *host, *provider)
Renames a file. The source and target name are specified using the from and to arguments. If *host and *provider are present, the operation is done remotely.
task:SSHSecurityContext
task:SSHSecurityContext(credentials)
Instantiates a SSH security context. This is simply a convenience function for securityContext(”ssh”, credentials).
task:InteractiveSSHSecurityContext
task:InteractiveSSHSecurityContext(*username, *privateKey, *nogui)
Instantiates a SSH security context which will lazily display a dialog window allowing the user to input a user-name/password pair or a user-name/private key/passphrase set. The dialog will only be displayed once per each instance of an interactive SSH security context.
If *username and/or *privateKey are specified, the values are used to pre-fill the corresponding dialog fields.
The InteractiveSSHSecurityContext makes use of a class present in the SSH provider of the Java CoG Kit. This class will try to determine whether a GUI can be displayed or not (by checking GraphicsEnvironment.isHeadless()). If a Swing dialog cannot be displayed, a text-mode interface is used instead. The *nogui argument can be used to force the use of the text-mode interface (by setting it to true).
task:passwordAuthentication
task:passwordAuthentication(username, password)
Returns a username/password pair suitable to be used as a credential for a securityContext.
task:publicKeyAuthentication
task:publicKeyAuthentication(username, privatekey, passphrase)
Returns a username/privatekey/passphrase set which can be used as credentials for securityContext. The privatekey argument must point to a file containing the private key.
task:envVar
task:envVar(name, value)
Returns the pair of name and value on the environment channel, to be used by execute
