V:4.1.5/Java CoG Kit Karajan Workflow Reference Manual - Single Page
From Java CoG Kit
Mike Hategan and Gregor von Laszewski
This document exists also as multiple smaller documents http://wiki.cogkit.org/index.php/V:4.1.5/Java_CoG_Kit_Karajan_Workflow_Reference_Manual
The Karajan Language
Karajan supports two syntax modes: a native syntax and XML. There is no difference on the semantic level between the two forms.
The Karajan Syntax
The following conventions are used:
(xy) is used to group x and y [x] indicates that x is optional x+ denotes at least one occurrence of x x* denotes zero or more occurrences of x x|y means either x or y 'x' is to be interpreted as the literal x
ε represents the empty production
Elements
The Karajan semantics revolve around the notion of elements. An element is relatively similar to a function in that it has a name, can accept arguments, and may return values. The general syntax for an element is:
element ::= identifier '(' [arguments] ')'
Example:
false()
Identifiers
An identifier can consist of alpha-numeric characters and certain symbols, but no whitespace. Symbols that cannot be used in an identifier are symbols that have other syntactic functions, such as brackets (all of them), commas, double quotes, and operators (’+’, ’-’, ’*’, ’/’, ’%’, ’^’, ’=’, ’<’, ’>’, ’&’, ’|’). Identifiers are case insensitive.
identifier ::= (Letter | Digit | '!' | '@' | '#' | '$' | '%' |
| '_' | ':' | ';' | ''' | '.' | '?' | '\' | '`' | '~')+
Example:
var, i, v123, @, a$, big_list, grid:task, file.list
Identifiers cannot begin with a digit.
Arguments
The arguments can either be other elements or values, such as numeric values or strings. Elements can be separated by commas or the new line character (or both):
arguments ::= argument [separator arguments]) | ε separator ::= ',' | Newline
Example:
list(true(),false())list(true()false() )list(true(),false())
Arguments come in two flavors. Named arguments and unnamed arguments:
argument ::= named_argument | unnamed_argument
Named Arguments
Named arguments provide a way of explicitly binding arguments to formal parameters:
named_argument ::= identifier '=' unnamed_argument
Example:
true(), nl =false())
Unnamed Arguments
Unnamed arguments can be either immediate values, elements or expressions. Immediate values can be numeric literals, string literals, variables, or quoted lists:
unnamed_argument ::= numeric_literal | string_literal | variable | quoted_list |
| element | expression
numeric_literal ::= ['+'|'-'] digit+ ['.' digit+]
digit ::= '0'... '9'
string_literal ::= '"' any_characters_but_double_quotes '"'
variable ::= identifier
Example:
list(1, 2.3, -4.56, +7.890, "A string",list("Another string value in a nested list", "*2"))
Expressions
Expressions consist of unnamed arguments to which operators are applied. Parantheses can be used to override the default precedence of operators in expressions.
expression ::= unnamed_argument operator unnamed_argument |
| '(' expression ')'
Example:
1+2*3-4
Operators
The native Karajan syntax supports basic arithmetic, logic operators, and an assignment operator:
operator ::= '*' | '/' | '%' | '+' | '-' | '<=' | '>=' | '<' | '>' |
| '==' | '!=' | '&' | '|' | ':='
The following lists enumerates operators in the order of precedence, starting with the highest precedence. While the XML syntax does not support the use of operators, each operator has an equivalent element which can be used in both syntaxes (shown in parantheses).
- Additive operators
-
+(sum) Addition -
-(subtraction) Subtraction
-
- High priority relational operators
-
<=(lessOrEqual) Less or equal -
>=(greaterOrEqual) Greater or equal -
<(lessThan) Strictly less -
>(greaterThan) Strictly greater
-
- Low priority relational operators
- Multiplicative logic operators
-
&(and) Logical AND
-
- Additive logic operators
-
|(or) Logical OR
-
- Assignment operator
-
:=(set) Variable assignment
-
Quoted Lists
A quoted list is a special element that produces a list of identifiers. What is specific about a quoted list is that if its arguments are variables, the variables will not be evaluated. Instead their identifiers will be added to the list. Quoted lists are convenience syntax for expressing a list of formal arguments:
quoted_list ::= '[' arguments ']'
However, quoted lists are not limited to expressing list of arguments. They can also be used to express lists of values. The only thing to remember is that variable evaluation will not take place for immediate arguments of a quoted list.
Example:
list("A quoted list follows", [a, b, c])
Programs
A Karajan program is a list of arguments:
program ::= arguments
There exists an implicit root element that sits at the top of the element tree, and implements certain system functions.
Comments
And finally, Karajan uses C-style comments. Single-line comments begin with two forward slashes and end at the following new line character, while multi-line comments are delimited by ’/*’ and ’*/’:
//This is a comment
print("This is not a comment")
/*This is
also a
comment
*/
The XML Syntax
Karajan also supports XML as its syntax. In the XML syntax, each XML element corresponds to a Karajan element. Arguments can be expressed either through XML attributes or nested elements.
Particularities of Using XML
One of the particular aspects of using XML with Karajan is that when using XML attributes for arguments, it is impossible to make a syntactic distinction between a numeric value and its string representation. In general, Karajan will try to use the context to figure out which one is desired, but there are instances when it is impossible to do so. Therefore, when using the XML syntax, the following elements can be used for the purpose of differentiating between numeric and string values: number and string
<list> <number>1</number> <string>1</string> </list>
The equivalent Karajan construct would be:
list(1, "1")
Karajan can load and interpret arbitrary XML files, provided that definitions exist for the XML elements present in the file, but XML mixed content is not handled properly. The unfortunate aspect is that it is impossible to handle XML mixed content in a generic way. For example, it cannot be known whether whitespace between two XML elements is to be interpreted as content or not, without knowledge of the implementation of an element. Since Karajan is a dynamic language, the implementation of an element is not known statically, at the time the parsing takes place. Therefore, the following rule was adopted: An element will consider textual content content if and only if no nested elements exist. If nested elements exist, textual content will be ignored.
If processed, textual content will be mapped as a string argument. A consequence of the above rule is that textual content and multiple arguments are mutually exclusive.
Lastly, a well-formed XML document must always have a root element. While in the native Karajan syntax, the root element is implicit, in XML the project or karajan elements can be used as root elements.
Parameters and Return Values
An element can accept any number of arguments and can generate any number of return values, and that includes an infinite number of arguments and/or return values (at least in theory).
Parameters and Arguments
Arguments are divided into two major types: single value arguments and channels. As their name implies, single value arguments can have only one value. By contrast, channels can be used for any number of values.
Single Value Arguments
Single value arguments can be specified using the named argument form. For example, the print element has a message argument. Thus passing a string as the message argument to print element can be done in the following way:
print(message = "Some string")
Or in XML:
<argumentname = "message" value = "Some string"/> </print>
Single value arguments can be further divided into mandatory and optional arguments.
Channels
Channels can be used to pass multiple arguments to an element. Each channel has a name, except for the default channel. The default channel is similar to the notion of variable arguments in C. Passing arguments on the default channel is done implicitly when arguments are not passed as single value arguments:
list(1, "value")
<list> <number>1</number> <string>value</string> </list>
In the above case, 1 and ”value” are both passed to the list element on the default channel.
Elements define whether they do receive arguments on a specific channel or not. One possibly interesting aspect is that an element that does not process arguments on a channel, will automatically return all values received on that channel. It is therefore possible to use named channels to return values to elements other than the immediate parent. Assuming that foo is an element that does not take any arguments on any channels, the following will produce the same result:
list(1, 2, 3)list(foo(1, 2, 3))
<list> <number>1</number> <number>2</number> <number>3</number> </list> <list> <foo> <number>1</number> <number>2</number> <number>3</number> </foo> </list>
Since foo does not process any arguments, all the arguments it receives on the default channel will be returned to the parent element.
Argument Mapping
It is not always convenient to use the named argument form to pass arguments to an element. Elements in Karajan will automatically map arguments received on the default channel to single value arguments. The mapping is done dynamically, in the order arguments are received. Suppose there is an element foo that takes three arguments, namely one, two and three. The following would then be equivalent:
foo(one = 1, two = 2, three = 3) foo(one = 1, two = 2, 3) foo(one = 1, 2, 3) foo(1, 2, 3) foo(1, 2, three = 3) ...
<foo one = "1" two = "2" three = "3"/> <foo one = "1" two = "2"> <number>3</number> </foo> <foo one = "1"> <number>2</number> <number>3</number> </foo> <foo> <number>1</number> <number>2</number> <number>3</number> </foo> <foo> <number>1</number> <number>2</number> <argumentname="three" value="3"/> </foo> ...
Optional Arguments
The unfortunate side-effect of using automatic mapping of default channel arguments to single value arguments is that elements that would accept both single value arguments and arguments on the default channel (variable arguments) cannot avoid mapping of variable arguments to certain single value arguments unless different semantics are introduced: optional arguments. Optional arguments do not need to be specified. However, if specified, the named form must always be used. An example is the print element, which has an optional argument named nl . It can be set to false to indicate that no new-line character should be appended at the end of the message argument:
false())
or
false())
The following however, is not valid:
false())
Return values
Return values are a mirror image of the arguments concept. Whatever can be accepted as an argument by an element can also be returned by another. Thus, it is possible to define a single element that returns all arguments to any given element. The following example defines an element that returns both a message and the named form of the nl argument, suitable for the print element:
//The following defines an element foo() which takes no //arguments and returns "Message" and nl = false()element(foo, [] "Message", nl =false() )
<elementname="foo" arguments=""> <string>Message</string> <argumentname="nl"> <false/> </argument> </element> <
Argument Evaluation Order
There is no imposed order for evaluating arguments. The order is controlled by each element. Most elements, by default, evaluate their arguments in sequential order. However, it is very easy to override the default order by using elements that use a different execution order. For example, the parallel element evaluates all of its arguments in parallel, returning all the resulting values. Evaluating the arguments to an element in parallel then becomes as easy as surrounding them with a parallel element:
list(parallel( "Value1" "Value2" ) )
<list> <parallel> <string>Value1</string> <string>Value2</string> </parallel> </list>
Furthermore, the way in which an element processes the arguments is also left to each element. For example, an element can choose to start executing after the evaluation of all arguments has been completed, or process arguments as they arrive. In other words, and in the most general case, arguments are both generated and processed asynchronously.
A concrete example is the print element, which simply returns the message argument on the stdout channel. When Karajan starts execution, an implicit root element is created that receives arguments on the stdout channel, and prints them to the console. Since the processing is done asynchronously, the appearance of print doing the actual work when executed is achieved. The advantage of such a mechanism is that, provided that an element does not produce any side-effects, its execution becomes equivalent to the totality of values returned (both single values, and channels).
Variables and Scope
We will not insult the reader’s intelligence by explaining what variables are. There is no explicit declaration of variables in Karajan. A variable is defined when it is assigned the first time.
The scope of a variable extends to the element that it was defined in, and is pseudo-lexical. By pseudo-lexical it is meant that internally, the scoping is dynamic, but provisions are made to make it impossible to access variables outside the lexical scope. Therefore, Karajan does not support closures.
On a lexical level, it is possible to read the value of a variable defined in a parent element, but setting the value of the same variable will create a new scope. In other words, Karajan uses deep access and shallow binding. The following example should make things clearer:
... v := 1 //$v$ is $1$ on stack frame $n$list( //A new stack frame is created: $n+1$ v //$v$ refers to the variable on frame $n$ v := 2 //A new binding is made for $v$ on frame $n+1$ //the new binding shadows the one from frame $n$ v //$v$ now refers to the binding on frame $n+1$ ) //The returned list contains the values $1$ and $2$
... <setname="v" value="1"/> <list> <variable>v</variable> <setname="v" value="2"/>
<variable>v</variable>
</list>
<variable>v</variable> </print> ...
In the above example, it would be impossible for the definition of list to access variable v, since the body of the definition of list does not fall within the lexical scope of the definition of v.
The reason for this kind of scoping is to reduce the ambiguity that could be introduced by not knowing the order in which child elements are executed. Please note that such ambiguity is not completely eliminated. The order in which the arguments to list are evaluated does matter, and can change the resulting list. However, the scope of the ambiguity is the same as the scope of the ambiguity in the order of evaluation of the elements. If set, list, and print are evaluated in sequence, no change in the way list evaluates its arguments can change the outcome of the execution of print(v). 1
Global Variables
Global variables are provided for conveniently defining settings that have a global scope. Please note that in the future, global variables will be single-assignment, thus approaching more the notion of constants.
global(foo, "Foo")element(boo, []
Variable Expansion
Karajan offers convenient variable expansion constructs. All pairs of curly brackets inside strings are replaced by the value of the variable with the name of the identifier inside the brackets. If no such variable exists, the element trying to access the string will fail. If the ’{’ literal is needed inside a string, it must be used twice. There is no need to escape the closing curly bracket, since it cannot be part of an identifier. If a closing bracket is part of a variable expansion expression, it will mark its end. If not, it will be interpreted as the closing curly bracket literal:
a := 1
<setname="a" value="1"/> <
Futures
Futures are a mechanism of binding a variable to the results of a future computation. Until the value of the computation to which the future is bound to, the future exists in an unbound state. Any attempt to use the value of an unbound future will cause the execution of the thread that tried to access the future to block until the future becomes bound. Errors occurring before a future is bound in the thread evaluating the future will cause the error to be reported when the future is accessed. Errors occurring in the thread evaluating the future after the future is bound will not be visible.
In Karajan there are two types of futures:
- single value futures
- are used to hold a single value of a future computation. They are defined using the
futureelement.
- future iterators
- can be used to hold multiple values. However, not all the values need to be generated before the future iterator can be used. Iterating over a future iterator will cause the iteration to use as many values as are available, then block waiting for more values to be added to the future iterator. Future iterators are defined using
futureIterator.
Source Files
As mentioned in Section ??, Karajan understands two syntaxes: the native Karajan syntax and the XML syntax. The distinction between them is made using the file extension. A file with the “.k” extension will be parsed using the native parser, while a file with the “.xml” syntax will be parsed using an XML parser. [1]
Libraries
Libraries are collections of elements grouped by the functionality they provide. A library is defined in a source file. Its functionality can be reused in other source files by using the import (equivalent to include) element:
import("sys.k")
It is possible to include XML libraries from native Karajan files. It is also possible to include native Karajan libraries from XML Karajan files. Consequently, the following are valid:
import("task.xml")
<import file="task.k"/>
Namespaces
Namespaces provide a way of distinguishing between elements with conflicting names in different libraries. Suppose a library “a” defines an element named foo, and a library “b” also defines an element named foo. Also, suppose that both libraries are included in a certain file. Namespaces make it possible to access both instances of the foo definition, without ambiguity, by prefixing the name with the namespace prefix in which the element was defined: a:foo and b:foo. Any reference to foo without a prefix will result in an error. Nonetheless, if only one of the libraries is used, the use of foo without a prefix will be allowed. Namespaces are defined using namespace.
Notes
1. The current implementation will first translate the native syntax to XML, then parse the resulting XML file
Library Reference
This section lists the current libraries available in Karajan.
Notation Conventions
The following conventions are used:
Normal arguments are listed in italics: argument
Optional arguments are listed in italics and with an asterisk in front: *optional
The default channel is symbolized by an ellipsis: ...
For channels the bold-italics font variant is used: channel
Libraries
- Kernel Library
- System Library
- Task Library (Grid elements)
- Java Library (Access to Java classes)
- HTML Library (Generate HTML source)
- Forms Library (Generate GUI forms)
- Restart Log Library (Condor DAGMan style fault tolerance)
Kernel Library
The Karajan kernel contains a minimum set of elements that are required in order to get the rest of the system running. All kernel elements are automatically available in any program.
Kernel Constants
kernel:true
true
Represents the true boolean value
kernel:false
false
Represents the false boolean value
kernel:cmdline:arguments
cmdline:arguments
Holds a list with the command line arguments (if any)
kernel:user.home
user.home
Contains the path to the user home directory
kernel:user.name
user.name
Contains the current user's name
Kernel Elements
kernel:project
kernel:project(stdout)
karajan
The root element of a Karajan program. Accepts arguments on the stdout channel and prints them immediately on the console.
kernel:import
kernel:import(file, ...)
include
Executes the file specified by the file argument. All arguments received on the default channel are considered to be element definitions (created with export), which import binds to the parent environment, such that after import completes execution, the definitions will be available for use.
foo(
import("file.k")
) //scope of foo() ends
Import uses the library search path specified in etc/karajan.properties, which defaults to searching the current directory first, then the Java class path. If no file with the given name is found in the library search path, import will fail.
kernel:export
kernel:export(name, value)
Returns the pair (name, value). The value argument should be a lambda, which can be bound by import.
export(foo,element([x],
It is also possible for export to be used without arguments, but with a set of immediately enclosed definitions. In this case it would take all the definitions and export them. This makes it easier to have old code somewhat cleanly converted:
export(element(x, [],element(y, [],
kernel:define
kernel:define(name, value)
Binds a lambda, specified by the value argument to the given name.
kernel:namespace
kernel:namespace(prefix)
Allows the specification of a namespace prefix. Any elements defined in the scope of namespace will automatically have the prefix indicated by the prefix argument, unless another namespace is nested.
kernel:elementdef
kernel:elementdef(type, classname)
Used by the current implementation to map element names to Java implementation classes.
kernel:named
kernel:named(value, *name)
Used internally by the named argument form, but can be used by the user equally well. The following are equivalent:
false())false())
However, the fact that the *name argument is optional makes it impossible to completely avoid the named form.
If the {{oarg|name} argument is not present, it simply returns the value of the value argument on the default channel.
kernel:number
kernel:number(value)
Can be used with the XML syntax to represent a number. There is no distinction between integral numbers and floating point numbers in Karajan.
kernel:string
kernel:string(value)
Can be used with the XML syntax to represent a string value.
kernel:variable
kernel:variable(name)
Used to represent the value of a variable.
Example:
<setname="n" value="5"/> <list> <number>10</number> <string>10</string> <variable>n</variable> </list>
will return the list [10, “10”, 5].
kernel:quotedlist
kernel:quotedlist(...)
Used internally to represent a quoted list. The following two lines of code will produce the same result:
[a, b, c] quotedlist(a, b, c)
kernel:cache
kernel:cache()
Caches the evaluation of the arguments. Upon subsequent executions of cache, the arguments will not be re-evaluated. Instead, the cached values will be returned. Cache does not make any checks for the invariance of the evaluation of the arguments.
System Library
Files: sys.k, sys.xml
The system library contains general purpose elements that help implement the most common tasks in Karajan.
System Constants
N/A
System Elements
Flow Control Elements
sys:sequential
sys:sequential()
Executes all arguments in sequence.
sys:parallel
sys:parallel()
Executes all arguments in parallel.
sys:unsynchronized
sys:unsynchronized()
Asynchronously executes all arguments in sequence. Does not return any value. If return values from asynchronous computations is required, use either future or futureIterator
sys:choice
sys:choice()
Executes arguments in succession. Terminates when one of the arguments completes successfully. If an argument fails, the next argument will have access to the following variables:
- element
- Contains a reference to the element that caused the initial failure.
- error
- A textual message detailing the error that occurred
- trace
- A textual representation of the Karajan stack trace.
- exception
- Available if a Java exception caused the failure.
If no argument completes successfully choice will fail with the last failure encountered.
Choice exhibits a transactional behavior when it comes to return values. All single values and all channels are buffered until one argument completes successfully. If that happens then choice returns the buffered values. If an argument fails, all the buffered values produced by that argument are discarded.
See also: catch
sys:catch
sys:catch(match)
Catch will match the error variable against the regular expression in match. If successful, it will execute the rest of its arguments, otherwise it will fail with the last failure encountered. Catch can be used with choice to selectively handle specific failures:
choice( ... catch(".*File not found.*"
sys:guard
sys:guard()
Guard expects two sub-elements. It will execute the first, then the second, even if the first one fails. In other words, the second sub-element will always be executed. If the first element failed, after executing the second element, guard will also fail. If the second element fails, guard will fail with the same error, regardless of whether the first element failed or not. Guard can be used to implement clean-up actions, in a fashion similar to try/finally from C++, Java or Python:
set(myhost, "sunny.mcs.anl.gov")transfer(srcfile="a", desthost=myhost) guard(sequential(execute(executable="/usr/bin/hammer", arguments="a", host=myhost, provider="gt2") )sequential( //always clean upfile:remove(name="a", host=myhost, provider="gridftp") ) )
sys:race
sys:race()
parallelChoice
Can be used to race a number of arguments. Race will execute all its arguments in parallel, buffer their return values, and wait for the first one that completes. It will then return all the values that the winner generated. The remaning threads are quietly aborted. If an any argument fails before any other argument completes, then race will fail. That is, failures occurring after the winner completes are silently ignored.
Race is similar in behavior to the discriminator workflow pattern.
sys:for
sys:for(name, in)
Can be used to iterate sequentially across a range of values. The name argument is an identifier that indicates the name of the variable that will be set to the successive values of the in argument, which Karajan will try to convert to an iterator before beginning the iteration process.
After evaluating name and in, for will proceed and evaluate the rest of the arguments repeatedly, while setting the variable indicated by the name argument to each value produced by the iterator (in).
Example:
equals(list( for(i,range(1, 5), i) )list(1, 2, 3, 4, 5) )
will return true
sys:parallelFor
sys:parallelFor(name, in)
Will behave in a similar way to for with the exception that iterations will occur in parallel. Each iteration will occur in a separate scope. Therefore variables set in one of the iterations will not be visible in the others (which is also the case with for). Each scope will have the variable indicated by the name argument set to one of the values obtained from the in argument.
sys:while
sys:while(condition)
Will repeatedly execute its arguments in sequence until a value of false is received on the condition channel. A convenience element that returns a boolean argument on the condition channel is condition (or ?). While will check for a condition every time an argument completes. It is therefore possible to exit the loop after the termination of any of the arguments.
Example:
list( while( 1, 2, 3,?(false) ) )list( while( 1,?(false), 2, 3 ) )
list( while(?(false), 1, 2, 3 ) )list( while(sequential(?(false) 0 /* zero will make it to the list * because while will only do the check * aftersequential() completes */ ) 1, 2, 3 ) )
will return the following lists:
[1,2,3] [1] [] [0]
Or, in XML:
<list> <while> <number>1</number> <number>2</number> <number>3</number> <condition> <false/> </condition> </while> </list> <list> <while> <number>1</number> <condition> <false/> </condition> <number>2</number> <number>3</number> </while> </list> <list> <while> <condition> <false/> </condition> <number>1</number> <number>2</number> <number>3</number> </while> </list> <list> <while> <sequential> <condition> <false/> </condition> <number>0</number> </sequential> <number>1</number> <number>2</number> <number>3</number> </while> </list>
sys:condition
sys:condition(value)
?
Evaluates the value argument and returns its value on the condition channel.
sys:break
sys:break()
Can be used to break out of a while loop. By contrast with using the condition channel, break will immediately exit the loop, no matter how deep the nesting level. It does that by generating a failure, which is intercepted by the enclosing while.
sys:continue
sys:continue()
Can be used to skip the evaluation of the remaining arguments in an iteration in a while loop and jump to the next iteration. Similar to break, continue achieves its purpose by generating a failure which is intercepted by while.
sys:if
sys:if()
Executes its elements using the following scheme:
- start with k = 0
- evaluate element 2 * k;
- if element 2 * k is the last element, then return its return values and complete
- if no value is element by argument 2 * k, fail
- if value returned by element 2*k is
truethen evaluates element 2*k+1 returning its return values, and completes - if value returned by element 2 * k is
falsethen continue with k = k + 1
Informally:
if( <condition> <then> [<condition2> <then2> [<condition3> <then3> ...]] [<else>] )
<condition>, <then>, and <else> can be any elements. However, each <condition>, <then>, and <else> must be only one element (possibly with multiple child elements). For convenience and clarity then and else can be used.
sys:then
sys:then()
else
Then and else are the same as sequential, but can be used to make constructs using if more intuitive:
if( a==1 then(==2 then(
<if> <equals> <number>1</number> <variable>a</variable> </equals> <then> <number>2</number> <variable>a</variable> </equals> <then> <
sys:exclusive
sys:exclusive()
Defines a mutual exclusion block. It guarantees that at any give time, within a given execution, only one instance of this (lexically) exclusive element is executing.
Elements Dealing with Variables and Arguments
sys:set
sys:set(names, ...)
Sets a variable or more to a value (or more). Set tries to interpret the first argument as an identifier or a list of identifiers. If the first argument is an identifier, it is treated as being a list with one identifier. Set expects the number of arguments on the default channel to be the same as the number of identifiers. It is important that a quoted list be used to specify the list of identifiers in order to avoid evaluating the variables that the identifiers represent:
set(a, 1) set([a], 1) set([a, b, c], 1, 2, 3)
Differences in XML:
The XML variant of the set element uses a slightly different set of arguments. If a single variable is assigned, the name argument can be used. If multiple variables are assigned, the names argument must be used. As opposed to the native syntax, the names argument can be a string of comma separated identifiers which will be tokenized by set
<set name="a" value="1"/> <set names="a, b, c"> <number>1</number> <number>2</number> <number>3</number> </set>
sys:default
sys:default(name, value)
Assigns a value to a variable if no binding of that variable can be accessed within the current scope. If an accessible variable with the indicated name is already defined, then the assignment does not take place:
default(a, 1)
//a is assigned the value 1
set(b, 2)
default(b, 3)
//b is not assigned the value of 3 because
//b is already accessible within the current scope
sys:maybe
sys:maybe()
Evaluates its arguments. If the evaluation completes successfully, it returns all the arguments. If the evaluation fails at any point, maybe completes without returning anything. In particular, maybe can be used in extending existing elements with optional arguments:
element(one, [a, b,optional(c, d)]element(two, [a, b,optional(c, d)] one(a = a, b = b, maybe(c = c), maybe(d = d)) )
sys:global
sys:global(name, value)
Sets a global variable. A global variable is a variable that has a global scope, and thus can be accessed from anywhere within the program. While the use of global variables is discouraged, it can prove useful to create constant-like definitions.
Differences in XML:
The XML variant of global uses the same arguments as the XML variant of the set element.
sys:...
sys:...()
vargs
Deprecated. Use each
Can be used inside an element definition to return all arguments received on the default channel.
Differences in XML:
In the XML syntax the vargs alias must be used, because “...” is not a valid XML element name.
channel:to
channel:to(name, ...)
Returns all arguments received on the default channel on the specified channel.
channel:from
channel:from(name, <varies>)
Returns all arguments received on the specified channel on the default channel.
channel:close
channel:close(name)
Closes a channel. All iterations that are active on that channel and waiting for values will complete.
channel:fork
channel:fork(name, count)
Splits a channel into a number of identical channels and returns these channels. All values written to the initial channel will be available in all the forked channels. Reading from the original channel will cause inconsistencies and should not be done. When the original channel is closed, all the forked channels will also be closed.
sys:isDefined
sys:isDefined(name)
Determines whether a variable is accessible within the current scope. Returns true if it is; false otherwise.
sys:quoted
sys:quoted(name)
Returns an identifier without evaluating the variable it might point to.
sys:discard
sys:discard(...)
Sometimes only the side-effect of an element is needed, while ignoring the return values of the element. Discard will evaluate its arguments, but avoid returning anything on the default channel.
sys:future
sys:future(...)
Evaluates the arguments asynchronously. Returns a future representing the first return value generated by the arguments. All other arguments received on the default channel are ignored.
sys:futureIterator
sys:futureIterator(...)
Evaluates its arguments asynchronously. Returns a future iterator representing all values received on the default channel. The particular aspect of a future iterator is that it can only be iterated over once. Every time a value is used from a future iterator, that value is removed. If access to the iterator values is needed more than once, a list can safely be created with the values. However, the process of creating the list will force synchronization with the thread that produced the values since the iterator is only closed when that thread completes.
sys:each
sys:each(items)
Returns all elements in items as separate values. It is roughly equivalent to the following:
for(i, items, i)
Element Definition Elements
sys:element
sys:element(name, arguments)
Allows the definition of an element. Evaluates the name and arguments arguments. It expects the name argument to be an identifier, and arguments to be a list of identifiers. The scope of the definition is the same as the scope of a variable that could be defined instead of the element. The arguments are a list of mandatory arguments. Optional arguments can also be specified using the optional element. If the element accepts arguments on the default channel, the ... identifier can be used in the argument list. Other channels can be specified using the channel element. The rest of the arguments are not evaluated when the definition takes place, but will be evaluated whenever the element is invoked.
The following example defines an element foo, which takes no arguments, and prints ’foo’ on the console:
element(foo, []
print("foo")
)
foo()
In the following example, foo takes two arguments and prints them both on the screen:
element(foo, [one, two]
Arguments on the default channel can be accessed using the ... identifier:
element(foo, [one, ...]for(i, ...
Other channels can be used in a similar way:
element(foo, [one, ...,channel(channelOne)]for(i, ...,for(i, channelOne,to(channelOne, 5, 6, 7, 8))
Optional arguments can be assigned a default value using default:
element(foo, [one,optional(two)]default(two, 2)
Element can also be used to define anonymous elements. If the first argument evaluates to a list of identifiers instead of an identifier, element considers that an anonymous element was instead desired, defines the element, and returns the definition, which can later be used through executeElement:
set(foo, element([]executeElement(foo)
Each definition of an element keeps a reference to the environment that was used at the time of the definition. When evaluated, elements in the body of the definition will be resolved by first searching in the local scope (eventually for elements defined by the execution of the body of this element) and, if not found, in the environment that was used at the time of the definition. In the following example, the result will be the printing of the string "a":
element(foo, [] element(a, [],set(b, foo()) element(a, [],executeElement(b)
This behaviour is particularly important when import and export are used.
Differences in XML:
The arguments list is a string of comma separated identifiers.
The XML version of element uses a different set of arguments. If an element accepts arguments on the default channel, the vargs="true" attribute must be used. The arguments received on the default channel will then be available in the body of the definition through the vargs identifier.
Optional arguments are indicated using the optargs attribute. The value must be a comma separated list of identifiers.
In a similar way, the channels are specified using a comma separated list of identifiers and the channels attribute.
<element name = "foo" arguments = "one" vargs = "true" channels = "channelOne"> <forname = "i" in = "{vargs}"> <forname = "i" in = "{channelOne}"> <number>1</number> <number>2</number> <number>3</number> <number>4</number> <toname="channelOne"> <number>5</number> <number>6</number> <number>7</number> <number>8</number> </to> </foo> <element name ="foo" arguments = "one" optargs = "two"> <defaultname = "two" value = "2"/> <
sys:parallelElement
sys:parallelElement(name, arguments)
Like element, parallelElement also defines an element. However, elements defined using parallelElement will evaluate their arguments in parallel with their bodies. All single value arguments will automatically be futures, and all channels will automatically be future iterators. ParallelElement can be used to define elements that process their arguments asynchronously.
parallelElement(consumer, [...]
for(i, ..., print("Received {i}")
)
element(producer, []
for(i, range(0, 100)
i
print("Sent {i}")
wait(delay = 100)
)
)
consumer(producer())
Differences in XML:
The differences for the XML syntax from element also apply to parallelElement
<parallelElement name = "consumer" vargs = "true"> <forname = "i" in = "{vargs}"> <elementname = "producer"> <forname = "i"> <rangefrom = "0" to = "100"/> <variable>i</variable> <waitdelay = "100"/> </for> </element> <consumer> <producer/> </consumer>
sys:channel
sys:channel(...)
Used to specify channel arguments for an element. Quotes all arguments, such that identifiers are not evaluated. Returns values that can be interpreted by element and parallelElement as representing channels.
sys:optional
sys:optional(...)
Allows the specification of optional arguments for element definitions. Quotes all arguments and returns values that can be interpreted by element and parallelElement as representing optional arguments.
sys:self
sys:self()
Self can be used to build recursive anonymous elements:
set(felement([x]if(x == 0 1 x*self(x - 1)) ) )executeElement(f, 6))
Service Interaction Elements
sys:remote
sys:remote(host)
Uses the Karajan:Service that runs on the specified host to evaluate the rest of the arguments. The element supports forwarding of the variable environment to the remote service
[1], and returning values from the service. Therefore the following code will work as expected:
set(a, 1)set(b remote("https://somehost:1984", a+1) )
A consequence is that if print is used remotely, the actual output is going to be automatically and transparently forwarded (since it is merely a return value on a particular channel) to the execution instance that initiated the remote execution.
It is also possible to use this feature recursively, such that a remote code snippet in turn delegates part of the execution to other services.
Please note that the code inside remote might reference entities that are sensitive to the actual location of the execution. In particular, the meaning of “localhost” if used with execute or transfer will not refer to the machine where the execution originated.
Element resolution is performed separately when a script is submitted to a service. In other words, import-ed files on the client side will be re-imported on the service side, if and only if they are system libraries. If imports refer to user-defined libraries not part of the system libraries, the code defining the libraries will be forwarded to the service, thus enabling the use of client-side user-defined elements on the remote side. Similarly, in-line user defined elements are also available for use on the remote side:
element(foo, [],
List Manipulation Elements
list:list
list:list(*items, ...)
Constructs a list from values received on the default channel.
Alternatively the *items argument can be used to specify a a string with comma separated list of items. This may be particularly convenient with the XML syntax.
list:append
list:append(list, *items, ...)
Appends all values received on the default channel to the list indicated by the list argument. Does not return anything.
Alternatively, the *items argument could be used with a string of comma separated items.
list:prepend
list:prepend(list, ...)
Works like append with the exception that values are added to the beginning of the list. The order of the values in the list will be the reverse of the order in which they are received by prepend:
set(l,list(4, 5, 6)) prepend(l, 1, 2, 3)
will print [3, 2, 1, 4, 5, 6]
list:join
list:join(...)
Concatenates all lists received on the default channel and returns the resulting list.
list:size
list:size(list)
Returns the size of the list indicated by the list argument.
list:first
list:first(list)
Returns the first element in a list.
list:last
list:last(list)
Returns the last element in a list.
list:get
list:get(list, index)
Returns the index-th element in the list. The first element has an index of 1.
list:butFirst
list:butFirst(list)
Returns a list composed of all but the first element in the specified list.
list:butLast
list:butLast(list)
Returns a list containing all elements but the last from the specified list.
list:isEmpty
list:isEmpty(list)
Tests whether a list is empty. Returns true if the list is empty, and false otherwise.
Map Elements
map:map
map:map(...)
Returns a map with the entries received on the default channel. See entry.
map:entry
map:entry(key, value)
Allows the specification of an entry that can be used with map to construct a map.
map:put
map:put(map, ...)
Adds the entries received on the default channel to the specified map. Existing entries with the same key are replaced. Does not return any value.
map:delete
map:delete(map, key)
Deletes the entry with the specified key from a map. Does not return a value.
map:get
map:get(map, key)
Returns the value corresponding to the specified key from a map.
map:size
map:size(map)
Returns the size of the map (the number of entries in the map).
map:contains
map:contains(map, key)
Tests whether the map contains an entry with the specified key.
Logic Elements
Logic elements do not at this time use shortcut evaluation. They always evaluate all their arguments.
sys:and
sys:and(...)
&
Returns the boolean and value of the arguments received on the default channel.
sys:or
sys:or(...)
|
Returns the boolean or value of the arguments received on the default channel.
sys:not
sys:not(value)
Returns the boolean negation of the value in the value argument.
sys:equals
sys:equals(value1, value2)
==
Tests for the equality of two values. Makes a deep comparison of the arguments.
sys:true
sys:true()
Returns true.
sys:false
sys:false()
Returns false.
Numeric Elements
math:sum
math:sum(...)
+
Returns the sum of all the values received on the default channel.
math:product
math:product(...)
*
Returns the product of all the values received on the default channel.
math:subtraction
math:subtraction(from, value)
-
Returns the difference between the value specified by the from argument and the value indicated by the value argument.
math:quotient
math:quotient(divisor, dividend)
/
Divides divisor by dividend and returns the resulting value.
math:remainder
math:remainder(divisor , dividend)
%
Returns the remainder of the division of divisor and dividend
math:square
math:square(value)
Returns the square of a number.
math:sqrt
math:sqrt(value)
Returns the square root of a number.
math:equalsNumeric
math:equalsNumeric(value1, value2)
Makes a numeric comparison of the values specified by the two arguments. A numeric comparison will try to convert string values to numbers before the comparison is performed. Like equals, equalsNumeric performs a deep comparison.
equalsNumeric(1, "1") //true
equalsNumeric("2", "2.0") //true
equals("2", 2) //false
equalsNumeric([1, 2, "3"], ["1", "2", 3]) //true
<equalsNumeric value1 = "1"> <number>1</number> </equalsNumeric> <equalsNumeric value1 = "2" value2 = "2.0"/> <equals value1 = "2"> <number>2</number> </equals> <equalsNumeric> <listitems = "1, 2, 3"/> <list> <number>1</number> <number>2</number> <string>3</string> </list> </equalsNumeric>
math:greaterThan
math:greaterThan(value1, value2)
>
Returns true if value1 is strictly larger than value2
math:lessThan
math:lessThan(value1, value2)
<
Returns true if value1 is strictly less than value2
math:greaterOrEqual
math:greaterOrEqual(value1, value2)
>=
Returns true if value1 is larger than or equal to value2
math:lessOrEqual
math:lessOrEqual(value1, value2)
<=
Returns true if value2 is less than or equal to value2
math:min
math:min(...)
Returns the minimum of all numeric values on the default channel.
math:max
math:max(...)
Returns the maximum of all numeric values on the default channel.
math:int
math:int(value)
Returns the integer part of the argument, where “integer part” is to be understood in the mathematical sense (also equivalent to the floor function).
math:ln
math:ln(value)
Returns the natural logarithm of the value argument.
math:exp
math:exp(value)
Returns e raised to the power of value.
math:random
math:random()
Returns a pseudo-random number in the interval [0,1) with a uniform distribution [2].
Error Handling Elements
sys:ignoreErrors
sys:ignoreErrors(*match)
Executes its arguments returning any values as they are received. If any of the arguments fails, the failure will be matched against the regular expression in *match if present (otherwise it will be treated as .*). If the match is successful, then the error will be ignored and the next argument will be executed. If the match fails, the error will be propagated.
sys:restartOnError
sys:restartOnError(match, times)
Evaluates its arguments in sequence. If a failure matching the regular expression in match occurs, all arguments will be re-evaluated for a maximum of times indicated by the times argument. Match can either be a string or a list of strings.
sys:generateError
sys:generateError(error, *exception)
Allows the generation of an error. The message of the error is taken from the error argument. *Exception is used in the current implementation to pass a Java exception to be attached to the error.
sys:onError
sys:onError(match)
OnError allows the definition of a custom error handler. Handlers defined with onError are valid for anything executed within the scope of the parent of onError. Multiple handlers can be defined within the same scope.
Whenever an error occurs within the scope of an error handler, the error will be matched against error handlers starting with the inner-most handlers and ending with the outer-most handlers. If a handler matches, it is invoked. When a handler is invoked it will evaluate all its arguments except for match in sequence. The execution takes place in the context of the failing element. The following variables are defined automatically to be used by the body of the error handler:
- element
- The element that caused the error. If the error is corrected by the handler, the execution of the element can be re-started using
executeElement. - error
- The message of the error that occurred.
- trace
- A textual representation of the Karajan stack trace.
- exception
- In the current implementation exception can either contain a Java exception or the message “No exception available”.
Error handlers are not re-entrant. If an unhandled error occurs within the body of the handler, the handler will fail immediately.
String Elements
str:concat
str:concat(...)
Concatenates all arguments received on ... and returns the resulting string value.
str:split
str:split(string, separator)
Returns a list obtained by splitting string in tokens separated by separator. The separator will not be part of the tokens.
str:strip
str:strip(value)
Strips all leading and trailing whitespace characters from value.
str:matches
str:matches(string, regexp)
Returns true if string matches the regular expression specified by regexp, and false otherwise.
str:nl
str:nl()
Returns the new-line separator.
str:chr
str:chr(code)
Returns the character whose code is represented by code.
str:substring
str:substring(string, from, *to)
Returns the substring of string that starts at from (inclusive) and ends at *to (exclusive). If *to is missing, the entire remainder of string is returned.
str:quote
str:quote(string)
Returns the specified string surrounded by double quotes
Example:
quote("abcd"))
Will print:
abcd "abcd"
Miscellaneous Elements
sys:print
sys:print(message, *nl)
Returns the value in the message argument on the stdout channel. If the *nl argument is not present or set to true, it appends a new line character to the message argument before returning it. The project (root element) automatically prints all argument received on the stdout channel on the console.
sys:echo
sys:echo(message, *nl, *stream)
Immediately prints the value in the message argument to the console. If the *nl argument is not present or set to true, it also prints a new line character. If the *stream argument is present, echo instead tries to print the message on the specified output stream.
The difference between print and echo is that print does not rely on a side-effect to print values. Therefore the evaluation of print cannot be distinguished from returning the value generated by print.
It is recommended that print be used instead of echo if possible.
sys:checkpoint
sys:checkpoint(*file, *automatic, *interval, *timestamped, *now)
Allows the configuration of checkpointing. If the *now arguments is present and set to true, creates a checkpoint of the state at the time of the evaluation of checkpoint, and writes it to the file indicated by the *file argument.
The *automatic argument, if set to true, indicates that automatic checkpoints should be created at the interval specified by the *interval argument (in seconds). If the *timestamped argument is also present and set to true, the file names in which the checkpoint is saved will have a date and time appended in the YYYYMMDDhhmm format.
sys:wait
sys:wait(*delay, *until)
When evaluated waits the number of milliseconds specified by the *delay argument or until the date in the *until argument before completing.
sys:time
sys:time()
Evaluates all arguments and returns the total time elapsed, in milliseconds.
set(t time(execute(executable="/bin/wait", arguments="10", ...) ) )
sys:file:execute
sys:file:execute(file)
executeFile
Parses and executes a file. The difference between import and file:execute is that file:execute always parses and executes the file when evaluated, unlike import which only parses the file once. File:execute can therefore be used to execute files which change over time, or to execute different files based on a certain context.
sys:executeElement
sys:executeElement(element, args, ...)
Executes an element optionally passing the single value arguments indicated by the args argument, and the ... on the default channel. Args must be a map in which the keys are argument names and the values are argument values. It is also possible to pass named arguments directly using the named form. However, this does not allow passing of arguments named element or args.
If args is not present, ... will be mapped to named arguments according to the rules in Argument Mapping
sys:elementList
sys:elementList()
Returns a list containing the arguments to elementList but in non-evaluated form. The elements in the resulting list can then be evaluated using executeElement
sys:cacheOn
sys:cacheOn(value)
Caches all return values of the rest of its arguments based on the value of the value argument. Subsequent evaluations of this element in which the value argument will have the same value will not re-evaluate the rest of the arguments, but return the cached values instead. The cache is bound to this static instance of the cacheOn element. In other words, if another cacheOn element exists, it will not use the values cached by this element, irrespective of the value of the value argument.
Caching is not guaranteed. It is a mechanism that could help improve performance, but it should not be relied on to guarantee that certain elements are only evaluated once (see once). Also, elements that rely on side-effects to perform their function will not be able to perform those functions if their cached valued is used. Echo will, for example, not do anything if cached. However, print will, because it does not rely on a side-effect to print values to the console.
sys:once
sys:once(value)
Guarantees that all its arguments besides value are only executed once for a given value of the value argument. Subsequent invocations of once will return all values generated by the first execution.
sys:numberFormat
sys:numberFormat(pattern, value)
Allows the formatting of a decimal number. The pattern argument indicates the pattern to be used for formatting (as used by the java.text.DecimalFormat class). The value argument holds the decimal value that is to be formatted.
In short, the following characters can be used in patterns:
- #
- Digit; zero not shown
- 0
- Digit; zero is shown
- .
- Decimal separator
- ,
- Grouping separator
- E
- Scientific notation separator
See also: http://java.sun.com/j2se/1.4.2/docs/api/java/text/DecimalFormat.html
sys:file:contains
sys:file:contains(file, value)
contains
File:contains determines whether a file contains a specific sequence of characters. The file argument points to the file to be checked, while the value argument specifies the value to be searched.
sys:uid
sys:uid(*prefix, *suffix)
Returns a string with a unique ID. The *prefix and *suffix arguments can be used to specify a prefix and a suffix respectively. In the current implementation, the uniqueness of the returned string is relative to the instance of the interpreter.
sys:file:read
sys:file:read(name)
readFile
File:read reads the contents of a file, pointed to by the name argument. This is intended for short text files that may possibly hold things like error messages or exit codes. The file is completely read into memory; therefore this element would not be suitable for manipulation of large files.
sys:file:write
sys:file:write(name, *append, ...)
File:write writes all arguments received on the default channel to the file with the given name. The file is truncated first, unless the *append argument is true. When the element terminates, either successfully or not, the file is closed.
sys:outputStream
sys:outputStream(type, *file)
Returns an output stream which can be used for writing values to. The type argument can be one of “stdout”, “stderr”, or “file”. If the type is “file”, the *file argument must be present and indicate a valid file name.
sys:closeStream
sys:closeStream(stream)
Closes a stream opened with outputStream.
sys:sort
sys:sort(*descending, ...)
Returns all arguments in sorted order. The values are sorted in ascending order, unless the *descending argument is set to true.
sys:dot
sys:dot(...)
Returns the "dot product" of all the arguments, which are expected to be some form of vectors (lists or channels). Dot returns the result asynchronously if any its arguments is a channel that is not closed. The returned values are lists with a value extracted from each of the vectors.
sys:cross
sys:cross(...)
Returns the "cross product" of its vector arguments. Each value returned is a list with a value from each vector. Cross does not work asynchronously.
sys:stats
sys:stats(*asmap)
If *asmap is false then it returns a string summarizing information such as the amount of used and free memory, the number of CPUs, and the current number of active threads. If the *asmap argument is set to true, it returns the information as a map with the following keys:
- memused
- heapsize
- heapmax
- cpucount
- threadcount
sys:filter
sys:filter(*regexp, *invert, ...)
Filters arguments based on a regular expression, specified by *regexp. If *invert is true, matches will be inverted (values that do not match are returned).
If only one argument is received on the default channel, and that argument is a list, a list is returned with the values of the argument filtered.
If more than one argument is received on the default channel, each argument will be matched against *regexp.
sys:info
sys:info(*prefix, *name)
Prints information (such as name and arguments) about elements. If the *prefix argument is present, it will print information about all elements currently defined within that namespace prefix. If the *name arguments is present it will only print information about the element with the given name. If none of the two possible arguments are present, info will print information about all currently defined elements (within the scope of info).
sys:property
sys:property(name, value)
Returns a property on the properties channel. It can be used with elements that accept arguments on the
properties channel, such as scheduler
sys:range
sys:range(from, to, *step)
Returns an iterator over a range starting at from and ending at to (inclusive) with an optional *step.
sys:unique
sys:unique(...)
Returns arguments received on ... in order while ensuring that any given value will only be returned once.
Example:
unique(1, 2, 3, 4, 2, 5, 3, 6)
will return 1, 2, 3, 4, 5, 6
Notes
1. At the time of this writing, futures are not properly forwarded if unbound at the time of the remote invocation, possibly causing errors or the remote execution to hang indefinitely. 2. The uniform distribution relies in the current implementation on the properties of the JavaMath.random() function
Task Library
Files: task.k, task.xml
The task library interfaces with the Java CoG Kit abstraction classes, allowing the use of services for job submission and file operations. The tasks in this library can function in two modes: scheduled or unscheduled. When scheduled, remote tasks are not executed directly. They are rather passed to a scheduler which can handle issues such as throttling, resource allocation, and task-to-resource mapping.
Task Elements
task:scheduler
task:scheduler(type, resources, handlers, properties, taskTransformers)
Defines a scheduler to be used. A scheduler in Karajan has the role of managing resources and assigning abstract tasks (such as the ones produced by execute and transfer) to concrete resources. More details about the role of Karajan schedulers can be found in Karajan:The role of schedulers.
The type describes the particular type of scheduler that is desired. The resources that can be used by the scheduler are passed in the resources argument and can be defined using resources. Each scheduler will also require a set of task handlers, specified with the help of the handler element. Each type of scheduler may support an optional set of properties. The properties channel and the property element can be used to specify any number of properties.
The taskTransformers channel and the taskTransformer element can be used to specify task transformers. A task transformer allows changes to be done to a task after it has been bound to a physical resource by the scheduler. For example, it could be used to set the initial directory of a job to a value that depends on the host on which the job is executed.
For a list of currently available schedulers and their supported properties take a look at Available Schedulers
The scope of scheduler is similar to a deeply accessible variable.
The following example shows a typical scheduler definition:
import("sys.k")import("task.k") scheduler("default"resources(host("host1", cpus = 256service("execution", provider = "gt2", jobManager = "PBS", uri = "host1.example.net:2119")service("file", provider = "gsiftp", uri = "host1.example.net:2911") )host("host2", cpus = 2service("execution", provider = "gt2", uri = "host2.example.net:2119")service("file", provider = "gsiftp", uri = "host2.example.net:2911") ) )handler("execution", "gt2")handler("execution", "gt4")handler("file", "gsiftp")handler("file-transfer", "ssh")property("jobsPerCpu", "1") )
Or in XML:
<importfile="sys.xml"/> <importfile="task.xml"/> <scheduler type = "default"> <resources> <hostname="host1" cpus="256"> <servicetype="execution" provider="gt2" jobManager="PBS" uri = "host1.example.net:2119"/> <servicetype="file" provider="gsiftp" uri="host1.example.net:2911"/> </host> <hostname="host2" cpus="2"> <servicetype="execution" provider="gt2" uri="host2.example.net:2119"/> <servicetype="file" provider="gsiftp" uri="host2.example.net:2911"/> </host> </resources> <handlertype="execution" provider="gt2"/> <handlertype="execution" provider="gt4"/> <handlertype="file" provider="gsiftp"/> <handlertype="file-transfer" provider="ssh"/> <propertyname="jobsPerCpu" value="1"/> </scheduler>
task:handler
task:handler(type, provider)
A handler specifies a Java CoG Kit Abstraction handler. A handler is used to submit tasks. Type indicates the type of handler. They type is string and can have one of the following values: “execution”, “file”, and “file-transfer”.
Execution handlers are used for submitting jobs. File handlers are used for file operations (such as renaming, deleting, and listing of files). File transfer handlers are used only for transferring files. It is possible to transfer files using file handlers, but it is not possible to delete a file using a file transfer handler.
The provider argument indicates the provider to be used for the handler. For a list of currently supported providers please see the abstractions guide.
task:taskTransformer
task:taskTransformer(classname)
Returns a task transformer implementation on the taskTransformers channel. Currently task transformers can only be defined as Java classes implementing the org.globus.cog.karajan.scheduler.TaskTransformer interface.
task:resources
task:resources(...)
Encapsulates a set of hosts which can be specified using host.
task:host
task:host(name , *cpus , ...)
Returns a host definition. The name argument indicates the host name or IP address. The number of CPUs of the host can be specified using the *cpus argument. A set of services can also be specified on the default channel.
task:service
task:service(type, provider, *uri, *project, *jobManager, *securityContext)
Returns a service definition. The type of the service can be one of “execution”, “file”, or “file-transfer”. Provider indicates The Java CoG Kit abstraction provider for the service. For a list of currently supported providers please see the abstractions guide.
The *uri argument can be used to specify a URI for the service. If missing the host name of the host containing the service will be used.
The *project argument can be used to automatically bind a queuing system project to the service in order to alleviate the need to do it with the execute element.
The *jobManager argument can be used to specify a job manager different from the default. Examples of job managers include Fork, PBS, and Condor.
A non-default security context can be specified using the *securityContext argument.
task:securityContext
task:securityContext(provider, credentials)
Returns a Java CoG Kit abstraction security context. The returned context will be instantiated for the specified provider. The credentials argument can be used to pass a specific set of credentials to security context.
task:allocateHost
task:allocateHost(name)
Allows tasks to be grouped on one host. By default, the scheduler assigns a different host to each task. AllocateHost can be used to reserve a host from the scheduler until it completes. The name indicates the name of the variable to be set with the allocated host, and is automatically quoted.
//Define a schedulerscheduler( ... ) allocateHost(host1execute("/bin/date", stdout="date", host=host1)transfer(srcfile="date", srchost=host1, desthost="localhost") )
Or, in XML:
<scheduler> ... </scheduler> <allocateHost name="host1"> <executeexecutable="/bin/date" stdout="date" host="{host1}"/> <transfersrcfile="date" srchost="{host1}" desthost="localhost"/> </allocateHost>
The default scheduler uses a late binding mechanism with allocateHost. It generates a virtual host that is only bound to an actual host when the first task using it is submitted to the scheduler. This removes the limitation on the number of parallel allocateHost that can be running, and allows contained jobs to be submitted to the scheduler, which will later handle the throttling issues.
Multiple allocateHost can be nested allowing the grouping of tasks on multiple dependent hosts.
task:host:hasService
task:host(host, type, provider)
Checks if a host, specified with the host element contains a service of the specified type and with the specified provider. Returns true if such a service exists, and false otherwise.
task:execute
task:execute(executable, arguments, *directory, *stdout, *stderr, *, *redirect, *provider, *host, *count, *jobtype, *maxtime, *maxwalltime, *maxcputime, *environment, environment, *queue, *project, *minmemory, *maxmemory, *nativespec, *delegation)
Executes a remote job. Executable indicates the executable to be run. Arguments can be passed to the executable using arguments. If present, the *directory argument specifies the remote directory in which the job will be executed. *Stdout and *stderr allow the redirection of the output and error streams to a remote file. *Stdin allows the redirection of the standard input from a remote file. If *redirect is set to true the standard output and standard error of the remote job is redirected to the local console. The *host argument allows the job to be executed on a specific host, and the *provider argument allows the job to be executed using a specific provider.
The *delegation can be used to enable credential delegation with providers which support it. Credential delegation is disabled by default.
A set of environment variables can be passed in two ways:
- Through the
*environmentargument, in which case the value should be either amapor a string with the following format: [<name>=<value>[, <name>=<value>[, ...]]] - Through the
environmentchannel, in which caseenvVarcan be used
The rest of the arguments are passed to the underlying provider.
A native specification (such as a classic GRAM RSL, or WS-GRAM RSL) can be passed to the provider using the *nativespec argument.
If the exit code of the job is not 0, the job is considered failed and execute will in turn fail. However some providers (notably GT2) provide no means of retrieving the job exit code. With such providers execute will always succeed no matter what the exit code of the job is.
task:transfer
task:transfer(*srcfile, *srcdir, *srchost, *destfile, *destdir, *desthost, *provider, *srcprovider, *destprovider, *thirdparty, *srcOffset, *length, *destOffset, *tcpBufferSize)
Transfers a file. The file can be transfered between the local machine and a remote machine, or between two remote machines. The name of the source file is specified by the *srcfile argument. If present, *destfile specifies the name of the target file, otherwise the source file name is used.
The *srcdir argument indicates the directory on the source machine where the source file can be found. If the *srcdir argument is missing, the default directory will be assumed (provider dependent).
The *destdir argument indicates the directory on the target machine where the file will be copied. If the *destdir argument is missing, the default directory will be assumed (provider dependent).
*Srchost and *desthost indicate the source and the target machines respectively, while the *provider argument can be used to force the scheduler to use a specific provider, or in the event a scheduler is not used. If the source and the destination use distinct providers, the *srcprovider and *destprovider arguments can be used.
The *thirdparty can be used to indicate that a direct transfer between two machines, none of which are the local host, is requested. At this time, only GridFTP supports third party transfers. By default, the Java CoG Kit Abstractions will use simulated third party transfers (routed through the local host) even if both the source and destination are different from the local host.
Partial transfers can be achieved using *srcOffset, *length, and *destOffset. Currently these are only supported with GridFTP 3rd party transfers.
The *tcpBufferSize can be used to set the buffer size used by TCP/IP connections initiated by the transfers for providers that support it. Currently this feature only works with 3rd party GridFTP transfers.
task:transferParams
task:transferParams(srchost, desthost, provider)
Can be used to load TCP buffer size information from a file and pass the parameters to transfer. The file containing the buffer size information is located in the etc directory and is named bdp.conf. It has the same format as the identically named file used by TGCP
Example:
transfer( srcfile="file.txt" srcdir="/home/me"transferParames( srchost="tg-gridftp.uc.teragrid.org" desthost="tg-gridftp.sdsc.teragrid.org" provider="gsiftp" ) )
task:file:list
task:file:list(dir, *host, *provider)
Returns a list of files in a directory specified by dir, on the *host machine. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:file:remove
task:file:remove(name, *host, *provider)
Removes a file specified by name, on the *host machine. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:file:exists
task:file:exists(name, *host, *provider)
Returns true if the file specified by name exists on the *host machine. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:dir:make
task:dir:make(name, *host, *provider)
Creates a directory specified by name, on the *host machine. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:dir:remove
task:dir:remove(name, *host, *provider)
Removes an empty directory.
task:file:isDirectory
task:file:isDirectory(name, host, provider)
Returns true if the file specified by name exists on the *host machine and it is a directory. The *provider argument can be used to select a specific provider for the operation. *Provider defaults to the local provider.
task:file:chmod
task:file:chmod(name, mode, *host, *)
Changes the permissions on the file specified with the name argument to the mode string indicated by the mode argument. If *host and *provider are present, the operation is done remotely using the respective provider.
task:file:rename
task:file:rename(from, to, *host, *provider)
Renames a file. The source and target name are specified using the from and to arguments. If *host and *provider are present, the operation is done remotely.
task:SSHSecurityContext
task:SSHSecurityContext(credentials)
Instantiates a SSH security context. This is simply a convenience function for securityContext(”ssh”, credentials).
task:InteractiveSSHSecurityContext
task:InteractiveSSHSecurityContext(*username, *privateKey, *nogui)
Instantiates a SSH security context which will lazily display a dialog window allowing the user to input a user-name/password pair or a user-name/private key/passphrase set. The dialog will only be displayed once per each instance of an interactive SSH security context.
If *username and/or *privateKey are specified, the values are used to pre-fill the corresponding dialog fields.
The InteractiveSSHSecurityContext makes use of a class present in the SSH provider of the Java CoG Kit. This class will try to determine whether a GUI can be displayed or not (by checking GraphicsEnvironment.isHeadless()). If a Swing dialog cannot be displayed, a text-mode interface is used instead. The *nogui argument can be used to force the use of the text-mode interface (by setting it to true).
task:passwordAuthentication
task:passwordAuthentication(username, password)
Returns a username/password pair suitable to be used as a credential for a securityContext.
task:publicKeyAuthentication
task:publicKeyAuthentication(username, privatekey, passphrase)
Returns a username/privatekey/passphrase set which can be used as credentials for securityContext. The privatekey argument must point to a file containing the private key.
task:envVar
task:envVar(name, value)
Returns the pair of name and value on the environment channel, to be used by execute
Java Library
Files: java.k, java.xml
The Java library allows limited interfacing with Java classes and objects from Karajan.
Java Elements
java:new
java:new(classname, *types, ...)
Instantiates a new Java object and returns it. Classname represents the fully qualified name of the class. The *types argument is a list of fully qualified class names used to search for a constructor. The arguments on the default channel are passed to the constructor after performing a conversion based on the *types argument. The *types argument is not always necessary, but should be used if Karajan cannot determine the types of the arguments that need to be passed to the constructor.
set(xnew("java.lang.Double", "1.0") /* The types argument is not necessary since * the string argument is automatically mapped * to java.lang.String */ )set(jnew("java.lang.Integer", types = ["int"], 1) /* The types argument is required since the numeric * type used by Karajan cannot be mapped automatically * to a specific Java numeric type. */ )
Primitive Java types are represented by their corresponding keywords: boolean, byte, char, int, long, float, and double.
java:invokeMethod
java:invokeMethod(method, *static, *classname, *object, *types, ...)
Invokes a method on a Java object or a static method on a Java class. The invocation is static if *static is set to true or *classname is present. Otherwise the value of *object is taken to be a Java object and the invocation is done on a virtual method of the object. Method indicates the name of the method. Unless the *types argument is present, Karajan will try to determine the method signature from the types of .... A method is searched for in the inheritance hierarchy of the object. If one is found, the method is invoked and, if its return value is not void the returned value is returned on the default channel. The format of the type argument is identical to the one for new.
java:executeMain
java:executeMain(class, ...)
Invokes static void main(String[] args) on a class. The qualified class name should be passed in the class argument. The arguments on the default channel are converted to strings and passed as the args to the main method.
java:getField
java:getField(field, *object, *classname)
Returns the value of a field from an object or class. The filed name is passed in field. If *object is present, the value of the instance field of the object is retrieved. If *classname is present, the class field (static field) of the specified class is retrieved. Arguments *object and *classname are mutually exclusive.
java:waitForEvent
java:waitForEvent(...)
WaitForEvent is a rather obscure and incomplete element. It waits for a Java event such as a button being pressed, or a window being closed. Each argument is a list composed by three elements:
- A return value which will be returned when the associated event occurs
- The type of the event to wait for. Currently the following types are supported:
java.awt.events.ActionEvent- Can be used to wait for a button being pressed (or any other actions that have a addActionListener(java.awt.events.ActionListener) method.
java.awt.events.WindowEvent- Can be used to listen for the windowClosing notification on a window.
- The source object to be used
When the event occurs, waitForEvent completes returning the return value specified as the first item in the list corresponding to the event that the list represents.
java:classOf
java:classOf(object)
Returns the Java class name of the specified object.
java:null
java:null()
Karajan has no explicit null value. However, it may be necessary that a null value be used when invoking Java methods. Null provides that.
HTML Library
Files: html.k, html.xml
The HTML library can be used to produce HTML output from Karajan scripts. It is incomplete, but able to generate simple HTML pages. It is listed here in the hope that it will be useful. However, not many guarantees are made about it. The library elements utilize the html channel.
HTML Elements
html:write
html:write(file, html)
Writes all arguments received on the html channel to the specified file. It is roughly equivalent to the following:
file:write(name=filefrom(html ... ) )
The following code produces a simple HTML file:
import("html.k") write(file="out.html"html(head(title("Title") )body(h1("Heading 1")text("Some text") ) ) )
The XML syntax may appear more convenient for using the HTML library:
<importname="html.xml"/> <write file="out.html"> <html> <head> <title>Title</title> </head> <body> <h1>Heading 1</h1> <text>Some text</text> </body> </html> </write>
html:quickstart
html:quickstart(title, html)
Produces an HTML skeleton output with the given title, and the arguments on the html channel substituted inside the <body> tag.
html:html
html:html(html)
Generates the <html></html> tag pairs with the html arguments substituted inside.
html:head
html:head(html)
html:title
html:title(title)
html:body
html:body(*bgcolor, html)
html:table
html:table(*width, *height, *border, *cellpadding, *rules, *bgcolor, *class, *style, html)
html:tr
html:tr(*width, *height, html)
html:td
html:td(text, *width, *height, *colspan, *bgcolor, *align, *valign, html)
A note that needs to be made about td is that the text argument is mandatory, which means that it needs to always be present, even if a HTML nested element is used. The following example shows the correct usage for a table cell containing just an image:
...table( ...tr( td("", {[el|html|img}}(src="image.jpg")) ) ... ) ...
or
... <table> ... <tr> <td text=""> <imgsrc="image.jpg"/> </td> </tr> ... </table> ...
html:th
html:th(text, *width, *height, *colspan, html)
See also: td
html:h1
html:h1(text, html)
html:h2
html:h2(text, html)
html:h3
html:h3(text, html)
html:h4
html:h4(text, html)
html:h5
html:h5(text, html)
html:h6
html:h6(text, html)
html:ul
html:ul(html)
html:pre
html:pre(text , html)
html:br
html:br()
html:li
html:li()
html:a
html:a(text, *href, *style, html)
html:anchor
html:anchor(name)
Anchor is used to define a HTML anchor since the a element does not support this functionality. It effectively returns <a name="{<code>name}"/></code> on the html channel.
html:img
html:img(src, *border)
html:text
html:text(text)
Returns the value of the text argument on the html channel.
Forms Library
Concepts
Files: forms.k, forms.xml
The forms library can be used to build and gather information from GUI forms. The current implementation uses the Java Swing library.
Component ID
Each component whose state is user-modifiable (such as button, textField, etc.) must have an ID, which can be used to retrieve the state of the component when using the form element.
Component Layout
The layout of components is done using hbox and vbox.
Component Alignment
Components support alignment using the *halign and *valign arguments. *Halign and *valign have values between 0.0 and 1.0, with 0.0 representing left/top alignment, and 1.0 representing right/bottom alignment. By default, both *halign and *valign are set to 0.5 (centered).
Form Elements
form:form
form:form(id, title, waiton)
This element represents the main form element. It creates a window based on the specification received in ..., and displays it. It will then wait for a control with the id that matches the waiton value, and terminates returning a map pairing control ids and their values. The layout of the controls in the window is performed by recursive nesting of vertical and horizontal containers (vbox and hbox respectively).
form:hbox
form:hbox(*homogeneous, ...)
Represents a horizontal container. All controls specified by ... are laid out one after another horizontally. By default, all controls are assigned an area of equal height, but the width varies according to the natural dimensions of the control (the total width is proportionally divided between the components according to their size). The *homogenous argument can be used to force all cells to have an equal width. The various possibilities are illustrated in Figure 1, Figure 2, and Figure 3.
Figure 1: A non-homogenous hbox
Figure 2: A scaled non-homogenous hbox
Figure 3: A homogenous hbox
form:vbox
form:vbox(*homogenous, ...)
A vbox is similar to a hbox but with the sub-components laid vertically.
form:label
form:label(text, *halign, *align)
Represents a text label. The text content is described by the text argument.
See also: Component Alignment
form:button
form:button(id, caption, *halign, *valign)
Represents a push-button. The text used for the label is specified with the caption argument. The ID is specified with the id argument.
See also: Component Alignment
form:checkBox
form:checkBox(id, *caption, *checked, *halign , *valign)
Constructs a check-box, whose initial state can be set using the *checked argument. By default, the check-box is not checked. The label of the checkbox is specified using the caption argument.
See also: Component Alignment
form:radioBox
form:radioBox(id, caption, ...)
Allows the construction of a radio-button group. The group will have a titled border with the text defined by the caption argument. RadioBox expects radio button definitions on the default channel (...).
form:radioButton
form:radioButton(id, caption, *selected, *halign, *valign)
Defines a radio button, with the text set by the caption argument. By default, the first button in a radioBox is selected, unless the *selected argument is used on a different radio button.
See also: Component Alignment
form:textField
form:textField(id, *columns, *halign, *valign)
Describes a text field, used for text input. A text field has only one line of text. The number of columns of the text field is indicated using the *columns argument.
See also: Component Alignment
form:passwordField
form:passwordField(id, *columns, *halign, *valign)
A passwordField is similar to a textField, with the difference that the characters entered are visually represented by asterisks (*).
See also: textField
form:comboBox
form:comboBox(id, *halign, *valign, ...)
Constructs a combo-box, which can be used to select from multiple items in a drop-down list. The items are specified as ..., using comboItem.
See also: Component Alignment
form:comboItem
form:comboItem(*text, *selected)
Specifies a comboBox item. The text of the item is indicated by the text argument. By default, the first item in a comboBox is selected, unless the *selected argument is used on a different item.
form:HSeparator
form:HSeparator()
Constructs a horizontal separator component.
form:VSeparator
form:VSeparator()
Constructs a vertical separator component.
form:filler
form:filler(*width, *height)
Defines a filler component. A filler component serves no functional purpose, but can be used to influence the layout of a form by providing a visually invisible component of the specified *width and *height (in pixels).
form:messageDialog
form:messageDialog(message, title)
Used to display a popup window displaying the specified message and having the specified title. The element completes when the popup message is closed by the user, either by using the window controls or the OK button.
Restart Log Library
The Restart Log Library provides fault tolerance in a style similar to that of Condor DAGMan. The execution of certain operations is recorded into a log file on the disk. In case of a failure the execution can be resumed using the information saved in the log file. The operations that previously completed successfully will not be re-executed. The library defines two elements: restartLog and logged.
This mechanism offers no guarantees of semantic consistency after a restart if the control flow is influenced by factors that change between the original execution and the resumed execution. It is generally safe to use this mechanism with for and parallelFor if the iteration values are fixed. Additionally, return values of logged elements are not recorded. Consequently a resumed logged element will not return anything.
Usage example:
import("sys.k")import("task.k")import("rlog.k")logged(transfer(srcfile="a.txt", desthost="host.example.org", provider="gt2") )parallel(logged(execute(executable="/bin/cat", arguments="a.txt", stdout="b.txt", host="host.example.org", provider="gt2") )logged(execute(executable="/bin/cat", arguments="a.txt", stdout="c.txt", host="host.example.org", provider="gt2") ) )parallelFor(out,list("b.txt", "c.txt")logged(transfer(srcfile=out, srchost="host.example.org", provider="gt2") ) )
Resuming:
cog-workflow workflow.k -rlog:resume=workflow.0.rlog
rlog:restartLog
rlog:restartLog(*resume, *name, restartlog)
RestartLog performs the following functions:
- Opens a log file. The prefix of the log file name is taken from the
*nameargument or, if the*nameargument is missing, from the file name of the current script being executed. The actual file name is obtained by appending a dot character ("."), a unique numeric identifier, and the ".rlog" extension.RestartLogwill attempt to successively use increasing numeric identifiers, starting from 0 (zero). If a log file with that identifier already exists or if an exclusive lock on the file cannot be obtained, the next number is used. An exclusive lock is acquired on the log file such that other processes will not attempt to use the same file.
- Accepts arguments on the
restartlogchannel and writes them to the log file. After writing each value, the file buffers are immediately flushed to the disk using the FileDescriptor.sync() method.
- In the case of a restart it also parses a previous log and builds the necessary data in a way that the
loggedelements can use. If during a restartrestartLogcannot acquire an exclusive lock on the log file, it will not attempt to use another log file, but fail instead.
- Upon successful completion, it closes and deletes the log file.
Restarts can be triggered in two ways:
- Using the
*resumeargument with the full file name of a log file
- By specifying the -rlog:resume=<logname> command line argument to the script (not to the interpreter).
rlog:logged
rlog:logged()
Logged elements execute their child elements and, upon termination, they return, on the restartlog channel an identifier that uniquely identifies a logged element within a given execution, and a thread id, which is used to differentiate between concurrent runs of the same element. The same (element id; thread id) pair can occur multiple times in a log file, but it only reflects successive executions of an element within the same thread.
If a restart is in effect, logged elements will analyze data parsed from the log, and if there exists an entry with the same (element id; thread id) pair which has a count larger than 0, the count will be decreased and logged will complete without executing its arguments/child elements. It will consequently not return any values whatsoever, not even values that were returned by its child elements in a previous run. It is therefore recommended that logged only wrap elements that do not return values.
Service Reference
Karajan features a service which can be used to execute parts of a script remotely. Some of the features of the service are listed below:
- Security
- The service and client use GSI security enabling strong authentication and encryption of data
- Configurable communication channels
- The Karajan networking infrastructure consists of configurable communication sub-systems, which separate the high level messaging protocols from the low level implementation details. The current implementation uses SSL sockets over TCP/IP, and can be configured to use persistent connections, or callbacks, or polling (or a combination of them), and the configurations can be different for each host or for a specific domain. This provides firewall transparency when needed, yet can minimize resource consumption and maximize performance when firewalls are not involved.
- Semantic transparency
- Remote execution can be seamlessly integrated with local execution, while preserving most of the system semantics, such as return values, definitions, etc.
Using the Service
The Karajan service can be accessed from Karajan scripts using the remote element.
Shared or Personal
The Karajan service can be used in two modes: personal or shared. In personal mode, the service runs under a user’s credential, loaded from a proxy certificate. It is therefore necessary that a valid proxy exists before the service is started in personal mode. In this mode, all libraries are available for use without restrictions, but the connections are limited to clients using the same credential as the one used for starting the service.
In shared mode, the service requires a host credential. In this mode, connections can be initiated by multiple users, and authorization is performed based on a grid-map file. Access to certain libraries or functions that could be used to access files, resources, or other information belonging to other users using the service are restricted. Any attempt to execute such functions will result in the execution failing. The following list enumerates libraries and elements whose use is restricted in shared mode:
- The Java Library
- Elements not in the Task Library which can be used to access local files:
file:contains,file:read,outputStream,closeStream,checkpoint,file:execute, andfile:write.
In addition, the Task Library requires a special local provider, which can use operating system services to execute local tasks under non-shared privilege, achieved through grid-mapfile mapping. The service will refuse to run in shared mode if the normal local provider is detected. Details about building and configuring the secure local provider are available here.
When running in shared mode the service should be started under a non-priviledged account. Please do not run the service from the root account or any other administrative/priviledged account!
Limiting Access to Resources in Shared Mode
Using the service in shared mode has a number of security requirements:
- Access to resources should be restricted based on user identities.
- A user must not be able to access any other user’s data/resources. Consequently, this requirement is divided into:
- Restricting access to the shared environment, such that privilege escalation cannot be done in order to override any of the security measures in place
- Delegating certain required privileged operations (such as execution and file access) to local security domains (running privileged operations under specific user accounts)
Access restriction is achieved using a ”all-or-nothing” access control list: the grid-mapfile. A given identity (materialized in a GSI/X509 certificate) is only able to use the service if there exists an entry for that identity in the grid-mapfile.
Shared environment access restrictions is done in two ways: [1]
- Execution of Karajan elements that could be used for escalating privileges or accessing other users’ data/resources is prohibited. Attempts to use these will result in an error instead. For example, allowing arbitrary Java method invocations inside the shared interpreter environment (the service JVM) is prohibited. Similarly, elements that can be used to access files belonging to the account under which the shared interpreter is running is also prohibited, and their use will result in an error.
- Instantiation of arbitrary Java objects, regardless of the means, is restricted to known “safe” object types. A subset of the problem is presented below:
The Karajan service allows passing various arguments from the client to the service by means of serialization and de-serialization. However this process does not guarantee the invariance of all objects when copied to a different resource. It is possible, that a certain object exists within the libraries of both the client and the service, which could, through the simple process of de-serialization, be initialized with privileged data, which can later be retrieved through other means. For example, let us consider the following Java class:
public class Foo implements Serializable {
private File file;
private transient String contents;
public Foo(File file) {
this.file = file;
this.contents = read contents of file
}
public String toString() {
return contents;
}
}
The transfer of an instance of Foo from the client to the service, by means of serialization/de-serialization will not preserve the local meaning of the file attribute. Instead, the mere instantiation of Foo combined with a seemingly unprivileged operation (print(”foo is {foo}”)) could allow access to arbitrary files in the service shared environment.
Another possible scenario is objects whose constructors implement side effects that execute privileged operations. Another Foo, should illustrate this:
public class Foo implements Serializable {
public Foo() {
something that amounts to "rm -rf /"
}
}
It is clear that de-serializing an instance of the latter Foo in the service environment is undesirable, to say the least. Surely, both Foo versions are somewhat extreme, and somewhat unlikely to exist in the libraries of both the client and the service, but there is no certainty, and auditing all classes in the libraries would be very tedious if at all possible. The chosen solution to the problem is to have a set of allowed classes/packages which can be safely migrated between client and server, while excluding anything that is not explicitly in the set. A configuration file (etc/karajan-restricted-classes.properties) defines the de-serialization policies. The file allows two types of entries:
- package.allow
- Define a package which is allowed. All classes in the specified package and all its sub-packages are considered safe to be de-serialized.
- class.disallow
- Explicitly disallow a class from being de-serialized, even if it is contained in an allowed package.
Communication Layer Configuration
The communication layer supports three basic modes of operation, described below:
- Persistent
- In this mode, connections are kept open for as long as needed. All client→server and server→client communications will re-use the same connection, even if multiple requests are sent concurrently from the client. It is also possible to keep connections open after all current communications have terminated, in the event that future communications will be needed. This mode can be safely used if the client is behind a firewall, but it may keep resources allocated unnecessarily if no communication happens between the server and the client.
- Callback
- The callback mode can be used to instruct the service to initiate a connection to the client if the service needs to send any messages to the client. This type of configuration will most likely not work if a client is behind a firewall.
- Polling
- Polling is somewhat similar to persistent connections in that it will safely work if the client is behind a firewall. It instructs the client to disconnect idle connections, but re-establish them at specific intervals. It is therefore more conservative in terms of resource usage, but it may cause unwanted delays.
These modes of operation can be combined. For example, a combination between using callbacks and polling will ensure that the system will work whether the client is behind a firewall or not, but it will provide better performance in the best case scenario (no firewall).
The communication layer is conservative, in the sense that before a connection is initiated, it will try to search for any other existing connection that can be used to transmit a certain bit of information, irrespective of the initiator of the connection, and only if such a search fails, will it consider establishing a new connection.
The configuration can be changed by editing etc/remote.properties. The file contains a set of pairs of domain expressions and connection properties. The entries are considered in the reverse order from that in which they appear in the file. In other words, the first entry will be considered last, only if no other matching entry can be found. The domain expression is a regular expression which is matched against the domain name of the service host. The connection properties are a set of comma separated options. These options are described in the following table:
- keepalive(timeout)
- Indicates that the connection is to be kept alive. The optional timeout indicates that the connection should be also kept alive for the specified number of seconds, even if no actual data is sent through.
- reconnect
- This option instructs the system to re-initiate a connection in the event that a connection loss occurs. By default, failed connections will not be re-established.
- callback
- Instructs services to connect to the client if no existing connection can be used for sending data. A client will have started a local service before the first connection with the service for which a callback configuration exists is established.
- poll(interval)
- Instructs the client to poll the service for updates. The service will buffer all data that it needs to send to the client, and commit the buffered data when the client initiates a polling run. The interval indicates, in seconds, the interval at which the client should initiate the polls.
An example configuration is listed below:
#default to persistent connections #this one is reached if no other entry matches ".*" keepalive(120), reconnect #callbacks can safely be used within our own domain ".*mydomain.com" callback #there's a firewall between mydomain.com and otherdomain.com, so #we poll every 2 minutes ".*otherdomain.com" poll(120)
#for the sake of this example, a combination of callback and polling #if callbacks don't work, polling will pick things up ".*thirddomain.org" callback, poll(120)
The Secure (Grid-Mapped) Local Provider
In order to build the secure local provider, the following steps must be taken:
- If a previous build was performed, run
ant distclean - Edit the abstractions dependency file in
cog/modules/abstraction/dependencies.xml - Change the dependency on the provider-local to provider-slocal:
<ant antfile="${main.buildfile}" target="dep">
<property name="module" value="provider-slocal"/>
</ant>
- Re-compile with ant dist
At this time, the secure local provider only implements job execution (execute). File operations are not yet implemented.
The secure local provider uses a gridmap file, which describes a maping of GSS distinguished names to local user accounts. For details about gridmap files, take a look at [Identity Mapping Information]
The actual mapping is done using sudo and a simple job wrapper. Sudo must be configured to allow the execution of the wrapper under target user accounts without a password. An example sudo configuration is shown below:
<shared_username> ALL=(<target_usernames_list>) NOPASSWD: <cog_path>/libexec/job-wrapper
Where <shared_username> is the username of the account under which the shared service is running, <target_usernames_list> is a comma separated list of user-names which the provider can map to (these must also be correctly configured in the gridmap file), and <cog_path> is the absolute path to the CoG/Karajan installation.
If the target user account is the same as the shared user account, the secure local provider can be configured to bypass sudo (which it does by default).
The provider configuration file (etc/provider-slocal.properties), supports the following properties:
- grid.mapfile
- Indicates the location of the gridmap file. The default is
/etc/grid-security/grid-mapfile. - sudo
- Points to the location of sudo.
- nosudo
- If the target user is the same as the user under which the shared service is running, then the value of this property will be used as the path to a program used to start the wrapper.
- job.wrapper
- The location of the wrapper. It defaults to
<cog_path>/libexec/job-wrapper, therefore it should not be necessary to specify this property unless a non-standard configuration is used.
An example configuration file is shown below:
grid.mapfile=/etc/grid-security/grid-mapfile sudo=/usr/bin/sudo nosudo=/bin/bash job.wrapper=/home/karajan/cog/libexec/job-wrapper
Service Command Line Options
Usage
cog-workflow-service <options>
Options
- -personal
- Starts the service in personal mode. In personal mode, the service uses the user credential, and does not use a restricted execution environment. This is the default mode.
- -shared
- Starts the service in shared mode. In shared mode, the service uses the host credentials, provides a restricted execution environment, and uses the grid-map file for authorization.
- -nosec
- Starts the service without security
- -local
- Binds the socket to the local host. Connections from the outside network will not be possible.
- (-port | -p) <port-number>
- Specifies the port that the service should be started on. The default is 1984.
- -gridmap <file>
- Specifies the location of the grid map file, which is used for mapping certificate distinguished names to local user accounts. This options is only meaningful if the service is started in shared mode. The default is /etc/grid-security/grid-mapfile
- -hostcert <file>
- Indicates the location of the host certificate. This option is only used in shared mode. The default is /etc/grid-security/hostcert.pem
- -hostkey <file>
- Indicates the location of the host key. This option is only used in shared mode. The default is /etc/grid-security/hostkey.pem
- -help
- Displays usage information
Notes
1. There should be a way to allow invocations of certain “safe” methods, as defined by an administrative policyInterpreter Reference
Usage
cog-workflow <options> file <arguments>
Options
- (-execute | -e) <string>
- Execute the script given as argument
- -showstats
- Show various execution statistics at the end of the execution
- -debug
- Enable debugging. This will enable a number of internal tests at the expense of speed. You should not use this since it is useful only for catching subtle consistency issues with the interpreter.
- -debugger
- EXPERIMENTAL and BUGGY. Starts the internal graphical debugger
- -monitor
- Shows a resource monitor
- -dumpstate
- If specified, in case of a fatal error, the interpreter will dump the state in a file
- -intermediate
- Saves intermediate XML code resulting from the translation of .k files
- (-help | -h)
- Display usage information
- -cache
- EXPERIMENTAL! Enables cache persistance
- -stdoutUnordered
- Make
printinvocations produce produce results in the order they are executed. By default the order is lexical.
Arguments
Arguments to the script can be specified after the file name. They are distinct from the interpreter options, and are passed to the script as a constant, in the form of a list named "cmdline:arguments".
Embedding Karajan Into Java
This document provides details on the two modes through which Karajan scripts can be used from Java code
The first possibility is to use the Karajan parser to build an internal representation of the code, and execute it. Example:
try {
ElementTree tree = Loader.loadFromString("include(\"sys.k\"), print(\"Hello world!\")");
ExecutionContext ec = new ExecutionContext(tree);
ec.start();
ec.waitFor();
}
catch (Exception e) {
e.printStackTrace();
}
An alternative, seemingly easier way to do the same is:
KarajanWorkflow workflow = new KarajanWorkflow();
workflow.setSpecification("...");
workflow.start();
workflow.waitFor();
if (workflow.isFailed()) {
System.err.println("Failed:");
worklfow.getFailure().printStackTrace();
}
The other possibility is to build an ElementTree without parsing:
try {
ElementTree tree = new ElementTree();
Sequential s = new Sequential();
Print p1 = new Print();
p1.setProperty("message", "Hello");
p1.setProperty("nl", Boolean.FALSE);
Print p2 = new Print();
p2.setProperty("message", " world!");
s.addElement(p1);
s.addElement(p2);
tree.setRoot(s);
ExecutionContext ec = new ExecutionContext(tree);
ec.start();
ec.waitFor();
}
catch (Exception e) {
e.printStackTrace()
}
This would be roughly equivalent to:
sequential( print(message = "Hello", nl = false) print(message = " world!") )
It is worth noting that it is not necessary to import the System Library because there is no need to resolve names of elements to implementations, since the implementations are specified directly.
More details about the mapping of element names to implementations can be found by consulting the actual library definitions found in the following directory: src/cog/modules/karajan/resources
Listed below are a few common libraries with links to their sources (from which the mapping can be infered):



