V:4.1.5/Karajan:Language Reference
From Java CoG Kit
Contents |
The Karajan Language
Karajan supports two syntax modes: a native syntax and XML. There is no difference on the semantic level between the two forms.
The Karajan Syntax
The following conventions are used:
(xy) is used to group x and y [x] indicates that x is optional x+ denotes at least one occurrence of x x* denotes zero or more occurrences of x x|y means either x or y 'x' is to be interpreted as the literal x
ε represents the empty production
Elements
The Karajan semantics revolve around the notion of elements. An element is relatively similar to a function in that it has a name, can accept arguments, and may return values. The general syntax for an element is:
element ::= identifier '(' [arguments] ')'
Example:
false()
Identifiers
An identifier can consist of alpha-numeric characters and certain symbols, but no whitespace. Symbols that cannot be used in an identifier are symbols that have other syntactic functions, such as brackets (all of them), commas, double quotes, and operators (’+’, ’-’, ’*’, ’/’, ’%’, ’^’, ’=’, ’<’, ’>’, ’&’, ’|’). Identifiers are case insensitive.
identifier ::= (Letter | Digit | '!' | '@' | '#' | '$' | '%' |
| '_' | ':' | ';' | ''' | '.' | '?' | '\' | '`' | '~')+
Example:
var, i, v123, @, a$, big_list, grid:task, file.list
Identifiers cannot begin with a digit.
Arguments
The arguments can either be other elements or values, such as numeric values or strings. Elements can be separated by commas or the new line character (or both):
arguments ::= argument [separator arguments]) | ε separator ::= ',' | Newline
Example:
list(true(),false())list(true()false() )list(true(),false())
Arguments come in two flavors. Named arguments and unnamed arguments:
argument ::= named_argument | unnamed_argument
Named Arguments
Named arguments provide a way of explicitly binding arguments to formal parameters:
named_argument ::= identifier '=' unnamed_argument
Example:
true(), nl =false())
Unnamed Arguments
Unnamed arguments can be either immediate values, elements or expressions. Immediate values can be numeric literals, string literals, variables, or quoted lists:
unnamed_argument ::= numeric_literal | string_literal | variable | quoted_list |
| element | expression
numeric_literal ::= ['+'|'-'] digit+ ['.' digit+]
digit ::= '0'... '9'
string_literal ::= '"' any_characters_but_double_quotes '"'
variable ::= identifier
Example:
list(1, 2.3, -4.56, +7.890, "A string",list("Another string value in a nested list", "*2"))
Expressions
Expressions consist of unnamed arguments to which operators are applied. Parantheses can be used to override the default precedence of operators in expressions.
expression ::= unnamed_argument operator unnamed_argument |
| '(' expression ')'
Example:
1+2*3-4
Operators
The native Karajan syntax supports basic arithmetic, logic operators, and an assignment operator:
operator ::= '*' | '/' | '%' | '+' | '-' | '<=' | '>=' | '<' | '>' |
| '==' | '!=' | '&' | '|' | ':='
The following lists enumerates operators in the order of precedence, starting with the highest precedence. While the XML syntax does not support the use of operators, each operator has an equivalent element which can be used in both syntaxes (shown in parantheses).
- Additive operators
-
+(sum) Addition -
-(subtraction) Subtraction
-
- High priority relational operators
-
<=(lessOrEqual) Less or equal -
>=(greaterOrEqual) Greater or equal -
<(lessThan) Strictly less -
>(greaterThan) Strictly greater
-
- Low priority relational operators
- Multiplicative logic operators
-
&(and) Logical AND
-
- Additive logic operators
-
|(or) Logical OR
-
- Assignment operator
-
:=(set) Variable assignment
-
Quoted Lists
A quoted list is a special element that produces a list of identifiers. What is specific about a quoted list is that if its arguments are variables, the variables will not be evaluated. Instead their identifiers will be added to the list. Quoted lists are convenience syntax for expressing a list of formal arguments:
quoted_list ::= '[' arguments ']'
However, quoted lists are not limited to expressing list of arguments. They can also be used to express lists of values. The only thing to remember is that variable evaluation will not take place for immediate arguments of a quoted list.
Example:
list("A quoted list follows", [a, b, c])
Programs
A Karajan program is a list of arguments:
program ::= arguments
There exists an implicit root element that sits at the top of the element tree, and implements certain system functions.
Comments
And finally, Karajan uses C-style comments. Single-line comments begin with two forward slashes and end at the following new line character, while multi-line comments are delimited by ’/*’ and ’*/’:
//This is a comment
print("This is not a comment")
/*This is
also a
comment
*/
The XML Syntax
Karajan also supports XML as its syntax. In the XML syntax, each XML element corresponds to a Karajan element. Arguments can be expressed either through XML attributes or nested elements.
Particularities of Using XML
One of the particular aspects of using XML with Karajan is that when using XML attributes for arguments, it is impossible to make a syntactic distinction between a numeric value and its string representation. In general, Karajan will try to use the context to figure out which one is desired, but there are instances when it is impossible to do so. Therefore, when using the XML syntax, the following elements can be used for the purpose of differentiating between numeric and string values: number and string
<list> <number>1</number> <string>1</string> </list>
The equivalent Karajan construct would be:
list(1, "1")
Karajan can load and interpret arbitrary XML files, provided that definitions exist for the XML elements present in the file, but XML mixed content is not handled properly. The unfortunate aspect is that it is impossible to handle XML mixed content in a generic way. For example, it cannot be known whether whitespace between two XML elements is to be interpreted as content or not, without knowledge of the implementation of an element. Since Karajan is a dynamic language, the implementation of an element is not known statically, at the time the parsing takes place. Therefore, the following rule was adopted: An element will consider textual content content if and only if no nested elements exist. If nested elements exist, textual content will be ignored.
If processed, textual content will be mapped as a string argument. A consequence of the above rule is that textual content and multiple arguments are mutually exclusive.
Lastly, a well-formed XML document must always have a root element. While in the native Karajan syntax, the root element is implicit, in XML the project or karajan elements can be used as root elements.
Parameters and Return Values
An element can accept any number of arguments and can generate any number of return values, and that includes an infinite number of arguments and/or return values (at least in theory).
Parameters and Arguments
Arguments are divided into two major types: single value arguments and channels. As their name implies, single value arguments can have only one value. By contrast, channels can be used for any number of values.
Single Value Arguments
Single value arguments can be specified using the named argument form. For example, the print element has a message argument. Thus passing a string as the message argument to print element can be done in the following way:
print(message = "Some string")
Or in XML:
<argumentname = "message" value = "Some string"/> </print>
Single value arguments can be further divided into mandatory and optional arguments.
Channels
Channels can be used to pass multiple arguments to an element. Each channel has a name, except for the default channel. The default channel is similar to the notion of variable arguments in C. Passing arguments on the default channel is done implicitly when arguments are not passed as single value arguments:
list(1, "value")
<list> <number>1</number> <string>value</string> </list>
In the above case, 1 and ”value” are both passed to the list element on the default channel.
Elements define whether they do receive arguments on a specific channel or not. One possibly interesting aspect is that an element that does not process arguments on a channel, will automatically return all values received on that channel. It is therefore possible to use named channels to return values to elements other than the immediate parent. Assuming that foo is an element that does not take any arguments on any channels, the following will produce the same result:
list(1, 2, 3)list(foo(1, 2, 3))
<list> <number>1</number> <number>2</number> <number>3</number> </list> <list> <foo> <number>1</number> <number>2</number> <number>3</number> </foo> </list>
Since foo does not process any arguments, all the arguments it receives on the default channel will be returned to the parent element.
Argument Mapping
It is not always convenient to use the named argument form to pass arguments to an element. Elements in Karajan will automatically map arguments received on the default channel to single value arguments. The mapping is done dynamically, in the order arguments are received. Suppose there is an element foo that takes three arguments, namely one, two and three. The following would then be equivalent:
foo(one = 1, two = 2, three = 3) foo(one = 1, two = 2, 3) foo(one = 1, 2, 3) foo(1, 2, 3) foo(1, 2, three = 3) ...
<foo one = "1" two = "2" three = "3"/> <foo one = "1" two = "2"> <number>3</number> </foo> <foo one = "1"> <number>2</number> <number>3</number> </foo> <foo> <number>1</number> <number>2</number> <number>3</number> </foo> <foo> <number>1</number> <number>2</number> <argumentname="three" value="3"/> </foo> ...
Optional Arguments
The unfortunate side-effect of using automatic mapping of default channel arguments to single value arguments is that elements that would accept both single value arguments and arguments on the default channel (variable arguments) cannot avoid mapping of variable arguments to certain single value arguments unless different semantics are introduced: optional arguments. Optional arguments do not need to be specified. However, if specified, the named form must always be used. An example is the print element, which has an optional argument named nl . It can be set to false to indicate that no new-line character should be appended at the end of the message argument:
false())
or
false())
The following however, is not valid:
false())
Return values
Return values are a mirror image of the arguments concept. Whatever can be accepted as an argument by an element can also be returned by another. Thus, it is possible to define a single element that returns all arguments to any given element. The following example defines an element that returns both a message and the named form of the nl argument, suitable for the print element:
//The following defines an element foo() which takes no //arguments and returns "Message" and nl = false()element(foo, [] "Message", nl =false() )
<elementname="foo" arguments=""> <string>Message</string> <argumentname="nl"> <false/> </argument> </element> <
Argument Evaluation Order
There is no imposed order for evaluating arguments. The order is controlled by each element. Most elements, by default, evaluate their arguments in sequential order. However, it is very easy to override the default order by using elements that use a different execution order. For example, the parallel element evaluates all of its arguments in parallel, returning all the resulting values. Evaluating the arguments to an element in parallel then becomes as easy as surrounding them with a parallel element:
list(parallel( "Value1" "Value2" ) )
<list> <parallel> <string>Value1</string> <string>Value2</string> </parallel> </list>
Furthermore, the way in which an element processes the arguments is also left to each element. For example, an element can choose to start executing after the evaluation of all arguments has been completed, or process arguments as they arrive. In other words, and in the most general case, arguments are both generated and processed asynchronously.
A concrete example is the print element, which simply returns the message argument on the stdout channel. When Karajan starts execution, an implicit root element is created that receives arguments on the stdout channel, and prints them to the console. Since the processing is done asynchronously, the appearance of print doing the actual work when executed is achieved. The advantage of such a mechanism is that, provided that an element does not produce any side-effects, its execution becomes equivalent to the totality of values returned (both single values, and channels).
Variables and Scope
We will not insult the reader’s intelligence by explaining what variables are. There is no explicit declaration of variables in Karajan. A variable is defined when it is assigned the first time.
The scope of a variable extends to the element that it was defined in, and is pseudo-lexical. By pseudo-lexical it is meant that internally, the scoping is dynamic, but provisions are made to make it impossible to access variables outside the lexical scope. Therefore, Karajan does not support closures.
On a lexical level, it is possible to read the value of a variable defined in a parent element, but setting the value of the same variable will create a new scope. In other words, Karajan uses deep access and shallow binding. The following example should make things clearer:
... v := 1 //$v$ is $1$ on stack frame $n$list( //A new stack frame is created: $n+1$ v //$v$ refers to the variable on frame $n$ v := 2 //A new binding is made for $v$ on frame $n+1$ //the new binding shadows the one from frame $n$ v //$v$ now refers to the binding on frame $n+1$ ) //The returned list contains the values $1$ and $2$
... <setname="v" value="1"/> <list> <variable>v</variable> <setname="v" value="2"/>
<variable>v</variable>
</list>
<variable>v</variable> </print> ...
In the above example, it would be impossible for the definition of list to access variable v, since the body of the definition of list does not fall within the lexical scope of the definition of v.
The reason for this kind of scoping is to reduce the ambiguity that could be introduced by not knowing the order in which child elements are executed. Please note that such ambiguity is not completely eliminated. The order in which the arguments to list are evaluated does matter, and can change the resulting list. However, the scope of the ambiguity is the same as the scope of the ambiguity in the order of evaluation of the elements. If set, list, and print are evaluated in sequence, no change in the way list evaluates its arguments can change the outcome of the execution of print(v). 1
Global Variables
Global variables are provided for conveniently defining settings that have a global scope. Please note that in the future, global variables will be single-assignment, thus approaching more the notion of constants.
global(foo, "Foo")element(boo, []
Variable Expansion
Karajan offers convenient variable expansion constructs. All pairs of curly brackets inside strings are replaced by the value of the variable with the name of the identifier inside the brackets. If no such variable exists, the element trying to access the string will fail. If the ’{’ literal is needed inside a string, it must be used twice. There is no need to escape the closing curly bracket, since it cannot be part of an identifier. If a closing bracket is part of a variable expansion expression, it will mark its end. If not, it will be interpreted as the closing curly bracket literal:
a := 1
<setname="a" value="1"/> <
Futures
Futures are a mechanism of binding a variable to the results of a future computation. Until the value of the computation to which the future is bound to, the future exists in an unbound state. Any attempt to use the value of an unbound future will cause the execution of the thread that tried to access the future to block until the future becomes bound. Errors occurring before a future is bound in the thread evaluating the future will cause the error to be reported when the future is accessed. Errors occurring in the thread evaluating the future after the future is bound will not be visible.
In Karajan there are two types of futures:
- single value futures
- are used to hold a single value of a future computation. They are defined using the
futureelement.
- future iterators
- can be used to hold multiple values. However, not all the values need to be generated before the future iterator can be used. Iterating over a future iterator will cause the iteration to use as many values as are available, then block waiting for more values to be added to the future iterator. Future iterators are defined using
futureIterator.
Source Files
As mentioned in Section ??, Karajan understands two syntaxes: the native Karajan syntax and the XML syntax. The distinction between them is made using the file extension. A file with the “.k” extension will be parsed using the native parser, while a file with the “.xml” syntax will be parsed using an XML parser. [1]
Libraries
Libraries are collections of elements grouped by the functionality they provide. A library is defined in a source file. Its functionality can be reused in other source files by using the import (equivalent to include) element:
import("sys.k")
It is possible to include XML libraries from native Karajan files. It is also possible to include native Karajan libraries from XML Karajan files. Consequently, the following are valid:
import("task.xml")
<import file="task.k"/>
Namespaces
Namespaces provide a way of distinguishing between elements with conflicting names in different libraries. Suppose a library “a” defines an element named foo, and a library “b” also defines an element named foo. Also, suppose that both libraries are included in a certain file. Namespaces make it possible to access both instances of the foo definition, without ambiguity, by prefixing the name with the namespace prefix in which the element was defined: a:foo and b:foo. Any reference to foo without a prefix will result in an error. Nonetheless, if only one of the libraries is used, the use of foo without a prefix will be allowed. Namespaces are defined using namespace.
