Java CoG Kit

From Java CoG Kit
Jump to: navigation, search

Keywords: Abstractions, Workflow, jglobus, Grid, SSH, XML, Web Services

Fig 1: Integrated Approach

Grid users and developers desire an integrated but modular system that allows them

  • (a) to program the Grid in familiar higher level frameworks that permit rapid prototyping,
  • (b) to have a framework in which workflows on the Grid can easily be expressed,
  • (c) to have a framework that supports monitoring the state through visual components, and
  • (d) to have a system that is easy to maintain and deploy.

As depicted in Fig 1., the Java CoG Kit integrates a variety of concepts to address these requirements. Hence, end users will be able to access the Grid through standalone applications, a desktop, or a portal. Command line tools allow users to define workflow scripts easily. Programming is achieved through services, abstractions, APIs, and workflows. Additionally, we integrate commodity tools, protocols, approaches, methodologies, while accessing the Grid through commodity technologies and Grid toolkits. Through this integrated approach the Java CoG Kit provides significant enhancements to the Globus Toolkit. Today, the Java CoG Kit distributed in part with the Globus Toolkit version 3 and version 4. However, many of the features reported here are only available through the software found through our specialized download pages.

Abstractions and Providers

Fig 2: Layered Architecture

The Architecture of the Java CoG kit is based on a layered module concept that allows easier maintenance and bridges the gap between applications and the Grid middleware (Fig. 2). It allows the easy integration of enhancements developed by the community. One of the strength of the Toolkit is based on the abstraction and provider model.

We have identified a number of useful basic and advanced abstractions that help in the development of Grid applications. These abstractions include job executions, file transfers, workflow abstractions, and job queues and can be used by higher level abstractions for rapid prototyping. As the Java CoG Kit is extensible, users can include their own abstractions and enhance the functionality of the Java CoG Kit.

We introduced the concept of Grid providers that allows different Grid middleware to be integrated into the Java CoG Kit. Abstractions allow the developer to choose at runtime to which Grid middleware services tasks related to job submission and file transfer are submitted. This capability is enabled through customized dynamic class loading facilitating late binding against an existing production Grid.

Java CoG Kit jGlobus Library

The Java CoG Kit jGlobus module provides the basic APIs to the Grid to allow access to gridFTP servers, the classic GRAM services, and a complete implementation of GSI. It also includes the myProxy client libraries. The jGlogus jar file called cog-jglobus.jar is distributed with GT3 and GT4. Booth toolkits rely heavily on the classes and methods implemented in jGlobus.

Java CoG Kit Workflow

Workflows are an important part of enabling scientist to deal with their complex tasks. The Java CoG Kit has traditionally provided workflow concepts since its beginnings. At present we provide three possibilities for Grid users to define workflows. Which one you should use depends on your requirements. We will assist you in your decision. Please contact gregor@mcs.anl.gov.

We provide

  • a XML language and engine for workflows known under the code name Karajan
  • an abstraction for workflows as part of our API
  • a gridshell that allows to define simple workflows in a shell like language (prototype)

Karajan Workflow Framework

Usepattern.png
Pattern.png
Karajan-simple.png
Hierarchy-graph.png

Karajan is a workflow language and workflow engine. It aims to provide the scientific community with an easy-to-use tool to define complex jobs on computational Grids, while keeping scalability and offering some advanced features, such as failure handling, checkpointing, dynamic workflows, and distributed workflows. It features declarative concurrency, integration with Java CoG Kit 4 providing access to all CoG supported services, an integrated scheduler, extensibility.

Workflows in Karajan are defined by using a structured language based on XML and is extensible through Java. The building block of the language is the element, which loosely translates into an XML element/container. Various elements are included, such as elements for parallel processing, parallel iterators, and Grid elements (i.e., job submission and file transfer). Common tasks can be grouped by using templates, and can be reused from multiple locations.

The execution engine in Karajan is based on an event model, which allows effective separation between the workflow specification and the runtime state. Elements react to events received from other elements and generate their own events. These events provide notification of status changes within the execution or can be used to control the execution of elements. The complete runtime state is contained within the events, which allows the elements themselves to exist on different resources. This mechanism also allows an external controller, which has access to these events, to completely control the execution of the workflow. It also allows a certain level of modification to the elements to be performed, at runtime, without affecting the execution of other elements.

As an example, suppose a large job requires a transfer of the resulting data, after the completion of all calculations. Also suppose the specification of the transfer points to a non-existing resource as the destination for the data. The transfer will fail. A tool can be used to intercept the failure notification and present the user with a visual message. The user can then modify the bogus specification, after which the particular failing element can be restarted by using the state present in the failure event.


Grid Portals

We are partners of the Open Grid Computing Environments Project (http://www.ogce.org) that develops tools and components to support the development of Grid portals based on the JSR168 standard. The Java CoG Kit is automatically included in that release.

Java CoG Kit Desktop (prototype)

Fig 2: CoG Desktop

Current Grid middleware toolkits expose their functionality through services, programming models, and command line tools, requiring much technical knowledge of the Grid backend and middleware systems. Although Grid portals hide much of these complexities and allow users easy access to Grids, they fall short on integrating with native environments while maintaining a uniform graphical user interface the user is already accustomed to from systems such as KDE, Gnome, or MS Windows. The Java CoG Kit Grid Desktop attempts to provide the less technical user communities a user centric workspace which enhances the normal operating system desktop paradigm by interlacing Grid concepts. The user interface is based on popular desktop patterns such as drag-n-drop.

[Movie (AVI)]

Experiment Management (prototype)

Fig 3: Experiment Directory Structure

This tool introduces a framework for experiment management that simplifies the user’s interaction with Grid environments. We have developed a service that allows the individual scientist to manage a large number of tasks as typically found in experiment management. Our service includes the ability to conduct application state notifications. Similar to the definition of standard output and standard error, we have defined standard status that allows us to conduct application status notifications. We have tested our tool with a large number of long running experiments, and shown its usability in practical applications such as bioinformatics.

  • Documentation: [las05exp]
  • Source code: SVN; the old code is still available in the CVS

Java CoG Kit Ad-hoc Grid Framework (prototype)

Fig: Ad-hoc discovery

The Java CoG Kit Ad hoc Grid framework provides elements required for an ad hoc spontaneous Grid-collaboration between multiple parties. The key features of this framework include on-the-fly Grid collaborations, multiple group creation and membership, service provisioning in different technologies, distributed service publishing and discovery, abstraction based service invocation based on the Java CoG kit, support for ClassAds-based service-task matching, support for service reservation and service agreements, and a policy based authorization framework.


Java CoG Kit Qstat component

Fig: Qstat GUI

This component enables users to use qstat and qsub from their client machines, providing a GUI functionality for monitoring the queue state as well as submitting jobs. At this time we have support for two queueing systems: Cobalt, used by ANL Blue Gene, and PBS, used by UC/ANL TeraGrid. The component also contains XML and HTML output functions that can be used to build portals including the information returned by qstat. If you would like to get more information about the component or the source, please contact gregor@mcs.anl.gov.

The component can easily be installed via Java Webstart by following to the install link.

Documentation

Documentation is available at Java CoG Kit Documentation. More guides are under development. Please, explore them and send us e-mail about improvement suggestions. If you like to contribute a guide yourself, please contact gregor@mcs.anl.gov.

A number of references can be found within Gregor's resume HTML. The links in the resume will lead you to a down loadable version.

Selected Applications

  • Active_Thermochemical_Tables: The general concept of Active Tables (AT) will be introduced and elaborated using Active Thermochemical Tables (ATcT) as a concrete example. ATcT are a new paradigm that catapults thermochemistry from the traditional sequential approach into the digital computing arena of the 21st century. The CoG kit is used to manage the calculations of ATcT and to expose the ATcT as a service to the community.
  • Threat management in drinking water distribution systems involves real-time characterization of any contaminant source and plume, design of control strategies, and design of incremental data sampling schedules. This requires dynamic integration of time-varying measurements along with analytical mod-ules that include simulation models, adaptive sampling procedures, and optimi-zation methods. These modules are compute-intensive, requiring multi-level parallel processing via computer clusters. Since real-time responses are critical, the computational needs must also be adaptively matched with available re-sources. This requires a software system to facilitate this integration via a high-performance computing architecture such that the measurement system, the analytical modules and the computing resources can mutually adapt and steer each other. The CoG Kit workflow helps the coordination of the simulations necesarry to calculate a response through an adaptive cyberin-frastructure system facilitated by a dynamic workflow design las06water


Download

The Java CoG Kit 4 can be obtained by following the download link.

Personal tools
Collaboration and Jobs