Java CoG Kit Component Repository

From Java CoG Kit

Jump to: navigation, search

Gregor von Laszewski and Deepti Kodeboyina


Contents

CoG Kit Workflow Component Repository

The Workflow component repository is a service used to store, retrieve, and search for components that can be dynamically integrated into a Java CoG Kit workflow. The repository service promotes reusability of components that can either be maintained by an individual researcher, or by a shared community of peers with similar interrests.

Although the design of the component repository is independent of a particular technology such as a filesystem, or a database, our initial implementation contains a provider that utilizes the IBM Cloudscape/Apache Derby database. We provide deployment examples to use them either as an embedded stand alone repository or as a shared community repository.



Guide to setting up and basic usage of your own personal workflow repository

1. Download the cogkit (You need to have Java and Ant installed on your local machine)

2. ant dist in the modules/repository dir in the cogkit.

3. Set the location of your repository in the repository.properties files available from ./etc The attribute that needs to be changed is derby.repository.dir.

4. Run the createRepository.xml script available on the machine in the examples directory using karajan. This will create a local repository in the location that is specified in your repository.properties.

5. The README in the repository module contains a description of the properties.

6. change directory to modules/repository/dist/repository-1.0/ where ./bin has the launchers to use the cog-workflow or cog-repository and ./examples has the examples that can be used to run using cog-workflow karajan and the component files that get stored in the repository.

Note: The existing karajan components have got no metadata associated with them currently and for testing purposes the components for the repository alone have been enhanced with the metadata. The component.xsd file is the schema for the components and the repository also allows for the extension of extra metadata attributes besides the ones that have been described in the schema. These can be added into the repository only when the attributes internal database schema is changed using the appropriate API. The Javadoc for the repository is helpful for this purpose.

7. You may run any of the examples[searchComponent.xml, retrieveMetadata] available in the examples directory to add, view or delete components that have been preloaded into your repository. New karajan components may also be added by you in this manner. This using the repository.xml file may be included as part of a larger complex workflow. Also, keep in mind, that the Java API may be used in your programs if the repository needs to be accessed programmatically.

8. The supplied cog-launchers are in the modules/repository/dist/repository-1.0/bin

Repository Code and documentation

The component repository is contained in the module: TBD
The link to the source code is: TBD
The repository launcher description is available at : Repository Launcher

The current source code has been moved to our svn repository on sourceforge but the previous version can be found in CVS.

Although you will not need any information about the Database if you use the Java CoG Kit, additional information about Cloudscape/Derby can be obtained at the IBM Web site or in an article at IBM developerworks.

Repository Usage Details

Setting up the Repository

The repository can be setup using the following files.

repository.properties[-] This contains the repository locations and various other attributes of the database that can be manipulated by the user. File should contain the host, port, name, passwd

build.xml[-] An ant script that contains targets to

  • Install the karajan repository provided as part of the distribution on your local machine.
  • Will also set classpaths ?
  • targets to create the default database
  • Needs to retrieve database metadata
  • Not sure if it needs to contain other targets to start/stop a server and other commands that will be part of the command line tools.

command line tools[-] Tools to perform various operations with the repository listed below. * * cog-repository-admin

  • cog-repository

cog-repository-admin

This is used to perform the below listed admin functions with the repository. The cog-repository-admin command will have the following options.

- type <embedded or server> indicates the repository type

- install <database location> to copy the database from ~/module/database to a user specified directory and set classpaths. This step is only for the workflow repository.

- host <hostname or ip> required for all server related calls

- portnumber <port> required for all server related calls

- username <username> required only if user authentication is enabled

- passwd <passwd> required only if user authentication is enabled

- start starts up the repository network server at the assigned port on the localhost or the name provided for the localhost

- stop stops the repository network server

- backup <backup location> backs up the the database to the above location

- recovery <backup location> reverts to the last backed up data at the above location

- ping tests if the server is up and running

- logging <true/false> logging can be enabled or disabled [check if this can be done when the server is running]

- add-user <username, password, database-wide/system-wide> adds a user to a specifice database or for the whole system specified in the final option

- remove-user <username> removes the user specified in the username

- authentication<true/false> user- authetication is enabled

- properties <properties file> location of the properties file to override the properties file.

cog-repository

General commands to be issued to the repository. But I will need to write a wrapper to make connections and close them or something similar. I will decide on this once I have the code to do to make the calls.

- type <embedded/server> repository type

- repositorylocation <hostname:port or local filesystem location> provides the location for the derby system

- username <username> required if user authentication is required [set read-only access mode for user]

- password <password> required if user authentication is required

- getschema gets the database metadata

- add <filename> adds a component to the repository

- remove <componentname> removes a component from the repository

- list lists all the components present in the repository

- search <regex> searches and returns a list of components that match the expression

- save <componentName, fileName> saves the given component to a file

- get <componentName>

Sample XML component file and schema

Currently, we can store in our repository files of this sample format. NOTE: change the name in the database schema.Check for ]] to see if space exists.

Filename: test.xml
<?xml version="1.0"?>
<component xmlns="http://cogkit.org/cog-workflow/karajan/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    
xsi:schemaLocation="http://cogkit.org/cog-workflow/karajan/ component.xsd">
<metadata author="John Smith" 
          name="Performance Test" 
          version="1.0">
     <shortDescription> Measures the time for a delay funtion</shortDescription>
     <description>This component is used to test the repository</description>
     <dateCreated>2005-06-06</dateCreated>
     <dateModified>2005-06-06</dateModified>
     <language>xml</language>
     <signature> not yet implemented </signature>
</metadata>
<source>
<![CDATA[
<project>
     <include file="sys.xml"/>
     <include file="currentTimeMillis-def.xml"/>
   
     <timer:initialize/>
     
     <timer:start name="one"/>
     <wait delay="1000"/>
     <timer:print name="one"/>
    
     <timer:start name="two"/>
     <parallel>
          <wait delay="2000"/>
          <wait delay="1000"/>
     </parallel>	
     <timer:print name="two"/>
     <timer:print name="one"/>
</project>
]]>
</source>
</component>

All workflow components that are to be stored in the repository follow this XML schema.

Storing Components in the repository

Components can be stored in the repository by either using the Java API directly or through workflows using the repository library. Consider a component such as the one below with metadata values added to it. NOTE: Make sure that you have disconnected from ij(use "exit;") before connecting to the repository using java or karajan while using an embedded database. Attributes besides the ones that have been inserted into the database will not be saved into the repository. Please make sure that you set new attributes in component_attributes table before you try to save the workflow component in the repository.


Java Example

The repository can be used to store components as follows. The code below illustrates storage of components by setting the component values from a file and saving them in the repository using cRepository.setComponent(Component componentObject, String componentName). Else, in order to load directly form a file the loadComponentsFromFile() may be used. NOTE:add another method in there that has only one-line connection URL.

       ComponentRepository  repository = RepositoryFactory.newRepository("derby");
       // why is this not new ComponentRepository(...) ??
         
       repository.setLocation("local", "/home/gregor/repository");
       repository.connect();
       Connection connection = repository.getProvider();
       Component component   = new DerbyComponent("test.xml", conn);
       // why is Derby in the method name?
       try {
            component.set("/home/gregor/test.xml");
           }
       catch (MetaDataNotFoundException exception) {
            exception.printStackTrace();
           }
       repository.setComponent(component,component.getName());
       repository.disconnect();

Karajan Example

the java and the karajan example should be the same

The workflow that uses components that are stored in the repository must use a library file to call the corresponding java methods. The library file(repository.xml) would use karajan's java library to issue these calls. A sample workflow component that creates components via the workflow using the workflow library is as below.

<karajan>
  
  <include file="cogkit.xml"/>
  <include file="repository.xml"/>   
   
  <repository:initialize />
  <repository:setProvider providertype="jdbc:local"
                          dblocation="/home/gregor/.globus/repository"/>
  <repository:connect/>
  <repository:loadComponent fileName="/home/gregor/test.xml"/>
  <repository:disconnect/>
   
</karajan>

the load component does not make much sense. is this a file? its not explained very well.


The repository library has the tags repository:initialize, repository:setProvider, repository:connect

Future

We plan to There are yet some improvements that need to be made to the system but it still provides us with a lightweight database under Apache License that can be packaged with the CoG Kit containing some of the workflows system components.

Appendix

Prerequisite Steps to set up your own embedded repository

If you use the java cog kit the following information is not necessary.

The instructions below are directed towards setting up Apache Derby on your local computer. If you are using the components that are transferred from a remote server it is not necessary for you to install Derby. On integration of the repository with the CoG Kit the repository will be pre-packaged and available for download along with Derby. This GettingStarted guide is useful in setting up Apache Derby if you find that the steps below are not sufficient for you.

1. As a first step towards setting up the repository you will need to download Apache Derby from here.

2. Once Derby has been installed on your local system, you could use the tool ij provided by Derby which is at ~/IBM/Cloudscape/frameworks/embedded/bin or at ~/IBM/Cloudscape/frameworks/NetworkServer/bin.

3. Run ij.sh or ij.bat depending on your operating system. Note: Make sure that the jars derby.jar and derbytools.jar are in your classpath in order to use ij.

4. Create a repository in a location of your choice using

  connect 'jdbc:derby:/home/gregor/repository;create=true'; 

If you are connecting for the first time only set create=true. For all subsequents connects leave this out and connect to the database using.

  connect 'jdbc:derby:/home/gregor/repository';

5. The schema of the database if defined using the following commands

  create table component_metadata(comp_id varchar(40) NOT NULL PRIMARY KEY, shortDesc varchar(100),
  description varchar(255), author varchar(25), version varchar(20), date_created date NOT NULL,
  date_modified date, language varchar(10) NOT NULL, signature varchar(255));
  -----------------------------------------------------------------------------
  create table component_code(comp_id varchar(40) CONSTRAINT code_fk REFERENCES 
  component_metadata(comp_id), code CLOB);
  -----------------------------------------------------------------------------
  create table component_attributes(attr_id varchar(40) NOT NULL PRIMARY KEY, attr_desc varchar(40), 
  attr_type varchar(40) NOT NULL);

6. The basic attributes have to be updated to component_attributes using:

  INSERT INTO component_attributes VALUES('comp_id','Components Id','varchar(40)');
  INSERT INTO component_attributes VALUES('shortdesc','Short description','varchar(100)');
  INSERT INTO component_attributes VALUES('description','Two line description','varchar(255)');
  INSERT INTO component_attributes VALUES('author','creator of the component','varchar(25)');
  INSERT INTO component_attributes VALUES('version','version of the component','varchar(20)');
  INSERT INTO component_attributes VALUES('date_created','date of creation','date');
  INSERT INTO component_attributes VALUES('date_modified','date of modification','date');
  INSERT INTO component_attributes VALUES('language','component language','varchar(10)');
  INSERT INTO component_attributes VALUES('signature','signature for authenticating', 
  'varchar(255)');
    

The above attributes will be the ones that will be recognized as part of the metadata. But using the DerbyAttributes API more can be added or some these may be removed.

Prerequisite Steps to set up your own Derby Network Server

Setting up a Network server for your repository enables multiple clients to remotely connect to the repository. Using the repository API, you can access a remote database when you furnish the required connectionURL. In case you would like to allow remote users to access your database, a server is to be setup on your local machine. The Derby system can also be embedded in an external server framework of your choice and the details regarding that can be found here or we could use the Network Server that has been provided along with Derby. The steps towards setting up a network server have been described below. Although most of the instructions can be found here the link above is more comprehensive in case you have problems setting this up.

1. To start up the server make sure that the following jars are on your classpath : derbynet.jar, derby.jar, db2cc.jar, db2cc_license_c.jar. The scripts to set the classpathsare available in the Derby install directory $CLOUDSCAPE_INSTALL\frameworks\NetworkServer\bin. 2. The startNetworkServer.bat (Windows) and startNetworkServer.ksh (UNIX) scripts start the server. These scripts are located in the $CLOUDSCAPE_INSTALL/frameworks/NetworkServer/bin. But the server can also be started up using

java org.apache.derby.drda.NetworkServerControl start -h wiggum -p 1368

where h, p are optional and default to localhost and port number 1527 if not provided. The values may also be set in the derby.properties file.\ 3. Additionally, you could verify if the server is up and running using the ping command. This can be done programmatically or using

 java org.apache.derby.drda.NetworkServerControl ping [-h <host>] [-p <portnumber>] 

4. The server can be shutdown similarly using the shutdown scripts. stopNetworkServer.bat [-h <host>] [-p <portnumber>] for Windows and stopNetworkServer.ksh [-h <host>] [-p <portnumber>] for unix or by

java org.apache.derby.drda.NetworkServerControl shutdown [-h <host>][-p <portnumber>]

If user authentication is not enabled, then shutting down the server will automatically shut down all the databases. If it is then the databases have to be explicitly shutdown.

repository.xml

Defined within it which are used in other workflows by including the repository.xml file as in the below example. repository:setProvider for instance, is defined within the library as

    <element name="repository:setProvider" arguments="providertype, dblocation">
        <java:invokeMethod object="{_repository}" method="setProvider" 
         types="java.lang.String, java.lang.String">
             <argument value="{providertype}"/>
             <argument value="{dblocation}"/>
        </java:invokeMethod>
    </element> 

In the above call, the object _repository is initialized using

  <set name="_repository">
    <java:new 
     classname="org.globus.cog.abstraction.repository.impl.jdbc.DerbyRepository">
    </java:new>
  </set>
Personal tools
Collaboration and Jobs