Biomedical Computation

Opal Toolkit

About Opal Toolkit

The Grid-based infrastructure enables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, Grid resources are not very easy to use for the end-users who have to learn how to generate security credentials, stage inputs and outputs, access Grid-based schedulers, and install complex client software. There is an imminent need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access Grid resources can be quite complicated, especially if it has to be replicated for every application. Towards that end, we have implemented Opal, which is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling, standards-based Grid security and data management in an easy-to-use and configurable manner.

Why Opal?

The salient features of Opal are as follows:

  • Opal enables deployment of scientific applications as Web services without having to write a single line of Web services code
  • Opal exposes the scientific functionality through a generic Web services API (via a standard WSDL)
  • Opal hides the complexity involved in the submission of computational jobs to Grid resources
  • Opal manages user data, which includes creation of working directories, input and output data staging, and persistent storage for job information and metadata
  • Opal services can be configured with GSI-based security for authentication and authorization purposes
  • Opal services can be accessed from a multitude of languages (Java, Python, Perl, JavaScript) and plaforms (Windows, various Unix flavors)

 

So why use Opal, and not just Globus GRAM to launch remote jobs? Here are some reasons.

  • Deploying an application as an Opal service is very easy, and can be achieved under a couple of hours. It can often be done much faster than that, once the first Opal service has already been deployed. After the necessary software has been downloaded and installed, adding a new service is a matter of modifying a few configuration files and using an Ant script to deploy the service.
  • Every user doesn’t have to deploy the application. From our experience, we have learnt that deploying a scientific application can be quite complicated if it has to be done by every user. If Opal is used, the service provider deploys this application once which can then be used by any client via a SOAP API.
  • Every user would typically need an account on the cluster if they use the traditional Globus GRAM approach. In theory, multiple client DN’s could be mapped to a generic group user account – but this means that all the users have to ensure that they don’t interfere with others who may be logged on to the same account. The Opal approach is much cleaner – only authorized users are allowed to run jobs using GSI-based transport level mechanisms. However, since they are not allowed to run *any* arbitrary command, they don’t interfere with one another. Furthermore, it is easier to keep track of user requests this way because every single user can be accounted for (unlike the former where only the users are only accounted for as a single group).
  • Users don’t have to do their own data management. Using the traditional method, every user would have to stage input and output files manually. Furthermore, they would have to create new working directories for every single run (so that output files from older runs are not overwritten). On the other hand, Opal performs the data management for the user. It creates new working directories automatically for every run, and returns URLs to the user to retrieve the outputs when the execution is complete.
  • Users don’t have to be concerned with the schedulers being used at the back-end. The service is configured to use a scheduler supported by Globus (e.g. Condor, SGE) – the users are oblivious to this. Services also be configured to use other back-ends such as DRMAA. In the traditional approach, users would have to submit to a particular scheduler, and be mindful of what back-end is being used.
  • Since the applications are exposed via a SOAP API, clients can be easily written in a variety of languages, and accessed from different platforms. Clients are shielded from any changes that happen at the backend (upgrades, etc) as long as the SOAP APIs and the URLs for connecting to the services stay the same. Currently, we have Java clients used in Gridsphere-based portals, Javascript clients used in the Mozilla-based Gemstone framework, and Python clients used in the PMV toolkit. Furthermore, workflow toolkits like Kepler can be used to orchestrate complex scientific pipelines based on Opal Web services.