Service Oriented Architecture and Science

New information architectures enable new approaches to publishing and accessing valuable data and programs. So-called service-oriented architectures define standard interfaces and protocols that allow developers to encapsulate information tools as services that clients can access without knowledge of, or control over, their internal workings. Thus, tools formerly accessible only to the specialist can be made available to all; previously manual data-processing and analysis tasks can be automated by having services access services. Such service-oriented approaches to science are already being applied successfully, in some cases at substantial scales, but much more effort is required before these approaches are applied routinely across many disciplines. Grid technologies can accelerate the development and adoption of service-oriented science by enabling a separation of concerns between disciplinespecific content and domain-independent software and hardware infrastructure. Paul ErdPs claimed that a mathematician is a machine for turning coffee into theorems. The scientist is arguably a machine for turning data into insight. However, advances in information technology are changing the way in which this role is fulfilled—by automating time-consuming activities and thus freeing the scientist to perform other tasks. In this Viewpoint, I discuss how service-oriented computing—technology that allows powerful information tools to be made available over the network, always on tap, and easy for scientists to use—may contribute to that evolution. The practice of science has, of course, already been affected dramatically by information technology and, in particular, by the Internet. For example, the hundreds of gigabytes of genome sequence available online means that for a growing number of biologists, Bdata[ is something that they find on the Web, not in the lab. Similarly, emerging Bdigital observatories[ Ealready several hundred terabytes in dozens of archives (1)^ allow astronomers to pose and answer in seconds questions that might previously have required years of observation. In fields such as cosmology and climate, supercomputer simulations have emerged as essential tools, themselves producing large data sets that, when published online, are of interest to many (2). An exploding number of sensors (3), the rapidly expanding computing and storage capabilities of federated Grids (4), and advances in optical networks (5) are accelerating these trends by making increasingly powerful capabilities available online. Sometimes, however, the thrill of the Web seems to blind us to the true implications of these developments. Human access to online resources is certainly highly useful, putting a global library at our fingertips. But ultimately, it is automated access by software programs that will be truly revolutionary, simply because of the higher speeds at which programs can operate. In the time that a human user takes to locate one useful piece of information within a Web site, a program may access and integrate data from many sources and identify relationships that a human might never discover unaided. Two dramatic examples are systems that automatically integrate information from genome and protein sequence databases to infer metabolic pathways (6) and systems that search digital sky surveys to locate brown dwarfs (7). The key to such success is uniformity of interface, so that programs can discover and access services without the need to write custom code for each specific data source, program, or sensor. Electric power–transmission standards and infrastructure enabled development of the electric power grid and spurred the development of a plethora of electric tools. In a similar manner, service technologies enable the development of a wide range of programs that integrate across multiple existing services for purposes such as metabolic pathway reconstruction, categorization of astronomical objects, and analysis of environmental data. If such programs are themselves made accessible as services, the result can be the creation of distributed networks of services, each constructed by a different individual or group, and each providing some original content and/or value-added product . We see this evolution occurring in the commercial Internet. As the Web has expanded in scale, so the preferred means of finding things has evolved from Yahoo_s manually assembled lists to Google_s automatically computed indices. Now Google is making its indices accessible, spurring development of yet other services. What makes Google_s indices feasible is the existence of large quantities of data in a uniform format (HTML, HyperText Markup Language) and—two important factors that must be considered when we turn to science—smart computer scientists to develop the algorithms and software required to manage the 100,000 computers used (at last count) to analyze Web link structure, and smart businesspeople to raise

Free download research paper