Powerful global computing networks are set to enable a new, collective approach to doing science. Physics will lead the way

 
© CERN
  The Grid will play a major role in analysing particle collisions from the Large Hadron Collider

It is extraordinary how much the Internet has changed our lives in just a few years. We now all talk to each other, find out information, shop and bank through the informal, global network of computers, using a system of software called the World Wide Web. Yet the Web was first purely a physicist’s tool – developed at CERN, the European particle physics laboratory in Geneva, to allow its research teams based across different countries to communicate and share data.

Now, physicists and computer scientists are driving the next evolutionary stage in Internet technology, which is to broaden the capability of computer networks to provide delocalised, large-scale resources for storing, accessing and processing data, and to make them available to groups of people with common goals. Computer services will become a utility like electricity, supplied via a ‘Grid’ that you can simply plug into without being concerned about their location.

The first beneficiaries will be large collaborations of scientists and engineers working on problems that involve huge amounts of data. Indeed, one of the first large-scale examples of the Grid in action will be in high-energy physics. The Large Hadron Collider, being built at CERN to investigate the ultimate physics of the Universe, will generate billions of megabytes of data to be made available for analysis to thousands of physicists working in teams around the world. No supercomputer could manage this amount of data on its own. The LHC Grid will therefore be spread over an international network of sites in a four-tier system.

Other scientific areas dealing with deluges of data, will also benefit from this distributed computing approach. These include astronomy, Earth observation, engineering and the biosciences (in particular exploiting the information from the Human Genome project). In fact, the Grid makes possible a powerful and pervasive research methodology – ‘e-science’.

Idle computers
The idea of distributed computing is quite old. Even 20 years ago, people realised that individual computers could be harnessed like a team of horses to provide more processing power. Today, the fastest supercomputers are collections of microprocessors, and many laboratories string together clusters of PCs – each more powerful than a supercomputer was 10 years ago – into computer ‘farms’.

However, there is no reason why the PCs should be in one location or even in one country. This was realised in 1985 by Miron Livny at the University of Wisconsin, who pointed out that most PCs are often idle and proposed pooling their spare processing capacity using Internet software. The first well-known application was for the SETI (search for extraterrestrial intelligence) project. Home-computer users could download a screen-saver, SETI@home, which sifted through the plethora of radio signals received from space by the Arecibo telescope in Puerto Rico. Other ‘@homes’ soon followed enabling various medical projects such as cancer and AIDS research.
   
© BAE Systems
Developing new aircraft will be done by
'virtual' organisations supported by the Grid
 

This was just the beginning. In 1995 Ian Foster at Argonne National Laboratory and Carl Kesselman of the University of Southern California took the concept in a new direction. They realised that research communities, which are becoming increasingly international and multi-disciplinary, could link all their computers so as to share data, applications, instruments and other resources in a coordinated and controlled way to create a single virtual laboratory. The concept of the Grid was born.

Foster envisages that to solve particular problems, groups of people would come together to form ‘virtual organisations’ supported by the Grid infrastructure. These could be particle physicists working with a new collider, a company consortium designing a new jet, or a crisis-management team dealing with an environmental problem.

Creating a large-scale computing infrastructure, of course, requires not only the right hardware – sets of computer clusters and high-speed connections – but also standard protocols and services to knit the system together – ‘middleware’. The system has to be secure and interface flexibly with different operating platforms. It has to recognise who is allowed access, what kinds of programs or data are being requested and how to allocate the computer resources. A set of Grid technologies is slowly emerging, the first being provided by the Globus Toolkit developed by teams headed by Foster and Kesselman. However, other research groups are also working on providing middleware components.

Grids ‘R’ Us
Scientists, especially physicists, have been quick to recognise the potential of this new way of working. Grid projects abound. For example, in the US there is the Grid Physics Network (GriPhyN), the NASA Information Power Grid, the Department of Energy Science Grid, the Particle Physics Data Grid and the National Science Foundation’s Tera Grid, among others. Europeans have been equally active and are working on a range of initiatives including the CERN DataGrid, an EU-funded Eurogrid and the Virtual Observatory (see Box).

The UK Government has also recognised the potential of e-science, allocating 120M over 3 years for various Grid applications in all areas of science and engineering, for upgrading hardware and for developing generic middleware. Theoretical physicist and computer scientist, Tony Hey, leads the initiative. A national network of Grid centres is being set up. Many of the projects will involve collaboration with industry as well as interfacing with international Grid projects. IT companies such as IBM, Microsoft and Sun are also gearing up to participate in creating the infrastructure.

The ultimate aim will be eventually to provide an intelligent global computing and data resource, that, like the present Internet, can be exploited by science, commerce and individuals according to their needs.

Astronomy offers an ideal testbed for e-science. Astronomers realised that ongoing and new sky surveys cataloguing stars and galaxies at all wavelengths were creating vast data sets. The idea of a Virtual Observatory has emerged as a way of coordinating the data from the different archives so they could be easily accessed. This approach will change the way astronomers work. They will be able to explore a wide range of observations, across wavelengths for example, in a way that will lead to new astrophysical insights. Furthermore, researchers in developing countries, and students, will benefit as they are given direct access to resources otherwise unavailable to them. The sociological impact of the Grid approach to doing science will be enormous.

VISTA
The VISTA telescope surveys will generate large amounts of data

To find out more about Grid projects and technologies, start off at the National e-Science Centre which has links to a range of sites www.nesc.ac.uk/ggrid/index.html

For more on virtual observatories start at The Virtual Observatory Forum www.voforum.org/

TOP OF PAGE

RETURN TO CONTENTS

Thanks go to Tony Hey of Southampton University / EPSRC / DTI, and also to Gerry Gilmore of the Cambridge Institute of Astronomy for their help with this paper