cloud computing in science

cloud computing in science

Many scientists would love access to large-scale com- that the programming demands of using a supercomputer—as well as the cost and queuing time are too daunt- ing. Privately owned cloud comput- ers—large data centers filled with computers that mainly run their companys software—are now becoming available to outside users, including scientists and educators. Companies are leasing their computing resources on demand from a large shared pool to individuals who run their own software on a pay- as-you-go basis. This approach is an example of cost associativity ( 1): 1000 computers used for 1 hour costs the same as one computer used for 1000 hours. If your problem can be com- puted in a way that takes advantage of parallel processing, you can now get the answer 1000 times as fast for the same amount of money. Although companies had long been operating “private clouds” that run programs such as Google Search or Microsoft Hotmail, Amazon was the fi rst to let outside users run software on their computers. For example, Amazons Elastic Compute Cloud (EC2), announced in late 2007, allows anyone with a credit card to use any number of computers in Amazons data centers for 8.5 cents per computer-hour with no minimum or maximum purchase and no contract. Such an arrangement is possible because these “warehouse-scale” data cen- ters (~50,000 servers, see the fi gure) are fi ve to seven times cheaper to build and operate than smaller facilities (~1000 servers) ( 2), in terms of network, storage, and administrator costs. When operational costs, which include human administrators, power, and network- ing, are considered, the charges for cloud computing to outside users is price-compet- itive with using in-house facilities. Cost-associativity enables new capabili- ties. For example, a researcher in our labora- tory created an automated classifi er to detect spam on the popular social-communication site Training the classifi er took about 270 hours on a typical desktop work- station. The training program could be par- allelized: The problem could be broken into pieces and run at the same time, only occa- sionally sharing results between different pieces. The same task took 3 hours using about 100 servers in Amazons cloud (about $250 in usage fees). Many universities have also begun to use cloud computing for educa- tion, where cost-associativity is a great fi t for semester courses. Lots of computing demand could be provided around assignment dead- lines (more than even the biggest schools could provide), and when demand is less (between deadlines), no outside resources need to be purchased. Initially, cloud-computing hardware was confi gured primarily for its earliest adopt- ers—Web-based applications—and early attempts to run scientifi c applications on the cloud gave discouraging results ( 3, 4). New hardware is now confi gured for better per- formance on scientific applications. For example, Amazons recently added “cluster computing instances,” priced at $1.60 per computer-hour, run sci- entifi c benchmarks 8.5 times as fast as the original cloud hardware, according to experiments at the National Energy Research Scientifi c Computing Labo- ratory at Lawrence Berkeley National Laboratory ( 5). Cloud computing works best when a problem can be broken down into a large number of relatively indepen- dent tasks, each running on its own computer. Software frameworks like Googles MapReduce ( 6) and its open- source equivalent Hadoop ( 7) provide a data-parallel “building block” for expressing such computations (much like a Web design framework allows you to “build” a Web site by fi lling in the relevant information and functions you want). Critically, these frame- works also hide the complex software machinery that handles inevitable transient machine failures when hundreds of machines in a cloud environment work on a problem simultaneously. Many of the “success sto- ries” of science in the cloud have embraced Hadoop, and other popular tools such as the statistical package R now feature libraries that integrate with it. However, many prob- lems cannot be easily expressed in terms of map and reduce tasks (mapping parcels out the work, and reduce collates the results). Even when they can, the programming effort required may be substantial. Most desktop software is not written to take advantage of cloud computing and requires modifi cation before it could har- ness cloud resources and run faster. How- ever, the popular packages MATLAB and Mathematica are now available in versions that can “farm out” work to a public cloud. Cloud vendors including Amazon and IBM are working with independent software ven- dors on cloud-friendly versions of popular scientifi c software.

Free download research paper


free research papers service oriented architecture