An Early Performance Analysis of Cloud Computing Services for Scientific Computing

Cloud Computing is emerging today as a commercial infrastructure that eliminates the need for maintaining expensive computing hardware. Through the use of virtualization, clouds promise to address with the same shared set of physical resources a large user base with different needs. Thus, clouds promise to be for scientists an alternative to clusters, grids, and supercomputers. However, virtualization may induce significant performance penalties for the demanding scientific computing workloads. In this work we present an evaluation of the usefulness of the current cloud computing services for scientific computing. We analyze the performance of the Amazon EC2 platform using micro-benchmarks, kernels, and e-Science workloads. We also compare using long-term traces the performance characteristics and cost models of clouds with those of other platforms accessible to scientists. While clouds are still changing, our results indicate that the current cloud services need an order of magnitude in performance improvement to be useful to the scientific community

Scientific computing requires an ever-increasing number of resources to deliver results for growing problem sizes in a reasonable time frame. In the last decade, while the largest research projects were able to afford expensive supercomputers, other projects were forced to opt for cheaper resources such as commodity clusters and grids. Cloud computing proposes an alternative in which resources are no longer hosted by the researcher’s computational facilities, but leased from big data centers only when needed. Despite the existence of several cloud computing vendors, such as Amazon [5] and GoGrid [15], the potential of clouds remains largely unexplored. To address this issue, in this paper we present a performance analysis of cloud computing services for scientific computing. The cloud computing paradigm holds good promise for the performance-hungry scientific community. Clouds promise to be a cheap alternative to supercomputers and specialized clusters, a much more reliable platform than grids, and a much more scalable platform than the largest of commodity clusters or resource pool. Clouds also promise to “scale by credit card,” that is, scale up immediately and temporarily with the only limits imposed by financial reasons, as opposed to the physical limits of adding nodes to cluster or even supercomputers or to the financial burden of over-provisioning resources. Moreover, clouds promise good support for bags-of-tasks, currently the dominant grid application type [23]. However, clouds also raise important challenges in many areas connected to scientific computing, including performance, which is the focus of this work.

Free download research paper