An Early Performance Analysis of Cloud Computing Services for Scientific Computing

Scientific computing requires an ever-increasing number of resources to deliver results for growing problem sizes in a reasonable time frame. In the last decade, while the largest research projects were able to afford expensive supercomputers, other projects were forced to opt for cheaper resources such as commodity clusters and grids. Cloud computing proposes an alternative in which resources are no longer hosted by the researcher’s computational facilities, but leased from big data centers only when needed. Despite the existence of several cloud computing vendors, such as Amazon and GoGrid , the potential of clouds remains largely unexplored. To address this issue, in this paper we present a performance analysis of cloud computing services for scientific computing.

The cloud computing paradigm holds good promise for the performance-hungry scientific community. Clouds promise to be a cheap alternative to supercomputers and specialized clusters, a much more reliable platform than grids, and a much more scalable platform than the largest of commodity clusters or resource pool. Clouds also promise to “scale by credit card,” that is, scale up immediately and temporarily with the only limits imposed by financial reasons, as opposed to the physical limits of adding nodes to cluster or even supercomputers or to the financial burden of over-provisioning resources. Moreover, clouds promise good support for bags-of-tasks, currently the dominant grid application type . However, clouds also raise important challenges in many areas connected to scientific computing, including performance, which is the focus of this work. There are two main differences between the scientific computing workloads and the initial target workload of clouds, one in size and the other in performance demand. Top scientific computing facilities are very large systems, with the top ten entries in the Top500 Supercomputers List totaling together about one million cores. In contrast, cloud computing services were designed to replace the small-to-medium size enterprise data centers with 10-20% utilization. Also, scientific computing is traditionally a high-utilization workload, with production grids often running at over 80% utilization and parallel production infrastructures (PPIs) averaging over 60% utilization . Scientific workloads usually require top performance and HPC capabilities. In contrast, most clouds use virtualization to abstract away from actual hardware, increasing the user base but potentially lowering the attainable performance. Thus, an important research question arises: Is the performance of clouds sufficient for scientific computing? Though early attempts to characterize clouds and other virtualized services exist , this question remains largely unexplored. Our main contribution towards answering it is threefold:
1. We evaluate the performance of the Amazon Elastic Compute Cloud (EC2), the largest commercial computing cloud in production (Section 3);
2. We compare clouds with other scientific computing alternatives using trace-based simulation and the
results of our performance evaluation (Section 4);
3. We assess avenues for improving the current clouds for scientific computing; this allows us to propose two
cloud-related research topics for the high performance distributed computing community (Section 5)

Free download research paper