An Early Performance Analysis of Cloud Computing Services for Scientiﬁc Computing
Scientiﬁc computing requires an ever-increasing number of resources to deliver results for growing problem sizes in a reasonable time frame. In the last decade, while the largest research projects were able to aﬀord expensive supercomputers, other projects were forced to opt for cheaper resources such as commodity clusters and grids. Cloud computing proposes an alternative in which resources are no longer hosted by the researcher’s computational facilities, but leased from big data centers only when needed. Despite the existence of several cloud computing vendors, such as Amazon and GoGrid , the potential of clouds remains largely unexplored. To address this issue, in this paper we present a performance analysis of cloud computing services for scientiﬁc computing.
The cloud computing paradigm holds good promise for the performance-hungry scientiﬁc community. Clouds promise to be a cheap alternative to supercomputers and specialized clusters, a much more reliable platform than grids, and a much more scalable platform than the largest of commodity clusters or resource pool. Clouds also promise to “scale by credit card,” that is, scale up immediately and temporarily with the only limits imposed by ﬁnancial reasons, as opposed to the physical limits of adding nodes to cluster or even supercomputers or to the ﬁnancial burden of over-provisioning resources. Moreover, clouds promise good support for bags-of-tasks, currently the dominant grid application type . However, clouds also raise important challenges in many areas connected to scientiﬁc computing, including performance, which is the focus of this work. There are two main diﬀerences between the scientiﬁc computing workloads and the initial target workload of clouds, one in size and the other in performance demand. Top scientiﬁc computing facilities are very large systems, with the top ten entries in the Top500 Supercomputers List totaling together about one million cores. In contrast, cloud computing services were designed to replace the small-to-medium size enterprise data centers with 10-20% utilization. Also, scientiﬁc computing is traditionally a high-utilization workload, with production grids often running at over 80% utilization and parallel production infrastructures (PPIs) averaging over 60% utilization . Scientiﬁc workloads usually require top performance and HPC capabilities. In contrast, most clouds use virtualization to abstract away from actual hardware, increasing the user base but potentially lowering the attainable performance. Thus, an important research question arises: Is the performance of clouds sufficient for scientiﬁc computing? Though early attempts to characterize clouds and other virtualized services exist , this question remains largely unexplored. Our main contribution towards answering it is threefold:
1. We evaluate the performance of the Amazon Elastic Compute Cloud (EC2), the largest commercial computing cloud in production (Section 3);
2. We compare clouds with other scientiﬁc computing alternatives using trace-based simulation and the
results of our performance evaluation (Section 4);
3. We assess avenues for improving the current clouds for scientiﬁc computing; this allows us to propose two
cloud-related research topics for the high performance distributed computing community (Section 5)