Performance Analysis of Cloud Computing Services for Many Tasks Scientiﬁc Computing
Cloud computing is an emerging commercial infrastructure paradigm that promises to eliminate the need for maintaining expensive computing facilities by companies and institutes alike. Through the use of virtualization and resource time-sharing, clouds serve with a single set of physical resources a large user base with different needs. Thus, clouds have the potential to provide to their owners the beneﬁts of an economy of scale and, at the same time, become an alternative for scientists to clusters, grids, and parallel production environments. However, the current commercial clouds have been built to support web and small database workloads, which are very different from typical scientiﬁc computing workloads. Moreover, the use of virtualization and resource time-sharing may introduce signiﬁcant performance penalties for the demanding scientiﬁc computing workloads. In this work we analyze the performance of cloud computing services for scientiﬁc computing workloads. We quantify the presence in real scientiﬁc computing workloads of Many-Task Computing (MTC) users, that is, of users who employ loosely coupled applications comprising many tasks to achieve their scientiﬁc goals. Then, we perform an empirical evaluation of the performance of four commercial cloud computing services including Amazon EC2, which is currently the largest commercial cloud. Last, we compare through trace-based simulation the performance characteristics and cost models of clouds and other scientiﬁc computing platforms, for general and MTC-based scientiﬁc computing workloads. Our results indicate that the current clouds need an order of magnitude in performance improvement to be useful to the scientiﬁc community, and show which improvements should be considered ﬁrst to address this discrepancy between offer and demand.
computing requires an ever-increasing number of resources to deliver results for evergrowing problem sizes in a reasonable time frame. In the last decade, while the largest research projects were able to afford (access to) expensive supercomputers, many projects were forced to opt for cheaper resources such as commodity clusters and grids. Cloud computing proposes an alternative in which resources are no longer hosted by the researchers’ computational facilities, but are leased from big data centers only when needed. Despite the existence of several cloud computing offerings by vendors such as Amazon and GoGrid , the potential of clouds for scientiﬁc computing remains largely unexplored. To address this issue, in this paper we present a performance analysis of cloud computing services for many-task scientiﬁc computing. The cloud computing paradigm holds great promise for the performance-hungry scientiﬁc computing community: Clouds can be a cheap alternative to supercomputers and specialized clusters, a much more reliable platform than grids, and a much more scalable platform than the largest of commodity clusters. Clouds also promise to “scale by credit card,” that is, to scale up instantly and temporarily within the limitations imposed only by the available ﬁnancial resources, as opposed to the physical limitations of adding nodes to clusters or even supercomputers and to the administrative burden of over-provisioning resources. Moreover, clouds promise good support for bags-of-tasks, which currently constitute the dominant grid application type . However, clouds also raise important challenges in many aspects of scientiﬁc computing, including performance, which is the focus of this work.
Free download research paper