The Cloud's Cloudy Moment: A Systematic Survey of Public Cloud Service Outage

Abstract

Inadequate service availability is the top concern when employing Cloud computing. It has been recognized that zero downtime is impossible for large-scale Internet services. By learning from the previous and others' mistakes, nevertheless, it is possible for Cloud vendors to minimize the risk of future downtime or at least keep the downtime short. To facilitate summarizing lessons for Cloud providers, we performed a systematic survey of public Cloud service outage events. This paper reports the result of this survey. In addition to a set of findings, our work generated a lessons framework by classifying the outage root causes. The framework can in turn be used to arrange outage lessons for reference by Cloud providers. By including potentially new root causes, this lessons framework will be smoothly expanded in our future work.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…