Cloud computing providers are under great pressure to reduce operational costs through improved energy utilization while provisioning dependable service to customers; it is therefore extremely important to understand and quantify the explicit impact of failures within a system in terms of energy costs. This paper presents the first comprehensive analysis of the impact of failures on energy consumption in a real-world large-scale cloud system (comprising over 12 500 servers), including the study of failure and energy trends of the spatial and temporal environmental characteristics.
Our results show that 88% of task failure events occur in lower priority tasks producing 13% of total energy waste, and 1% of failure events occur in higher priority tasks due to server failures producing 8% of total energy waste. These results highlight an unintuitive but significant impact on energy consumption due to failures, providing a strong foundation for research into dependable energy-aware cloud computing.
Authors: Peter Garraghan | Ismael Solis Moreno | Paul Townend | Jie Xu