Amazon Traces Cloud Outage To Faulty Breaker
Improperly configured breaker opened and brought down portion of cloud[Data Center Dynamics]
Amazon Web Services has released details about the root cause of the outage of one of its public-cloud’s availability zones that started in the evening on 14 June and lasted until next morning, US Pacific time.
In a note posted on the cloud’s status dashboard, the company said the outage was caused by a cable fault in the power distribution system of the electric utility that served the data center hosting the US-East-1 region of the cloud in northern Virginia.
The entire facility was switched over to back-up generator power, but one of the generators overheated and powered off because of a defective cooling fan. The virtual-machine instances and virtual-storage volumes that were powered by this generator were transferred to a secondary back-up power system, provided by a separate power-distribution circuit that has its own backup generator capacity.
But, one of the breakers on this backup circuit was configured incorrectly and opened as soon as the load was transferred to the circuit. The breaker was set up to open at too low a power threshold.
“After this circuit breaker opened … the affected instances and volumes were left without primary, back-up, or secondary back-up power,” Amazon’s note read.