The Northeasternand Midwestern parts of the United States along with Ontario, Canada faced amajor power outage on Thursday evening, August 14, 2003, at nearly 4:10 p.m.Eastern Daylight Time (EDT). It was the second largest blackout in history atthat time affecting nearly 10 million people in Ontario, Canada andapproximately 45 million people in the states of New Jersey, Ohio, Connecticut, Michigan, Massachusetts,Pennsylvania, Vermont and New York.
The main causebehind this blackout was a bug in GE Energy’s alarm system at the control roomof an Akron, Ohio based company, FirstEnergy Corporation. Due to this bug, theoperators were not warned by an alarm warning. This resulted in the unawarenessamong FirstEnergy system operators that the transmission lines had beenoverloaded and race condition had been triggered in the software for energymanagement system. As a result, the FirstEnergy system operators failed to takeany action which resulted in a widespread blackout which, otherwise, would havebeen easily manageable local blackout. This bug resided so deeply into the codethat it took them weeks to analyze millions of lines of codes and data tofinally rectify it.The mostimportant cause of this software failure was that the software did not have anyfailure detection module for alarm system.
The alarm software had already failedat 14:14 but the IT staff were completely unaware about this failure until next40 minutes when the second EMS server stopped functioning. Even after this, thestaff thought that only the server had failed functioning not realizing thatthe alarm system had already stopped functioning about 40 minutes ago. Thishappened because FirstEnergy Corporation did not have any provision of periodicdiagnostics of the alarm processor which would have rather helped in detectingthe alarm system failure.
The computer support staff of FirstEnergy controlcenter restored the server soon but did not fully test all the functionalitiesof the application and, therefore, they were still unaware about alarm systemfailure. In addition to this, the FirstEnergy operators even lacked aneffective alternative to easily visualize and analyze the conditions of thesystems. If they had any such functionalities, the operators would have easilyknown about alarm system failure and would have allowed them to warn MISO andneighboring systems about the failure of alarm system which would have put themon alert and monitor the conditions more closely.In some partspower was restored by 11 p.
m. but most of the affected areas did not receivepower for 2 to 4 days and in some parts of Ontario, it took about a week to getthe power back. This contributed to an estimate cost of around $5 billion to$10 billion. Because of this blackout, the gross domestic product of Canadawent down by 0.7% in August as a result of loss of 18.9 million work hours.