Problem
The Incident Scenario
Our company "Metarail" is one of the largest enterprises in the country. Due to historical incidents related to the power supply in other railway companies [12,13], and to structural-specific needs in "Metarail", our head office with its data center are in the same location (adjacent buildings).
The whole company, being a massive enterprise, consumes a lot of electricity and thus has a separate power supply line from the local power distribution company called "EnerGo Ltd".
"Metarail" received a notification from "Energo Ltd". that some maintenance would be done on the power line on May 10th, where complete cut off of the consumer attached to the power line will be required.
During the maintenance time, "Metarail" needs to rely on other power sources: 70% will be supplied from a reserve power line supplied by "EnerGo Ltd". The other 30% will need to be taken out of autonomous generating systems (fuel generators) which will also be supplied by the mentioned power distribution company.
After all required preparations were completed, the switch from the main power line to alternative power sources was completed as planned at midnight on May 10th. The maintenance required 24 hours to be completed.
Alternative power sources were controlled by technicians of EnerGo Ltd. that guaranteed proper functioning of the infrastructure and uninterrupted functioning of "Metarail" in general.
Close to 11:00 PM of May 10th a delay occurred, and the maintenance works were continued up till 10:00 AM of May 11th.
When everything was completed, specialists of "EnerGo Ltd". declared readiness to switch back to the main power supply line. They also expressed concerns that during the switch, a short (a few seconds) power supply interruption could appear, and thus recommended doing the switch after working hours.
Our management discussed the risk with heads of responsible departments and considering that all critical systems are backed up by UPSs, and the previous day showed some deficit of electricity in the system, a decision was taken to do the switching to a major power supply line immediately.
Engineers of "EnerGo Ltd". started to perform the switch to the main power line. In 6 minutes, the reserve power line and autonomous generators were disconnected from the consumer, and UPSs started to work. In 6 minutes and 10 seconds, after the main line was connected to our system, an unprecedented spike in voltage was detected in the electricity network of our company and all our systems went down. The action plan of switching back to the reserve power line and autonomous power sources was activated, but even after connecting back, literally, none of the systems worked.
IT department reported that all major servers are down and are not rebooting. That impacted critical business systems and processes: payment system, ticket distribution system, scheduling platform, website, and many others. Being a critical company, time for recovery is very limited. We activated BCP and Disaster Recovery Plan to restore the functionality of "Metarail". However, that was after "Metarail" lost valuable time which affected the company, its customers, and its reputation.
Each participant in the group must assume one of the following roles:
1. President - Make the big decisions, ensure the viability of the company
2. IT Manager - Focus on the IT systems to support the business
3. Operations Manager - Focus on the clients and employees
4. Legal/HR - Ensure compliance with applicable laws/regulations + assistance on hiring/firing/overtime
5. Communications - Managing the internal and external communications during the crisis
Section I: of this problem will be executing the tabletop exercise from the point of view of each of the above roles. My role is Operations Manager- Focus on the clients and employees. write from the perspective of each of these roles on how you would handle the incident. Since your all work for the same organization, the response must be coordinated.
Section II: of this problem will focus on the 2 discussion questions. Each group must explore each question from the perspective of their organization.
Two Questions Related To The Scenario.
I. What countermeasures could "Metarail" have considered to mitigate this risk and minimize the impact of a system failure and what would be the first steps which need to be taken in response to this incident?
II. How "Metarail" should have designed its communication strategy with its internal and external stakeholders during this incident to mitigate risks that are related to its reputation, as well as its customers' trust and loyalty?