Switch Crash in Manchester

We are investigating a core switch crash in Manchester which caused a brief interruption to connectivity to our customer VMs and colocation.

2018-07-24
16:17:00

We will be working with the support team for the vendor of the switches to determine whether this is caused by a known defect.
2018-07-24
16:15:00

We believe the connectivity interruption was been caused by a "split-brain" on our core switches: the master failed to respond to heartbeat messages, another master was elected, and then the original master started functioning normally again. The switches "solved" the split-brain by rebooting the older master switch, which caused a brief interruption to connectivity.
2018-07-24
16:05:00

The switch appears to have rebooted, coming back into full service just before 16:56:00. We are investigating why this has happened.
2018-07-24
16:04:00

A core switch became unresponsive at 16:54:16 local time today. Our engineers began looking into it as soon as alerts were raised a few seconds later.