Core router crash: aebi.m.faelix.net

Degraded Performance

published: 2020-04-17 07:54
started: 2020-04-17 07:43:06
finished: 2020-04-17 10:00
resolved: 2020-04-17 16:00

One of our core routers, aebi.m.faelix.net has dropped all BGP sessions and restarted. We are investigating the cause. Meanwhile traffic has re-routed to other core routers and transit links.


Timeline

2020-04-17 07:56

aebi is booting up again, and we will begin monitoring the situation as we bring it back into service.

2020-04-17 08:32

aebi started again, but almost immediately afterwards the BGP process crashed again. We are investigating the cause.

2020-04-17 09:02

We have observed some routing instability where traffic intended for our Manchester datacentres was being routed by Cogent via the USA. This seems to have settled down now, but we are continuing to watch the situation.

2020-04-17 09:19

aebi seems to be running stably now, after an update to the operating system and routing engine software. We are continuing to monitor this.

2020-05-11 14:47

Power seems to be stable, and all our providers have been back online for the last hour. However, we are awaiting an update from Equinix about the overall long-term state of the data-centre.

2020-05-13 11:51

The latest update from Equinix indicates that they are about to begin work to repair the generator. The whole of MA1 appears to still be at risk (“N” redundancy). “Equinix IBX site staff reports that they continue to work with the vendor to resolve the issues with the Units 1-3 UPS system. The works will involve transferring the load from utility supply to generator supply. The UPS system will then be transferred into bypass mode to enable the repairs works to continue. This work is scheduled to commence at 13-MAY-2020 12:30 Site Local Time. The redundancy remains at N.”