BGP Router Crash in Telehouse North Backbone
- Event Started
- 2022-04-13 10:46
- Report Published
- 2022-04-13 11:10
- Last Updated
- 2022-04-16 15:16
- Event Finished
- Ongoing
Timeline (most recent first)
-
2022-04-16
15:15:00We are tentatively marking this as closed.
-
2022-04-15
22:13:00The new BGP process on
earhart
remains stable throughout the day. We note that average CPU usage for the last 16+ hours is approximately one third of the previous averages recorded for periods and traffic loads.We are continuing to monitor.
-
2022-04-15
05:57:00The new BGP process on
earhart
remains stable. We are continuing to monitor. -
2022-04-15
04:57:00We have enabled interfaces on
earhart
— running the newer BGP process — and are monitoring the situation. -
2022-04-14
16:30:00We have prepared an update to the router, which includes a more recent version of the BGP routing process. We are going to perform some testing, before deciding whether to reintroduce this into the network.
-
2022-04-14
09:32:00The BGP process on our peering and transit router in Telehouse North,
earhart.n.faelix.net
, has once again crashed spontaneously, causing a period of network instability.We've removed earhart from service while we investigate this issue.
-
2022-04-13
19:36:00During our ongoing monitoring we noticed some prefix-lists have been corrupted in running configurations (deviating from the saved configurations). We have removed and re-applied these, and the affected traffic flows are now going via the expected paths.
-
2022-04-13
14:23:00The last hour has been completely stable since fixing RIB/FIB mismatches.
We are continuing to monitor the network.
-
2022-04-13
13:24:00We have identified and resolved some lingering RIB/FIB mismatches.
-
2022-04-13
11:42:00The BGP process is remaining running.
-
2022-04-13
11:34:00The router has booted up again.
-
2022-04-13
11:13:00The router ran for 6 minutes, before the BGP process crashed:
vyos@earhart.n.faelix.net:~$ show ip bgp sum vtysh: error reading from bgpd: Connection reset by peer (104)Warning: closing connection to bgpd because of an I/O error! Warning: connecting to bgpd...failed! bgpd is not running
-
2022-04-13
11:00:00The affected router is finishing rebooting.
-
2022-04-13
10:36:00We've received alerts about a routing issue in Telehouse North, blackholing significant amounts of traffic.