BGP Router Crash in Telehouse North Backbone

Event Started
2022-04-13 10:46
Report Published
2022-04-13 11:10
Last Updated
2022-04-16 15:16
Event Finished

Timeline (most recent first)
  • 2022-04-16

    We are tentatively marking this as closed.

  • 2022-04-15

    The new BGP process on earhart remains stable throughout the day. We note that average CPU usage for the last 16+ hours is approximately one third of the previous averages recorded for periods and traffic loads.

    We are continuing to monitor.

  • 2022-04-15

    The new BGP process on earhart remains stable. We are continuing to monitor.

  • 2022-04-15

    We have enabled interfaces on earhart — running the newer BGP process — and are monitoring the situation.

  • 2022-04-14

    We have prepared an update to the router, which includes a more recent version of the BGP routing process. We are going to perform some testing, before deciding whether to reintroduce this into the network.

  • 2022-04-14

    The BGP process on our peering and transit router in Telehouse North,, has once again crashed spontaneously, causing a period of network instability.

    We've removed earhart from service while we investigate this issue.

  • 2022-04-13

    During our ongoing monitoring we noticed some prefix-lists have been corrupted in running configurations (deviating from the saved configurations). We have removed and re-applied these, and the affected traffic flows are now going via the expected paths.

  • 2022-04-13

    The last hour has been completely stable since fixing RIB/FIB mismatches.

    We are continuing to monitor the network.

  • 2022-04-13

    We have identified and resolved some lingering RIB/FIB mismatches.

  • 2022-04-13

    The BGP process is remaining running.

  • 2022-04-13

    The router has booted up again.

  • 2022-04-13

    The router ran for 6 minutes, before the BGP process crashed:$ show ip bgp sum vtysh: error reading from bgpd: Connection reset by peer (104)Warning: closing connection to bgpd because of an I/O error! Warning: connecting to bgpd...failed! bgpd is not running

  • 2022-04-13

    The affected router is finishing rebooting.

  • 2022-04-13

    We've received alerts about a routing issue in Telehouse North, blackholing significant amounts of traffic.