We've received alerts at 15:54 of ports flapping in London due to what appears to be a dark fibre issue. We've shut down the path and are investigating.
Fault with Dark Fibre Link in London Backbone
- Event Started
- 2022-11-11 15:54
- Report Published
- 2022-11-11 16:05
- Last Updated
- 2022-11-26 03:36
- Event Finished
Timeline (most recent first)
The works are complete, and we are now monitoring the network before closing out this maintenance.
We are about to swap the THW end.
We have swapped the THN end of the link and are monitoring the situation before proceeding to do the THW end.
We are commencing works.
Engineers have arrived on site and are preparing for the works.
We've been advised by our supplier that it may be another two weeks for the necessary replacement parts... assuming no further delays to shipping in the run up to the winter holiday period.
We normally light two wavelengths between Telehouse North (THN) and Telehouse West (THW): a direct route, and a long diverse path passively traversing our equipment in Interxion's Brick Lane/Hanbury Street campus. Unfortunately the optical fault affects the short, direct path. As the long path is approximately 20km, through various ducts and street chambers within London, we assess that this is at much higher risk of being impacted by third parties than a connection entirely under the control of Telehouse, and completely within their Docklands campus. Indeed, we experienced one such issue on 25th October which we attribute to third party works carried out on one of those links between Docklands and Brick Lane.
Given the significant delays to delivery of replacement optics, we have decided to schedule emergency maintenance to the THW-THN direct path. We apologise for the short notice, but we feel that this is the best option to minimise risk while we await a permanent fix.
Unfortunately this will involve a brief interruption to the connectivity between those two points of presence, as we currently do not have sufficient spare equipment at both THW and THN to establish both the direct and long paths concurrently. Therefore we will be carrying out this maintenance tonight at 03:00-04:00 UTC, the quietest traffic period on our network, so as to minimise any disruption it may cause.
We enabled one end of the suspect link this morning to try to determine if the optics were at fault or the fibre is the underlying cause. Unfortunately this caused our ring switch device in Telehouse West to believe it had a functioning link to Telehouse North via the direct path (even though the Telehouse North end is disabled), and so began blackholing about 50% of the traffic between THW and THN, which caused a short period of packet loss and disruption. We have disabled the port again, and will now talk to our DWDM vendor about warranty replacement of the failed part.
- down for 29 seconds
- up for 120 seconds
- down for 28 seconds
- up for 4 seconds
- down for 20 seconds
- up for 183 seconds
- down (and administratively staying down)
We've observed that the link flapped hard three times, before settling into a state where one end of the link believed the link was up while the other end believed it to be down.
For now we've removed the link from service pending more tests.