Routing instability London
Incident Report for Simwood
Postmortem

Cogent acknowledged this incident an hour or so after we mitigated it and have now provided the following RFO:

Your service connected at London Volta may have experienced connectivity issues for some minutes.

During the execution of a non impacting maintenance in our network, we made a mistake and applied some configuration changes on the wrong device, causing the isolation of a node router. As soon as we realized about the mistake, we reverted the config changes on the affected device.

This has been a human error. However we will review the maintenance process to see if there is any room of improvement to avoid a similar issue in the future.

Apologies for any inconvenience caused.

Thankfully, we have multiple transit providers and multiple connections to each distributed around the network, so shutting one down in this kind of situation, or even for a prolonged period, is a non-event. Further, only 10% of our traffic flows over transit - 90% is directly on-net or flows over bilateral peering.

We continue to encourage customers to connect as directly as possible - either by being on-net directly, cross-connected to us in a common data centre, or having your colocation provider peer with us if they don’t already. Please speak to us if you’d like this.

We will be testing Cogent’s fix and re-enabling this session later this evening.

Posted 23 days ago. Jul 26, 2019 - 10:31 UTC

Resolved
Between 23.21 and 23.30 (UK time), customers connecting services in our London availability zone but reaching us over Cogent transit, may have seen instability. Cogent were announcing our routes but not passing traffic. The session was shut down from our side to alleviate things.

Those on-net or reaching us over peering sessions (which is the majority) and other ultimate transit providers, or reaching other availability zones over Cogent would have been unaffected.

We strongly encourage all customers to connect into us directly wherever possible.
Posted 24 days ago. Jul 25, 2019 - 22:37 UTC
This incident affected: Availability Zones (London).