Slough BT emergency maintenance
Incident Report for Simwood
Postmortem

We have posted a post-mortem for this and related incidents on our blog.

Posted Nov 12, 2015 - 16:27 UTC

Resolved
This chassis has been swapped out and internal limits lifted.

We had a total of 0 support tickets during this window so thank customers for your patience and trust our handling of it avoided any issues for you.

Unfortunately however, this hasn't made any difference to the two exchanges which remain out of service so BT need to investigate further. As we mentioned on Saturday though, these are very light traffic sites and whilst their absence reduces redundancy for incoming calls from some parts of London, only does so to the level most competitors operate at normally.

We will issue a post-mortem on this incident in due course as whilst we feel it validates our approach, we're hard on ourselves and there are inevitably lessons to learn.
Posted Nov 09, 2015 - 16:02 UTC
Update
BT have confirmed that there are no faults on the links and they therefore need to swap their chassis. They will do this in the next 10-15 minutes.

We have identified two customers whose traffic profile may adversely affect others during this time and they have been notified of restrictions. Others will see traffic flow through other sites and of course (for UK traffic) other carriers we interconnect with.

We continue to monitor.
Posted Nov 09, 2015 - 15:32 UTC
Update
This work is now scheduled to begin at 1500Z but they will be running diagnostics in the first instance. These will determine whether the equipment does in fact need replacing or whether the issue is off-site.
Posted Nov 09, 2015 - 14:02 UTC
Monitoring
Following Saturday's BT issues they continue to work on the remaining circuits that were not restored when they fixed their Ealing fault. As part of this they wish to replace some equipment in our rack which means we need to take Slough entirely out of route (SS7 only - IP remains in service) later this afternoon. The present estimate is 4-5pm. Customers do not need to take any action and this should not be service affecting.

As customers will have seen on Saturday, our network is designed so traffic will seamlessly flow to another site in the event of one being out of service. We connect to each BT exchange multiple times so they just see a drop in capacity to us in this scenario, rather than us having to ask them to re-route calls. This is far more expensive and quite unusual in the market-place but is the only way we will do it. On this occasion, that drop in capacity and redundancy will affect all circuits in Slough during a relatively busy period. We will therefore be monitoring internally and imposing internal channel limits to ensure traffic remains balanced, although there should be enough headroom available.

This issue also highlights a difference in the way we manage our physical capacity with BT that is worth mentioning. Competitors may gloat about how many points of interconnect they have but then blame their "network partner" for outages. In addition to having every point of interconnect doubled, we take the more expensive option of having BT manage mutual interconnect capacity right the way to our racks. This issue would be far harder to diagnose and to assign responsibility for fixing if we'd done it the more 'normal' way. For all their faults and our views on their IP products, BT know how to run a TDM network.

We will update when this work is due to begin and update on progress throughout.
Posted Nov 09, 2015 - 13:24 UTC