Whilst our own and external monitoring shows no failures of application or network here, we have isolated this problem to the Slough proxy.
According to log files it was failing to load-balance some requests but was responding with a 100 Trying and was otherwise available. This caused outbound calls to time-out, where customer equipment was not respecting SRV or failing-over on time-out; other customers were unaffected. As for inbound calls, these were affected where landing in Slough as our own SS7 edge was not appropriately failing over to London at the SIP layer, whilst the call was already accepted at the SS7 layer.
We have identified a couple of configuration improvements that we intend to make at the earliest opportunity to better handle this scenario.
Whilst this appears to have been a brief issue that affected some customers but not others, we apologise for any inconvenience caused.
Posted 5 months ago. Jun 29, 2018 - 16:10 UTC
We continue to investigate the cause of this.
Posted 5 months ago. Jun 29, 2018 - 15:23 UTC
We've had a few reports of call failures coming into the network at around 15h30. Whilst this apparently recovered a few minutes later we're currently investigating the cause. Our own on-net and external monitoring doesn't show any failures at the IP network or application layer.
Posted 5 months ago. Jun 29, 2018 - 14:47 UTC
This incident affected: Availability Zones (London, Slough, Manchester).