Outbound Calls Failing
Incident Report for Simwood
Resolved
This incident was fully resolved earlier.

A bug was introduced yesterday to a module which the micro-service that performs the authentication for username/password authenticated calls depended. It was functional and passed tests but the module concerned was updated on container restart which meant it progressively worked its way through production instances overnight. The result was that when containers restarted they were not reporting themselves as available for service, and thus were unavailable. This is an anycast service and the final available instance did this after 8am this morning.

Auth trunks (username/password authenticated) represent less than 1% of calls so this was not apparent in call volumes or failure rates, and was not reported by high volume customers. As the services were passing build tests and were operational, things looked normal until we received credible reports from customers after 8am this morning and actively investigated.

To compound confusion, our own PBX runs on-net on a customer's hosted PBX platform and for various reasons relied on an auth trunk to route out of hours calls. Customers who were testing service by ringing our office therefore wrongly concluded this was a wider issue. Our test line is 0330 122 9999 (or indeed 999/9999 on any Simwood number range) and these were fully functional throughout. The issue was exclusively outbound calls for the sub-1% of calls that rely on username/password authentication, inbound and IP address authenticated outbound were unaffected.

Whilst this has never happened before, it gives a number of lessons for things we can improve with monitoring of such services. Apologies to those affected.
Posted 17 days ago. Apr 02, 2019 - 09:56 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted 17 days ago. Apr 02, 2019 - 08:35 UTC
Identified
We are aware of an issue affecting some outbound calls, from customers using username/password auth trunks. The issue has been identified and we’re working on resolving this as a priority. Customers using IP authentication, and inbound calls, are not affected.
Posted 17 days ago. Apr 02, 2019 - 08:27 UTC