Disaster recovery experts dig down into Azure cloud outages over past 12 months

James is editor in chief of TechForge Media, with a passion for how technologies influence business and several Mobile World Congress events under his belt. James has interviewed a variety of leading figures in his career, from former Mafia boss Michael Franzese, to Steve Wozniak, and Jean Michel Jarre. James can be found tweeting at @James_T_Bourne.


The majority of Microsoft’s service errors in the first quarter of 2014 were advisory, while there were significantly more service interruptions in the following three quarters, according to analysis carried out by CloudEndure.

The figures, taken from Azure’s Service Health Dashboard across last year, saw three full service interruptions in Q1, a whopping 28 in Q2, 16 in Q3 and zero in the final quarter. The highest number of errors came in Q1 (259), yet also produced the lowest number of partial service interruptions (88), compared to 134, 129 and 127 for the other three quarters.

The analysis came about after Azure suffered two debilitating outages last year; one in August, and one in November, which was caused by storage blob front ends going into an infinite loop – a process which went undetected during testing.

Picture credit: CloudEndure

John Dinsdale, chief analyst at Synergy Research, told CloudTech the outage was “really not good”, and Microsoft’s response was “an awful lot less than stellar.” This came after research which found Microsoft had taken the clear second place – behind AWS, naturally – in the global cloud infrastructure market.

Not surprisingly, Americas West had the most outages overall (114), followed by Americas East (98) and Europe West (91), although Europe West and Americas North Central had the most full service interruptions (5). By service, compute had by far the most errors (135), followed by SQL databases (124), virtual machines (64), websites (61) and storage (55).

Despite these figures, CloudEndure is quick to point out that statistics should be taken with a pinch of salt. “Planning the location of your app based on the historical number of errors and performance issues is probably not the best approach”, a blog post reads. “While cloud provider issues are important, you should keep in mind that the top reason for application downtime remains human error.”

It’s certainly the reason Joyent’s servers unexpectedly went down back in May – and it’s good advice to follow here.

View Comments
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *