The Amazon Web Services (AWS) EC2 North Virginia cloud went down for approximately four hours yesterday, due to ‘elevated packet loss’.
The issue was first reported on the AWS Service Health Dashboard at 0208 PDT (0908 GMT) on October 15, with the dashboard cautiously noting “possible network connectivity issues” for EC2.
‘Possible’ became confirmed at 0219 PDT (0919 GMT), with a further update at 0320 PDT (1020 GMT) adding that it was due to elevated packet loss.
Only at 0514 PDT (1214 GMT) did Amazon confirm the problem had been fixed, adding that some APIs (application programming interfaces) had experienced increased error rates and latencies.
Some of the AWS customers affected included iSharingSoft, who tweeted that it was “experiencing network issues with #AWS”, and Cirrus Insight.
The Twitter sphere appeared, if anything confused at the lack of information on AWS’ global status page. Giedrius Banaitis tweeted that “as always, Twitter is much better for health monitoring EC2 than their status page”.
Of course, this is not the first time the AWS EC2 North Virginia cloud has gone down. With it being the largest datacentre for AWS, it is the most susceptible to outages.
The EC2 cloud suffered a hefty outage at the end of June, due to heavy storms in the Virginia area. The system appeared to only be down for a couple of hours, yet updates on power restoration appeared sporadically throughout the following day.
The outage in June led to various consequences, including dating site WhatsYourPrice.com dumping Amazon, with CEO Brandon Wade claiming that 100% uptime was “a required service level agreement for anyone providing cloud computing services”.
CloudTech spoke to IaaS provider Interoute’s CTO, Matthew Finnie, at Apps World who dismissed ideas of a 100% available SLA. Finnie said: “Amazon’s model is Amazon’s model. It’s based on their experience of running their business. Our SLA [99.99%] is a very normal SLA for what we want to do”.
Back in July Hewlett Packard unveiled an innovative strategy to combat uptime issues. The company promised that its Cloud Object storage facility will have a guaranteed 99.95% uptime – and if it doesn’t, users will get service credits.
Amazon’s most recent EC2 outage puts this issue in the spotlight again, and it appears there are plenty of different opinions out there. But which one do you agree with?