Search

Top 60 Oracle Blogs

Recent comments

outage

The strangest Oracle problem I ever encountered – can you guess the cause?

Before I joined Blue Gecko, I did independent remote DBA work, and called myself ORA-600 Consulting. Stemming from my hair-raising experiences in the trenches at Amazon in the late ’90s / early 2000s, I decided to specialize in emergency DBA work for companies in the midst of crises (I know, great idea for someone who wanted to get away from the Amazon craziness, right?).

One day in 2009, a company in Florida called my cell phone at 2AM. They described their problem as follows:

EC2 outage reactions showcase widespread ignorance regarding the cloud

Amazon EC2′s high-profile outage in the US East region has taught us a number of lessons.  For many, the take-away has been a realization that cloud-based systems (like conventionally-hosted systems) can fail.  Of course, we knew that, Amazon knew that, and serious companies who performed serious availability engineering before deploying to the cloud knew that. In cloud environments, as in conventionally-hosted environments, you must implement high-availability if you want high availability.  You can’t just expect it to magically be highly-available because it is “in the cloud.” Thorough and thoughtful high-availability engineering made it possible for EC2-based Netflix to experience no service interruptions through this event.