Search

OakieTags

Who's online

There are currently 0 users and 51 guests online.

Recent comments

Affiliations

Business Blog

Advert: Few Oracle Database Appliances at significant discount

This blog is a little bit self serving and I’d normally not post it but I think that it would be an awesome deal for those of you who are thinking of buying an Oracle Database Appliance now. We have several just two left brand new, unopened ODAs left in our inventory that we need to move. Half of them are gone to our customers but there are few still left. We are not really interested in holding on to them while somebody else can put them to good use so we have very very (did I say very?) good price. :) Limited time offer as they say.

2001: Oracle Database Appliance by Dell — Déjà vu

Stumbled upon this Dell’s article from 2001 — The Oracle Database Appliance by Dell: Architecture and Features. The idea of appliance-like platform for Oracle database is obviously not new but the latest implementation of Oracle Database Appliance makes the most sense from all previous attempts (comments are open below to disagree if you’d like). What [...]

Oracle ACE Directors + 1 — Gwen Shapira

Quick post congratulating Gwen Shapira on becoming Oracle ACE Director. Gwen has be an Oracle ACE for a while by now and been very active in the community. Widely recognized in the conferencing circles and a frequent blogger, Gwen has recently been focusing a lot on Big Data and many of her recent articles have [...]

CanWIT Panel — CIO or CTO? The Path to Next Generation Technology Leadership

Last Thursday I was invited to the panel organized by Ottawa Chapter of Canadian Women In Technology (CanWIT). I wanted to mention it here as CanWIT sets up very interesting events for women in IT so if you are interested in progressing your IT career, definitely consider their events. The panel was designed to share [...]

Oracle Database Appliance — What Does It Mean for You and Your Business?

When I first heard about Oracle Database Appliance and what it does, I got really excited — I saw great potential in this product. When we got our hands dirty and started testing the appliance, I become confident that this product will be a hit. Now it’s finally the time when I can share my [...]

Oracle OpenWorld 2011 — Bloggers Meetup

Isn’t that that time of the year again? Yes, it is — it’s time for our annual Oracle Bloggers Meetup and of course Oracle is piggybacking OpenWorld with the meetup again! ;) What: Oracle Bloggers Meetup 2011 When: Wed, 5-Oct-2011, 5:00pm Where: Main Dining Room, Jillian’s Billiards @ Metreon, 101 Fourth Street, San Francisco, CA [...]

Oracle Exadata vs SAP HANA

Before I left on vacation (now almost a month ago – can’t remember when I had such a long vacation if I ever had), Mark Fontecchio organized a short video conference between myself and John Appleby. The idea was to compare Oracle Exadata with SAP HANA in a shot video discussion. Unfortunately, video part didn’t [...]

Incident Notification – Pythian internal – Jul 29th 3PM EDT

We have incident reporting procedures at Pythian. This incident report was sent just recently internally at Pythian. We learned some good lessons from it so I hope it would be useful to the community as well – copying it below as is… As part of our incident management process, you will find below a summary [...]

RMOUG 2011: Pythian Raffle Results

I’m following up on a conference almost half a year later — try to bet that! Actually, this blog post was written more than 3 months ago and was sitting in my drafts waiting the moment I understand why I really wrote it. 3 months later… I still don’t know but I thought I should [...]

Handling Human Errors

Interesting question on human mistakes was posted on the DBA Managers Forum discussions today.

As human beings, we are sometimes make mistakes. How do you make sure that your employees won’t make mistakes and cause downtime/data loss/etc on your critical production systems?

I don’t think we can avoid this technically, probably working procedures is the solution.
I’d like to hear your thoughts.

I typed my thoughts and as I was finishing, I thought that it makes sense to post it on the blog too so here we go…

The keys to prevent mistakes are low stress levels, clear communications and established processes. Not a complete list but I think these are the top things to reduce the number of mistakes we make managing data infrastructure or for that matter working in any critical environment be it IT administration, aviation engineering or medical surgery field. It’s also a matter of personality fit – depending on your balance between mistakes tolerance and agility required, you will favor hiring one individual or another.

Regardless of how much you try, there are still going to be human errors and you have to account for them in the infrastructure design and processes. The real disasters happen when many things align like several failure combined with few human mistakes. The challenge is to find the right balance between efforts invested in making no mistakes and efforts invested into making your environment errors-proof to the point when risk or human mistake is acceptable to the business.

Those are the general ideas.

Just a few examples of the practical solutions to prevent mistakes when it comes to Oracle DBA:

  • test production actions on a test system before applying in production
  • have a policy to review every production change by another senior member of a team
  • watch over my shoulder policy working on production environments – i.e. second pair of eye all the time
  • employee training, database recovery bootcamp
  • discipline of performing routing work under non-privileged accounts

Some of the items to limit impact of the mistakes:

  • multiples database controlfiles for Oracle database (in case DBA manually does something bad to one of them – I saw this happen)
  • standby database with delayed recovery or flashback database (for Oracle)
  • no SPOF architecture
  • Oracle RAC, MySQL high availability setup (like sharding or replication), SQL*Server cluster — architecture examples that limit impact of human mistakes affecting a single hardware component

Both lists can go on very long. Old article authored by Paul Vallee is very relevant top this topic — The Seven Deadly Habits of a DBA…and how to cure them.

Feel free to post your thoughts and example. How do you approach human mistakes in managing production data infrastructure?