Search

Top 60 Oracle Blogs

Recent comments

The 3 step program

Richard Foote updated his blog today with a description of the 3 step process for troubleshooting technical problems with business systems. Briefly his 3 steps are

  1.  Identify an actual problem that needs addressing, one that’s problematic to the business, not one that only exists in some statistic or in one’s imagination
  2.  Determine what’s actually causing the problem as identified in Step 1.
  3.  Address the specific issue as identified in Step 2.

This started as a comment, but grew a bit. I suspect that most of the time the ‘difficulty’ lies in step 1. Identifying a problem that is causing drag on your employers business. This requires at least: 

  1. understanding the business in the first place.
  2. specifying to a high degree of certainty the issue.
  3. quantifying the impact.

IT staff are notoriously bad at 1) and 3) and business staff are notoriously bad at 2) and 3). For example some colleagues of mine went to a meeting with business users of a core system that has historically suffered significant downtime. We identified and made some infrastructure changes that have reduced the downtime by approximately 40 days a year (that’s right this system was running at circa 80% availability).  The system has been running in it’s new configuration at over 99% availability, and helpdesk calls have all but vanished. The meeting was quite difficult since the business users wanted to complain about the stability of the system. In particular they were upset with the 99% availability statistics because they felt that the stats did not reflect reality, which was that occasionally data was ‘lost’ or application sessions were apparently hung. The fact that other users could continue to work did not mean that the service was available.

This illustrates particularly well my point 2, the technologists involved had understood a problem statement "the system is often unavailable" in terms of the uptime of the application – i.e Can I log on? The business users on the other hand interpreted the exact same statement as meaning "we often encounter unexpected errors when using the application".