performance

My Whole System Is Slow. Now What?

At CMG'09 a couple of weeks ago, I presented "Measuring Response Times of Code on Oracle Systems." The paper for this presentation was a subset of "For Developers: Making Friends with the Oracle Database." In the presentation, I spent a few minutes talking about why to measure response times in Oracle, and then I spent a lot of minutes talking about how. As usual, I focused heavily on the importance of measuring response times of individual business tasks executed by individual end users.

Kernel NFS fights back… Oracle throughput matches Direct NFS with latest Solaris improvements

After my recent series of postings, I was made aware of David Lutz’s blog on NFS client performance with Solaris.  It turns out that you can vastly improve the performance of NFS clients using a new parameter to adjust the number of client connections.

root@saemrmb9> grep rpcmod /etc/system
set rpcmod:clnt_max_conns=8

This parameter was introduced in a patch for various flavors of Solaris.  For details on the various flavors, see David Lutz’s recent blog entry on improving NFS client performance.  Soon, it should be the default in Solaris making out-of-box client performance scream.

DSS query throughput with Kernel NFS

I re-ran the DSS query referenced in my last entry and now kNFS matches the throughput of dNFS with 10gigE.


Kernel NFS throughput with Solaris 10 Update 8 (set rpcmod:clnt_max_conns=8)

This is great news for customers not yet on Oracle 11g.  With this latest fix to Solaris, you can match the throughput of Direct NFS on older versions of Oracle.  In a future post, I will explore the CPU impact of dNFS and kNFS with OLTP style transactions.

Posted in Oracle, Storage Tagged: 11g, 7410, analytics, database, dNFS, NAS, NFS, Oracle, performance, Solaris, Sun, tuning

Direct NFS vs Kernel NFS bake-off with Oracle 11g and Solaris… and the winner is

NOTE::  Please see my next entry on Kernel NFS performance and the improvements that come with the latest Solaris.

==============

After experimenting with dNFS it was time to do a comparison with the “old” way.  I was a little surprised by the results, but I guess that really explains why Oracle decided to embed the NFS client into the database :)

bake-off with OLTP style transactions

This experiment was designed to load up a machine, a T5240, with OLTP style transactions until no more CPU available.  The dataset was big enough to push about 36,000 IOPS read and 1,500 IOPS write during peak throughput.  As you can see, dNFS performed well which allowed the system to scale until DB server CPU was fully utilized.   On the other hand, Kernel NFS throttles after 32 users and is unable to use the available CPU to scale transactional throughput.

lower cpu overhead yields better throughput

A common measure for benchmarks is to figure out how many transactions per CPU are possible.  Below, I plotted the CPU content needed for a particular transaction rate.  This chart shows the total measured CPU (user+system) to for a given TPS rate.


dNFS vs kNFS (TPS/CPU)

As expected, the transaction rate per CPU is greater when using dNFS vs kNFS.  Please do note, that this is a T5240 machine that has 128 threads or virtual CPUs.  I don’t want to go into semantics of sockets, cores, pipelines, and threads but thought it was at least worth noting.  Oracle sees a thread of a T5240 as a CPU, so that is what I used for this comparison.

silly little torture test

When doing the OLTP style tests with a normal sized SGA, I was not able to fully utilize the 10gigE interface or the Sun 7410 storage.   So, I decided to do a silly little micro benchmark with a real small SGA.  This benchmark just does simple read-only queries that essentially result in a bunch of random 8k IO.  I have included the output from the Fishworks analytics below for both kNFS and dNFS.


Random IOPS with kNFS and Sun Open Storage


Random IOPS with dNFS and Sun 7410 open storage

I was able to hit ~90K IOPS with 729MB/sec of throughput with just one 10gigE interface connected to Sun 7140 unified storage.  This is an excellent result with Oracle 11gR2 and dNFS for a random test IO test… but there is still more bandwidth available.  So, I decided to do a quick DSS style query to see if I could break the 1GB/sec barrier.

===dNFS===
SQL> select /*+ parallel(item,32) full(item) */ count(*) from item;
 COUNT(*)
----------
 40025111
Elapsed: 00:00:06.36

===kNFS===
SQL> select /*+ parallel(item,32) full(item) */ count(*) from item;
 COUNT(*)
----------
 40025111

Elapsed: 00:00:16.18

kNFS table scan


dNFS table scan

Excellent, with a simple scan I was able to do 1.14GB/sec with dNFS more than doubling the throughput of kNFS.

configuration notes and basic tuning

I was running on a T5240 with Solaris 10 Update 8.

$ cat /etc/release
Solaris 10 10/09 s10s_u8wos_08a SPARC
Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 16 September 2009

This machine has the a built-in 10gigE interface which uses multiple threads to increase throughput.  Out of the box, there is very little to tuned as long as you are on Solaris 10 Update 8.  I experimented with various settings, but found that only basic tcp settings were required.

ndd -set /dev/tcp tcp_recv_hiwat 400000
ndd -set /dev/tcp tcp_xmit_hiwat 400000
ndd -set /dev/tcp tcp_max_buf 2097152
ndd -set /dev/tcp tcp_cwnd_max 2097152

Finally, on the storage front, I was using the Sun Storage 7140 Unified storage server as the NFS server for this test.  This server was born out of the Fishworks project and is an excellent platform for deploying NFS based databases…. watch out NetApp.

what does it all mean?

dNFS wins hands down.  Standard kernel NFS only essentially allows one client per “mount” point.  So eventually, we see data queued to a mount point.  This essentially clips the throughput far too soon.   Direct NFS solves this problem by having each Oracle shadow process mount the device directly.  Also with dNFS, all the desired tuning and mount point options are not necessary.  Oracle knows what options are most efficient for transferring blocks of data and configures the connection properly.

When I began down this path of discovery, I was only using NFS attached storage because nothing else was available in our lab… and IO was not initially a huge part of the project at hand.  Being a performance guy who benchmarks systems to squeeze out the last percentage point of performance, I was skeptical about NAS devices.  Traditionally, NAS was limited by slow networks and clumsy SW stacks.   But times change.   Fast 10gigE networks and Fishworks storage combined with clever SW like Direct NFS really showed this old dog a new trick.

Posted in Oracle, Storage Tagged: 11g, 7410, analytics, dNFS, fishworks, NAS, NFS, Oracle, performance, Solaris, Sun

The Core Performance Fundamentals Of Oracle Data Warehousing – Introduction

At the 2009 Oracle OpenWorld Unconference back in October I lead a chalk and talk session entitled The Core Performance Fundamentals Of Oracle Data Warehousing. Since this was a chalk and talk I spared the audience any powerpoint slides but I had several people request that make it into a presentation so they could share [...]

Monitoring Direct NFS with Oracle 11g and Solaris… pealing back the layers of the onion.

When I start a new project, I like to check performance from as many layers as possible.  This helps to verify things are working as expected and helps me to understand how the pieces fit together.  My recent work with dNFS and Oracle 11gR2, I started down the path to monitor performance and was surprised to see that things are not always as they seem.  This post will explore the various ways to monitor and verify performance when using dNFS with Oracle 11gR2 and Sun Open StorageFishworks“.

why is iostat lying to me?

iostat(1M)” is one of the most common tools to monitor IO.  Normally, I can see activity on local devices as well as NFS mounts via iostat.  But, with dNFS, my device seems idle during the middle of a performance run.

bash-3.0$ iostat -xcn 5
cpu
us sy wt id
8  5  0 87
extended device statistics
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0    6.2    0.0   45.2  0.0  0.0    0.0    0.4   0   0 c1t0d0
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 toromondo.west:/export/glennf
cpu
us sy wt id
7  5  0 89
extended device statistics
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   57.9    0.0  435.8  0.0  0.0    0.0    0.5   0   3 c1t0d0
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 toromondo.west:/export/glennf

From the DB server perspective, I can’t see the IO.  I wonder what the array looks like.

what does fishworks analytics have to say about IO?

The analytics package available with fishworks is the best way to verify performance with Sun Open Storage.  This package is easy to use and indeed I was quickly able to verify activity on the array.

There are 48,987 NFSv3 operations/sec and ~403MB/sec going through the nge13 interface.  So, this array is cooking pretty good.  So, let’s take a peek at the network on the DB host.

nicstat to the rescue

nicstat is wonderful tool developed by Brendan Greg at Sun to show network performance.  Nicstat really shows you the critical data for monitoring network speeds and feeds by displaying packet size, utilization, and rates of the various interfaces.

root@saemrmb9> nicstat 5
Time          Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs %Util    Sat
15:32:11    nxge0    0.11    1.51    1.60    9.00   68.25   171.7  0.00   0.00
15:32:11    nxge1  392926 13382.1 95214.4 95161.8  4225.8   144.0  33.3   0.00

So, from the DB server point of view, we are transferring about 390MB/sec… which correlates to what we saw with the analytics from Fishworks.  Cool!

why not use DTrace?

Ok, I wouldn’t be a good Sun employee if I didn’t use DTrace once in a while.  I was curious to see the Oracle calls for dNFS so I broke out my favorite tool from the DTrace Toolkit. The “hotuser” tool shows which functions are being called the most.  For my purposes, I found an active Oracle shadow process and searched for NFS related functions.

root@saemrmb9> hotuser -p 681 |grep nfs
^C
oracle`kgnfs_getmsg                                         1   0.2%
oracle`kgnfs_complete_read                                  1   0.2%
oracle`kgnfswat                                             1   0.2%
oracle`kgnfs_getpmsg                                        1   0.2%
oracle`kgnfs_getaprocdata                                   1   0.2%
oracle`kgnfs_processmsg                                     1   0.2%
oracle`kgnfs_find_channel                                   1   0.2%
libnfsodm11.so`odm_io                                       1   0.2%
oracle`kgnfsfreemem                                         2   0.4%
oracle`kgnfs_flushmsg                                       2   0.4%
oracle`kgnfsallocmem                                        2   0.4%
oracle`skgnfs_recvmsg                                       3   0.5%
oracle`kgnfs_serializesendmsg                               3   0.5%

So, yes it seems Direct NFS is really being used by Oracle 11g.

performance geeks love V$ tables

There are a set of V$ tables that allow you to sample the performance of the performance of dNFS as seen by Oracle.  I like V$ tables because I can write SQL scripts until I run out of Mt. Dew.  The following views are available to monitor activity with dNFS.

  • v$dnfs_servers: Shows a table of servers accessed using Direct NFS.
  • v$dnfs_files: Shows a table of files now open with Direct NFS.
  • v$dnfs_channels: Shows a table of open network paths (or channels) to servers for which Direct NFS is providing files.
  • v$dnfs_stats: Shows a table of performance statistics for Direct NFS.

With some simple scripting, I was able to create a simple script to monitor the NFS IOPS by sampling the v$dnfs_stats view.  This script simply samples the nfs_read and nfs_write operations, pauses for 5 seconds, then samples again to determine the rate.

timestmp|nfsiops
15:30:31|48162
15:30:36|48752
15:30:41|48313
15:30:46|48517.4
15:30:51|48478
15:30:56|48509
15:31:01|48123
15:31:06|48118.8

Excellent!  Oracle shows 48,000 NFS IOPS which agrees with the analytics from Fishworks.

what about the AWR?

Consulting the AWR, shows “Physical reads” in agreement as well.

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               93.1            1,009.2       0.00       0.00
       DB CPU(s):               54.2              587.8       0.00       0.00
       Redo size:            4,340.3           47,036.8
   Logical reads:          385,809.7        4,181,152.4
   Block changes:                9.1               99.0
  Physical reads:           47,391.1          513,594.2
 Physical writes:                5.7               61.7
      User calls:           63,251.0          685,472.3
          Parses:                5.3               57.4
     Hard parses:                0.0                0.1
W/A MB processed:                0.1                1.1
          Logons:                0.1                0.7
        Executes:           45,637.8          494,593.0
       Rollbacks:                0.0                0.0
    Transactions:                0.1

so, why is iostat lying to me?

iostat(1M) monitors IO to devices and nfs mount points.  But with Oracle Direct NFS, the mount point is bypassed and each shadow process simply mounts files directly.  To monitor dNFS traffic you have to use other methods as described here.  Hopefully, this post was instructive on how to peel back the layers in-order to gain visibility into dNFS performance with Oracle and Sun Open Storage.

Posted in Oracle, Storage Tagged: 7410, analytics, dNFS, monitoring, network, NFS, Oracle, performance, Solaris

Performance Optimization with Global Entry. Or Not?

As I entered the 30-minute "U.S. Citizens" queue for immigration back into the U.S. last week, the helpful "queue manager" handed me a brochure. This is a great place to hand me something to read, because I'm captive for the next 30 minutes as I await my turn with the immigration officer at the Passport Control desk. The brochure said "Roll through Customs faster."Ok. I'm listening.Inside the brochure, the first page lays out the main benefits:

  • bypass the passport lines
  • no paper Customs declaration
  • in most major U.S. airports

Well, that's pretty cool. Especially as I'm standing only 5% deep in a queue with a couple hundred people in it. And look, there's a Global Entry kiosk right there with its own special queue, with nobody—nobody!—in it.If I had this Global Entry thing, I'd have a superpower that would enable me to zap past the couple hundred people in front of me, and get out of the Passport Control queue right now. Fantastic.So what does this thing cost? It's right there in the brochure:

  1. Apply online at www.globalentry.gov. There is a non-refundable $100 application fee. Membership is valid for five years. That's $20 a year for the queue-bypassing superpower. Not bad. Still listening.
  2. Schedule an in-person interview. Next, I have to book an appointment to meet someone at the airport for a brief interview.
  3. Complete the interview and enrollment. I give my interview, get my photo taken, have my docs verified, and that's it, I'm done.

So, all in all, it doesn't cost too much: a hundred bucks and probably a couple hours one day next month sometime.What's the benefit of the queue-bypassing superpower? Well, it's clearly going to knock a half-hour off my journey through Passport Control. I immigrate three or four times per year on average, and today's queue is one of the shorter ones I've seen, so that's at least a couple hours per year that I'd save... Wow, that would be spectacular: a couple more hours each year in my family's arms instead of waiting like a lamb at the abattoir to have my passport controlled.But getting me into my family's arms 30 minutes earlier is not really what happens. The problem is a kind of logic that people I meet get hung up in all the time. When you think about subsystem (or resource) optimization, it looks like your latency savings for the subsystem should go straight to your system's bottom line, but that's often not what happens. That's why I really don't care about subsystem optimization; I care about response time. I could say that a thousand times, but my statement is too abstract to really convey what I mean unless you already know what I mean.What really happens in the airport story is this: if I had used Global Entry on my recent arrival, it would have saved me only a minute or two. Not half an hour, not even close.It sounds crazy, doesn't it? How can a service that cuts half an hour off my Passport Control time not get me home at least a half hour earlier?You'll understand once I show you a sequence diagram of my arrival. Here it is (at right). You can click the image to embiggen it, if you need.To read this sequence diagram, start at the top. Time flows downward. This sequence diagram shows two competing scenarios. The multicolored bar on the left-hand side represents the timeline of my actual recent arrival at DFW Airport, without using the Global Entry service. The right-hand timeline is what my arrival would have looked like had I been endowed with the Global Entry superpower.You can see at the very bottom of the timeline on the right that the time I would have saved with Global Entry is minuscule: only a minute or two.The real problem is easy to see in the diagram: Queue for Baggage Claim is the great equalizer in this system. No matter whether I'm a Global Entrant or not, I'm going to get my baggage when the good people outside with the Day-Glo Orange vests send it up to me. My status in the Global Entry system has absolutely no influence over what time that will occur.Once I've gotten my baggage, the Global Entry superpower would have again swung into effect, allowing me to pass through the zero-length queue at the Global Entry kiosk instead of waiting behind two families at the Customs queue. And that's the only net benefit I would have received.Wait: there were only two families in the Customs queue? What about the hundreds of people I was standing behind in the Passport Control queue? Well, many of them were gone already (either they had hand-carry bags only, or their bags had come off earlier than mine). Many others were still awaiting their bags on the Baggage Claim carousel. Because bags trickle out of the baggage claim process, there isn't the huge all-at-once surge of demand at Customs that there is at Passport Control when a plane unloads. So the queues are shorter.At any rate, there were four queues at Customs, and none of them was longer than three or four families. So the benefit of Global Entry—in exchange for the $100 and the time spent doing the interview—for me, this day, would have been only the savings of a couple of minutes.Now, if—if, mind you—I had been able to travel with only carry-on luggage, then Global Entry would have provided me significantly more value. But when I'm returning to the U. S. from abroad, I'm almost never allowed to carry on any bag other than my briefcase. Furthermore, I don't remember ever clearing Passport Control to find my bag waiting for me at Baggage Claim. So the typical benefit to me of enrolling in Global Entry, unfortunately, appears to be only a fraction of the duration required to clear Customs, which in my case is almost always approximately zero.The problem causing the low value (to me) of the Global Entry program is that the Passport Control resource hides the latency of the Baggage Claim resource. No amount of tuning upon the Passport Control resource will affect the timing of the Baggage In Hand milestone; the time at which that milestone occurs is entirely independent of the Passport Control resource. And that milestone—as long as it occurs after I queue for Baggage Claim—is a direct determinant of when I can exit the airport. (Gantt or PERT chart optimizers would say that Queue for Baggage Claim is on the critical path.)How could a designer make the airport experience better for the customer? Here are a few ideas:

  • Let me carry on more baggage. This idea would allow me to trot right through Baggage Claim without waiting for my bag. In this environment, the value of Global Entry would be tremendous. Well, nice theory; but allowing more carry-on baggage wouldn't work too well in the aggregate. The overhead bins on my flight were already stuffed to maximum capacity, and we don't need more flight delays induced by passengers who bring more stuff onboard than the cabin can physically accommodate.
  • Improve the latency of the baggage claim process. The sequence diagram shows clearly that this is where the big win is. It's easy to complain about baggage claim, because it's nearly always noticeably slower than we want it to be, and we can't see what's going on down there. Our imaginations inform us that there's all sorts of horrible waste going on.
  • Use latency hiding to mask the pain of the baggage claim process. Put TV sets in the Baggage Claim area, and tune them to something interesting instead of infinite loops of advertising. At CPH, they have a Danish hot dog stand in the baggage claim area. They also have a currency exchange office in there. Excellent latency hiding ideas if you need a snack or some DKK walkin'-around-money.

Latency hiding is a weak substitute for improving the speed of the baggage claim process. The killer app would certainly be to make Baggage Claim faster. Note, however, that just making Baggage Claim a little bit faster wouldn't make the Global Entry program any more valuable. To make Global Entry any more valuable, you'd have to make Baggage Claim fast enough that your bag would be waiting for anyone who cleared the full Passport Control queue.So, my message today: When you optimize, you must first know your goal. So many people optimize subsystems (resources) that they think are important, but optimizing subsystems is often not a path to optimizing what you really want. At the airport, I really don't give a rip about getting out of the Passport Control queue if it just means I'm going to be dumped earlier into a room where I'll have to wait until an affixed time for my baggage.Once you know what your real optimization goal is (that's Method R step 1), then the sequence diagram is often all you need to get your breakthrough insight that either helps you either (a) solve your problem or (b) understand when there's nothing further that you can really do about it.

Why We Made Method R

Twenty years ago (well, a month or so more than that), I entered the Oracle ecosystem. I went to work as a consultant for Oracle Corporation in September 1989. Before Oracle, I had been a language designer and compiler developer. I wrote code in lex, yacc, and C for a living. My responsibilities had also included improving other people's C code: making it more reliable, more portable, easier to read, easier to prove, and easier to maintain; and it was my job to teach other people in my department how to do these things themselves. I loved all of these duties.

In 1987, I decided to leave what I loved for a little while, to earn an MBA. Fortunately, at that time, it was possible to earn an MBA in a year. After a year of very difficult work, I had my degree and a new perspective on business. I interviewed with Oracle, and about a week later I had a job with a company that a month prior I had never heard of.

By the mid-1990s, circumstances and my natural gravity had matched to create a career in which I was again a software developer, optimizer, and teacher. By 1998, I was the manager of a group of 85 performance specialists called the System Performance Group (SPG). And I was the leader of the system architecture and system management consulting service line within Oracle Consulting's Global Steering Committee.

My job in the SPG role was to respond to all the system performance-related issues in the USA for Oracle's largest accounts. My job in the Global Steering Committee was to package the success of SPG so that other practices around the world could repeat it. The theory was that if a country manager in, say, Venezuela, wanted his own SPG, then he could use the financial models, budgets, hiring plans, training plans, etc. created by my steering committee group. Just add water.

But there was a problem. My own group of 85 people consisted of two very different types of people. About ten of these 85 people were spectacularly successful optimizers whom I could send anywhere with confidence that they'd thrive at either improving performance or proving that performance improvements weren't possible. The other 75 were very smart, very hard-working people who would grow into the tip of my pyramid over the course of more years, but they weren't there yet.

The problem was, how to you convert good, smart, hard-working people in the base of the SPG pyramid into people in the tip? The practice manager in Venezuela would need to know that. The answer, of course, is supposed to be the Training Plan. Optimally, the Training Plan consists of a curriculum of a few courses, a little on-the-job training, and then, presto: tip of the pyramid. Just add water.

But unfortunately that wasn't the way things worked. What I had been getting instead, within my own elite group, was a process that took many years to convert a smart, hard-working person into a reasonably reliable performance optimizer whom you could send anywhere. Worse yet, the peculiar stresses of the job—like being away from home 80% of the time, and continually visiting angry people each week, having to work for me—caused an outflow of talent that approximately equaled the inflow of people who made it to the tip of the pyramid. The tip of my pyramid never grew beyond roughly 10 people.

The problem, by definition, was the Training Plan. It just wasn't good enough. It wasn't that the instructors of Oracle's internal "tuning" courses were doing a poor job of teaching courses. And it wasn't that the course developers had done a poor job of creating courses. On the contrary, the instructors and course developers were doing excellent work. The problem was that the courses were focusing on the wrong thing. The reason that the courses weren't getting the job done was that the very subject matter that needed teaching hadn't been invented yet.

I expect that the people who write, say, the course called "Braking System Repair for Boeing 777" to have themselves invented the braking system they write about. So, the question was, who should be responsible for inventing the subject matter on how to optimize Oracle? I decided that I wanted that person to be me. I deliberated carefully and decided that my best chance of doing that the way I wanted to do it would be outside of Oracle. So in October 1999, ten years and one week after I joined the company, I left Oracle with the vision of creating a repeatable, teachable method for optimizing Oracle systems.

Ten years later, this is still the vision for my company, Method R Corporation. We exist not to make your system faster. We exist to make you faster at making all your systems faster. Our work is far from done, but here is what we have done:

  • Written white papers and other articles that explain Method R to you at no cost.
  • Written a book called Optimizing Oracle Performance, where you can learn Method R at a low cost.
  • Created a Method R course (on which the book is based), to teach you how to diagnose and repair response time problems in Oracle-based systems.
  • Spoken at hundreds of public and private events where we help people understand performance and how to manage it.
  • Provided consulting services to make people awesome at making their systems faster and more efficient.
  • Created the first response time profiling software ever for Oracle software applications, to let you analyze hundreds of megabytes of data without drudgery.
  • Created a free instrumentation library so that you can instrument the response times of Oracle-based software that you write.
  • Created software tools to help you be awesome at extracting every drop of information that your Oracle system is willing to give you about your response times.
  • Created a software tool that enables you to record the response time of every business task that runs on your system so you can effortlessly manage end-user performance.

As I said, our work is far from done. It's work that really, really matters to us, and it's work we love doing. I expect it to be a journey that will last long into the future. I hope that our journey will intersect with yours from time to time, and that you will enjoy it when it does.

Profiling with my Boy

We have an article online called "Can you explain Method R so even my boss could understand it?" Today I'm going to raise the stakes, because yesterday I think I explained Method R so that an eleven year-old could understand it.

Yesterday I took my 11 year-old son Alex to lunch. I talked him into eating at one of my favorite restaurants, called Mercado Juarez, over in Irving, so it was a half hour in the car together, just getting over there. It was a big day for the two of us because we were very excited about the new June 17 iPhone OS 3.0 release. I told him about some of the things I've learned about it on the Internet over the past couple of weeks. One subject in particular that we were both interested in was performance. He likes not having to wait for click results just as much as I do.

According to Apple, the new iPhone OS 3.0 software has some important code paths in it that are 3× faster. Then, upgrading to the new iPhone 3G S hardware is supposed to yield yet another 3× performance improvement for some code paths. It's what Philip Schiller talks about at 1:42:00 in the WWDC 2009 keynote video. Very exciting.

Alex of course, like many of us, wants to interpret "3× faster" as "everything I do is going to be 3× faster." As in everything that took 10 seconds yesterday will take 3 seconds tomorrow. It's a nice dream. But it's not what seeing a benchmark run 3× faster means. So we talked about it.

Latency Hiding For Fun and Profit

Yep, another post with the word ‘latency’ written all over it.

I’ve talked a lot about latency, and how it is more often than not completely immutable. So, if the latency cannot be improved upon because of some pesky law of physics, what can be done to reduce that wasted time? Just three things, actually:

  1. Don’t do it.
  2. Do it less often.
  3. Be productive with the otherwise wasted time.

The first option is constantly overlooked – do you really need to be doing this task that makes you wait around? The second option is the classic ‘do things in bigger lumps between the latency’ – making less roundtrips being the classic example. This post is about the third option, which is technically referred to as latency hiding.

Everybody knows what latency hiding is, but most don’t realise it. Here’s a classic example:

I need some salad to go with the chicken I am about to roast. Do I:

(a) go to the supermarket immediately and buy the salad, then worry about cooking the chicken?

OR

(b) get the chicken in the oven right away, then go to the supermarket?

Unless the time required to buy the salad is much longer than the chicken’s cook-time, the answer is always going to be (b), right? That’s latency hiding, also known as Asynchronous Processing. Let’s look at the numbers:

Variable definitions:

Supermarket Trip=1800s

Chicken Cook-Time=4800s

Calculations:

Option (a)=1800s+4800s=6600s (oh man, nearly two hours until dinner!)

Option (b)=4800s (with 1800s supermarket time hidden within it)

Here’s another example: You have a big code compile to do, and an empty stomach to fill. In which order do you execute those tasks? Hit ‘make’, then grab a sandwich, right?

Forget I/O Bound, You’re Latency Bound, Bub

Since it’s been nearly ten years since I wrote my book, Scaling Oracle8i, I thought it was about time that I started writing again. I thought I would start with the new-fangled blogging thing, and see where it takes me. Here goes.

As some will know, I run a small consulting company called Scale Abilities, based out of the UK. We get involved in all sorts of fun projects and problems (or are they the same thing?), but one area that I seem to find myself focusing on a lot is storage. Specifically, the performance and architecture of storage in Oracle database environments. In fact I’m doing this so much that, whenever I am writing presentations for conferences these days, it always seems to be the dominant subject at the front of my mind.

One particular common thread has been the effect of latency. This isn’t just a storage issue, of course, as I endeavoured to point out in my Hotsos Symposium 2008 presentation “Latency and Skew”. Latency, as the subtitle of that particular talk said, is a silent killer. Silent, in that it often goes undetected, and the effects of it can kill performance (and still remain undetected). I’m not going to go into all the analogies about latency here, but let’s try and put a simple definition out for it:

Latency is the time taken between a request and a response.

If that’s such a simple definition, why is it so difficult to spot? Surely if a log period of time passes between a request and a response, the latency will be simple to nail? No.