Search

OakieTags

Who's online

There are currently 0 users and 42 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

ITL Waits – Changes in Recent Releases

In recent releases Oracle has silently changed the behavior of ITL waits. The aim of this post it to describe what has changed and why. But, first of all, let’s review some essential concepts about ITLs and ITL waits.

Interested Transaction List

The Oracle database engine locks the data modified by a transaction at the row level. To implement this feature every data block contains a list of all transactions that are modifying it. This list is commonly called interested transaction list (ITL). Its purpose is twofold. First, it is used to store information to identify a transaction as well as a reference to access the undo data associated to it. Second, it is referenced by every modified or locked row to indicate which transaction it is involved.

INITRANS

The initial number of slots composing the ITL is set through the INITRANS parameter. Even though it can be set to 1, which is the default value as well, as of 9i at least 2 slots are always created. Note that the data dictionary lies to us on this matter. In fact, as shown in the following example, the data dictionary shows the value specified when the object was created and not the actual number of slots.

SQL> CREATE TABLE t (n NUMBER) INITRANS 1;

SQL> SELECT ini_trans FROM user_tables WHERE table_name = 'T';

 INI_TRANS
----------
         1

MAXTRANS

There is a maximum number of slots an ITL can contain. The actual maximum number depends on the blocks size. For example, an 8KB block can have up to 169 slots. Up to 9i the maximum is limited by the MAXTRANS parameter as well. As of 10g, however, this parameter is deprecated and, therefore, no longer honored. In the same way as for INITRANS, the data dictionary shows the value specified when the object was created and not the actual maximum number of slots.
Also note that while creating an object the database engine checks whether the MAXTRANS value is not greater than 255. And, if it is greater, it raises an ORA-02209 (invalid MAXTRANS option value).

ITL Waits

When a session requires a slot but all the available ones are in use by other active transactions, the database engine tries to dynamically create a new slot. This is of course only possible when a) the maximum number of slots was not already allocated b) enough free space (one slot occupies 24 bytes) is available in the block itself. If a new slot cannot be created, the session requiring it hits a so-called ITL wait. Note that the name of the actual wait event is called “enq: TX – allocate ITL entry”.
It is essential to point out that a session does not wait on the first slot becoming free. Instead, it probes, round-robin, the available slots to find out one that becomes free. And, while doing so, it waits few seconds on every one it probes. When during this short wait the slot becomes free, it uses it. Otherwise, it tries with another slot.
The actual implementation for finding a free slot is what Oracle changed in recent releases. So, let’s describe what the behavior in recent releases is.

ITL Waits in 11gR1

In 11.1.0.6 and 11.1.0.7 a session waits at most one time on every slot. For all slots but one it waits up to 5 seconds. For the other one it might wait indefinitely. The following pseudo code illustrates this (you should consider the variable called “itl” as an array referencing/containing all ITL slots).

FOR i IN itl.FIRST..itl.LAST
LOOP
  EXIT WHEN itl(i) IS FREE
  IF i <> itl.LAST
  THEN WAIT ON itl(i) FOR 5 SECONDS
  ELSE WAIT ON itl(i) FOREVER
  END IF
END LOOP

The problem of this algorithm is that an “unlucky” session might wait much longer than necessary. In fact, once it enters the WAIT FOREVER status, it no longer considers the other slots.

ITL Waits in 11gR2

In 11.2.0.1 and 11.2.0.2 a session might wait several times for the same slot. Initially the wait is short. As the time passes, the wait time increases exponentially based on the formula “wait time = power(2,iteration-1)”. For all slots but one there is a maximum wait time of 5 seconds, though. For the other one, and for the first 10 iterations only, the wait time is computed with the very same formula. Then, during the 11th iteration, the session waits indefinitely. The following pseudo code illustrates this.

iteration = 0
LOOP
  iteration++
  FOR i IN itl.FIRST..itl.LAST
  LOOP
    EXIT WHEN itl(i) IS FREE
    IF i <> itl.LAST
    THEN WAIT ON itl(i) FOR min(power(2,iteration-1),5) SECONDS
    ELSIF iteration <= 10
    THEN WAIT ON itl(i) FOR power(2,iteration-1) SECONDS
    ELSE WAIT ON itl(i) FOREVER
    END IF
  END LOOP
  EXIT WHEN free_itl_found
END LOOP

The advantage of this algorithm is that a session might probe several time all the available slots and, as a result, enters the WAIT FOREVER status after about 20 minutes only.

ITL Waits in 9i/10g

Up to 10.2.0.4 the behavior is similar to 11gR1. The only noticeable difference is that the wait time is not always 5 seconds. Instead, it is either 3 or 6 seconds. I was not able to spot a rule behind the choice between the two durations. So, there might be some randomness involved.
In 10.2.0.5 the behavior is similar to 11gR2. Also in this case the only noticeable difference is that the maximum wait time is not always 5 seconds. Instead, as in releases up to 10.2.0.4, it is either 3 or 6 seconds.

無線メッシュで大量画像を取り込む

無線メッシュルータを複数設定して、動画を撮りまくる:
右奥に見えるのが親機です。
左側についている少し細いアンテナが2.4GHz対応のもの。右側の太いのが5GHzです。通常は2.4GHzで使うのですが、屋内で使用するという前提で、干渉の少ない5GHzも付けました。2.4GHzだけだと、外とぶつかってしまいます。
IT用語辞典 e-wordより

2.4GHz近辺の電波周波数帯で、日本では、10mW以下の出力であれば免許不要で利用できるよう開放されている領域。産業・科学・医学用の機器に用いられている周波数帯ということで、これらの頭文字をとって「ISMバンド」(Industry Science Medical band)とも呼ばれる。

無線の周波数帯の多くは、通信・放送用に割り当てられ、免許がなければ利用することができない。しかし、電子レンジなど多くの電子機器は微弱な出力の電磁波を発生したり利用したりしているため、そうした機器が自由に使える帯域として、2.4GHz帯が開放されている。

近年では、この周波数帯を用いて無線LANや各種の無線インターネット技術が実用化されてきている。2.4GHz帯を使う通信規格としてはIEEE 802.11bに準拠するDSSS方式の無線LAN、Bluetooth、HomeRF、一部のFWAシステムなどがある。

諸外国では5GHz帯を用いて無線LANを実現する動きが活発だが、日本国内では5GHz帯は既に気象レーダーなどで利用されているため、免許不要で利用できる帯域が諸外国と比べて大幅に制限されている。このため、2.4GHz帯の高速化に注目が集まっている。

フレキシブルな調整ができるアンテナもあります。
また、単一指向性で300Mまで電波を飛ばせるアンテナもあります。

カメラは、Panasonic社製を複数用意しました:

大量のカメラをコントロールするには無線メッシュが必要です。

購入先:http://www.thinktube.com/

Day One of Collaborate 11 - My Session

Thank you everyone who came to my session which got off with a rocky start due to a recording issue in the facilities. I was also the manager of the Manageability Bootcamp which ran the whole day. My sincere thanks to the speakers who spoke on the track. We had some of the best speakers and best topics.

If you want to get a copy of my session - Wait Events in RAC - you can download it here. Three more sessions and two expert panels to go.

My next sessions

(1) RAC Customer Panel - 4/12 9:15 - 10:15
(2) HA Bootcamp Panel - 4/12 11:45 - 12:15
(3) RAC for Beginners - 4/13 8:00 - 9:00
(4) Secure your Database in a Single Day - 4/13 2:15-3:15
(5) Designing Applications for RAC - 4/14 8:30 - 9:30

I hope to see all there.

HOWTO: Partition Binary XML based on a Virtual Column

This one is long overdue. There is a partition example of binary xml on this website based on Range, Hash and List partitioning, but this is, seen from a XML view, a incorrect way to do it due to the fact that the partition column is a regular one and not a virtual column as described in the Oracle XMLDB Developers Guide for 11.2. The examples given in that ppost will partition the table on the ID column but does not partition the XML based on a value/element IN the XML document. The following will.

So here, a small example, of how it can be done based on a virtual column. Be aware that you should support these virtual columns with at least an index or XMLIndex structure for performance reasons.

#993333; font-weight: bold;">CREATE #993333; font-weight: bold;">TABLE binary_part_xml #993333; font-weight: bold;">OF XMLType
  XMLTYPE STORE #993333; font-weight: bold;">AS SECUREFILE #993333; font-weight: bold;">BINARY XML
  VIRTUAL #993333; font-weight: bold;">COLUMNS
  #66cc66;">(
    LISTING_TYPE #993333; font-weight: bold;">AS #66cc66;">(XMLCast#66cc66;">(XMLQuery#66cc66;">(#ff0000;">'/LISTING/@TYPE'
         PASSING OBJECT_VALUE RETURNING CONTENT#66cc66;">) #993333; font-weight: bold;">AS VARCHAR2#66cc66;">(#cc66cc;">100#66cc66;">)#66cc66;">)#66cc66;">)
  #66cc66;">)
  PARTITION #993333; font-weight: bold;">BY LIST #66cc66;">(LISTING_TYPE#66cc66;">)
  #66cc66;">(
    PARTITION health #993333; font-weight: bold;">VALUES #66cc66;">(#ff0000;">'Health'#66cc66;">)#66cc66;">,
    PARTITION law_firms #993333; font-weight: bold;">VALUES #66cc66;">(#ff0000;">'Law Firm'#66cc66;">)
  #66cc66;">);

M.

Julie Marie (Hall) Lloyd : 1962-2011

While I was at OOW2010 my sister Julie was diagnosed with skin cancer. Just after I got back to the UK some scans revealed it had spread to a number of places including her brain and stomach. She started a course of treatment and we were all hopeful, but we knew the odds were stacked against her. On Saturday 9th April she died at the age of 48.

When Julie was 11 a boy at school was annoying her. She took a swing at him, missed and punched the guy standing next to him. The guy she accidentally punched was called Mark. He became her boyfriend, then later her husband and now, after 37 years of being together, her widower.

Cheers

Tim…




On the Topic of Technology… 2

April 11, 2011 (Back to the Previous Post in the Series) (Forward to the Next Post in the Series) Roughly a year ago I wrote a blog article that described the process that I went through, wandering around a big chain electronics store in pursuit of an iPad.  I first wandered past the netbooks, and if that had not happened, I [...]

桜満開

週末、箱根に行ってきました。


左から、
Anjo Kolk 1994年に日本に滞在。たった一年だったけど、日本のお客様には知名度が高い。僕の師匠。
Mogens Nørgaard CEO, Miracle A/S  Miracleグループのコンセプトは:
We promise everything and demand nothing, and we are experts in the art of buying things expensively and selling them cheaply. で、トラブルシューティング案件では何度も世話になっている
Mario Broodbakker 今から10年位前、PISOのプロトタイプを作っているときに助っ人として参加してくれた。今はSQL Serverのスペシャリスト。

OakTableシンポジウムを日本でも開きたい。。。。

A Busy Few Weeks

It's been a busy time over the past few weeks. I've given 3 different presentations at 3 conferences in as many weeks

First was the UKOUG Back to Basics event, which was the least effort because it was in Oracle's City Office, around the corner from the offices I'm working in at the moment. It was an old presentation, too, from UKOUG a couple of years ago, covering a few very high level thoughts to focus peoples initial performance analysis efforts. There's a limit to what you can achieve in 45 mins but hopefully I at least pointed people in the right directions with a few book recommendations. I also had an opportunity to hear Martin Widlake's very entertaining presentation on the mistakes we all make initially. That was about it, though, as I'd only taken half a day out of the office.

A week later, it was the short flight to Dublin for the OUG Ireland Conference at the Aviva Stadium (otherwise known as Lansdowne Road). I had forgotten that the Irish football team had a friendly against Uruguay that night, but fortunately I made it to my hotel just before the stadium was due to empty, otherwise I'm guessing the traffic would have been a pain.

I had really good fun at this event, from  arriving at a football stadium and noting all the usual queueing signs and bars but seeing them in the very different context of a geek conference, to catching up with Jonathan Lewis and Mogens Nørgaard (a concerned transporter of eggs, complete with loaned waitress blouse after the airline lost his luggage) to a keynote that wasn't as awful as most keynotes are. Jonathan's upgrade presentation was as interesting as always with the added note of personal interest that he discussed a customer problem he'd helped me with around improvements to partitioned index costing in 10.2.0.5

It was also the first time I'd met Paul Logan from Pythian face-to-face although I'd been unaware he had attended a remote webinar I'd done a few weeks earlier.

Despite having a couple of late and big scares before my presentation as my laptop crashed and I lost late slide updates, I think my presentation went ok in the end and I had some good feedback later. I'll put that down to SQL monitoring. It's easy to be passionate and slightly inspiring about such an excellent feature.

I definitely saved the best until last, though. When Truls Bergersen first asked me if I was interested in speaking at the OUGN Spring Conference on a boat from Oslo to Kiel, I leapt at the chance after my very positive experience last time I was on a boat for the Swedish/Finnish user group event. (Note to conference organisers - it's always very nice to be asked. A yes isn't guaranteed of course ;-))

When a conference is on a boat, it means

- Everyone sticks together for the duration, instead of maybe going home in the evenings or back to distant hotels
- Much better scenery
- Surprisingly excellent conference facilities
- Food I can (mostly) eat

Add the natural hospitality of the Scandinavian countries (based on my so far limited experience); great presentations from a great line-up of speakers (including several of Oracle's big guns); very passionate and interested attendees and it was my highlight conference of the year so far. Great speaker gift, too ...

Although I had to leave this guy dancing to Singing in the Rain because of the simple physical facts of Comfy Seat over-crowding.

I spent my last evening fishing for invitations for future years.

For a more considered view, see Magnus' review here. Very flattering, too ;-)

Which leaves no major conferences between now and Openworld, although I will probably do a couple of customer presentations based on the recent slides. I could use a break so I can focus on customer work.

Having said that, I am registered for todays UKOUG Exadata event so I might be able to make it to that depending how work is looking, oh, and there's this too! Not to be missed ;-)

Running SELECT … INTO :bind_variable from SQL

I just wasted a few minutes troubleshooting one of my scripts – and realized that while SELECT … INTO :bind_variable does not error out when executed as top-level SQL, it doesn’t seem to populate the resulting bind variable:

SQL> VAR blah VARCHAR2(10)
SQL> SELECT dummy INTO :blah FROM dual;

D
-
X

SQL> print blah

BLAH
--------------------------------

 
See, the bind variable is empty…
Now, let’s run the same statement via PL/SQL engine:
 

SQL> EXEC SELECT dummy INTO :blah FROM dual;

PL/SQL procedure successfully completed.

SQL> print blah

BLAH
--------------------------------
X

 

…and it works…

New Dtrace book $9.99

UPDATE: the book sale was only a couple of days and is over now and back to the regular price last I checked

———————————————————-

New Dtrace book $9.99
http://www.informit.com/deals/

As a fan of Active Session History (ASH) I’ve always wanted a version of ASH for operating systems. Operating systems, especially UNIX/Linux and the ilk have generally had ways of generating the basic info such as what process is active at a certain time slice but getting more detailed information was less than obvious if it was possible at all. Now dtrace changes all that.

Let’s take an example. What if I wanted to see all the communication and timings between and NFS client and an NFS server? What happens when I do:

dd if=/dev/zero of=foo bs=8k count=1

On an NFS mounted file system?
First thing we could do is get the timings,size and file being accessed

#!/usr/sbin/dtrace -s
#pragma D option quiet
dtrace:::BEGIN
{
      printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)",
          "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME");
}

nfsv3:::op-read-start
{
      rtime[args[1]->noi_xid] = timestamp;
      roffset[args[1]->noi_xid] =  args[2]->offset / 1024;
      rsize[args[1]->noi_xid] =  args[2]->count;
      rname[args[1]->noi_xid] =  args[1]->noi_curpath;
}

nfsv3:::op-read-done
/   rtime[args[1]->noi_xid] /
{
      printf("%-16d %-18s %2s %-8d %6d %s\n", (timestamp - rtime[args[1]->noi_xid])/ 1000,
          args[0]->ci_remote, "R",
          roffset[args[1]->noi_xid] ,
          rsize[args[1]->noi_xid] ,
          args[1]->noi_curpath);
}

nfsv3:::op-write-start
{
      wtime[args[1]->noi_xid] = timestamp;
      woffset[args[1]->noi_xid] =  args[2]->offset / 1024;
      wsize[args[1]->noi_xid] =  args[2]->count;
      wname[args[1]->noi_xid] =  args[1]->noi_curpath;
}

nfsv3:::op-write-done
/   wtime[args[1]->noi_xid] /
{
      printf("%-16d %-18s %2s %-8d %6d %s\n", (timestamp - wtime[args[1]->noi_xid]) / 1000,
          args[0]->ci_remote, "W",
          woffset[args[1]->noi_xid] ,
          wsize[args[1]->noi_xid] ,
          args[1]->noi_curpath);
}

Now if I run the above dtrace, and then kick off the dd, I get some dtrace output like

TIME(us)         CLIENT             OP OFFSET    BYTES PATHNAME
1079             172.16.100.194      W 0          8192 /domain0/group0/vdb1/datafile/u01/oracle/oradata/orcl/foo

The above data says that it took 1ms (or 1079us) to finish the nfs write operation.
The “dd” command reports 58ms (ie 5800us)

[oracle@kylelinux orcl]$ dd if=/dev/zero of=foo bs=8k count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.005861 seconds, 140 kB/s

Where did all the extra time come from? One area is the TCP/IP layer above the NFS.
We can also trace that. Here is a detailed trace of the “dd” at the TCP/IP level:

    delta      send     recd       seq/ack  seq/ack flags
                    <-- 120          2436     4870 ACK|PUSH|
      198       124 -->              4870     2556 ACK|PUSH|
      514           <-- 0            2556     4994 ACK|
       33           <-- 120          2556     4994 ACK|PUSH|
       37       124 -->              4994     2676 ACK|PUSH|
      280           <-- 116          2676     5118 ACK|PUSH|
       39       116 -->              5118     2792 ACK|PUSH|
      208           <-- 116          2792     5234 ACK|PUSH|
       42       116 -->              5234     2908 ACK|PUSH|
      264           <-- 120          2908     5350 ACK|PUSH|
       59       124 -->              5350     3028 ACK|PUSH|
      224           <-- 152          3028     5474 ACK|PUSH|
      771       148 -->              5474     3180 ACK|PUSH|
      430           <-- 1448         3180     5622 ACK|
      111           <-- 1448         4628     5622 ACK|
      138           <-- 1448         6076     5622 ACK|
      332           <-- 1448         7524     5622 ACK|
      105           <-- 1448         8972     5622 ACK|
       94           <-- 1088          420     5622 ACK|PUSH|
      615       164 -->              5622     1508 ACK|PUSH|
      225           <-- 116          1508     5786 ACK|PUSH|
       57       116 -->              5786     1624 ACK|PUSH|
    39863           <-- 0            1624     5902 ACK|

(notice that the last ACK seems to me to be a lazy ack as it happens well after dd says it is finished)
How would I get that info? It gets a little more complex, but here is the Dtrace script for the TCP level:

#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs
#pragma D option dynvarsize=64m

inline int TICKS=$1;
inline string ADDR=$$2;

dtrace:::BEGIN
{
       TIMER = ( TICKS != NULL ) ?  TICKS : 1 ;
       ticks = TIMER;
       TITLE = 10;
       title = 0;
       walltime=timestamp;
       printf("starting up ...\n");
}

tcp:::send,
tcp:::receive
/   title == 0
/
{
      printf("%9s %8s %6s %8s %8s %8s \n",
        "delta"    ,
        "cid"    ,
        "pid"    ,
        "send" ,
        "recd"  ,
        "flags"
      );
        title=TITLE;
}

tcp:::send
/     ( args[2]->ip_daddr == ADDR || ADDR == NULL )
    && curpsinfo->pr_psargs == "/usr/lib/nfs/nfsd"
/
{
    nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
    delta= timestamp-walltime;
    walltime=timestamp;
    flags="";
    flags= strjoin((( args[4]->tcp_flags & TH_FIN ) ? "FIN|" : ""),flags);
    flags= strjoin((( args[4]->tcp_flags & TH_SYN ) ? "SYN|" : ""),flags);
    flags= strjoin((( args[4]->tcp_flags & TH_RST ) ? "RST|" : ""),flags);
    flags= strjoin((( args[4]->tcp_flags & TH_PUSH ) ? "PUSH|" : ""),flags);
    flags= strjoin((( args[4]->tcp_flags & TH_ACK ) ? "ACK|" : ""),flags);
    flags= strjoin((( args[4]->tcp_flags & TH_URG ) ? "URG|" : ""),flags);
    flags= strjoin((( args[4]->tcp_flags & TH_ECE ) ? "ECE|" : ""),flags);
    flags= strjoin((( args[4]->tcp_flags & TH_CWR ) ? "CWR|" : ""),flags);
    flags= strjoin((( args[4]->tcp_flags == 0 ) ? "null " : ""),flags);

    printf("%9d %8d %6d %8d --> %8s %8d %8d %s \n",
        delta/1000,
        args[1]->cs_cid  % 100000,
        args[1]->cs_pid ,
        args[2]->ip_plength - args[4]->tcp_offset,
        "",
        args[4]->tcp_seq % 10000,
        args[4]->tcp_ack % 10000,
        flags
      );
    flag=0;
    title--;
}

tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL )
    && nfs[args[1]->cs_cid]
/
{
      delta=timestamp-walltime;
      walltime=timestamp;
      flags="";
      flags= strjoin((( args[4]->tcp_flags & TH_FIN ) ? "FIN|" : ""),flags);
      flags= strjoin((( args[4]->tcp_flags & TH_SYN ) ? "SYN|" : ""),flags);
      flags= strjoin((( args[4]->tcp_flags & TH_RST ) ? "RST|" : ""),flags);
      flags= strjoin((( args[4]->tcp_flags & TH_PUSH ) ? "PUSH|" : ""),flags);
      flags= strjoin((( args[4]->tcp_flags & TH_ACK ) ? "ACK|" : ""),flags);
      flags= strjoin((( args[4]->tcp_flags & TH_URG ) ? "URG|" : ""),flags);
      flags= strjoin((( args[4]->tcp_flags & TH_ECE ) ? "ECE|" : ""),flags);
      flags= strjoin((( args[4]->tcp_flags & TH_CWR ) ? "CWR|" : ""),flags);
      flags= strjoin((( args[4]->tcp_flags == 0 ) ? "null " : ""),flags);
      printf("%9d %8d %6d %8s <-- %-8d %8d %8d %s \n",         delta/1000,         args[1]->cs_cid  % 100000,
        args[1]->cs_pid ,
        "",
        args[2]->ip_plength - args[4]->tcp_offset,
        args[4]->tcp_seq % 10000,
        args[4]->tcp_ack % 10000,
        flags
      );
    flag=0;
    title--;
}

Now I can run

sudo nfs_details.d 1 172.16.100.194

To get the above output.
The above output can get pretty overwhelming, so what about outputting the summaries every second?

#!/usr/sbin/dtrace -s
#pragma D option defaultargs
#pragma D option dynvarsize=64m
#pragma D option quiet
inline int TICKS=$1;
inline string ADDR=$$2;

dtrace:::BEGIN
{
       TIMER = ( TICKS != NULL ) ?  TICKS : 1 ;
       ticks = TIMER;
       TITLE = 10;
       title = 0;
       start = 1;
}

/* ***** begin initialization */
/*
    curpsinfo->pr_psargs == "/usr/lib/nfs/nfsd"
    or could have done
    execname == "nfsd"
*/
tcp:::send
/   (args[2]->ip_daddr == ADDR || ADDR == NULL )&&
    curpsinfo->pr_psargs == "/usr/lib/nfs/nfsd"
/
{
    printf("execname %s\n",execname);
    nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
    @send_tot_sz[args[2]->ip_daddr] =  sum( args[2]->ip_plength - args[4]->tcp_offset);
    @send_tot_ct[args[2]->ip_daddr] =  count();
}

tcp:::receive, tcp:::send
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL )
    && nfs[args[1]->cs_cid]
    && start == 1
/
{
       start =0;
       recd_small_tm_beg[args[1]->cs_cid] = 0;
       recd_small_tm_beg[args[1]->cs_cid] = 0;
       recd_small_ct[args[1]->cs_cid]=0;
       recd_small_sz[args[1]->cs_cid]=0;

       recd_big_tm_beg[args[1]->cs_cid] = 0;
       recd_big_tm_beg[args[1]->cs_cid] = 0;
       recd_big_ct[args[1]->cs_cid]=0;
       recd_big_sz[args[1]->cs_cid]=0;

       send_big_tm_beg[args[1]->cs_cid] = 0;
       send_big_tm_beg[args[1]->cs_cid] = 0;
       send_big_ct[args[1]->cs_cid]=0;
       send_big_sz[args[1]->cs_cid]=0;

       send_small_tm_beg[args[1]->cs_cid] = 0;
       send_small_tm_beg[args[1]->cs_cid] = 0;
       send_small_ct[args[1]->cs_cid]=0;
       send_small_sz[args[1]->cs_cid]=0;
}
/* ***** end initialization */

/* ***************** BEG _OUT_  big RTT  TIME ****************** */
tcp:::send
/     (args[2]->ip_daddr == ADDR || ADDR == NULL )
    && curpsinfo->pr_psargs == "/usr/lib/nfs/nfsd"
    && (args[2]->ip_plength - args[4]->tcp_offset) > 300
/
{
    send_big_unacked[args[1]->cs_cid] = 1;
    send_big_sq[args[1]->cs_cid] = args[4]->tcp_seq;

    send_big_tm_beg[args[1]->cs_cid] =
      send_big_tm_beg[args[1]->cs_cid] == 0
    ? timestamp : send_big_tm_beg[args[1]->cs_cid] ;

    send_big_tm_end[args[1]->cs_cid] = timestamp;

    send_big_sz[args[1]->cs_cid] += (args[2]->ip_plength - args[4]->tcp_offset);
    send_big_ct[args[1]->cs_cid] ++;
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL )
    && send_big_unacked[args[1]->cs_cid] == 1
    && nfs[args[1]->cs_cid]
    && (args[2]->ip_plength - args[4]->tcp_offset)  >= 0
/
{
/* kyle
   just take out sequence check
   response seems to be lazy on last ACK , ie slower
   process that asked for data
    && args[4]->tcp_ack > send_big_sq[args[1]->cs_cid]

   if we want to use a beginning and ending sequence :
    recd_sq_beg[args[1]->cs_cid] =
       recd_sq_beg[args[1]->cs_cid] == 0
      ? args[4]->tcp_seq : recd_sq_beg[args[1]->cs_cid] ;

        printf("receive > 0 %d size %d  end %d\n",
              timestamp - send_big_tm_end[args[1]->cs_cid] ,
             (args[2]->ip_plength - args[4]->tcp_offset) ,
              timestamp - send_big_tm_beg[args[1]->cs_cid]
              );

*/
        @rtt_big_tm_beg[args[2]->ip_saddr] =  avg(timestamp - send_big_tm_beg[args[1]->cs_cid]);
        @rtt_big_tm_end[args[2]->ip_saddr] =  avg(timestamp - send_big_tm_end[args[1]->cs_cid]);
        @rtt_big_sz[args[2]->ip_saddr] =  avg(send_big_sz[args[1]->cs_cid]);
        @rtt_big_ct[args[2]->ip_saddr] =  avg(send_big_ct[args[1]->cs_cid]);
        @rtt_big_tot_ct[args[2]->ip_saddr] =  count();

        send_big_sq[args[1]->cs_cid] = 0;
        send_big_unacked[args[1]->cs_cid] = 0;
        send_big_tm_end[args[1]->cs_cid] = 0;
        send_big_tm_beg[args[1]->cs_cid] = 0;
        send_big_sz[args[1]->cs_cid] = 0;
        send_big_ct[args[1]->cs_cid] = 0;
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL )
    && send_big_unacked[args[1]->cs_cid] == 1
    && args[4]->tcp_ack > send_big_sq[args[1]->cs_cid]
    && nfs[args[1]->cs_cid]
    && (args[2]->ip_plength - args[4]->tcp_offset)  < 0 / {         send_big_sq[args[1]->cs_cid] = 0;
        send_big_unacked[args[1]->cs_cid] = 0;
        send_big_tm_end[args[1]->cs_cid] = 0;
        send_big_tm_beg[args[1]->cs_cid] = 0;
        send_big_sz[args[1]->cs_cid] = 0;
        send_big_ct[args[1]->cs_cid] = 0;
}
 /* ***************** END  _OUT_ big RTT  TIME ****************** */

 /* ***************** BEGIN _OUT_  small RTT  TIME ****************** */
tcp:::send
/     (args[2]->ip_daddr == ADDR || ADDR == NULL )
    && curpsinfo->pr_psargs == "/usr/lib/nfs/nfsd"
    && (args[2]->ip_plength - args[4]->tcp_offset) > 0
    && (args[2]->ip_plength - args[4]->tcp_offset) <= 300 / {     send_small_unacked[args[1]->cs_cid] = 1;
    send_small_sq[args[1]->cs_cid] = args[4]->tcp_seq;

    send_small_tm_beg[args[1]->cs_cid] =
      send_small_tm_beg[args[1]->cs_cid] == 0
    ? timestamp : send_small_tm_beg[args[1]->cs_cid] ;

    send_small_tm_end[args[1]->cs_cid] = timestamp;

    send_small_sz[args[1]->cs_cid] += args[2]->ip_plength - args[4]->tcp_offset;
    send_small_ct[args[1]->cs_cid] ++;
}

/*
    && args[4]->tcp_ack > send_small_sq[args[1]->cs_cid]
*/
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL )
    && send_small_unacked[args[1]->cs_cid] == 1
    && nfs[args[1]->cs_cid]
    && (args[2]->ip_plength - args[4]->tcp_offset) >   0
/
{
        @rtt_small_tm_beg[args[2]->ip_saddr] =  avg(timestamp - send_small_tm_beg[args[1]->cs_cid]);
        @rtt_small_tm_end[args[2]->ip_saddr] =  avg(timestamp - send_small_tm_end[args[1]->cs_cid]);
        @rtt_small_sz[args[2]->ip_saddr] =  avg(send_small_sz[args[1]->cs_cid]);
        @rtt_small_ct[args[2]->ip_saddr] =  avg(send_small_ct[args[1]->cs_cid]);

        send_small_sq[args[1]->cs_cid] = 0;
        send_small_unacked[args[1]->cs_cid] = 0;
        send_small_tm_end[args[1]->cs_cid] = 0;
        send_small_tm_beg[args[1]->cs_cid] = 0;
        send_small_sz[args[1]->cs_cid] =0;
        send_small_ct[args[1]->cs_cid] =0;
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL )
    && send_small_unacked[args[1]->cs_cid] == 1
    && args[4]->tcp_ack > send_small_sq[args[1]->cs_cid]
    && nfs[args[1]->cs_cid]
    && (args[2]->ip_plength - args[4]->tcp_offset) ==   0
/
{
        send_small_sq[args[1]->cs_cid] = 0;
        send_small_unacked[args[1]->cs_cid] = 0;
        send_small_tm_end[args[1]->cs_cid] = 0;
        send_small_tm_beg[args[1]->cs_cid] = 0;
        send_small_sz[args[1]->cs_cid] =0;
        send_small_ct[args[1]->cs_cid] =0;
}
 /* ***************** END  _OUT_ small RTT  TIME ****************** */

/* ***************** START _IN_ SMALL RESPONSE TIME ****************** */
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL )
    && nfs[args[1]->cs_cid]
    && (args[2]->ip_plength - args[4]->tcp_offset) >   0
    && (args[2]->ip_plength - args[4]->tcp_offset) <= 300 / {        recd_small_unacked[args[1]->cs_cid] = 1;

       recd_small_tm_beg[args[1]->cs_cid] =
          recd_small_tm_beg[args[1]->cs_cid] == 0
        ? timestamp : recd_small_tm_beg[args[1]->cs_cid] ;

       recd_small_sq_end[args[1]->cs_cid] = args[4]->tcp_seq;
       recd_small_tm_end[args[1]->cs_cid] = timestamp;

       @recd_tot_sz[args[2]->ip_saddr] = sum(args[2]->ip_plength - args[4]->tcp_offset);
       @recd_tot_ct[args[2]->ip_saddr] = count();

       recd_small_sz[args[1]->cs_cid] += args[2]->ip_plength - args[4]->tcp_offset;
       recd_small_ct[args[1]->cs_cid] ++;

}
/*
    && args[4]->tcp_ack > recd_small_sq_end[args[1]->cs_cid]
*/
tcp:::send
/      (args[2]->ip_daddr == ADDR || ADDR == NULL )
    && curpsinfo->pr_psargs == "/usr/lib/nfs/nfsd"
    && recd_small_unacked[args[1]->cs_cid] == 1
    && (args[2]->ip_plength - args[4]->tcp_offset) >   0
/
{
/*
        printf("send > 0 beg %d size %d end %d \n",
                            timestamp - recd_small_tm_beg[args[1]->cs_cid],
                           (args[2]->ip_plength - args[4]->tcp_offset) ,
                            timestamp - recd_small_tm_end[args[1]->cs_cid]
                            );
*/
        @resp_small_tm_beg[args[2]->ip_daddr] =  avg(timestamp - recd_small_tm_beg[args[1]->cs_cid]);
        @resp_small_tm_end[args[2]->ip_daddr] =  avg(timestamp - recd_small_tm_end[args[1]->cs_cid]);
        @resp_small_sz[args[2]->ip_daddr] =  avg(recd_small_sz[args[1]->cs_cid]);
        @resp_small_ct[args[2]->ip_daddr] =  avg(recd_small_ct[args[1]->cs_cid]);
             recd_small_tm_beg[args[1]->cs_cid] = 0;
             recd_small_tm_end[args[1]->cs_cid] = 0;
             recd_small_sq_end[args[1]->cs_cid] = 0;
             recd_small_ct[args[1]->cs_cid]  =0;
             recd_small_sz[args[1]->cs_cid]  =0;
             recd_small_unacked[args[1]->cs_cid] = 0;
}
/*
    && args[4]->tcp_ack > recd_small_sq_end[args[1]->cs_cid]
*/
tcp:::send
/      (args[2]->ip_daddr == ADDR || ADDR == NULL )
    && curpsinfo->pr_psargs == "/usr/lib/nfs/nfsd"
    && recd_small_unacked[args[1]->cs_cid] == 1
    && (args[2]->ip_plength - args[4]->tcp_offset) ==   0
/
{
             recd_small_tm_beg[args[1]->cs_cid] = 0;
             recd_small_tm_end[args[1]->cs_cid] = 0;
             recd_small_sq_end[args[1]->cs_cid] = 0;
             recd_small_ct[args[1]->cs_cid]  =0;
             recd_small_sz[args[1]->cs_cid]  =0;
             recd_small_unacked[args[1]->cs_cid] = 0;
}
/* ***************** end _IN_ SMALL RESPONSE TIME ****************** */
/* ****** BIG _IN_ RESP ******************* */
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL )
    && nfs[args[1]->cs_cid]
    && (args[2]->ip_plength - args[4]->tcp_offset) >  300
/
{
       recd_big_unacked[args[1]->cs_cid] = 1;

       recd_big_tm_beg[args[1]->cs_cid] =
          recd_big_tm_beg[args[1]->cs_cid] == 0
        ? timestamp : recd_big_tm_beg[args[1]->cs_cid] ;

       recd_big_sq_end[args[1]->cs_cid] = args[4]->tcp_seq;
       recd_big_tm_end[args[1]->cs_cid] = timestamp;

       @recd_tot_sz[args[2]->ip_saddr] = sum(args[2]->ip_plength - args[4]->tcp_offset);
       @recd_tot_ct[args[2]->ip_saddr] = count();

       recd_big_sz[args[1]->cs_cid] += args[2]->ip_plength - args[4]->tcp_offset;
       recd_big_ct[args[1]->cs_cid] ++;

}
/*
    && args[4]->tcp_ack > recd_big_sq_end[args[1]->cs_cid]
*/
tcp:::send
/      (args[2]->ip_daddr == ADDR || ADDR == NULL )
    && curpsinfo->pr_psargs == "/usr/lib/nfs/nfsd"
    && recd_big_unacked[args[1]->cs_cid] == 1
/
{
        @resp_big_tm_beg[args[2]->ip_daddr] =  avg(timestamp - recd_big_tm_beg[args[1]->cs_cid]);
        @resp_big_tm_end[args[2]->ip_daddr] =  avg(timestamp - recd_big_tm_end[args[1]->cs_cid]);
        @resp_big_sz[args[2]->ip_daddr] =  avg(recd_big_sz[args[1]->cs_cid]);
        @resp_big_ct[args[2]->ip_daddr] =  avg(recd_big_ct[args[1]->cs_cid]);
        @resp_big_tot_ct[args[2]->ip_daddr] =  count();
             recd_big_tm_beg[args[1]->cs_cid] = 0;
             recd_big_tm_end[args[1]->cs_cid] = 0;
             recd_big_sq_end[args[1]->cs_cid] = 0;
             recd_big_ct[args[1]->cs_cid]  =0;
             recd_big_sz[args[1]->cs_cid]  =0;
             recd_big_unacked[args[1]->cs_cid] = 0;
}
/* ***************** END _IN_ RESPONSE TIME ****************** */

profile:::tick-1sec
/ ticks > 0 /
{ ticks--; }

profile:::tick-1sec
/ title == 0 /
{
        printf("%-15s %8s %8s %8s %8s %8s %8s %8s %8s %8s %8s  \n",
                                                   "client",
                                                   "out_rtt",
                                                   "in_resp",
                                                   "in_bytes",
                                                   "in_cnt",
                                                   "i_avg_sz",
                                                   "i_avg_tm",
                                                   "o_bytes_",
                                                   "out_cnt",
                                                   "o_avg_sz",
                                                   "o_avg_tm"
                                            );
        title=TITLE;
}

profile:::tick-1sec
/ ticks == 0 /
{
/*
*/

        normalize(@recd_tot_ct,TIMER);
        normalize(@send_tot_ct,TIMER);
        normalize(@recd_tot_sz,(TIMER) );
        normalize(@send_tot_sz,(TIMER) );
        normalize(@resp_small_tm_beg,1000);
        normalize(@resp_small_tm_end,1000);
        normalize(@rtt_small_tm_beg,1000);
        normalize(@rtt_small_tm_end,1000);
        normalize(@resp_big_tm_beg,1000);
        normalize(@rtt_big_tm_beg,1000);
        normalize(@resp_big_tm_end,1000);
        normalize(@rtt_big_tm_end,1000);
  /*                    1   2    3    4    5     6   7    8    9   10    */
        printa("%-15s %@8d %@8d %@8d %@8d %@8d %@8d %@8d %@8d %@8d %@8d  \n",
                                                   @rtt_small_tm_end,
                                                   @resp_small_tm_end,
                                                   @recd_tot_sz,
                                                   @recd_tot_ct,
                                                   @resp_big_sz,
                                                   @resp_big_tm_beg,
                                                   @send_tot_sz,
                                                   @rtt_big_sz,
                                                   @send_tot_ct,
                                                   @rtt_big_tm_beg
                                                   );

                                                   trunc(@rtt_big_tot_ct);
                                                   trunc(@resp_big_tot_ct);

                                                   trunc(@send_tot_sz);
                                                   trunc(@recd_tot_sz);
                                                   trunc(@send_tot_ct);
                                                   trunc(@recd_tot_ct);

                                                   trunc(@resp_small_tm_beg);
                                                   trunc(@resp_small_tm_end);
                                                   trunc(@resp_small_sz);
                                                   trunc(@resp_small_ct);
                                                   trunc(@resp_big_tm_beg);
                                                   trunc(@resp_big_tm_end);
                                                   trunc(@resp_big_sz);
                                                   trunc(@resp_big_ct);

                                                   trunc(@rtt_small_tm_beg);
                                                   trunc(@rtt_small_tm_end);
                                                   trunc(@rtt_small_sz);
                                                   trunc(@rtt_small_ct);
                                                   trunc(@rtt_big_tm_beg);
                                                   trunc(@rtt_big_tm_end);
                                                   trunc(@rtt_big_sz);
                                                   trunc(@rtt_big_ct);
        title--;
        ticks = TIMER;
}

Now if I run the above dtrace, and re-run my dd I get the following dtrace output:

client           out_rtt  in_resp in_bytes   in_cnt i_avg_sz i_avg_tm o_bytes_  out_cnt o_avg_sz o_avg_tm
172.16.100.194       764     1968     9188       13     8328     4873     1032        0        8        0

this is telling me that my
round trips were 764us
server response to incoming requests is 1968us
I had 9068 incoming bytes
I had 12 incoming packages
the average size of packages above 300 bytes was 8328
the average time to response to packets above 300 bytes was 4837us
and my dd command (in this case) reported 5.8ms or (5800us)
so I’m getting at most the dd response time for the TCP response time to the 8K packet