Search

OakieTags

Who's online

There are currently 0 users and 41 guests online.

Recent comments

Affiliations

network

NFS Performance Issues at TCP level

What happens with I/O requests over NFS and more specifically with Oracle? How does NFS affect performance and what things can be done to improve performance?

What happens at the TCP layer when I request with dd an 8K chunk of data off an NFS mounted file system?

Here is one example:

I do a

dd if=/dev/zero of=foo bs=8k count=1

where my output file is on an NFS mount, I see the TCP send and receives from NFS server to client as:

(the code is in dtrace and runs on the server side, see #2970a6;" href="https://sites.google.com/site/oraclemonitor/tcp-d">tcp.d for the code)

#2970a6;" href="http://dboptimizer.com/wp-content/uploads/2011/04/nfs3.png">

OSP #2c: Build a Standard Platform from the Bottom-Up

This is the fourth of twelve articles in a series called Operationally Scalable Practices. The first article gives an introduction and the second article contains a general overview. In short, this series suggests a comprehensive and cogent blueprint to best position organizations and DBAs for growth.

OSP #2c: Build a Standard Platform from the Bottom-Up

This is the fourth of twelve articles in a series called Operationally Scalable Practices. The first article gives an introduction and the second article contains a general overview. In short, this series suggests a comprehensive and cogent blueprint to best position organizations and DBAs for growth.

OSP #2c: Build a Standard Platform from the Bottom-Up

This is the fourth of twelve articles in a series called Operationally Scalable Practices. The first article gives an introduction and the second article contains a general overview. In short, this series suggests a comprehensive and cogent blueprint to best position organizations and DBAs for growth.

TCP Trace Analysis for NFS

How do we know where latency comes from when  there is a disparity in reported I/O latency on  the I/O subsystem and that of the latency reported on the  client box requesting the I/O.

For example if I have an Oracle database requesting I/O  and Oracle says an 8Kb request takes 50 ms yet the I/O storage subsystem says 8Kb I/Os are taking 1ms (averages) , then where does the 49  extra ms come from?

When the I/O subsystem is connected to Oracle via NFS  then there are a lot of layers that could be causing the extra latency.

Screen Shot 2013-08-23 at 1.35.20 PM

Where does the difference in latency come from between NFS Server and Oracle’s timing of pread?

OakTable…

I was recently nominated and approved as a member of the OakTable Network .

Do you ever get that feeling that one day people are going to realize you don’t have a clue what you are talking about? I think that day just got a little closer. :)

Cheers

Tim…

unplumb (or unbinding) NICs on Linux

I’ve been quiet for a long time now, but this entry hopefully will shake the cobwebs off and get me back into the habit.

I recently had a need to “unplumb” (from Solaris fame) or make interfaces on Linux “disappear” from the ifconfig list. It could be that I don’t know how to completely deconfigure an interface, but I didn’t find any methods to unassign an IP address from a Linux Ethernet interface after it was assigned. You can take interfaces down (ifconfig eth3 down) and reconfigure them to assign different addresses, but not remove the address completely.

Fetch as Much as You Can

In my “Everything DBAs Need to Know About TCP/IP” presentation, I discussed the issue of fetch size – tuning the number of rows your application will get from Oracle in a single “fetch” call.

The problem is interesting when you want to get a large number of rows from the DB to the application. Cary Millsap would say that no one should want to do that, but Cary obviously never had to work with statistical analysis application. These things demand large amounts of blood raw data to do their thing.

If this is too slow for your users, you should look at statistics like “round trips”. If the number of round-trips looks completely off the wall, like 200,000 round trips for 2M rows, someone might have left JDBC’s default fetch size of 10.

RMOUG Presentations

Like many other DBAs, I’ll be attending RMOUG training days conference on Feb 17-18 in Denver. I’ll give two presentations in the conference. On the same day, just thinking about it makes me exhausted.

The first presentation is “Everything DBAs need to know about TCP/IP Networks”. Here’s the paper and the slides. I’ll also present this at NoCOUG’s winter conference in Pleasanton, CA. Maybe you prefer to catch me there.

The second presentation is “Analyzing Database Performance using Time Series Techniques”. Here’s the paper and the slides.

I still have time to improve the presentations and papers – so comments are very welcome :)

Monitoring Direct NFS with Oracle 11g and Solaris… pealing back the layers of the onion.

When I start a new project, I like to check performance from as many layers as possible.  This helps to verify things are working as expected and helps me to understand how the pieces fit together.  My recent work with dNFS and Oracle 11gR2, I started down the path to monitor performance and was surprised to see that things are not always as they seem.  This post will explore the various ways to monitor and verify performance when using dNFS with Oracle 11gR2 and Sun Open StorageFishworks“.

why is iostat lying to me?

iostat(1M)” is one of the most common tools to monitor IO.  Normally, I can see activity on local devices as well as NFS mounts via iostat.  But, with dNFS, my device seems idle during the middle of a performance run.

bash-3.0$ iostat -xcn 5
cpu
us sy wt id
8  5  0 87
extended device statistics
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0    6.2    0.0   45.2  0.0  0.0    0.0    0.4   0   0 c1t0d0
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 toromondo.west:/export/glennf
cpu
us sy wt id
7  5  0 89
extended device statistics
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   57.9    0.0  435.8  0.0  0.0    0.0    0.5   0   3 c1t0d0
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 toromondo.west:/export/glennf

From the DB server perspective, I can’t see the IO.  I wonder what the array looks like.

what does fishworks analytics have to say about IO?

The analytics package available with fishworks is the best way to verify performance with Sun Open Storage.  This package is easy to use and indeed I was quickly able to verify activity on the array.

There are 48,987 NFSv3 operations/sec and ~403MB/sec going through the nge13 interface.  So, this array is cooking pretty good.  So, let’s take a peek at the network on the DB host.

nicstat to the rescue

nicstat is wonderful tool developed by Brendan Greg at Sun to show network performance.  Nicstat really shows you the critical data for monitoring network speeds and feeds by displaying packet size, utilization, and rates of the various interfaces.

root@saemrmb9> nicstat 5
Time          Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs %Util    Sat
15:32:11    nxge0    0.11    1.51    1.60    9.00   68.25   171.7  0.00   0.00
15:32:11    nxge1  392926 13382.1 95214.4 95161.8  4225.8   144.0  33.3   0.00

So, from the DB server point of view, we are transferring about 390MB/sec… which correlates to what we saw with the analytics from Fishworks.  Cool!

why not use DTrace?

Ok, I wouldn’t be a good Sun employee if I didn’t use DTrace once in a while.  I was curious to see the Oracle calls for dNFS so I broke out my favorite tool from the DTrace Toolkit. The “hotuser” tool shows which functions are being called the most.  For my purposes, I found an active Oracle shadow process and searched for NFS related functions.

root@saemrmb9> hotuser -p 681 |grep nfs
^C
oracle`kgnfs_getmsg                                         1   0.2%
oracle`kgnfs_complete_read                                  1   0.2%
oracle`kgnfswat                                             1   0.2%
oracle`kgnfs_getpmsg                                        1   0.2%
oracle`kgnfs_getaprocdata                                   1   0.2%
oracle`kgnfs_processmsg                                     1   0.2%
oracle`kgnfs_find_channel                                   1   0.2%
libnfsodm11.so`odm_io                                       1   0.2%
oracle`kgnfsfreemem                                         2   0.4%
oracle`kgnfs_flushmsg                                       2   0.4%
oracle`kgnfsallocmem                                        2   0.4%
oracle`skgnfs_recvmsg                                       3   0.5%
oracle`kgnfs_serializesendmsg                               3   0.5%

So, yes it seems Direct NFS is really being used by Oracle 11g.

performance geeks love V$ tables

There are a set of V$ tables that allow you to sample the performance of the performance of dNFS as seen by Oracle.  I like V$ tables because I can write SQL scripts until I run out of Mt. Dew.  The following views are available to monitor activity with dNFS.

  • v$dnfs_servers: Shows a table of servers accessed using Direct NFS.
  • v$dnfs_files: Shows a table of files now open with Direct NFS.
  • v$dnfs_channels: Shows a table of open network paths (or channels) to servers for which Direct NFS is providing files.
  • v$dnfs_stats: Shows a table of performance statistics for Direct NFS.

With some simple scripting, I was able to create a simple script to monitor the NFS IOPS by sampling the v$dnfs_stats view.  This script simply samples the nfs_read and nfs_write operations, pauses for 5 seconds, then samples again to determine the rate.

timestmp|nfsiops
15:30:31|48162
15:30:36|48752
15:30:41|48313
15:30:46|48517.4
15:30:51|48478
15:30:56|48509
15:31:01|48123
15:31:06|48118.8

Excellent!  Oracle shows 48,000 NFS IOPS which agrees with the analytics from Fishworks.

what about the AWR?

Consulting the AWR, shows “Physical reads” in agreement as well.

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               93.1            1,009.2       0.00       0.00
       DB CPU(s):               54.2              587.8       0.00       0.00
       Redo size:            4,340.3           47,036.8
   Logical reads:          385,809.7        4,181,152.4
   Block changes:                9.1               99.0
  Physical reads:           47,391.1          513,594.2
 Physical writes:                5.7               61.7
      User calls:           63,251.0          685,472.3
          Parses:                5.3               57.4
     Hard parses:                0.0                0.1
W/A MB processed:                0.1                1.1
          Logons:                0.1                0.7
        Executes:           45,637.8          494,593.0
       Rollbacks:                0.0                0.0
    Transactions:                0.1

so, why is iostat lying to me?

iostat(1M) monitors IO to devices and nfs mount points.  But with Oracle Direct NFS, the mount point is bypassed and each shadow process simply mounts files directly.  To monitor dNFS traffic you have to use other methods as described here.  Hopefully, this post was instructive on how to peel back the layers in-order to gain visibility into dNFS performance with Oracle and Sun Open Storage.

Posted in Oracle, Storage Tagged: 7410, analytics, dNFS, monitoring, network, NFS, Oracle, performance, Solaris