Gwen Shapira has written an article about a good example of a non-trivial performance problem.
I’m not talking about anything advanced here (such as bugs or problems arising at OS/Oracle touchpoint) but that sometimes the root cause of a problem (or at least the reason why you notice this problem now) is not something deeply technical or related to some specific SQL optimizer feature or a configuration issue. Instead of focusing on the first symptom you see immediately, it pays off to take a step back and see how the problem task/application/SQL is actually used by the users or client applications.
In other words, talk to the users, ask how exactly they experience the problem and then drill down from there.
I wrote a latch contention troubleshooting article for IOUG Select journal last year (it was published earlier this year). I have uploaded this to tech.E2SN too, I recommend you to read it if you want to become systematic about latch contention troubleshooting:
I’m working on getting the commenting & feedback work at tech.E2SN site too, but for now you can comment here at this blog entry…
This is an updated version of Snapper, which works ok on Oracle 10.1 now as well (9i support is coming some time in the future :)
Thanks to Jamey Johnston for sending me the fix info (and saving me some time that way :)
So if you have some problems with Snapper on Oracle 10.1, please make sure you have the latest version v3.11, which you can get from here:
The output below is from Snapper 3.11 on Oracle 10.1.0.5, the ASH columns in the bottom part of the output are displayed correctly now:
Hi all, long time no see! =8-)
Now as I’m done with the awesome Hotsos Symposium (and the training day which I delivered) and have got some rest, I’ll start publishing some of the cool things I’ve been working on over the past half a year or so.
The first is Oracle Session Snapper version 3!
There are some major improvements in Snapper 3, like ASH style session activity sampling!
When you troubleshoot a session’s performance (or instance performance) then the main things you want to know first are very very simple:
Often this is enough for troubleshooting what’s wrong. For example, if a session is waiting for a lock, then wait interface will show you that. If a single SQL statement is taking 99% of total response time, the V$SESSION (ASH style) samples will point out the problem SQL and so on. Simple stuff.
However there are cases where you need to go beyond wait interface and use V$SESSTAT (and other) counters and even take a “screwdriver” and open Oracle up from outside by stack tracing :-)
When I wrote the first version of Snapper for my own use some 4-5 years ago I wrote it mainly having the “beyond wait interface” part in mind. So I focused on V$SESSTAT and various other counters and left the basic troubleshooting to other tools. I used to manually sample V$SESSION/V$SESSION_WAIT a few times in a row to get a rough overview of what a session was doing or some other special-purpose scripts.
However after Snapper got more popular and I started getting some feedback about it I saw the need for covering more with Snapper, not just the “beyond wait interface” part, but also the “wait interface” and “which SQL” part too.
If you are in Singapore and have 24th Feb afternoon available then you can register and join a free Oracle performance troubleshooting seminar I’m doing in Singapore Management University’s (SMU) campus.
The seminar will be about:
The date is Wednesday, 24th Feb
The seminar time is from 15:30-19:00 (don’t be late)
Registration and more details are here:
I have written the first article to the troubleshooting section of my new website tech.E2SN.com:
It’s about a very valuable Oracle troubleshooting tool -> ERRORSTACK trace.
I cover 4 frequently asked questions there:
You can read it here:
By the way, if you like my new website, feel free to link to it !!! ;-)
Kyle Hailey has started putting together a much needed Oracle wait event reference.
You can access it here.
By the way, Oracle documentation also has a wait event reference section, it has more events, but it’s less detailed…
I have plans to go deep into some wait events and cover some less common ones in tech.E2SN too… in the future ;-)
After my recent series of postings, I was made aware of David Lutz’s blog on NFS client performance with Solaris. It turns out that you can vastly improve the performance of NFS clients using a new parameter to adjust the number of client connections.
root@saemrmb9> grep rpcmod /etc/system set rpcmod:clnt_max_conns=8
This parameter was introduced in a patch for various flavors of Solaris. For details on the various flavors, see David Lutz’s recent blog entry on improving NFS client performance. Soon, it should be the default in Solaris making out-of-box client performance scream.
I re-ran the DSS query referenced in my last entry and now kNFS matches the throughput of dNFS with 10gigE.