There’s been a lot of work in the area of profiling. One of the things I have recently fallen in love with is Brendan Gregg’s flamegraphs. I work mainly on Linux, which means I use perf for generating stack traces. Luca Canali put a lot of effort in generating extended stack profiling methods, including kernel (only) stack traces and CPU state, reading the wait interface via direct SGA reading and kernel stack traces and getting userspace stack traces using libunwind and ptrace plus kernel stack and CPU state. I was inspired by the last method, but wanted more information, like process CPU state including runqueue time.
This blog post is inspired by a question from an attendee of Sigrid Keydana's DOAG 2015 conference session called "Raising the fetchsize, good or bad? Exploring memory management in Oracle JDBC 12c". Basically it was a question about what the wait event "SQL*Net more data to client" represents and what it really measures. In general you may use the following steps, if you don't know what a particular wait event means:
I am an ardent believer of “show me how it works” principle and usually, I have demos in my presentation. So, I was presenting “Tools for advanced debugging in Solaris and Linux” with demos in IOUG Collaborate 2015 in Las Vegas on April 13 and my souped-up laptop (with 32G of memory, SSD drives, and an high end video processor etc ) was not responding when I tried to access folder to open my presentation files.
Sometimes, demos do fail. At least, I managed to complete the demos with zero slides
My grandfather used to say to me: “Use the proper tool for the job”. This is important to keep in mind when faced with performance issues. When I am faced with performance problems in Oracle, I typically start at a high level with AWR reports or Enterprise Manager to get a high level understanding of the workload. To drill down further, the next step is to use Oracle “10046 event” tracing. Cary Millsap created a methodology around event tracing called “Method-R” which shows how to focus in on the source of a performance problem by analyzing the components that contribute to response time. These are all fine places to start to analyze performance problems from the “user” or “application” point of view. But what happens if the OS is in peril?