After reading my blog entry about a performance issue due to excessive HCC decompression ( Accessing HCC compressed objects using index access path, a reader asked me about the CPU profiling method I mentioned in that blog entry. I started responding to that comment, and realized that the response was too big for a comment. So, in this blog entry, I will cover basics of the CPU profiling in Linux. Other platform provides similar utilities, for example, Solaris provides an utility dtrace.
This is a small blogpost on using ‘perf’. I got an error message when I tried to run ‘perf top’ systemwide:
# perf top Too many events are opened. Try again after reducing the number of events
What actually is the case here, is actually described in the perf wiki:
Open file limits
The design of the perf_event kernel interface which is used by the perf tool, is such that it uses one file descriptor per event per-thread or per-cpu.
On a 16-way system, when you do:
perf stat -e cycles sleep 1
You are effectively creating 16 events, and thus consuming 16 file descriptors.
In my blogpost When the oracle wait interface isn’t enough I showed how a simple asynchronous direct path scan of a table was spending more than 99% of it’s time on CPU, and that perf showed me that 68% (of the total elapsed time) was spent on a spinlock unlock in the linux kernel which was called by io_submit().
This led to some very helpful comments from Tanel Poder. This blogpost is a materialisation of his comments, and tests to show the difference.
First take a look at what I gathered from ‘perf’ in the first article: