CPU profiling using perf utility in Linux

After reading my blog entry about a performance issue due to excessive HCC decompression ( Accessing HCC compressed objects using index access path, a reader asked me about the CPU profiling method I mentioned in that blog entry. I started responding to that comment, and realized that the response was too big for a comment. So, in this blog entry, I will cover basics of the CPU profiling in Linux. Other platform provides similar utilities, for example, Solaris provides an utility dtrace.

Demos do fail.

I am an ardent believer of “show me how it works” principle and usually, I have demos in my presentation. So, I was presenting “Tools for advanced debugging in Solaris and Linux” with demos in IOUG Collaborate 2015 in Las Vegas on April 13 and my souped-up laptop (with 32G of memory, SSD drives, and an high end video processor etc ) was not responding when I tried to access folder to open my presentation files.

Sometimes, demos do fail. At least, I managed to complete the demos with zero slides </p />

In-memory pre-population speed

While presenting at Oaktable World 2014 in San Fransisco, I discussed the in-memory pre-population speed and indicated that it takes about 30 minutes to 1 hour to load ~300GB of tables. Someone asked me “Why?” and that was a fair question. So, I profiled the in-memory pre-population at startup.

Profiling methods

I profiled all in-memory worker sessions using Tanel’s snapper script and also profiled the processes in OS using Linux perf tool with 99Hz sample rate. As there is no other activity in the database server, it is okay to sample everything in the server. Snapper output will indicate where the time is spent; if the time is spent executing in CPU, then the perf report output will tell us the function call stack executing at that CPU cycle. Data from these two profiling methods will help us to understand the root cause of slowness.