Today I had an error on my test environment which surprised me…initially… While I was…
I wouldn’t believe the bad XMLType performance statement given stated in Martin Preiss’ blog post,…
I had some time to spend, killing time, and thought about something that was “on…
This is the second part of a series of blogpost on Oracle database PGA usage. See the first part here. The first part described SGA and PGA usage, their distinction (SGA being static, PGA being variable), the problem (no limitation for PGA allocations outside of sort, hash and bitmap memory), a resolution for Oracle 12 (PGA_AGGREGATE_LIMIT), and some specifics about that (it doesn’t look like a very hard limit).
But this leaves out Oracle version 11.2. In reality, the vast majority of the database that I deal with at the time of writing is at version 11.2, and my guess is that this is not just the databases I deal with, but a general tendency. This could change in the coming time with the desupport of Oracle 11.2, however I suspect the installed base of Oracle version 12 to increase gradually and smoothly instead of in a big bang.
I posted a fair amount of stuff on how Oracle is generating IOs, and especially large IOs, meaning more than one Oracle block, so > 8KB. This is typically what is happening when the Oracle database is executing a row source which does a full segment scan. Let’s start off with a quiz: what you think Oracle is the maximum IO size the Oracle engine is capable of requesting of the Operating System (so the IO size as can be seen at the SCI (system call interface) layer? If you made up your answer, remember it, and read on!
The real intention of this blogpost is to describe what is going on in the Oracle database kernel, but also what is being done in the Linux kernel. Being a performance specialised Oracle DBA means you have to understand what the operating system does. I often see that it’s of the utmost importance to understand how an IO ends up as a request at the NAS or SAN head, so you understand what a storage admin is talking about.
(the details are investigated and specific to Oracle’s database implementation on Linux x86_64)
Exadata IO: This event is not used with Exadata storage, ‘cell single block physical read’ is used instead.
Despite p3 listing the number of blocks, I haven’t seen a db file sequential read event that read more than one block ever. Of course this could change in a newer release.
Last week I’ve gotten a question on how storage indexes (SI) behave when the table for which the SI is holding data is changed. Based on logical reasoning, it can be two things: the SI is invalidated because the data it’s holding is changed, or the SI is updated to reflect the change. Think about this for yourself, and pick a choice. I would love to hear if you did choose the correct one.
First let’s do a step back and lay some groundwork first. The tests done in this blogpost are done on an actual Exadata (V2 hardware), with Oracle version 188.8.131.52.6 (meaning bundle patch 6). The Exadata “cellos” (Cell O/S) version is 184.108.40.206.1.140529.1 on both the compute nodes and the storage nodes.
I guess everybody who is working with Oracle databases and has been involved with Oracle Exadata in any way knows about smartscans. It is the smartscan who makes the magic happen of full segment scans with sometimes enormously reduced scan times. The Oracle database does smartscans which something that is referred to as ‘offloading’. This is all general known information.
But how does that work? I assume more people are like me, and are anxious to understand how that exactly works. But the information on smartscans is extremely scarce. Of course there is the Oracle public material, which looks technical, but is little/nothing more than marketing. On My Oracle Support, I can’t find anything on the inner working. Even in the ‘Expert Oracle Exadata’ book (which I still regard as the best source of Exadata related information) there is no material on the mechanics of smartscans.
strace is a linux utility to profile system calls. Using strace you can see the system calls that a process executes, in order to investigate the inner working or performance. In my presentation about multiblock reads I put the text ‘strace lies’. This is NOT correct. My current understanding is that strace does show every system call made by an executable. So…why did I make that statement? (editorial note: this article dives into the inner working of Linux AIO)
In my blogpost When the oracle wait interface isn’t enough I showed how a simple asynchronous direct path scan of a table was spending more than 99% of it’s time on CPU, and that perf showed me that 68% (of the total elapsed time) was spent on a spinlock unlock in the linux kernel which was called by io_submit().
This led to some very helpful comments from Tanel Poder. This blogpost is a materialisation of his comments, and tests to show the difference.
First take a look at what I gathered from ‘perf’ in the first article: