Search

OakieTags

Who's online

There are currently 0 users and 28 guests online.

Recent comments

Oakies Blog Aggregator

VirtualBox 4.1.14 Released…

VirtualBox 4.1.14 is ready for download. It’s a maintenance release, with fixes listed in the changelog.

Happy upgrading.

Cheers

Tim…


VirtualBox 4.1.14 Released… was first posted on April 26, 2012 at 9:33 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.




Avengers Assemble…

If you have ever enjoyed a superhero film in your life you simply must go to see Avengers Assemble. It is totally fantastic. It has great action scenes, some really neat characters and is really funny at times too. I’ve not enjoyed an action film this much since I saw the first Iron Man Film. Awesome!!!

Iron Man/Tony Stark : I want to be him. He is too cool for school. He is Kool and the Gang.

The Hulk : All The Hulk films have been a bit of a let down because the only thing I really like is the green monster running around destroying things. The great thing about this film is it doesn’t have to fill two hours with the story of Bruce Banner, so you just get to see the bit you really want to see. The Hulk is fantastic in this film and adds greatly to the comedy elements. He literally had the whole audience laughing hard on several occasions.

Thor : I know the Thor film got a bit of a slating, but I liked it. I think the Thor character works well in this film.

Loki : I think he worked well as the bad guy. It’s always good when the bad guy is a little fragile.

Captain America : I didn’t like the CA film and I don’t really like the character in this film. It’s not Chris Evans fault. I just think the character is a bit lame and I feel no connection with him. He’s just totally meh.

Black Widow and Hawkeye : Whatever… Never heard of them before this film. Have no feeling either way about their presence in the film.

I guess you can see that in my head it’s an Iron Man and Hulk roadshow and I loved it.

So now I’ve raised your expectations you are going to hate it. :)

Cheers

Tim…


Avengers Assemble… was first posted on April 26, 2012 at 3:27 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.




IOT Secondary Indexes – The Logical ROWID Guess Component Part I (Lucky)

As discussed previously, an index entry within a Secondary Index on an Index Organized Table (IOT) basically consists of the indexed column(s) and the Logical Rowid, the PK column(s) and a “guess” to the physical block in the IOT containing the corresponding row. Let’s discuss this “guess” component in a bit more detail. When the Secondary [...]

Xeon E5-2600 OS CPU To Core / SMT Thread Mapping On Linux. It Matters.

Ages ago I blogged about the Intel topology tool and mapping Xeon 5500 (Nehalem EP) processor threads to OS CPUs on Linux. I don’t recall if I ever blogged the same about Xeon 5600 (Westmere EP) but I’ll cover that processor and Xeon E5-2600 in this short post.  Fist, Xeon 5600.

The following two screen shots are socket 0 and socket 1 from a Xeon 5600 server. Socket 0 first:

Now, socket 1:

So, based on the information above, one would have to specify OS CPUs 0,1,2,3,4,5 if they wanted thread 0 from the first 3 cores on each CPU (c0_t0). I never liked that much. That’s why I’m glad Sandy Bridge presents itself in a more logical manner. As you can see from the following two screen shots, specifying affinity for thread 0 of cores on socket 0 is as simple as 0,1,2,3,4,5,6,7. First, socket 0:

And now, socket 1:

Lest this come off as simple tomfoolery, allow me to show the 2x difference in siphoning off a fifo when the data flows socket-local versus socket-remote:

Be aware that this level of disparity will not necessarily be realized when a server is booted SUMA (nor even when BIOS NUMA is enabled but the grub boot string includes numa=off). I’d test the difference and blog that here but that would just be tomfoolery :-)

Filed under: oracle, Westmere EP, Xeon 5600 Tagged: NUMA, numactl, Sandy Bridge, SUMA, Xeon E5, Xeon E5-2600

Collaborate 2012 Sessions and Select Article

Thank you all who came to my sessions at #IOUG Collaborate 2012 #C12LV on April 22-24 in Las Vegas. I had four full sessions, two panels and one bootcamp. Quite a busy schedule, as you can see. I also worked on some urgent performance issues at work during the week.

You can download the the slides and scripts here. They are available from the IOUG site but I thought I would put them for download here as well.

My article Cache Fusion Demystified on SELECT Journal won the Editor's Choice award for 2011. This article is available for free download here.

I hope you like them. As always, feedback - good, bad and something in between - will be appreciated.

#cc0000;">Update as of April 29th, 2012

I usually add more stuff to my slides after the session is over. This may result from a direct attendee feedback that some content was not clear, inadequate, misleading or even incorrect. This time is no exception. I added some more content to make the concepts clearer. Also many thanks to Charles Hooper for pointing out an Oracle documentation bug which makes two scripts in my session incorrect. I have since corrected the slides and re-uploaded. Please redownload the slides for How Oracle Locking Works and specifically look at the slides 14, 18, 24, 25 and 26. Also please download the scripts for Beginning Oracle Performance Tuning. More specifically, the script lock1.sql has changed.

Getting to know Oracle wait events in Linux.

May 4th: some updates after discussion with Jeff Holt, Cary Millsap and Ron Christo of Method-R.

There’s all the documentation, and there all the terrific blogs about performance and Oracle wait events. But if you more or less start with this stuff, or encounter a wait event that is not (extensive enough) documented, or an event turns up and gives a response time you don’t expect, you need to understand what that wait event means. If you don’t know, it’s easy to get stuck at this point.

If you are familiar with waits, and just want to dig further, progress to “Get to know wait events”, if you want to get up speed with waits, read on.

What is a wait event?

This is the definition of the performance tuning guide in the Oracle documentation:

Statistics that are incremented by a server process/thread to indicate that it had to wait for an event to complete before being able to continue processing. Wait events are one of the first places for investigation when performing reactive performance tuning.

This description is quite good I think. If I would have to make a better description, I would say a wait event is essentially the registration of time spend on executing a part of the Oracle database server code, mostly doing something elementary. The granularity of wait events can differ between Oracle versions. Wait events can describe interaction with the operating system (reading or writing are good examples of this), but also interaction between processes (enqueue’s and ipc messaging are good examples of that). Needless to say, both interactions require operating system calls.

Also, a wait event is not necessary a bad thing. Every Oracle database process is either running on CPU or waiting. If your SQL*Plus session is not running, it is waiting for ‘SQL*Net message from client’, which means the database tries to tell you that it waits for the next instruction. Another interesting thing to note about wait events is that the execution of certain instrumented (covered by a wait event) parts if Oracle server code can not report a wait per kernel developer decision. This means, for example, that if doing a latch get read went efficiently, you will not get a ‘latch free’ or ‘latch: latch name‘ wait event, because the wait registration would take exponentially more time than the actual latch get.

How to get the wait events during execution

(skip this section if you are familiar with enabling and disabling sql trace at level 8)

The way to get all the waits of the execution of one or more SQL statements for a given process is to enable SQL trace, alias event 10046, at level 8. This can be done at the system level (meaning all processes) too, but unless you want to reverse engineer the interaction between ALL the process, this is not very useful. There much written about all the ways you can enable SQL trace at level 8. For the sake of simplicity, I use setting event 10046 at level 8 in the session itself. The session requires ‘alter session’ rights to be able to set it. This is how it’s done:

SQL> alter session set events '10046 trace name context forever, level 8';
Session altered.

And can be turned off the following way:

SQL> alter session set events '10046 trace name context off';
Session altered.

Get to know wait events: strace

At this point we know how to enable tracing, but want to dig deeper and understand the operating system interaction inside the waits.

To see what a process (for example your Oracle foreground process) is doing from the perspective of the Operating System, there’s the ‘strace’ executable. The ‘strace’ executable can display all the system calls an operating system process is executing if it has enough privileges to access it. It resembles SQL trace a bit, but at the Operating System/Linux layer. First, you determine the process ID of the session you want to use ‘strace’ on, then use another process to execute ‘strace’. This is what it looks like:

1. Setup a sqlplus session
$ sqlplus ts/ts@v11203

SQL*Plus: Release 11.2.0.3.0 Production on Wed Apr 25 23:03:22 2012

Copyright (c) 1982, 2011, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

TS@v11203 >

2. Attach to the sqlplus process with strace in a session on the linux host where the database is running:
$ strace -p 14128
Process 14128 attached - interrupt to quit
read(14,

3. Now execute something in the sqlplus session, for example ‘select * from dual’:
TS@v11203 > select * from dual;

D
-
X

TS@v11203 >

4. This results in the following list of system calls:

read(14, "\1/\6\21i \376\377\377\377\377\377\377\377\2\2"..., 8208) = 303
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372954
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 19996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
write(14, "\1r\6\20\27\2\374T\rD@\255\262t\t\313\242\1\247-8x"..., 370) = 370
read(14, "\25\6\3\5\"\2\17", 8208) = 21
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
write(14, "\254\6\4\1 \1\1{\5\2"..., 172) = 172
read(14, "\1\337\6\3^#)\4\4\376\377\377\377\377\377\377\3774"..., 8208) = 479
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 20996}, ru_stime={0, 7998}, ...}) = 0
times({tms_utime=2, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 430372955
write(14, "010\6\v\1\5\314\2\1p\5\207y"..., 304) = 304
read(14,

Wow! Nice! But what does it mean? Well, these are all the system calls used by the Oracle foreground process in order to execute the (simple) SQL. It’s hard to tell something about the execution of the SQL with this list. And be aware, this is only a ‘select * from dual’, so a ‘real’ SQL will probably give a list so long you do not dare to begin studying the list to get something useful out of it.

Well, how about the next chapter:

Get to know wait events: combining waits and strace

How about combining the sql trace output with the system calls? The ‘strace’ executable has quite some options, which can be very helpful:

- list the data written by the process (‘-e write=all’)
This option lists the data written by the session. The trace file is written by the server process, so it will list the contents of that.
- list a timestamp for every system call (‘-tt’)
- list the time spend doing the system call (‘-T’)
- save the output to a file (‘-o /path/to/file.txt’)

These are all helpful to get more information the system calls.

Now setup a session in the database, enable SQL trace at level 8, fetch the process ID of session and run ‘strace -ttT -e write=all -p PID’ in another session and run something. Yes, this generates a massive amount of output. But, as I use it, it’s to investigate what certain waits are doing. Here are some examples:


23:46:20.145793 pread(257, "\6\242\202\3\1\356O\6\2049\245\2\3226\264O\6"..., 1032192, 1996505088) = 1032192
23:46:20.148986 lseek(8, 0, SEEK_CUR) = 46126
23:46:20.149030 write(8, "WAIT #139830933048328: nam='db f"..., 125) = 125
| 00000 57 41 49 54 20 23 31 33 39 38 33 30 39 33 33 30 WAIT #13 98309330 |
| 00010 34 38 33 32 38 3a 20 6e 61 6d 3d 27 64 62 20 66 48328: n am='db f |
| 00020 69 6c 65 20 73 63 61 74 74 65 72 65 64 20 72 65 ile scat tered re |
| 00030 61 64 27 20 65 6c 61 3d 20 33 31 39 38 20 66 69 ad' ela= 3198 fi |
| 00040 6c 65 23 3d 35 20 62 6c 6f 63 6b 23 3d 36 36 34 le#=5 bl ock#=664 |
| 00050 33 34 20 62 6c 6f 63 6b 73 3d 31 32 36 20 6f 62 34 block s=126 ob |
| 00060 6a 23 3d 31 34 30 33 34 20 74 69 6d 3d 31 33 33 j#=14034 tim=133 |
| 00070 35 33 39 30 33 38 30 31 34 38 39 36 31 53903801 48961 |
23:46:20.149121 write(9, "J?e83~1z1\n", 10) = 10
| 00000 4a 3f 65 38 33 7e 31 7a 31 0a J?e83~1z 1. |
23:46:20.149163 write(8, "\n", 1) = 1
| 00000 0a . |
23:46:20.157813 pread(256, "\6\242\2\4\1\360O\6\2\4\217T\2\3226\264O\6"..., 1032192, 2000699392) = 1032192
23:46:20.161124 lseek(8, 0, SEEK_CUR) = 46252
23:46:20.161169 write(8, "WAIT #139830933048328: nam='db f"..., 125) = 125
| 00000 57 41 49 54 20 23 31 33 39 38 33 30 39 33 33 30 WAIT #13 98309330 |
| 00010 34 38 33 32 38 3a 20 6e 61 6d 3d 27 64 62 20 66 48328: n am='db f |
| 00020 69 6c 65 20 73 63 61 74 74 65 72 65 64 20 72 65 ile scat tered re |
| 00030 61 64 27 20 65 6c 61 3d 20 33 33 31 37 20 66 69 ad' ela= 3317 fi |
| 00040 6c 65 23 3d 35 20 62 6c 6f 63 6b 23 3d 36 36 35 le#=5 bl ock#=665 |
| 00050 36 32 20 62 6c 6f 63 6b 73 3d 31 32 36 20 6f 62 62 block s=126 ob |
| 00060 6a 23 3d 31 34 30 33 34 20 74 69 6d 3d 31 33 33 j#=14034 tim=133 |
| 00070 35 33 39 30 33 38 30 31 36 31 31 30 30 53903801 61100 |
23:46:20.161260 write(9, "J?Hv2~1z1\n", 10) = 10
| 00000 4a 3f 48 76 32 7e 31 7a 31 0a J?Hv2~1z 1. |
23:46:20.161624 lseek(8, 0, SEEK_CUR) = 46377
23:46:20.161659 write(8, "\n", 1) = 1
| 00000 0a

The pread() call is the actual IO call done by the database.
The lseek() call is the seek towards the end of the trace file, and sets the pointer for the upcoming write.
The write() call is the write in the trace file. You see the multiblock read (scattered read in the trace file) was done using a pread() call, and the time measured by strace () is line with the time measured by Oracle (ela= 3317), and the number of blocks in the trace (126) matches the number of bytes read 1032192 (126*8192).
The time spend in the system calls is between the ‘smaller’ and ‘bigger’ signs, and get removed by wordpress, so if you where searching for it in the above example, you wouldn’t find it
The write() to file descriptor 9 with the garbage in it is the write to the .trm file.
The lseek() sets the pointer in the file after the writes, and a write() is done to enter a carriage return (\n) to the tracefile.
This is a simple example, to show how a synchronous multiblock read is reported, let’s see a a-synchronous one!


00:17:27.557053 pread(256, "#\242\202\251\243O\6\1\4[\4"..., 8192, 1190150144) = 8192
00:17:27.558973 lseek(8, 0, SEEK_CUR) = 2659
00:17:27.559009 write(8, "WAIT #140244789954888: nam='db f"..., 124) = 124
| 00000 57 41 49 54 20 23 31 34 30 32 34 34 37 38 39 39 WAIT #14 02447899 |
| 00010 35 34 38 38 38 3a 20 6e 61 6d 3d 27 64 62 20 66 54888: n am='db f |
| 00020 69 6c 65 20 73 65 71 75 65 6e 74 69 61 6c 20 72 ile sequ ential r |
| 00030 65 61 64 27 20 65 6c 61 3d 20 31 39 30 33 20 66 ead' ela = 1903 f |
| 00040 69 6c 65 23 3d 35 20 62 6c 6f 63 6b 23 3d 34 33 ile#=5 b lock#=43 |
| 00050 33 39 34 20 62 6c 6f 63 6b 73 3d 31 20 6f 62 6a 394 bloc ks=1 obj |
| 00060 23 3d 31 34 30 33 33 20 74 69 6d 3d 31 33 33 35 #=14033 tim=1335 |
| 00070 33 39 32 32 34 37 35 35 38 39 34 33 39224755 8943 |
00:17:27.559088 write(9, "N?Rb~22y1\n", 10) = 10
| 00000 4e 3f 52 62 7e 32 32 79 31 0a N?Rb~22y 1. |
00:17:27.559128 write(8, "\n", 1) = 1
| 00000 0a . |
00:17:27.559334 mmap(0x7f8d48e41000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 7, 0) = 0x7f8d48e41000
00:17:27.559450 mmap(NULL, 1114112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, 7, 0x1ec000) = 0x7f8d48c81000
00:17:27.559491 mmap(0x7f8d48c81000, 1114112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 7, 0) = 0x7f8d48c81000
00:17:27.559576 mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, 7, 0x2fc000) = 0x7f8d48a81000
00:17:27.559614 mmap(0x7f8d48a81000, 327680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 7, 0) = 0x7f8d48a81000
00:17:27.559834 io_submit(140244822339584, 1, {{0x7f8d48a861f8, 0, 0, 0, 256}}) = 1
00:17:27.560008 mmap(0x7f8d48ad1000, 1114112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 7, 0) = 0x7f8d48ad1000
00:17:27.560080 io_submit(140244822339584, 1, {{0x7f8d48a86450, 0, 0, 0, 256}}) = 1
00:17:27.560272 io_getevents(140244822339584, 1, 128, {{0x7f8d48a86450, 0x7f8d48a86450, 122880, 0}}, {600, 0}) = 1
00:17:27.560363 io_getevents(140244822339584, 1, 128, {{0x7f8d48a861f8, 0x7f8d48a861f8, 106496, 0}}, {600, 0}) = 1
00:17:27.560865 lseek(8, 0, SEEK_CUR) = 2784
00:17:27.560902 write(8, "WAIT #140244789954888: nam='dire"..., 129) = 129
| 00000 57 41 49 54 20 23 31 34 30 32 34 34 37 38 39 39 WAIT #14 02447899 |
| 00010 35 34 38 38 38 3a 20 6e 61 6d 3d 27 64 69 72 65 54888: n am='dire |
| 00020 63 74 20 70 61 74 68 20 72 65 61 64 27 20 65 6c ct path read' el |
| 00030 61 3d 20 35 38 36 20 66 69 6c 65 20 6e 75 6d 62 a= 586 f ile numb |
| 00040 65 72 3d 35 20 66 69 72 73 74 20 64 62 61 3d 34 er=5 fir st dba=4 |
| 00050 33 33 39 35 20 62 6c 6f 63 6b 20 63 6e 74 3d 31 3395 blo ck cnt=1 |
| 00060 33 20 6f 62 6a 23 3d 31 34 30 33 33 20 74 69 6d 3 obj#=1 4033 tim |
| 00070 3d 31 33 33 35 33 39 32 32 34 37 35 36 30 38 34 =1335392 24756084 |
| 00080 35 5 |

This advanced example shows how first the segment header is fetched using the ‘db file sequential read’ wait. Despite being on ASM (this is ASM!), this clearly shows the segment header is read synchronously using the pread() call.
The next system calls and waits are even more interesting!
First, memory is allocated using the mmap() call. This is the beginning of a full scan, and the stuff read from disk need to be buffered somewhere. Please mind the memory allocation is not part of any wait. That is no problem, the allocation of memory should be very fast.
Next, TWO asynchronous reads are initiated using the two io_submit() calls, mind the iocb (IO control block) addresses, 0x7f8d48a861f8 and 0x7f8d48a86450.
See the man page of the io_submit call on your linux system using ‘man io_submit’, also use the man page of io_getevents to learn more about that call
Next, the asynchronous reads are reaped using the io_getevents() calls. What is noteworthy here is, is that the two requests are reaped in REVERSE order (this is consistent for the first two asynchronous call pairs for every segment, this example is Oracle version 11.2.0.3). So first request 0x7f8d48a86450 is fetched, then 0x7f8d48a861f8.
After 0x7f8d48a861f8 is reaped, the wait is reported for the FIRST IO call (io_submit() & io_getevents() combination). The second request (0x7f8d48a86450) of the start of a full segment scan is not reported. Never. Further requests do not reverse reaping/’encapsulate’ IO requests (mostly, it can happen a few times more).

I hope you get the idea of digging deeper into waits by examining the system calls.

I am not sure if the above ‘encapsulation’ of IO calls is a bug. But think of this: if you issue multiple IO requests (like we saw above, multiple IO requests are issued!) you cannot report the waits of all these distinct IO requests, because you would get more time reported than has passed. This has some consequences for the above ‘direct path read’ waits: these do not report individual IO times (like ‘db file sequential read’ does). This is just an indication on time spend in the process it had to wait for results from its IO requests, rather than disk IO latency.

The above conclusions are the result of observation and logical reasoning from my side. If you have anything to add, please respond!

Tagged: oracle waits performance events waitevents

Linux Firewall and SELinux (RHCSA)…

I’ve put the last two articles in the RHSCA certification series live.

These took a little bit of time because I was in denial (and ill for a couple of weeks). When I put the previous batch of articles live, I mentioned these objectives were the ones I knew least about. That’s mostly because my standard operating procedure for Oracle servers is to disable SELinux and turn off the firewall. I reached out to the OakTable to see what Oracle do on their engineered solutions (Exadata and ODA) and it seems the answer depends on which part of the solution you are discussing (RAC nodes or storage cells) and the age/patch level of the kit software you have.

In the early releases it was very much SELinux and firewall disabled. Later release have SELinux in permissive mode on some components and the firewall enabled on some components.

Running SELinux in permissive mode seems a bit pointless to me, unless you are investigating what policies need to be changed in order to switch to enforcing it at a later date. I’m still not convinced about the relevance of SELinux for a database server at this point, but my opinion may change as I get more familiar with it. It is quite literally an uneducated opinion at this point. :)

Now I’ve completed the revision notes for the RHCSA exam I guess I should think about taking the exam. I’ve just checked the Red Hat website and the earliest I could sit the exam in Birmingham is July. Unfortunately I’m out of the country for much of July, so it would appear the middle of August is probably going to be the first real opportunity. Most other cities have a couple of dates a month, but not here. Once again Birmingham proves itself to be at the arse-end of British I.T. :(

Cheers

Tim…


Linux Firewall and SELinux (RHCSA)… was first posted on April 25, 2012 at 1:30 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.




Limit nmon to certain disks

I am often working on systems with large number of LUNs, going into the hundreds. Trying to keep track is difficult at best. Sometimes though you might want to limit yourself to a specific number of disks-I knew about the dskfilt option in collectl but only today learned about a similar feature in nmon. The following is a very greatly simplified example of course, but it gives you the idea.

Assume you are interested in 4 disks-sd{a,b,c,d}. Let’s further assume that your disks are used for 2 ASM disk groups, DATA and RECO. You could create a file “disks.dat” with the following contents:

DATA sda sdb
RECO sdc sdd

Then pass that file to nmon with the -g flag. When you type “g” now, you are shown IO stats for those disks.

─────────────────Hostname=test──────────────────────Refresh= 2secs ───15:13.02─┐
│ Disk-Group-I/O ──────────────────────────────────────────────────────────────│
│ Name          Disks AvgBusy Read|Write-KB/s  TotalMB/s   xfers/s BlockSizeKB │
│ DATA               2   0.0%       0.0|0.0          0.0       0.0    0.0      │
│ RECO               2   0.0%       0.0|0.0          0.0       0.0    0.0      │
│ Groups= 2 TOTALS   4   0.0%       0.0|0.0          0.0       0.0             │
│──────────────────────────────────────────────────────────────────────────────│

That is terrific, and very useful to hunt for bottlenecks. In addition, you have the “minimal mode” as well, which displays only busy disks and processors. Invoke it by pressing the “.” in interactive mode.

Grouping disks also works for the capacity planning mode. From the online help:

For Data-Collect-Mode = spreadsheet format (comma separated values)
 Note: use only one of f,F,z,x or X and make it the first argument
...
 -g  User Defined Disk Groups (see above) - see BBBG & DG lines

Nothing stopping you now from benchmarking properly and focusing on the relevant detail! Don’t forget to run the raw file through nmon analyzer to get some eye-candy even a non-techie understands (it has lots of nice color!)

Understanding Linux Load Average – Part 1

A frequently asked question in my classroom is “What is the meaning of load average and when is it too high?”. This may sound like an easy question, and I really thought it was, but recently I discovered that things aren’t always that easy as they seem. In this first of a three-part post I […]

NVL2()

There are many little bits and pieces lurking in the Oracle code set that would be very useful if only you had had time to notice them. Here’s one that seems to be virtually unknown, yet does a wonderful job of eliminating calls to decode().

The nvl2() function takes three parameters, returning the third if the first is null and returning the second if the first is not null. This is  convenient for all sorts of example where you might otherwise use an expression involving  case or decode(), but most particularly it’s a nice little option if you want to create a function-based index that indexes only those rows where a column is null.

Here’s a code fragment to demonstrate the effect:

select nvl2(1,2,3) from dual;

select nvl2(null,2,3) from dual;

select nvl2(1,null,3) from dual;

select nvl2(null,null,3) from dual;

And here’s the resulting output – conveniently the function call is also the column heading in the output:

NVL2(1,2,3)
-----------
          2

NVL2(NULL,2,3)
--------------
             3

NVL2(1,NULL,3)
--------------

NVL2(NULL,NULL,3)
-----------------
                3

Note, particularly, from the last two that a non-null input (first parameter) turns into the null second parameter, and the null input turns into the non-null third parameter. To create a function-based index on rows where columnX is null, and be able to access them by index, you need only do the following:

create index t1_f1 on t1(nvl2(columnX,null,1));

select * from t1 where nvl2(columnX,null,1) = 1;

(Don’t forget, of course, that you will need to gather stats on the hidden column underpinning the function-based index before you can expect the optimizer to use it in the correct cases.)