Search

OakieTags

Who's online

There are currently 0 users and 44 guests online.

Recent comments

Oakies Blog Aggregator

Next Public Appearance – Scottish SIG on 29th Feb

Who’s up for a beer or whiskey in Edinburgh on the evening of 28th Feb?

I’ve been promising myself I’d do the Scottish SIG for three or four years but life has always conspired to stop me. However, at last I am going to manage it this year.

The meeting is on the 29th February at the Oracle {was Sun} office in Linlithgow. You can see the schedule and details here. As ever, it is being chaired by Thomas Presslie, though I {and I suspect Mr Hasler} will be hoping he is not forcing drams of Whiskey on people before 10am in the morning, as he did at the last UKOUG conference…

I’m presenting on Index Organised Tables again, following up on the series of posts I did {and have still to finish}. As well as myself there is also Tony Hasler talking about stabilising statistics {one of the key things to stable and thus acceptable performance from a very knowledgeable man}, a presentation by Wayne Lewis on Unbreakable Enterprise Kernel 2 {which I understand is Oracle Linux with the extra bits Larry wants in there before they have gone through to the official Open Source release} and Harry Hall talking about all the new stuff on OEM 12C. If he says Cloud too often I might lob something heavy at him :-) {nothing personal Harry, I’m just tired of the C word in connection with Oracle already}. Additionally, Gordon Wilkie will also be giving an Oracle Update.

Part of the reason I want to do the Scottish SIG is that I really like Edinburgh {and Scotland in general – wonderful geography}. My original intention was to take my wife up there and make the trip into a short break – but she has to go to the US that week and I have a new client that needs my time, so it will be a dash up there the evening before and back in the afternoon.

So, is anyone around in Edinburgh from late evening on the 28th of Feb and fancies showing me one or two nice pubs?

Tuning is in the eye of the beholder… Memory is memory right?

It is human nature to draw from experiences to make sense of our surroundings.  This holds true in life and performance tuning.   A veteran systems administrator will typically tune a system different from an Oracle DBA.  This is fine, but often what is obvious to one, is not to the other.  It is sometimes necessary to take a step back to tune from another perspective.

I recently have ran across a few cases where a customer was tuning “Sorts” in the database by adding memory. Regardless of your prospective, every one knows memory is faster than disk; and the goal of any good tuner is to use as much in memory as possible.   So, when it was noticed by the systems administrator that the “TEMP” disks for Oracle were doing a tremendous amount of IO,  the answer was obvious right?

RamDisk to the rescue

To solve this problem, the savvy systems administrator added a RAM disk to the database.  Since, it was only for “TEMP” space this is seemed reasonable.

ramdiskadm -a oratmp1 1024m
/dev/ramdisk/oratmp1

Indeed user performance was improved.  There are some minor issues around recovery upon system reboot or failure that are annoying, but easily addressed with startup scripts.  So, SLA’s were met and everyone was happy.  And so things were fine for a few years.

Double the HW means double the performance… right?

Fast forward a few years in the future.  The system was upgraded to keep up with demand by doubling the amount of memory and CPU resources.  Everything should be faster right? Well not so fast.  This action increased the NUMA ratio of the machine.  After doubling memory and CPU the average user response time doubled from ~1 second to 2 seconds.  Needless to say, this was not going to fly.   Escalations were mounted and the pressure to resolve this problem reached a boiling point. The Solaris support team was contacted by the systems administrator.  Some of the best kernel engineers in the business began to dig into the problem.  Searching for ways to make the “ramdisk” respond faster in the face of an increased NUMA ratio.

A fresh set of eyes

Since I have worked with the Solaris support engineers on anything Oracle performance related for many years, they asked me to take a look.  I took a peak at the system and noticed the ramdisk in use for TEMP.  To me this seemed odd, but I continued to look at SQL performance.   Things became clear once I saw the “sort_area_size” was default.

It turns out, Oracle was attempting to do in-memory sorts, but with the default settings all users were spilling out to temp.  With 100’s of users on the system, this became a problem real fast.  I had the customer increase the sort_area_size until the sorts occurred in memory with out the extra added over head of spilling out to disk (albit fast disk).  With this slight adjustment, the average user response time was better than it had ever been.

lessons learned

  • Memory is memory, but how you use it makes all the difference.
  • It never hurts to broaden your perspective and get a second opinion

Filed under: Linux, Oracle, Solaris, Storage Tagged: Oracle, ramdisk, Solaris, sort_area_size, temp, tuning

Will Enabling a 10046 Trace Make My Query Slower (or Faster)?

February 13, 2012 (Modified February 14, 2012) It seems that I often wonder about the things that I read… is it really true?  Several years ago, one of the items that I wondered about is whether or not an execution plan for a SQL statement could change magically simply because the Oracle RDBMS knows that it is [...]

Oracle Security Training in Berlin ... and more ...

I am going to be teaching by two day Oracle security training course in Berlin on March 6th and 7th 2012 for DOAG - the German Oracle users group. You can find details of the course and also register to....[Read More]

Posted by Pete On 13/02/12 At 11:57 AM

RMOUG 2012 training days

In the next few days I’ll be at RMOUG 2012 training days! and I’ll be again presenting the topic so dear to my heart. Mining the AWR ;) I’ve updated the material with a couple of new research and findings, some of them are as follows:

  • CPU Wait (new metric in 11g Performance page)
  • Latency issues on virtualized environments

So if you are attending the RMOUG training days, stop by at my session @ room 401 Thursday 9:45 am-10:45 am

There will also be a RAC Attack at the exhibition area so that must be fun ;)

and don’t forget to follow RMOUG’s twitter @RMOUG_ORG for updates during the conference!

See yah!

 

Update: 

I had a great time at RMOUG, was able to meet old/new friends and had a good number of attendees. The organizers really did a good job :)  Here’s the updated PPT I used during the presentation.
That’s the most recent, the one on the RMOUG is not yet updated. For the scripts https://karlarao.wordpress.com/scripts-resources/ they are all at the “PerformanceAndTroubleshooting” folder

Thanks to all who attended my presentation, I know that’s a bit too much technical stuff for 1hour presentation squeezing all 101 slides and all that info. The best way is to go through my notes on the slides, check out all the links I pointed, and try to poke around on the scripts. And I assure you will learn tons of stuff & greatly appreciate the material  :)

How Many Non-Exadata RAC Licenses Do You Need To Match Exadata Performance?

This post is about exaggeration.

The Oracle Database running in the Database Grid of an Exadata Database Machine is the same as what you can run on any Linux x64 server. Depending on the workload (OLTP/ERP/DW.BI.Analytics) there is the variable of storage offload processing freeing up some cycles on the RAC grid when running Exadata. Yes, that is true.

We all know the only thing that really costs Oracle IT shops is Oracle’s licensing and Oracle’s license model is per-processor.

So the big question is whether spending a significant amount of money for Exadata storage actually reduces the Oracle Database licensing cost due to offload processing. Or, in other words, does the magic of Exadata offload processing save you money.

That’s an interesting topic but before I even blog about it I have to wonder how a company like Oracle aims to improve their bottom line by undercutting their high-margin product space (i.e., RAC licenses)  just to push in low-margin storage products (products based entirely on commodity x64 componentry)  like Exadata? Oh well, who knows? Actually, I can answer that. The investors that think Oracle is a hardware company (as a result of buying Sun Microsystems in 2010) want to see some tin hitting the shipping dock. Really? Swapping high-margin for low-margin? Perhaps it’s a buy high, sell low play where the goal is to make up for it with volume. Hah. I call that ExaMath(tm).

I have heard ridiculous claims concerning how many non-Exadata Linux x64 cores one requires to match the same number of licensed database server cores in an Exadata environment. And when I say ridiculous, I really mean absurd.  But it all comes down to how much you pay for the cores in Exadata storage and what percentage of work is offloaded from the RAC grid to the storage grid. Indeed, if, for instance, 90% of the RDBMS effort is offloaded from the RAC grid to the storage grid in Exadata you’d need 90% fewer excruciatingly expensive RAC licenses to service an application than you would without Exadata storage. That’s an interesting idea and if it helps Oracle sales folks clinch a deal or two I’m sure everyone is all the merrier. As the person cutting the purchase order for the software, aren’t you overjoyed? No? Please, read on.

Exadata Storage Servers Are Shared Nothing
That’s right. Exadata Storage Server cells don’t know anything about each other and cannot communicate with each other. So if Oracle offloads, say, 90% of the RDBMS code to Exadata Storage Servers you’d have a mostly shared-nothing MPP. Why not just get a shared-nothing MPP? Oh, pardon me, I know, that is a taboo question.

I Know a Little About Exadata
There is nowhere near as much offload processing in Exadata as Oracle would like you to believe. Exaggeration and Exadata tend to go hand in hand. So, perhaps, your first inclination is to think this is a my-word versus their-word situation. If that is the case you may also naturally be inclined to take Oracle’s word on this matter because they have no reason to mislead you when working to get you into an Exadata system. I’m correct on that matter, aren’t I?

I know more than a little about Exadata. I most certainly know more about Exadata than the sales person who is trying to sell you on the technology. Ask them. If they have a shred of honesty they’ll admit that, yes, Kevin does know more about Exadata than they do. In fact, if feeling particularly honest in their reply they’ll even admit they learned a significant percentage of what they know about Exadata from me. Go ahead, ask them. The topic will surely feel awkward for them since I no longer work for Oracle. But that is how the cookie crumbles. Or, maybe it would be fun to take a poll instead.

Just for fun sake, please participate in this poll. Your answer may reflect what your Oracle sales team is telling you or it may reflect your perception from Oracle marketing. Either way, let’s see how this goes:

Exaggeration Poll

Filed under: oracle

SQLTXPLAIN quick tips and tricks and DB Optimizer VST

Lately I’ve been busy on projects and research so I’m putting more and more stuff on my wiki/braindump… and really I need to catch up on blogging.
I have a longer draft post about SQLTXPLAIN.. but I haven’t finished it yet so I’ll just go with the quick post for now.

SQLTXPLAIN:

I’ve been using SQLTXPLAIN for quite a while, and I can say I’m a really big fan. It’s a cool tool to use for systematic SQL Tuning so I got them all neatly documented here http://karlarao.tiddlyspot.com/#SQLTXPLAIN.

BTW, Carlos Sierra, the author of the tool will present at Hotsos 2012 http://www.hotsos.com/sym12/sym_speakers_sierra.html so I may drop by the conference and say hello :)

The wiki link will probably show you all the things you need to know about SQLTXPLAIN. My personal use case favorites would be the following:

  • Mining 10053 output
    • On the execution plan view of the main HTML report whenever you hover on each step of the plan it will pop up an info about the Query Block (QB) details which you can mine on the raw 10053 text file that is also on the zip file. This is helpful when troubleshooting the specific part of the SQL where it has gone wrong.
  • Instant SQL Monitor output
    • The HTML and text versions of SQL Monitor is readily available for the particular SQL_ID you are troubleshooting ;)
  • Compare good/bad plan of SQLs
    • This is pretty handy because the output report of the SQLT-compare highlights the difference in red text. So instead of chasing some delta numbers, you can just go straight to the red text!
  • Fix bad stats
    • Let’s say you have a prod and dev that have completely different run times. And you found out that the problem is the stats, you can do a SQLT on the good plan and transfer the stats to the other environment. You will be using the sqltimp.sql for this
  • Transfer stored outline/profiles
    • This makes use of exp/imp to transfer and outline + DBMS_SQLTUNE for the SQL Profile
    • Manual SQL Profile is readily available as sqlt__sqlprof.sql or can be created using sqlt/utl/coe_xfr_sql_profile.sql
  • Reproducing an execution plan from one system to another and Creating test cases
    • I usually pull SQLT from the server I’m troubleshooting and load it on my laptop. And from there I can have the same optimizer environment then I can do anything I want with it. (well you can also do this from prod then load the SQLT to dev/QA environment)
    • I mainly use the SQLT-tc (test case builder) and that is totally different from the TCB (test case builder) which uses the DBMS_SQLDIAG.. wait, am I confusing you with the TCB terms? :)
      • Let me give you a little background

Each SQLTXPLAIN run (START sqltxtract.sql SQL_ID) packages everything in one zip file

[oracle@karl sqlt_s54491-good]$ du -sm sqlt_s54491.zip
 47 sqlt_s54491.zip

and inside that zip file includes the test case zip file that has a *_tc.zip extension (sqlt_s54491_tc.zip)
Note that this is different from the *_tcb.zip extension (sqlt_s54491_tcb.zip) which is using the DBMS_SQLDIAG TCB (test case builder) package

[oracle@karl sqlt_s54491-good]$ unzip -l sqlt_s54491.zip
 Archive: sqlt_s54491.zip
 Length Date Time Name
 --------- ---------- ----- ----
 104557695 11-27-2011 02:36 sqlt_s54491_10053_explain.trc
 2587 11-27-2011 09:18 sqlt_s54491_driver.zip
 660035 11-27-2011 02:35 sqlt_s54491_lite.html
 28711 11-27-2011 09:18 sqlt_s54491_log.zip
 34589692 11-27-2011 02:35 sqlt_s54491_main.html
 316568 11-27-2011 09:18 sqlt_s54491_opatch.zip
 40487 11-27-2011 02:35 sqlt_s54491_p73080644_sqlprof.sql
 21526 11-27-2011 02:35 sqlt_s54491_readme.html
 461504 11-27-2011 02:35 sqlt_s54491_sql_detail_active.html
 385239 11-27-2011 02:35 sqlt_s54491_sql_monitor_active.html
 424680 11-27-2011 02:35 sqlt_s54491_sql_monitor.html
 57213 11-27-2011 02:35 sqlt_s54491_sql_monitor.txt
 333117 11-27-2011 02:35 sqlt_s54491_sta_report_mem.txt
 1488 11-27-2011 02:35 sqlt_s54491_sta_script_mem.sql
 616556 11-27-2011 02:36 sqlt_s54491_tcb.zip         <-- ''DBMS_SQLDIAG TCB'' (test case builder) package zip file
 8659 11-27-2011 09:18 sqlt_s54491_tc_script.sql
 7793 11-27-2011 09:18 sqlt_s54491_tc_sql.sql
 4254816 11-27-2011 09:18 sqlt_s54491_tc.zip         <-- ''SQLT-tc test case'' zip file
 33308174 11-27-2011 09:18 sqlt_s54491_trc.zip
 --------- -------
 180076540 19 files
  • And from the wiki, you will see the related wiki topics that starts with “testcase -“. Actually, you can generate test cases using the following methods:

DB Optimizer:

Below you will see that the highlighted table name starts with the schema TC, that means it was generated from SQLT-tc and I was also able to reproduce underlying objects of the SQL_ID and the optimizer environment on my laptop.

The cool thing about DB Optimizer is that it automates a lot of tedious tasks for you. It takes away the pain of manual diagramming just like what I’m doing here http://goo.gl/VZFB that turns out just look like this http://goo.gl/ttrh

Bad Plan Patrick Star

I’m a very visual guy, and I can have a better understanding of the problem and solve things quickly (and creatively) by making use of visualization.

The highlighted table on the diagram below shows a missing join condition. This version of the SQL takes FOREVER!… here with DB Optimizer, right after it prints the VST diagram and correlating it with the “merge join cartesian” on the SQLT.. it’s pretty easy to identify the missing join (see highlighted SQL on the left) which is the root cause of the performance problem.

and a quick modification on the SQL made it run for just 10 seconds! that’s the power of SQLTXPLAIN and DB Optimizer VST  ;)

You may also want to check out this ultimate SQL tune off by Kyle Hailey and Jonathan Lewis to know more about DB Optimizer http://dboptimizer.com/2010/06/14/jonathan-lewis-kyle-hailey-webinar-replay/

And my example usage of the tool on a high load SAP OLTP environment – DB Optimizer example – 3mins to 10secs

That’s it for now, hope I’ve shared you some good stuff :)

Update: 

I attended the 2nd day of Hotsos and I was able to meet Carlos, he’s a really nice guy. Randy Johnson, Carlos, and I had a great discussion about the tools we are working on and currently developing… we were like kids on a table with a laptop showing off each others work & that includes Jamey Johnston showing his performance monitoring tool from his server at home… I was able to show Carlos how the VST of DB Optimizer works and he was impressed by the power of easily slicing and dicing on the query visually. Of course we shared a couple of SQLTXPLAIN tips and tricks and war stories where the tool was really useful. And all of these stuff over beer and good food :) that was a fun Hotsos party :)

The 10th Annual Hotsos Symposium is coming...

I remember the first Hotsos Symposium like it was yesterday.  It was a time I got to meet many people I had interacted with online for the first time - like Jonathan Lewis for example.  I cannot believe it was 10 years ago...

I've attended all but one of these events over the years (and I definitely won't be missing the 10 year anniversary!) - it is one of the few conferences where I try to attend as many sessions as I can myself.  One thing I like is there are only two tracks at most - meaning you'll miss some content you probably wanted to see - but nothing on the scale of the larger conferences out there with 10, 20 or more concurrent tracks.

The party they throw every year isn't bad either.  (the food is probably the best conference food ever ;) )

In addition to the three day conference (March 5th-March 7th) - Jonathan Lewis will be giving the extra training day on Thursday.

But wait, there's more...

Cary Millsap will be giving a one day training course on Friday March 9th in the same area.

So, if you want to overdose on Oracle for a week, I'd recommend the Symposium, followed by Jonathan's "Designing Optimal SQL" training day, and closing up with Cary and some trace data analysis.

Dbvisit Replicate…

Last months I wrote a post Dbvisit Standby and said I planned on taking a look at their replication product at some point. I had actually already started writing an article on it, but got a bit side tracked before putting it live. I went back to it today and noticed a new version of the product had been released, so I downloaded it and gave it a run through.

Once again DBVisit have produced a clean and easy to install product that does exactly what it says it will. You guys better be careful or you’re going to turn me into a fanboy! :)

If you need replication and are checking out Oracle Streams and Oracle GoldenGate, you should probably take a look at Dbvisit Replicate also. You could be pleasantly surprised.

Cheers

Tim…




What is ‘rdbms ipc message’ wait event?

Introduction

There was a question about the wait event ‘rdbms ipc message’ in Oracle-l list. Short answer is that ‘rdbms ipc message’ event means that a process is waiting for an IPC message to arrive. Usually, this wait event can be ignored, but there are few rare scenarios this wait event can’t be completely ignored.

What is ‘rdbms ipc message’ wait means?

It is typical of Oracle Database background processes to wait for more work. For example, LGWR will wait for more work until another (foreground or background ) process request LGWR to do a log flush. In UNIX platforms, wait mechanism is implemented as a sleep on a specific semaphore associated with that process. This wait time is accounted towards database wait events ‘rdbms ipc message’.

Also note that, semaphore based waits are used in other wait scenarios too, not just ‘rdbms ipc message’ waits.

Time to Trace

We will use UNIX utility TRUSS to trace system calls from LGWR; We will enable sql trace on LGWR process. Using the output of these two methods, we will explore this wait event.

-- First we will identify UNIX PID of LGWR
$ ps -ef|grep ora_lgwr_${ORACLE_SID}
  oracle  1508     1   0 13:27:33 ?           0:00 ora_lgwr_solrac1

-- Next we will map that to Oracle session sid. 
select sid,b.serial#,b.program,b.username from v$process a,v$session b
where a.addr=b.paddr
and a.spid=&proc_id
/
Enter value for proc_id: 1508
  SID    SERIAL# PROGRAM                                          USERNAME
----- ---------- ------------------------------------------------ ------------------------------
   19          1 oracle@solrac1 (LGWR)

-- Let's enable sqltrace on LGWR session

SYS@solrac1:1>EXEC DBMS_MONITOR.session_trace_enable(session_id =>19, serial_num=>1, waits=>TRUE, binds=>FALSE);

PL/SQL procedure successfully completed.

Note, Enabling sql trace on LGWR is not a grand idea in a production environment, actually, even in environments where the performance is vital, so, try this test case only at home. Next section below prints few lines from the trace file of that LGWR process.

*** 2012-02-10 13:42:31.333
WAIT #0: nam='rdbms ipc message' ela= 3000112 timeout=300 p2=0 p3=0 obj#=-1 tim=1269283430

*** 2012-02-10 13:42:34.333
WAIT #0: nam='rdbms ipc message' ela= 3000349 timeout=300 p2=0 p3=0 obj#=-1 tim=1272283936 -- line #1

*** 2012-02-10 13:42:37.334
WAIT #0: nam='rdbms ipc message' ela= 3000397 timeout=300 p2=0 p3=0 obj#=-1 tim=1275284461 -- line #2

*** 2012-02-10 13:42:40.334
WAIT #0: nam='rdbms ipc message' ela= 3000185 timeout=300 p2=0 p3=0 obj#=-1 tim=1278284833 -- line #3

*** 2012-02-10 13:42:41.157
WAIT #0: nam='rdbms ipc message' ela= 822820 timeout=300 p2=0 p3=0 obj#=-1 tim=1279107848 -- line #4
WAIT #0: nam='log file parallel write' ela= 604 files=2 blocks=2 requests=2 obj#=-1 tim=1279108889 -- line #5

I tagged the lines with comment to improve readability and refer to those line numbers while reading this paragraph. There are few important points to understand:

  1. As you may know, these trace lines are printed after the completion of waits. In Line #1, LGWR process completed a wait of 3 seconds and woken up at 2012-02-10 13:42:34.333 time. Essentially, LGWR was sleeping for 3 seconds between 13:42:31.333 and 13:42:34.333.
  2. At Line #2 and Line #3, LGWR was sleeping for full three seconds.
  3. But, in Line #4, elapsed time of ‘rdbms ipc message’ is 0.8 seconds, meaning, LGWR slept for 0.8 seconds only (not full three seconds). Line #5 shows that LGWR wrote a log block to the log file.

In a nutshell, LGWR is sleeping on ‘rdbms ipc message’ for full 3 seconds if there is no work to be done. Another process can wakeup LGWR process and trigger a log write as indicated in line #4 and line #5. I will disable SQL trace on LGWR at this point and then truss LGWR process.

LGWR truss

Let’s truss the LGWR process and match the pattern with the trace file. Truss command I am printing here will work in Solaris/hp platform. In Linux, you would use strace -ttT command to trace the UNIX process.

$ truss -d -E -p 1508
Base time stamp:  1328904448.3033  [ Fri Feb 10 14:07:28 CST 2012 ]
....
/1:     12.2573  0.0000 times(0xFFFFFD7FFFDFD790)                       = 276585
/1:     semtimedop(7, 0xFFFFFD7FFFDFD178, 1, 0xFFFFFD7FFFDFD190) (sleeping...)
/1:     15.2576  0.0000 semtimedop(7, 0xFFFFFD7FFFDFD178, 1, 0xFFFFFD7FFFDFD190) Err#11 EAGAIN
/1:     15.2578  0.0000 times(0xFFFFFD7FFFDFD790)                       = 276885
/1:     15.2578  0.0000 times(0xFFFFFD7FFFDFD790)                       = 276885
/1:     semtimedop(7, 0xFFFFFD7FFFDFD178, 1, 0xFFFFFD7FFFDFD190) (sleeping...)
/1:     18.2582  0.0000 semtimedop(7, 0xFFFFFD7FFFDFD178, 1, 0xFFFFFD7FFFDFD190) Err#11 EAGAIN
/1:     18.2586  0.0000 times(0xFFFFFD7FFFDFD790)                       = 277185
/1:     18.2587  0.0000 times(0xFFFFFD7FFFDFD790)                       = 277185
/1:     semtimedop(7, 0xFFFFFD7FFFDFD178, 1, 0xFFFFFD7FFFDFD190) (sleeping...)
/1:     21.2589  0.0000 semtimedop(7, 0xFFFFFD7FFFDFD178, 1, 0xFFFFFD7FFFDFD190) Err#11 EAGAIN
/1:     21.2590  0.0000 times(0xFFFFFD7FFFDFD790)                       = 277485
/1:     21.2591  0.0000 times(0xFFFFFD7FFFDFD790)                       = 277485
/1:     semtimedop(7, 0xFFFFFD7FFFDFD178, 1, 0xFFFFFD7FFFDFD190) (sleeping...)
/1:     23.8895  0.0000 semtimedop(7, 0xFFFFFD7FFFDFD178, 1, 0xFFFFFD7FFFDFD190) = 0
/1:     23.8898  0.0001 kaio(AIOWRITE, 261, 0x6023D000, 6144, 0xFC73DB480CADD400) = 0
/1:     23.8903  0.0000 kaio(AIOWRITE, 263, 0x6023D000, 6144, 0xFC73DDD00C7DD400) = 0
/1:     23.8908  0.0001 kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF)               = 0
/1:     23.8917  0.0001 kaio(AIOWAIT, 0xFFFFFD7FFFDFC250)               = -2748838584880
/1:     23.8919  0.0000 kaio(AIOWAIT, 0xFFFFFD7FFFDFC250)               = -2748838585528

From the output of truss command above, we can understand LGWR behavior better. Second column prints the time offset from the base timestamp.

  1. At time offset 12.2573, LGWR went in to sleep calling the semtimedop call. I will explain about semtimedop system call shortly.
  2. At time offset 15.2576, LGWR was woken up from the sleep (meaning, that semtimedop call returned after three seconds of timer expiry).
  3. LGWR went to sleep again for three seconds immediately after( possibly because no work to be done) and woken up at 18.2582 time offset. Similar 3 seconds timer expiry occured at 21.2589 time offset also.
  4. It gets interesting at 23.8895 time offset. LGWR was (rudely!) woken up at 23.8895 time offset without allowing the LGWR to sleep for full three seconds ( 2.6 seconds). You can see few kaio calls (kaio kernalized async i/o calls) later and LGWR was submitting asynchronous I/O calls to the redo log file.

semtimedop

UNIX system call semtimedop is used by Oracle processes to sleep with a timer. In this example, LGWR called semtimedop system call with a 3 second timeout. Calling semtimedop system call with 3 seconds timer will suspend the process. Kernel will schedule the UNIX process in CPU (LGWR in this example) if one of these two conditions occur (a) Timer expired as requested by the process or (b) another process modifies this specific semaphore.

In our example, if there is no work, then, the LGWR process will sleep for full 3 seconds until the expiry of semaphore timer; If there is work to be done prior to that 3 seconds timer expiry, then another process will modify the semaphore associated with LGWR and so, Kernel will wakeup LGWR and schedule the LGWR in the CPU. LGWR will perform a log flush sync subsequently.

Advantage with this approach of sleep using semaphore with a timer expiry is that process will not consume any CPU while suspended.

This semaphore based sleep is analogous to a parent sleeping on a Saturday morning. Parent will wakeup if the timer goes off at 7:30 AM or if the kids wakeup prior to that alarm :)

Evidently, you can see that when the LGWR process is sleeping on semaphore, then the time is accounted towards the wait event ‘rdbms ipc message’. You should also remember that semaphores are created as semaphore sets and semctl system call allows the code to modify a specific semaphore in a semaphore set.

semtimedop is used in numerous places

Wait event ‘rdbms ipc message’ is not the only wait event associated with semtimedop system call. There are numerous places semtimedop call is used. Let’s examine a row level locking contention with two processes.

-- In session #1, we update a row.
SYS@solrac1:1>@mypid

SPID
------------------------
3100
SYS@solrac1:1>update rs.t_one set v1=v1 where n1=100;

1 row updated.
----------------
-- In session #2, we will find our PID and then, and then try updating the same row leading to locking contention.
----------------
SYS@solrac1:1>@mypid

SPID
------------------------
3048

SYS@solrac1:1>update rs.t_one set v1=v1 where n1=100;

At this time, PID 3048 is waiting for a lock. We will TRUSS PID 3048, to explore this concept further.

-- In another UNIX window
$ truss -d -E -p 3048
...
11.4971  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
11.9974  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
12.4979  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
12.9982  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
12.9984  0.0000 times(0xFFFFFD7FFFDF6F50)                       = 613374
12.9985  0.0000 times(0xFFFFFD7FFFDF6F50)                       = 613374
13.4987  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
13.9992  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
14.4995  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
15.0008  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
15.0010  0.0000 times(0xFFFFFD7FFFDF6F50)                       = 613575
15.0010  0.0000 times(0xFFFFFD7FFFDF6F50)                       = 613575
15.5012  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
16.0016  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
16.5023  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
17.0030  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) Err#11 EAGAIN
17.0032  0.0000 times(0xFFFFFD7FFFDF6F50)                       = 613775
17.0032  0.0000 times(0xFFFFFD7FFFDF6F50)                       = 613775
17.2466  0.0000 semtimedop(7, 0xFFFFFD7FFFDF69E8, 1, 0xFFFFFD7FFFDF6A00) = 0

Under the cover, PID 3048 is waiting on a semaphore (its own semaphore) in a tight loop with a 0.5 seconds timer expiry. So, essentially, process is sleeping on the semaphore with a 0.5 seconds timer, wakesup after 0.5 seconds of timer expiry, and sleeps again in a loop. At this time, From session #1, I rolled back the changes with a rollback command (commit also will behave exactly the same). Process associated with session #1, identified next waiter in the waiting queue, woke up the waiting process by modifying the semaphore of the waiting process, at 17.2466 time offset. You can see that sleep time for the last semtimedop call is just 0.24 seconds and the semtimedop call returned with a return code of zero. If the timer expired, then EAGAIN is returned as a return code, if not, then 0 is returned. Remember that EAGAIN is not an error and just a timer expiry.

But, in the case of enqueue contention, even though semaphore based sleeps were used, time is accounted towards the wait event ‘enq: TX – row lock contention’. As you see, semaphore based sleeps can be used for waits for many wait evens.

nam='enq: TX - row lock contention' ela= 13930545 name|mode=1415053318 usn<<16 

mypid.sql:

select spid from v$process where
  addr = (select  paddr from v$session where sid=(select sid from v$mystat where rownum =1))
;

Summary

In Summary, this wait event can be ignored. In rare cases, this used to be a platform bug in 7.0 database version, semtimedop will not return even if another process modifies the semaphore. So, generally, this event is an idle event and as such, should be ignored.