Search

OakieTags

Who's online

There are currently 0 users and 21 guests online.

Recent comments

Oakies Blog Aggregator

Friday Philosophy – Christmas Cheer and Business Bah-Humbug

For many, today is the last working day before Christmas and the festive season – So I sincerely wish upon everyone a Merry Christmas.. If you don’t celebrate Christmas, well the intent of my wishes still holds – I hope everyone; whether working or not; religious leanings for, against or indifferent; has an enjoyable few days during whatever end-of-year festives you have.

I’m going to be miserably now. You might want to stop reading here and maybe go to the shops for that last spell of retail hell or some other Christmas tradition. It’s probably best if you do…

You see, despite the best wishes above, generally speaking I am not a big fan of Christmas and have not been for as long as I can remember. It is not the principle of Christmas I am not keen on {I rather like both the religious and secular aspects of the whole thing, especially the seeing-people part like Di and Bri and ringing up old friends}, it is what Business does to it. Like many people, I really object to the bombarding we endure of advertising, selling and down-right commercialist bullying for what seems to be 3 months on the run-up to Christmas. I know, I know, many people make this very same point ad nauseum around this time. What ticks me off the most is that I don’t think it would be an easy thing to change, for the fundamental reason that the businesses that are so set on telling us that Christmas will not be as good as it could be if we don’t buy their food to make us fat/get expensive presents for the kids to break/buy this bottle of smelly stuff so we get more sex/buy this booze cheap, probably for the same reason as the smelly stuff {or to help ignore the lack of sex}/take out a loan to make this Christmas REALLY “special” and you can pay it off for the whole of the rest of the year and be miserable as a result, {pause to catch breath…} as I was saying, any business that sells more stuff as a result of their advertising, no matter how much it annoys other people or adds to the degrading of the whole Christmas experience, will do better than a company that does not. And so will out-compete less tacky, crass and manipulative businesses.

That’s the huge problem with Christmas and other celebratory times. We live in a commercial society and commercial selection pressure means those companies that can squeeze the most out of a situation to sell tat will win. They give not a hoot about if we enjoy ourselves really {we are back to the smelly stuff and booze again, aren’t we?}, it’s profit. Oh, if enjoying ourselves in some way aids them in getting more profit then they won’t object, but it is not in the company mission statement of 99% of companies – and any that it is in are doing it for cynical, commercialist reasons.

So, all successful businesses are Evil and are ruining Christmas for us all {OK, so that’s a bit of a big leap, stay with me….} So, have your revenge!!!

Next year:

  • Don’t buy stuff people probably don’t want. No adult wants 95% of what they get so….get nothing.
  • Tell everyone “I have all the stuff I need, buy yourself something instead – treat yourself on me”. You can buy the stuff you really want from the savings from point 1.
  • Having established the principle of reciprocal meanness above, that’s all that shopping hassle ditched.
  • Get normal food you like {and that does not play merry hell with your digestive system}. Preferably stuff you can freeze or keep a while, so you don’t need to go into the supermarket after Dec 20th.
  • Turn off the TV in December {or at least record everything and skip the adverts}. There is no decent TV in December anyway, it is all being saved up for the end of the month and, heck, even that is pretty awful.
  • Don’t read the paper. Or if you do, if you must, first four pages and last four pages only and scribble over adverts with a felt-tip pen. You’ll get the gist of world events and if your team is winning or losing.
  • That company you work for, that thinks paying you a wage means it owns your soul? It’s Evil, you owe them nothing they are not getting out of you already, so have a nice break at Christmas. {Unless you work at the same place as me, then they will need you to fill in for me as I will be on holiday}.

You will now be more relaxed, less stressed, have more time and generally be a nicer person. Take people to the pub, spend more time with people who like you being around (and this will be easier due to the people who no longer like you as you did not buy them any socks or a rubbish “humorous” golf book). Do things you actually enjoy. This year it is just going to be me, my gorgeous wife and the cat over Christmas and Boxing Day. The cat is really happy about this as we both like scratching the cat’s ears.

I might invite some neighbours over. They won’t come as they have to fulfil their awful Christmas Obligations – but they will like the fact they were invited. Heck, if they do turn up I’ll be in such a fine, happy mood I will even be nice to them.

Go and walk the hills of Mid Wales with your brother and relax.

The mutating table error prevents non-deterministic behavior of your code

A short recap of workarounds 1a and 1b from our previous post.

  • In workaround 1a we ensured that the query that was selecting from the mutating table, was being executed from within an autonomous transaction.
  • In workaround 1b we ensured that this query was being executed via a loopback database link.
We also discussed the major difference between the two workarounds:
  • 1a: the query sees the mutating table as it existed prior to the start of the main transaction.
  • 1b: the query sees the mutating table as it is being mutated by the triggering DML statement. Essentially this workaround behaves as if Oracle did not have a mutating table error at all.

Please never implement these workarounds. Especially 1b for the reason I will demonstrate in this post. And do you really want the overhead of going through a database link every time the row-trigger fires? And potentially have twice the number of database sessions in your database instance? I hope not.

Apart from this, you will have planted a booby trap in your application code: your code can now behave non-deterministically.

To discuss this we will investigate how our row-trigger behaves not in the case of a single row insert, but a multi-row insert. Given our business rule, no manager can be without a clerk in the same department, a multi-row insert that inserts both the manager as well as the necessary clerk as the first two employees for a new department in one go, is a valid insert statement that our trigger-based rule-enforcement code should allow.

Let's look at the behavior of our row-trigger using workaround 1b (loopback db-link) first. Suppose we have a table called EMP_STAGE. This (staging) table is holding employee-rows that are to be inserted into the real EMP table. Table EMP_STAGE currently holds two employees: a manager and a clerk both working in department 42.

Now see what our row-trigger does.

Our row-trigger does not allow this insert. Why? Well think about it. How is the statement 'insert into EMP select * from EMP_STAGE' executed? Particularly in what order are the selected from EMP_STAGE and then inserted into EMP? Apparently in the case above the manager was the first one that was selected and inserted into EMP. Which would then fire our row-trigger, which would then see the mutating table (via the loopback db-link) and not find the required clerk in department 24. Hence our row-trigger raised the business rule error. And the multi-row insert statement never reaches inserting the second row (the clerk), it is cancelled by the raise_application_error and rolled back.

But what if the rows happened to be processed in a different order? We can force this by simply adding an order-by clause to the subquery in the insert statement (we re-execute the insert and have added 'order by empno desc'):

Our two-row insert statement now executes succesfully!

So what happened in this case? Due to the order by clause we've ensured that employee 111 (the clerk) got inserted first. The row-trigger fires, sees it's not a manager and therefore nothing needs to be done. Next employee 110 got inserted. The row-trigger fires again, sees it's a manager, and executes the business rule check by calling our procedure. This procedure now queries the EMP table (via the loopback db-link) and it actually sees (!) the clerk that just got inserted by the same statement, and therefore executes successfully.

Now the order in which rows get inserted by a multi-row insert statement is (and rightfully should be) of no concern to you. Think of the subquery instead of being a simple 'select * from emp_stage', being a complex multi-table join: depending on current contents/sizes of the tables involved, statistics might differ which in turn might cause different execution plans for the insert statement, which in turn might change the order in which rows get inserted by a multi-row insert statement. If your code is not immune to this order then in effect your code executes in a non-deterministic fashion. And this is the essence of why ORA-04091 is your friend. It protects you against developing software that might work today, and not work tomorrow.

So how does this all work with the autonomous transaction workaround? Here's the same two-row insert for that scenario, executed without and with the forced order (EMP again has no employees in department 42).

As you can see both versions of the insert statements fail with this workaround. The explanation for this is simple. Since the query that checks for the necessary clerk is run from inside an autonomous transaction, all it sees is the EMP table as it existed prior to start of the insert statement: in which there were no employees for department 42. So in this case the 'workaround' is not a workaround at all: yes it prevents the mutating table error, but no it does not allow a valid insert-statement to execute successfully. Note that it even wouldn't allow two subsequent single row-inserts: first insert a clerk, then insert a manager. With this workaround you are forced to commit the clerk first, and only then insert the manager.

So, in summary:

Again, forget about these two red herrings: don't use them.

In the following posts we will start developing how to correctly work around the mutating table error. And as many of you probably know, it involves a combination of row and statement triggers.

In the meantime happy holidays!

Extended DISPLAY_CURSOR With Rowsource Statistics

Introduction

So this will be my Oracle related Christmas present for you: A prototype implementation that extends the DBMS_XPLAN.DISPLAY_CURSOR output making it hopefully more meaningful and easier to interpret. It is a simple standalone SQL*Plus script with the main functionality performed by a single SQL query. I've demoed this also during my recent "optimizer hacking sessions".

DBMS_XPLAN.DISPLAY_CURSOR together with the Rowsource Statistics feature (enabled via SQL_TRACE, GATHER_PLAN_STATISTICS hint, STATISTICS_LEVEL set to ALL or controlled via the corresponding hidden parameters "_rowsource_execution_statistics" and "_rowsource_statistics_sampfreq") allows since Oracle 10g a sophisticated analysis of the work performed by a single SQL statement.

Of course you'll appreciate that it doesn't go as far as the Real-Time SQL Monitoring feature added in Oracle 11g but only available with Enterprise Edition + Diagnostic + Tuning Pack that is "always on" and provides similar (and much more) information while a statement is executing and doesn't require reproducing the execution with the corresponding hints / parameters set.

It's usually necessary to reproduce the execution without the Tuning Pack because the overhead of the Rowsource Statistics is significant and therefore it doesn't make sense to have them always enabled - unfortunately Oracle 11g gathers the same information "always on" but you're only allowed to access that information if you have the Tuning Pack license.

But for users without the corresponding licenses DBMS_XPLAN.DISPLAY_CURSOR together with Rowsource Statistics is still a very valuable tool.

However during my seminars and consulting at client sites I've realized that people quite often struggle to interpret the output provided for several reasons:

1. They have problems in general to interpret the execution plan - here I refer in particular to the flow of execution and underlying execution mechanics

2. They have problems in identifying the operations that are responsible for the majority of the work due to the cumulative nature of the work-related figures provided like Elapsed Time, Logical I/O, Physical Reads etc.

3. They are potentially mislead when trying to identify those steps in the execution plan that are subject to cardinality mis-estimates of the optimizer - the single most common reason for inefficient execution plans - due to the way the optimizer shows the number of estimated rows for operations that are executed multiple times (for example the inner row source of a Nested Loop join).

I've tried to address all of the above points (and even more) with this prototype implementation. In fact point 1 above has already been addressed by Adrian Billington's XPLAN wrapper utility that adds the Parent ID and Order of Execution to the DBMS_XPLAN output and I've picked up that idea of injecting additional information into the output with this prototype, so kudos to Adrian for his great idea and implementation.

Apart from any home-grown scripts there have probably been numerous attempts to address point 2 und 3 above, the latest one I know of being Kyle Hailey's DISPLAY_CURSOR post and his "TCF query" provided in the same article. I've included his TCF-GRAPH and LIO-RATIO information, so also thanks to Kyle for posting this.

I plan to eventually turn this into a SQL statement analysis "Swiss-army knife" for non-Tuning Pack users with more sophisticated formatting options (for example specifying which columns to show and in which order) and the ability to combine the information with the ASH info available from the Diagnostic Pack license (similar to the output provided by the Real-Time SQL Monitoring text mode).

However I believe that this prototype is already quite helpful and therefore decided to publish it as it is.

Let's have a look what the extended output has to offer by performing a couple of sample Rowsource Profiles.

Examples

The first example is deliberately kept as simple as possible to explain the basic functionality by performing a full table scan.

SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select count(*) from t1;

COUNT(*)
----------
1000000

Elapsed: 00:00:03.01
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID 5bc0v4my7dvr5, child number 0
-------------------------------------
select count(*) from t1

Plan hash value: 3724264953

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 3 | SELECT STATEMENT | | 1 | | 1 |00:00:02.98 | 15390 | 15386 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 1 | 0 | 2 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:02.98 | 15390 | 15386 | 00:00:01.39 | 0 | 0 | @@@@@@ | | | 0 | | 1 |
| 2 | 1 | 1 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:01.58 | 15390 | 15386 | 00:00:01.58 | 15390 | 15386 | @@@@@@ | @@@@@@@@@@@@| @@@@@@@@@@@@| 0 | | 1000K|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

14 rows selected.

The first thing that becomes obvious is the fact that you need a veeery wide display setting to see all the columns provided :-)

As you can see if you call the script without any parameters it will try to pick up the last statement executed by the current session and call DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST formatting option. Further options can be found in the documentation provided with the script.

To the left you can see the "Pid" and "Ord" column that Adrian added in his original XPLAN wrapper script - these define the Parent Id as well as the Order of Execution. Note that this Order of Execution is only correct for the common cases - it doesn't cater for the various exceptions to the general rules and therefore can be misleading. You'll find below an example that demonstrates this.

Furthermore you see in addition the following columns to what is provided out of the box by DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST option:

A-Time Self: This is the time spent on the operation itself. For leaf operations this corresponds to the A-Time, but for all non-leaf operations this is the time that was spent on the operation itself obtained by subtracting the time spent on all direct descendant operations from the time shown for the parent operation. Please note that if you use a lower rowsource sample frequency (for example as set by the GATHER_PLAN_STATISTICS hint) the A-Time information will be pretty wrong and misleading. You need to set the sample frequency to 1 to get a stable time information reported - of course this means that the overhead of the rowsource sampling gets maximized

Bufs Self/Reads Self/Write Self: This is the corresponding self-operation statistic obtaining in the same way as just described

Graphs: The self-operation work shown relative to the total work performed. Note that the "total" is defined by querying the MAX value found in the statistics rather than picking the top-most cumulative value. This is because for queries that are cancelled or performed using Parallel Execution the top-most value may either not be populated at all or may be different from the values accumulated by the Parallel Slaves. So there are cases where the Graphs may be wrong and misleading - treat them carefully

LIO Ratio: This the simply the ratio between the number of rows generated by the row source and the number of logical I/O recorded for the particular operation required to generate them. As usual care should be taken when interpreting a ratio, but in general a high value here might indicate that there are more efficient ways to generate the data, like a more selective access path. This can be very misleading for aggregation operations for example - a COUNT(*) will potentially show a huge LIO ratio but doesn't indicate a problem by itself

TCF Graph: "Tuning by Cardinality Feedback" - this is a graph in a different style - it shows either plus or minus signs and each plus / minus corresponds to one order of magnitude difference between the estimated and the actual rows. Plus stands for underestimates, minus for overestimates. So two plus signs indicate that the the actual number of rows was 100 times greater than the estimated number, and similarly two minus signs would indicate an overestimate of factor 100. Note that this information will be partially misleading with Parallel Execution, because an operation that is only started once with serial execution might be started several times to obtain the complete result set when executed in parallel. Also cancelling queries might show misleading information here, see the "E-Rows*Sta" column description for an explanation why.

E-Rows*Sta: This is the estimated number of rows times the Starts column. This tries to address the point 3 above where the simple comparison of E-Rows and A-Rows can be very misleading, however doesn't indicate a problem at all if the operation has been started a corresponding number of times. If a query gets cancelled then this might still indicate a difference between this and A-Rows simply because the operation wasn't run to completion. Also for Parallel Execution this information needs to be carefully treated because an operation executed in parallel will be started many times that gets only executed once with serial execution

Looking at above example graph it becomes obvious that all of the logical and physical I/O has been caused by the full table scan of course, but with the increased STATISTICS_LEVEL setting you can see that the SORT AGGREGATE function also required some time - presumably CPU time due to instrumentation overhead whereas the top-most operation didn't account for any work at all. The cardinality estimate is also spot on.

The next example shows a different shape:

SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );
2.7338E+11

Elapsed: 00:00:41.83
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp| A-Time Self |Bufs Self |Reads Self|Write Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|Write S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 6 | SELECT STATEMENT | | 1 | | 1 |00:00:41.84 | 1469K| 49356 | 10578 | | | | | 00:00:00.00 | 0 | 0 | 0 | | | | | 0 | | |
| 1 | 0 | 5 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:41.84 | 1469K| 49356 | 10578 | | | | | 00:00:02.19 | 0 | 0 | 0 | @ | | | | 0 | | 1 |
| 2 | 1 | 4 | VIEW | | 1 | 16 | 1466K|00:00:39.64 | 1469K| 49356 | 10578 | | | | | 00:00:04.22 | 0 | 0 | 0 | @ | | | | 0 | ++++ | 16 |
| 3 | 2 | 3 | WINDOW SORT | | 1 | 16 | 1466K|00:00:35.42 | 1469K| 49356 | 10578 | 93M| 3312K| 55M (1)| 84992 | 00:00:11.85 | 6 | 24410 | 10578 | @@@ | | @@@@@@ | @@@@@@@@@@@@| 0 | ++++ | 16 |
| 4 | 3 | 2 | TABLE ACCESS BY INDEX ROWID| T | 1 | 16 | 1466K|00:00:23.58 | 1469K| 24946 | 0 | | | | | 00:00:19.73 | 1466K| 21707 | 0 | @@@@@@ | @@@@@@@@@@@@| @@@@@ | | 1 | ++++ | 16 |
|* 5 | 4 | 1 | INDEX RANGE SCAN | I | 1 | 16 | 1466K|00:00:03.85 | 3240 | 3239 | 0 | | | | | 00:00:03.85 | 3240 | 3239 | 0 | @ | | @ | | 0 | ++++ | 16 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Have I already mentioned that you need a veery wide display setting :-) ??

Anyway here we can see a couple of interesting points:

- An example of a parent operation requiring a significant amount of time - in this case a WINDOW SORT operation that spills to disk (see the Used-Tmp and Writes columns)

- A problem with the cardinality estimates as indicated by the TCF Graph. In this case it is the reason for an inefficient index-based access path. Note that the LIO Ratio isn't indicating this problem here very clearly

- The majority of the logical I/O (and time and work) is caused by the random access to the table, again caused by the bad choice of the optimizer due to the wrong cardinality estimates

Here is another example of a more complex execution plan:

SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set star_transformation_enabled = temp_disable;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select * from (
2 select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
3 select
4 f.id
5 from
6 t f
7 , (select * from d where is_flag_d1 = 'Y') d1
8 , (select * from d where is_flag_d2 = 'Y') d2
9 , (select * from d where is_flag_d3 = 'Y') d3
10 where
11 f.fk1 = d1.id
12 and f.fk2 = d2.id
13 and f.fk3 = d3.id
14 ) s
15 where t1.id = s.id
16 )
17 where rownum > 1
18 ;

no rows selected

Elapsed: 00:00:21.26
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID 5u3x96k4s5zt6, child number 0
-------------------------------------
select * from ( select t1.id as t1_id, t1.filler, s.id as s_id from t1,
( select f.id from t f , (select * from d where
is_flag_d1 = 'Y') d1 , (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3 where f.fk1 =
d1.id and f.fk2 = d2.id and f.fk3 = d3.id ) s where t1.id = s.id
) where rownum > 1

Plan hash value: 42027304

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 30 | SELECT STATEMENT | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 1 | 0 | 29 | COUNT | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
|* 2 | 1 | 28 | FILTER | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.44 | 0 | 0 | | | | | | |
| 3 | 2 | 27 | NESTED LOOPS | | 1 | | 1000K|00:00:20.79 | 2161K| 43798 | | | | 00:00:00.44 | 0 | 0 | | | | 0 | | |
| 4 | 3 | 25 | NESTED LOOPS | | 1 | 9 | 1000K|00:00:11.10 | 2131K| 21150 | | | | 00:00:00.41 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 5 | 4 | 23 | HASH JOIN | | 1 | 9 | 1000K|00:00:06.12 | 19549 | 17970 | 33M| 6589K| 65M (0)| 00:00:00.59 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 6 | 5 | 21 | HASH JOIN | | 1 | 9 | 1000K|00:00:05.53 | 19385 | 17970 | 37M| 6044K| 69M (0)| 00:00:00.57 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 7 | 6 | 19 | HASH JOIN | | 1 | 10 | 1000K|00:00:04.95 | 19221 | 17970 | 1452K| 1452K| 1002K (0)| 00:00:00.53 | 0 | 0 | | | | 0 | +++++ | 10 |
|* 8 | 7 | 1 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
| 9 | 7 | 18 | VIEW | VW_ST_84A34AF1 | 1 | 10 | 1000K|00:00:04.42 | 19057 | 17970 | | | | 00:00:00.18 | 0 | 0 | | | | 0 | +++++ | 10 |
| 10 | 9 | 17 | NESTED LOOPS | | 1 | 10 | 1000K|00:00:04.24 | 19057 | 17970 | | | | 00:00:00.37 | 0 | 0 | | | | 0 | +++++ | 10 |
| 11 | 10 | 15 | BITMAP CONVERSION TO ROWIDS| | 1 | 10 | 1000K|00:00:00.41 | 2107 | 1020 | | | | 00:00:00.11 | 0 | 0 | | | | 0 | +++++ | 10 |
| 12 | 11 | 14 | BITMAP AND | | 1 | | 11 |00:00:00.30 | 2107 | 1020 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 13 | 12 | 5 | BITMAP MERGE | | 1 | | 11 |00:00:00.12 | 863 | 400 | 1024K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 14 | 13 | 4 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.10 | 863 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 15 | 14 | 2 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 16 | 14 | 3 | BITMAP INDEX RANGE SCAN| T_FK1 | 100 | | 800 |00:00:00.10 | 699 | 400 | | | | 00:00:00.10 | 699 | 400 | | | | 0 | | |
| 17 | 12 | 9 | BITMAP MERGE | | 1 | | 11 |00:00:00.12 | 847 | 400 | 2802K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 18 | 17 | 8 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.10 | 847 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 19 | 18 | 6 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 20 | 18 | 7 | BITMAP INDEX RANGE SCAN| T_FK3 | 100 | | 800 |00:00:00.10 | 683 | 400 | | | | 00:00:00.10 | 683 | 400 | | | | 0 | | |
| 21 | 12 | 13 | BITMAP MERGE | | 1 | | 11 |00:00:00.06 | 397 | 220 | 1024K| 512K| 1581K (0)| 00:00:00.01 | 0 | 0 | | | | 0 | | |
| 22 | 21 | 12 | BITMAP KEY ITERATION | | 1 | | 440 |00:00:00.05 | 397 | 220 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 23 | 22 | 10 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
|* 24 | 22 | 11 | BITMAP INDEX RANGE SCAN| T_FK2 | 10 | | 440 |00:00:00.05 | 233 | 220 | | | | 00:00:00.05 | 233 | 220 | | | | 0 | | |
| 25 | 10 | 16 | TABLE ACCESS BY USER ROWID | T | 1000K| 1 | 1000K|00:00:03.46 | 16950 | 16950 | | | | 00:00:03.46 | 16950 | 16950 | @@ | | @@@@@ | 0 | | 1000K|
|* 26 | 6 | 20 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 27 | 5 | 22 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 28 | 4 | 24 | INDEX RANGE SCAN | T1_IDX | 1000K| 1 | 1000K|00:00:04.57 | 2111K| 3180 | | | | 00:00:04.57 | 2112K| 3180 | @@@ | @@@@@@@@@@@@| @ | 2 | | 1000K|
| 29 | 3 | 26 | TABLE ACCESS BY INDEX ROWID | T1 | 1000K| 1 | 1000K|00:00:09.25 | 29628 | 22648 | | | | 00:00:09.25 | 29628 | 22648 | @@@@@ | | @@@@@@ | 0 | | 1000K|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM>1)
5 - access("ITEM_1"="D"."ID")
6 - access("ITEM_3"="D"."ID")
7 - access("ITEM_2"="D"."ID")
8 - filter("IS_FLAG_D2"='Y')
15 - filter("IS_FLAG_D1"='Y')
16 - access("F"."FK1"="D"."ID")
19 - filter("IS_FLAG_D3"='Y')
20 - access("F"."FK3"="D"."ID")
23 - filter("IS_FLAG_D2"='Y')
24 - access("F"."FK2"="D"."ID")
26 - filter("IS_FLAG_D1"='Y')
27 - filter("IS_FLAG_D3"='Y')
28 - access("T1"."ID"="ITEM_4")

Note
-----
- star transformation used for this statement

68 rows selected.

This is another case where a more efficient execution plan could be found if the cardinality estimate was in the right ballpark - you can see this pretty clearly in the "TCF Graph" column. Due to the strong underestimation several bad choices have been made: Reading all rows from T by ROWID rather than performing simply a full table scan and again an index driven random access to T1 which drives up the logical I/O unnecessarily. This is a crafted example that minimizes the logical and physical I/O due to the good clustering of T1 in relation to the data returned by the driving row source - a more real-life bad clustering together with larger table sizes would have turned this into a more or less infinitely running query.

It is also an example that simply looking at E-Rows and A-Rows can be misleading: Check operations 28 and 29: A-Rows is 1000K but E-Rows is 1, so should this be worrying? Not at all if you look at the "E-Rows*Sta" column because the operation has been started 1000K times hence the estimate is spot on.

The "LIO Ratio" for operation 23 is 16 - this means it took 16 LIOs on average to generate a single row and might indicate that there are more efficient ways to generate those rows than a full table scan.

By the way, the 11g buffer pinning optimization also helped to minimize the logical I/O on the T1 table.

Here is the same query, but this time with a bad clustering of T1 - I've cancelled it after 40 seconds to give you an example of that you can use DBMS_XPLAN.DISPLAY_CURSOR without the need to run a statement for completion.

SQL> select * from (
2 select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
3 select
4 f.id
5 from
6 t f
7 , (select * from d where is_flag_d1 = 'Y') d1
8 , (select * from d where is_flag_d2 = 'Y') d2
9 , (select * from d where is_flag_d3 = 'Y') d3
10 where
11 f.fk1 = d1.id
12 and f.fk2 = d2.id
13 and f.fk3 = d3.id
14 ) s
15 where t1.id = s.id
16 )
17 where rownum > 1
18 ;
select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
*
ERROR at line 2:
ORA-01013: user requested cancel of current operation

Elapsed: 00:00:40.71
SQL> @xplan_extended_display_cursor
SQL_ID 5u3x96k4s5zt6, child number 0
-------------------------------------
select * from ( select t1.id as t1_id, t1.filler, s.id as s_id from t1,
( select f.id from t f , (select * from d where
is_flag_d1 = 'Y') d1 , (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3 where f.fk1 =
d1.id and f.fk2 = d2.id and f.fk3 = d3.id ) s where t1.id = s.id
) where rownum > 1

Plan hash value: 42027304

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 30 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 1 | 0 | 29 | COUNT | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
|* 2 | 1 | 28 | FILTER | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 3 | 2 | 27 | NESTED LOOPS | | 1 | | 102K|00:00:40.42 | 337K| 143K| | | | 00:00:00.16 | 0 | 0 | | | | 0 | | |
| 4 | 3 | 25 | NESTED LOOPS | | 1 | 9 | 102K|00:00:13.08 | 235K| 40794 | | | | 00:00:00.13 | 0 | 0 | | | | 0 | ++++ | 9 |
|* 5 | 4 | 23 | HASH JOIN | | 1 | 9 | 102K|00:00:05.70 | 19420 | 17970 | 33M| 6589K| 65M (0)| 00:00:00.40 | 0 | 0 | | | | 0 | ++++ | 9 |
|* 6 | 5 | 21 | HASH JOIN | | 1 | 9 | 1000K|00:00:05.30 | 19385 | 17970 | 37M| 6044K| 69M (0)| 00:00:00.57 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 7 | 6 | 19 | HASH JOIN | | 1 | 10 | 1000K|00:00:04.73 | 19221 | 17970 | 1452K| 1452K| 1010K (0)| 00:00:00.52 | 0 | 0 | | | | 0 | +++++ | 10 |
|* 8 | 7 | 1 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
| 9 | 7 | 18 | VIEW | VW_ST_84A34AF1 | 1 | 10 | 1000K|00:00:04.21 | 19057 | 17970 | | | | 00:00:00.18 | 0 | 0 | | | | 0 | +++++ | 10 |
| 10 | 9 | 17 | NESTED LOOPS | | 1 | 10 | 1000K|00:00:04.03 | 19057 | 17970 | | | | 00:00:00.37 | 0 | 0 | | | | 0 | +++++ | 10 |
| 11 | 10 | 15 | BITMAP CONVERSION TO ROWIDS| | 1 | 10 | 1000K|00:00:00.42 | 2107 | 1020 | | | | 00:00:00.10 | 0 | 0 | | | | 0 | +++++ | 10 |
| 12 | 11 | 14 | BITMAP AND | | 1 | | 11 |00:00:00.32 | 2107 | 1020 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 13 | 12 | 5 | BITMAP MERGE | | 1 | | 11 |00:00:00.13 | 863 | 400 | 1024K| 512K| | 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 14 | 13 | 4 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.11 | 863 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 15 | 14 | 2 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 16 | 14 | 3 | BITMAP INDEX RANGE SCAN| T_FK1 | 100 | | 800 |00:00:00.11 | 699 | 400 | | | | 00:00:00.11 | 699 | 400 | | | | 0 | | |
| 17 | 12 | 9 | BITMAP MERGE | | 1 | | 11 |00:00:00.13 | 847 | 400 | 2802K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 18 | 17 | 8 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.11 | 847 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 19 | 18 | 6 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 20 | 18 | 7 | BITMAP INDEX RANGE SCAN| T_FK3 | 100 | | 800 |00:00:00.11 | 683 | 400 | | | | 00:00:00.11 | 683 | 400 | | | | 0 | | |
| 21 | 12 | 13 | BITMAP MERGE | | 1 | | 11 |00:00:00.06 | 397 | 220 | 1024K| 512K| | 00:00:00.01 | 0 | 0 | | | | 0 | | |
| 22 | 21 | 12 | BITMAP KEY ITERATION | | 1 | | 440 |00:00:00.05 | 397 | 220 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 23 | 22 | 10 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
|* 24 | 22 | 11 | BITMAP INDEX RANGE SCAN| T_FK2 | 10 | | 440 |00:00:00.05 | 233 | 220 | | | | 00:00:00.05 | 233 | 220 | | | | 0 | | |
| 25 | 10 | 16 | TABLE ACCESS BY USER ROWID | T | 1000K| 1 | 1000K|00:00:03.24 | 16950 | 16950 | | | | 00:00:03.24 | 16950 | 16950 | @ | @ | @ | 0 | | 1000K|
|* 26 | 6 | 20 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 27 | 5 | 22 | TABLE ACCESS FULL | D | 1 | 100 | 11 |00:00:00.01 | 35 | 0 | | | | 00:00:00.00 | 35 | 0 | | | | 3 | | 100 |
|* 28 | 4 | 24 | INDEX RANGE SCAN | T1_IDX | 102K| 1 | 102K|00:00:07.26 | 216K| 22824 | | | | 00:00:07.26 | 216K| 22824 | @@ | @@@@@@@@ | @@ | 2 | | 102K|
| 29 | 3 | 26 | TABLE ACCESS BY INDEX ROWID | T1 | 102K| 1 | 102K|00:00:27.19 | 102K| 102K| | | | 00:00:27.19 | 102K| 102K| @@@@@@@@ | @@@@ | @@@@@@@@@ | 1 | | 102K|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM>1)
5 - access("ITEM_1"="D"."ID")
6 - access("ITEM_3"="D"."ID")
7 - access("ITEM_2"="D"."ID")
8 - filter("IS_FLAG_D2"='Y')
15 - filter("IS_FLAG_D1"='Y')
16 - access("F"."FK1"="D"."ID")
19 - filter("IS_FLAG_D3"='Y')
20 - access("F"."FK3"="D"."ID")
23 - filter("IS_FLAG_D2"='Y')
24 - access("F"."FK2"="D"."ID")
26 - filter("IS_FLAG_D1"='Y')
27 - filter("IS_FLAG_D3"='Y')
28 - access("T1"."ID"="ITEM_4")

Note
-----
- star transformation used for this statement

68 rows selected.

The result is similar to the previous, but you can see that the increased number of physical reads on the T1 table segment slowed down the execution significantly.

Here is an example of Parallel Execution. Note that I strongly recommend the Real-Time SQL Monitoring feature if you have to deal a lot with Parallel Execution, because it is offering much more insight and information than DBMS_XPLAN.DISPLAY_CURSOR.

SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> select /*+ parallel(t1, 4) */ count(*) from t1;

COUNT(*)
----------
1000000

Elapsed: 00:00:01.59
SQL>
SQL> @xplan_extended_display_cursor "" "" ALLSTATS
SQL> set echo off verify off termout off
SQL_ID 92661sht5tyw1, child number 0
-------------------------------------
select /*+ parallel(t1, 4) */ count(*) from t1

Plan hash value: 3110199320

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 7 | SELECT STATEMENT | | 1 | | 1 |00:00:01.53 | 7 | 2 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 1 | 0 | 6 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.53 | 7 | 2 | 00:00:00.00 | 0 | 0 | | | | 0 | | 1 |
| 2 | 1 | 5 | PX COORDINATOR | | 1 | | 4 |00:00:01.53 | 7 | 2 | 00:00:01.53 | 7 | 2 | @@@ | | | 1 | | |
| 3 | 2 | 4 | PX SEND QC (RANDOM) | :TQ10000 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 00:00:00.00 | 0 | 0 | | | | | | 0 |
| 4 | 3 | 3 | SORT AGGREGATE | | 4 | 1 | 4 |00:00:05.84 | 15541 | 15385 | 00:00:00.14 | 0 | 3841 | | | @@@ | 0 | | 4 |
| 5 | 4 | 2 | PX BLOCK ITERATOR | | 4 | 1000K| 1000K|00:00:05.71 | 15541 | 11544 | 00:00:00.27 | 0 | 0 | @ | | | 0 | | 4000K|
|* 6 | 5 | 1 | TABLE ACCESS FULL| T1 | 52 | 1000K| 1000K|00:00:05.44 | 15541 | 15385 | 00:00:05.44 | 15541 | 15385 | @@@@@@@@@@@ | @@@@@@@@@@@@| @@@@@@@@@@@@| 0 | - | 52M|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access(:Z>=:Z AND :Z<=:Z)

23 rows selected.

The important points to consider when dealing with Parallel Execution are:

- You need to use ALLSTATS instead of ALLSTATS LAST in order to get a meaningful output. ALLSTATS LAST would only show the activity of the Query Coordinator. However ALLSTATS means that it shows the statistics cumulative for all executions so far, so if the query has been executed multiple times this will show you not only the statistics of the last execution. If you want to ensure to see only the statistics for the last execution you need to create a new cursor, for example by adding a simple comment that makes the SQL statement text unique

- You can see in the output that the "TCF Graph" and "E-Rows*Sta" columns can be misleading for Parallel Execution - the full table scan has been divided into 52 chunks executed by four parallel slaves, hence the Starts column shows 52 but the cardinality estimate of 1000K rows was spot on instead of wrong by factor 52

- The elapsed time information for the parts executed in parallel is not the wall clock time but the accumulated time spent by all parallel slaves, hence the Graphs will be partially misleading due to the MAX value found

- In this case also the "Reads Self" column seems to indicate reads by the SORT AGGREGATE operation - this looks questionable, too

The last example shows that it takes just a simple scalar subquery to make the output misleading again - so be aware that are some exceptions (like scalar / early filter subqueries, certain Parallel Execution plans etc.) to the rules how to interpret execution plans and usually any automated interpretation of such plans is therefore mislead:

SQL> select count(id) from (select (select id from t1 t1_i where t1_i.id = t1.id) as id from t1);

COUNT(ID)
----------
1000000

Elapsed: 00:00:17.50
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID af2gry2z9g7vt, child number 0
-------------------------------------
select count(id) from (select (select id from t1 t1_i where t1_i.id =
t1.id) as id from t1)

Plan hash value: 1144741071

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 4 | SELECT STATEMENT | | 1 | | 1 |00:00:17.51 | 1551K| 17617 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 1 | 0 | 1 | INDEX RANGE SCAN | T1_IDX | 1000K| 1 | 1000K|00:00:09.36 | 1536K| 2231 | 00:00:09.36 | 1536K| 2231 | @@@@@@ | @@@@@@@@@@@@| @@ | 1 | | 1000K|
| 2 | 0 | 3 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.51 | 1551K| 17617 | 00:00:15.76 | 1536K| 2231 | @@@@@@@@@@@ | @@@@@@@@@@@@| @@ | 1536485 | | 1 |
| 3 | 2 | 2 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:01.74 | 15390 | 15386 | 00:00:01.74 | 15390 | 15386 | @ | | @@@@@@@@@@ | 0 | | 1000K|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T1_I"."ID"=:B1)

The scalar subquery is shown as child operation to the root node (or in 10g with the missing ID = 0 operation in V$SQL_PLAN_STATISTICS(_ALL) as independent operation with no parent at all) and according to the usual rules therefore would be executed first (see the "Ord" column), but this is not true - the execution starts with the first leaf of the main branch of the plan (the full table scan of T1).

Note that not only the "Ord" column is wrong, also the fact that the SORT AGGREGATE operation includes the work performed by the scalar subquery is not interpreted correctly by the remaining logic calculating the operation self statistics.

The Script

Below you can find the current version of the script. If you're too lazy to copy&paste (and because I don't have a fancy "copy to clipboard" button) you can also download the script from here.

Of course I'm interested in feedback. This prototype is not tested much yet, so expect glitches and problems. If you get back to me with reproducible cases I'll try to address them and publish updated versions of the script.

A final note: This tool comes for free but with no warranties at all. Use at your own risk.

Happy rowsource profiling (and holiday season)!

set echo off verify off termout off
set doc off
doc
-- ----------------------------------------------------------------------------------------------
--
-- Script: xplan_extended_display_cursor.sql
--
-- Version: 0.9
-- December 2011
--
-- Author: Randolf Geist
-- oracle-randolf.blogspot.com
--
-- Description: A free-standing SQL wrapper over DBMS_XPLAN. Provides access to the
-- DBMS_XPLAN.DISPLAY_CURSOR pipelined function for a given SQL_ID and CHILD_NUMBER
--
-- This is a prototype for an extended analysis of the data provided by the
-- Runtime Profile (aka. Rowsource Statistics enabled via
-- SQL_TRACE = TRUE, STATISTICS_LEVEL = ALL or GATHER_PLAN_STATISTICS hint)
-- and reported via the ALLSTATS/MEMSTATS/IOSTATS formatting option of
-- DBMS_XPLAN.DISPLAY_CURSOR
--
-- Versions: This utility will work for all versions of 10g and upwards.
--
-- Required: The same access as DBMS_XPLAN.DISPLAY_CURSOR requires. See the documentation
-- of DISPLAY_CURSOR for your Oracle version for more information
--
-- The script directly queries
-- 1) V$SESSION
-- 2) V$SQL_PLAN_STATISTICS_ALL
--
-- Credits: Based on the original XPLAN implementation by Adrian Billington (http://www.oracle-developer.net/utilities.php
-- resp. http://www.oracle-developer.net/content/utilities/xplan.zip)
-- and inspired by Kyle Hailey's TCF query (http://dboptimizer.com/2011/09/20/display_cursor/)
--
-- Features: In addition to the PID (The PARENT_ID) and ORD (The order of execution, note that this doesn't account for the special cases so it might be wrong)
-- columns added by Adrian's wrapper the following additional columns over ALLSTATS are provided:
--
-- A_TIME_SELF : The time taken by the operation itself - this is the operation's cumulative time minus the direct descendants operation's cumulative time
-- LIO_SELF : The LIOs done by the operation itself - this is the operation's cumulative LIOs minus the direct descendants operation's cumulative LIOs
-- READS_SELF : The reads performed the operation itself - this is the operation's cumulative reads minus the direct descendants operation's cumulative reads
-- WRITES_SELF : The writes performed the operation itself - this is the operation's cumulative writes minus the direct descendants operation's cumulative writes
-- A_TIME_SELF_GRAPH : A graphical representation of A_TIME_SELF relative to the total A_TIME
-- LIO_SELF_GRAPH : A graphical representation of LIO_SELF relative to the total LIO
-- READS_SELF_GRAPH : A graphical representation of READS_SELF relative to the total READS
-- WRITES_SELF_GRAPH : A graphical representation of WRITES_SELF relative to the total WRITES
-- LIO_RATIO : Ratio of LIOs per row generated by the row source - the higher this ratio the more likely there could be a more efficient way to generate those rows (be aware of aggregation steps though)
-- TCF_GRAPH : Each "+"/"-" sign represents one order of magnitude based on ratio between E_ROWS_TIMES_START and A-ROWS. Note that this will be misleading with Parallel Execution (see E_ROWS_TIMES_START)
-- E_ROWS_TIMES_START : The E_ROWS multiplied by STARTS - this is useful for understanding the actual cardinality estimate for related combine child operations getting executed multiple times. Note that this will be misleading with Parallel Execution
--
-- More information including demos can be found online at http://oracle-randolf.blogspot.com/2011/12/extended-displaycursor-with-r...
--
-- Usage: @xplan_extended_display_cursor.sql [sql_id] [cursor_child_number] [format_option]
--
-- If both the SQL_ID and CHILD_NUMBER are omitted the previously executed SQL_ID and CHILD_NUMBER of the session will be used
-- If the SQL_ID is specified but the CHILD_NUMBER is omitted then CHILD_NUMBER 0 is assumed
--
-- This prototype does not support processing multiple child cursors like DISPLAY_CURSOR is capable of
-- when passing NULL as CHILD_NUMBER to DISPLAY_CURSOR. Hence a CHILD_NUMBER is mandatory, either
-- implicitly generated (see above) or explicitly passed
--
-- The default formatting option for the call to DBMS_XPLAN.DISPLAY_CURSOR is ALLSTATS LAST - extending this output is the primary purpose of this script
--
-- Note: You need a veeery wide terminal setting for this prototype, something like linesize 400 should suffice
--
-- This tool is free but comes with no warranty at all - use at your own risk
--
#

col plan_table_output format a400
set linesize 400 pagesize 0 tab off

/* ALLSTATS LAST is assumed as the default formatting option for DBMS_XPLAN.DISPLAY_CURSOR */
define default_fo = "ALLSTATS LAST"

column prev_sql_id new_value prev_sql_id
column prev_child_number new_value prev_cn

/* Get the previous command as default
if no SQL_ID / CHILD_NUMBER is passed */
select
prev_sql_id
, prev_child_number
from
v$session
where
sid = userenv('sid')
;

-- The following is a hack to use default
-- values for defines
column 1 new_value 1
column 2 new_value 2
column 3 new_value 3

select
'' as "1"
, '' as "2"
, '' as "3"
from
dual
where
rownum = 0;

column si new_value si
column cn new_value cn
column fo new_value fo

/* Use passed parameters else refer to previous SQL_ID / CHILD_NUMBER
ALLSTATS LAST is default formatting option */
select
nvl('&1', '&prev_sql_id') as si
, coalesce('&2', '&prev_cn', '0') as cn
, nvl('&3', '&default_fo') as fo
from
dual
;

column last new_value last

/* Last or all execution */
select
case
when instr('&fo', 'LAST') > 0
then 'last_'
end as last
from
dual
;

set termout on

with
-- The next three queries are based on the original XPLAN wrapper by Adrian Billington
-- to determine the PID and ORD information, only slightly modified to deal with
-- the 10g special case that V$SQL_PLAN_STATISTICS_ALL doesn't include the ID = 0 operation
-- and starts with 1 instead for Rowsource Statistics
sql_plan_data as
(
select
id
, parent_id
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
),
hierarchy_data as
(
select
id
, parent_id
from
sql_plan_data
start with
id in
(
select
id
from
sql_plan_data p1
where
not exists
(
select
null
from
sql_plan_data p2
where
p2.id = p1.parent_id
)
)
connect by
prior id = parent_id
order siblings by
id desc
),
ordered_hierarchy_data as
(
select
id
, parent_id as pid
, row_number() over (order by rownum desc) as oid
, max(id) over () as maxid
, min(id) over () as minid
from
hierarchy_data
),
-- The following query uses the MAX values
-- rather than taking the values of PLAN OPERATION_ID = 0 (or 1 for 10g V$SQL_PLAN_STATISTICS_ALL)
-- for determining the grand totals
--
-- This is because queries that get cancelled do not
-- necessarily have yet sensible values in the root plan operation
--
-- Furthermore with Parallel Execution the elapsed time accumulated
-- with the ALLSTATS option for operations performed in parallel
-- will be greater than the wallclock elapsed time shown for the Query Coordinator
--
-- Note that if you use GATHER_PLAN_STATISTICS with the default
-- row sampling frequency the (LAST_)ELAPSED_TIME will be very likely
-- wrong and hence the time-based graphs and self-statistics will be misleading
--
-- Similar things might happen when cancelling queries
--
-- For queries running with STATISTICS_LEVEL = ALL (or sample frequency set to 1)
-- the A-TIME is pretty reliable
totals as
(
select
max(&last.cu_buffer_gets + &last.cr_buffer_gets) as total_lio
, max(&last.elapsed_time) as total_elapsed
, max(&last.disk_reads) as total_reads
, max(&last.disk_writes) as total_writes
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
),
-- The totals for the direct descendants of an operation
-- These are required for calculating the work performed
-- by a (parent) operation itself
-- Basically this is the SUM grouped by PARENT_ID
direct_desc_totals as
(
select
sum(&last.cu_buffer_gets + &last.cr_buffer_gets) as lio
, sum(&last.elapsed_time) as elapsed
, sum(&last.disk_reads) as reads
, sum(&last.disk_writes) as writes
, parent_id
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
group by
parent_id
),
-- Putting the three together
-- The statistics, direct descendant totals plus totals
extended_stats as
(
select
stats.id
, stats.parent_id
, stats.&last.elapsed_time as elapsed
, (stats.&last.cu_buffer_gets + stats.&last.cr_buffer_gets) as lio
, stats.&last.starts as starts
, stats.&last.output_rows as a_rows
, stats.cardinality as e_rows
, stats.&last.disk_reads as reads
, stats.&last.disk_writes as writes
, ddt.elapsed as ddt_elapsed
, ddt.lio as ddt_lio
, ddt.reads as ddt_reads
, ddt.writes as ddt_writes
, t.total_elapsed
, t.total_lio
, t.total_reads
, t.total_writes
from
v$sql_plan_statistics_all stats
, direct_desc_totals ddt
, totals t
where
stats.sql_id='&si'
and stats.child_number = &cn
and ddt.parent_id (+) = stats.id
),
-- Further information derived from above
derived_stats as
(
select
id
, greatest(elapsed - nvl(ddt_elapsed , 0), 0) as elapsed_self
, greatest(lio - nvl(ddt_lio, 0), 0) as lio_self
, trunc((greatest(lio - nvl(ddt_lio, 0), 0)) / nullif(a_rows, 0)) as lio_ratio
, greatest(reads - nvl(ddt_reads, 0), 0) as reads_self
, greatest(writes - nvl(ddt_writes,0) ,0) as writes_self
, total_elapsed
, total_lio
, total_reads
, total_writes
, trunc(log(10, nullif(starts * e_rows / nullif(a_rows, 0), 0))) as tcf_ratio
, starts * e_rows as e_rows_times_start
from
extended_stats
),
/* Format the data as required */
formatted_data1 as
(
select
id
, lio_ratio
, total_elapsed
, total_lio
, total_reads
, total_writes
, to_char(numtodsinterval(round(elapsed_self / 10000) * 10000 / 1000000, 'SECOND')) as e_time_interval
/* Imitate the DBMS_XPLAN number formatting */
, case
when lio_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when lio_self >= 10000000000000000000 then to_char(lio_self/1000000000000000000, 'FM99999') || 'E'
when lio_self >= 10000000000000000 then to_char(lio_self/1000000000000000, 'FM99999') || 'P'
when lio_self >= 10000000000000 then to_char(lio_self/1000000000000, 'FM99999') || 'T'
when lio_self >= 10000000000 then to_char(lio_self/1000000000, 'FM99999') || 'G'
when lio_self >= 10000000 then to_char(lio_self/1000000, 'FM99999') || 'M'
when lio_self >= 100000 then to_char(lio_self/1000, 'FM99999') || 'K'
else to_char(lio_self, 'FM99999') || ' '
end as lio_self_format
, case
when reads_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when reads_self >= 10000000000000000000 then to_char(reads_self/1000000000000000000, 'FM99999') || 'E'
when reads_self >= 10000000000000000 then to_char(reads_self/1000000000000000, 'FM99999') || 'P'
when reads_self >= 10000000000000 then to_char(reads_self/1000000000000, 'FM99999') || 'T'
when reads_self >= 10000000000 then to_char(reads_self/1000000000, 'FM99999') || 'G'
when reads_self >= 10000000 then to_char(reads_self/1000000, 'FM99999') || 'M'
when reads_self >= 100000 then to_char(reads_self/1000, 'FM99999') || 'K'
else to_char(reads_self, 'FM99999') || ' '
end as reads_self_format
, case
when writes_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when writes_self >= 10000000000000000000 then to_char(writes_self/1000000000000000000, 'FM99999') || 'E'
when writes_self >= 10000000000000000 then to_char(writes_self/1000000000000000, 'FM99999') || 'P'
when writes_self >= 10000000000000 then to_char(writes_self/1000000000000, 'FM99999') || 'T'
when writes_self >= 10000000000 then to_char(writes_self/1000000000, 'FM99999') || 'G'
when writes_self >= 10000000 then to_char(writes_self/1000000, 'FM99999') || 'M'
when writes_self >= 100000 then to_char(writes_self/1000, 'FM99999') || 'K'
else to_char(writes_self, 'FM99999') || ' '
end as writes_self_format
, case
when e_rows_times_start >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when e_rows_times_start >= 10000000000000000000 then to_char(e_rows_times_start/1000000000000000000, 'FM99999') || 'E'
when e_rows_times_start >= 10000000000000000 then to_char(e_rows_times_start/1000000000000000, 'FM99999') || 'P'
when e_rows_times_start >= 10000000000000 then to_char(e_rows_times_start/1000000000000, 'FM99999') || 'T'
when e_rows_times_start >= 10000000000 then to_char(e_rows_times_start/1000000000, 'FM99999') || 'G'
when e_rows_times_start >= 10000000 then to_char(e_rows_times_start/1000000, 'FM99999') || 'M'
when e_rows_times_start >= 100000 then to_char(e_rows_times_start/1000, 'FM99999') || 'K'
else to_char(e_rows_times_start, 'FM99999') || ' '
end as e_rows_times_start_format
, rpad(' ', nvl(round(elapsed_self / nullif(total_elapsed, 0) * 12), 0) + 1, '@') as elapsed_self_graph
, rpad(' ', nvl(round(lio_self / nullif(total_lio, 0) * 12), 0) + 1, '@') as lio_self_graph
, rpad(' ', nvl(round(reads_self / nullif(total_reads, 0) * 12), 0) + 1, '@') as reads_self_graph
, rpad(' ', nvl(round(writes_self / nullif(total_writes, 0) * 12), 0) + 1, '@') as writes_self_graph
, ' ' ||
case
when tcf_ratio > 0
then rpad('-', tcf_ratio, '-')
else rpad('+', tcf_ratio * -1, '+')
end as tcf_graph
from
derived_stats
),
/* The final formatted data */
formatted_data as
(
select
/*+ Convert the INTERVAL representation to the A-TIME representation used by DBMS_XPLAN
by turning the days into hours */
to_char(to_number(substr(e_time_interval, 2, 9)) * 24 + to_number(substr(e_time_interval, 12, 2)), 'FM900') ||
substr(e_time_interval, 14, 9)
as a_time_self
, a.*
from
formatted_data1 a
),
/* Combine the information with the original DBMS_XPLAN output */
xplan_data as (
select
x.plan_table_output
, o.id
, o.pid
, o.oid
, o.maxid
, o.minid
, a.a_time_self
, a.lio_self_format
, a.reads_self_format
, a.writes_self_format
, a.elapsed_self_graph
, a.lio_self_graph
, a.reads_self_graph
, a.writes_self_graph
, a.lio_ratio
, a.tcf_graph
, a.total_elapsed
, a.total_lio
, a.total_reads
, a.total_writes
, a.e_rows_times_start_format
, x.rn
from
(
select /* Take advantage of 11g table function dynamic sampling */
/*+ dynamic_sampling(dc, 2) */
/* This ROWNUM determines the order of the output/processing */
rownum as rn
, plan_table_output
from
table(dbms_xplan.display_cursor('&si',&cn, '&fo')) dc
) x
, ordered_hierarchy_data o
, formatted_data a
where
o.id (+) = case
when regexp_like(x.plan_table_output, '^\|[\* 0-9]+\|')
then to_number(regexp_substr(x.plan_table_output, '[0-9]+'))
end
and a.id (+) = case
when regexp_like(x.plan_table_output, '^\|[\* 0-9]+\|')
then to_number(regexp_substr(x.plan_table_output, '[0-9]+'))
end
)
/* Inject the additional data into the original DBMS_XPLAN output
by using the MODEL clause */
select
plan_table_output
from
xplan_data
model
dimension by (rn as r)
measures
(
cast(plan_table_output as varchar2(4000)) as plan_table_output
, id
, maxid
, minid
, pid
, oid
, a_time_self
, lio_self_format
, reads_self_format
, writes_self_format
, e_rows_times_start_format
, elapsed_self_graph
, lio_self_graph
, reads_self_graph
, writes_self_graph
, lio_ratio
, tcf_graph
, total_elapsed
, total_lio
, total_reads
, total_writes
, greatest(max(length(maxid)) over () + 3, 6) as csize
, cast(null as varchar2(128)) as inject
, cast(null as varchar2(4000)) as inject2
)
rules sequential order
(
/* Prepare the injection of the OID / PID info */
inject[r] = case
/* MINID/MAXID are the same for all rows
so it doesn't really matter
which offset we refer to */
when id[cv(r)+1] = minid[cv(r)+1]
or id[cv(r)+3] = minid[cv(r)+3]
or id[cv(r)-1] = maxid[cv(r)-1]
then rpad('-', csize[cv()]*2, '-')
when id[cv(r)+2] = minid[cv(r)+2]
then '|' || lpad('Pid |', csize[cv()]) || lpad('Ord |', csize[cv()])
when id[cv()] is not null
then '|' || lpad(pid[cv()] || ' |', csize[cv()]) || lpad(oid[cv()] || ' |', csize[cv()])
end
/* Prepare the injection of the remaining info */
, inject2[r] = case
when id[cv(r)+1] = minid[cv(r)+1]
or id[cv(r)+3] = minid[cv(r)+3]
or id[cv(r)-1] = maxid[cv(r)-1]
then rpad('-',
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
14 else 0 end /* A_TIME_SELF */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
11 else 0 end /* LIO_SELF */ +
case when coalesce(total_reads[cv(r)+1], total_reads[cv(r)+3], total_reads[cv(r)-1]) > 0 then
11 else 0 end /* READS_SELF */ +
case when coalesce(total_writes[cv(r)+1], total_writes[cv(r)+3], total_writes[cv(r)-1]) > 0 then
11 else 0 end /* WRITES_SELF */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
14 else 0 end /* A_TIME_SELF_GRAPH */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
14 else 0 end /* LIO_SELF_GRAPH */ +
case when coalesce(total_reads[cv(r)+1], total_reads[cv(r)+3], total_reads[cv(r)-1]) > 0 then
14 else 0 end /* READS_SELF_GRAPH */ +
case when coalesce(total_writes[cv(r)+1], total_writes[cv(r)+3], total_writes[cv(r)-1]) > 0 then
14 else 0 end /* WRITES_SELF_GRAPH */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
11 else 0 end /* LIO_RATIO */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
11 else 0 end /* TCF_GRAPH */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
11 else 0 end /* E_ROWS_TIMES_START */
, '-')
when id[cv(r)+2] = minid[cv(r)+2]
then case when total_elapsed[cv(r)+2] > 0 then
lpad('A-Time Self |' , 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('Bufs Self |' , 11) end ||
case when total_reads[cv(r)+2] > 0 then
lpad('Reads Self|' , 11) end ||
case when total_writes[cv(r)+2] > 0 then
lpad('Write Self|' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('A-Ti S-Graph |', 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('Bufs S-Graph |', 14) end ||
case when total_reads[cv(r)+2] > 0 then
lpad('Reads S-Graph|', 14) end ||
case when total_writes[cv(r)+2] > 0 then
lpad('Write S-Graph|', 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('LIO Ratio |' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('TCF Graph |' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('E-Rows*Sta|' , 11) end
when id[cv()] is not null
then case when total_elapsed[cv()] > 0 then
lpad(a_time_self[cv()] || ' |', 14) end ||
case when total_lio[cv()] > 0 then
lpad(lio_self_format[cv()] || '|', 11) end ||
case when total_reads[cv()] > 0 then
lpad(reads_self_format[cv()] || '|', 11) end ||
case when total_writes[cv()] > 0 then
lpad(writes_self_format[cv()] || '|', 11) end ||
case when total_elapsed[cv()] > 0 then
rpad(elapsed_self_graph[cv()], 13) || '|' end ||
case when total_lio[cv()] > 0 then
rpad(lio_self_graph[cv()], 13) || '|' end ||
case when total_reads[cv()] > 0 then
rpad(reads_self_graph[cv()], 13) || '|' end ||
case when total_writes[cv()] > 0 then
rpad(writes_self_graph[cv()], 13) || '|' end ||
case when total_lio[cv()] > 0 then
lpad(lio_ratio[cv()] || ' |', 11) end ||
case when total_elapsed[cv()] > 0 then
rpad(tcf_graph[cv()], 9) || ' |' end ||
case when total_elapsed[cv()] > 0 then
lpad(e_rows_times_start_format[cv()] || '|', 11) end
end
/* Putting it all together */
, plan_table_output[r] = case
when inject[cv()] like '---%'
then inject[cv()] || plan_table_output[cv()] || inject2[cv()]
when inject[cv()] is present
then regexp_replace(plan_table_output[cv()], '\|', inject[cv()], 1, 2) || inject2[cv()]
else plan_table_output[cv()]
end
)
order by
r
;

undefine default_fo
undefine prev_sql_id
undefine prev_cn
undefine last
undefine si
undefine cn
undefine fo
undefine 1
undefine 2
undefine 3

col plan_table_output clear
col prev_sql_id clear
col prev_child_number clear
col si clear
col cn clear
col fo clear
col last clear

Oracle Core: Essential Internals for DBAs and Developers book by Jonathan Lewis

In case you didn’t know, Jonathan Lewis’es new Oracle Core: Essential Internals for DBAs and Developers book is out (for a few weeks already).

I was the technical reviewer for that book and I can say it’s awesome! It will likely be the best Oracle internals book out there for the coming 10 years, just like Steve Adams’es Oracle Internal Services book was in the last decade :) 

Jonathan does a very good job explaining complex things in a simple enough way – and the book is not just dry listing of how things work inside Oracle database, but also why they work like they do and what are the benefits, limitations and side effects of the behavior.

So check out Jonathan’s book page for more details (and I think you can still order a heavily discounted ebook from Apress too):

Enjoy :)

DNS Config for RAC SCAN… Again…

Some time ago I wrote an article about configuring DNS for the SCAN address in Oracle 11gR2 RAC.

Over the last couple of days one of the Oakies has given me some feedback on the article based on his experiences with Oracle Linux 5.

For my virtual RAC installations I typically do the DNS configuration on the host machine, which is running Fedora. It seems Oracle Linux 5 is a little different. Just to make sure I ran through the setup on Oracle Linux 5 and 6 to see what the score was.

Oracle Linux 6 is pretty much the same as Fedora as far as installation and configuration of BIND is concerned. Oracle Linux 5 (and also RHEL5 I assume) is a little funky in that it doesn’t create the default config files automatically. There are sample files you can use, but they contain a load of crap you don’t need and are missing some stuff you do. The easiest way to get it configured is to install the “system-config-bind” package and start the GUI. This notices you have no config and sets it up for you. You can then exit the GUI and do the manual config as described in the article. Very odd.

I’ve made a note about this in the article, so it shouldn’t trip anyone else up. I’ve also given some examples of the manually adjusted default named.conf files for each distro.

Cheers

Tim…




Visualizing Active Session History (ASH) Data With R

One of the easiest ways to understand something is to see a visualization. Looking at Active Session History (ASH) data is no exception and I’ll dive into how to do so with R and how I used R plots to visually present a problem and confirm a hypothesis. But first some background…

Background

Frequently DBAs use the Automatic Workload Repository (AWR) as an entry point for troubleshooting performance problems and in this case the adventure started the same way. In the AWR report Top 5 Timed Foreground Events, the log file sync event was showing up as the #3 event. This needed deeper investigation as often times the cause for longer log file sync times is related to longer log file parallel write times.

#DDDDDD;display:block;font-family:monospace;font-size:12px;line-height:1.2em;overflow:auto;padding:5px;white-space:pre;width:100%;word-wrap:normal;margin:20px 0;">
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync                     3,155,253       9,197      3    6.4 Commit

Drilling into this a bit deeper the two log file events reported the following in the Foreground Wait Events and Background Wait Events sections of the AWR report:

#DDDDDD;display:block;font-family:monospace;font-size:12px;line-height:1.2em;overflow:auto;padding:5px;white-space:pre;width:100%;word-wrap:normal;margin:20px 0;">
Foreground Wait Events               
-> s  - second, ms - millisecond -    1000th of a second
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by wait time desc, waits desc (idle events last)
-> %Timeouts: value of 0 indicates value was  ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0

                                                             Avg
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file parallel write         659,157     0        526       1      0.2   18.2

It is generally always worth looking at the breakdown of these times, as they are averages. We really want to understand the entire histogram of these wait events. For that we can look at the Wait Event Histogram section of the AWR report as below.

#DDDDDD;display:block;font-family:monospace;font-size:12px;line-height:1.2em;overflow:auto;padding:5px;white-space:pre;width:100%;word-wrap:normal;margin:20px 0;">
Wait Event Histogram
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> % of Waits: value of .0 indicates value was  % of Waits: column heading of <=1s is truly 1s is truly >=1024ms
-> Ordered by Event (idle events last)

                                                    % of Waits
                                 -----------------------------------------------
                           Total
Event                      Waits   <1ms  <2ms  <4ms  <8ms <16ms <32ms  1s
-------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write    661.6K  84.7   9.7   4.7    .4    .1    .3    .0
log file sync              3138.K  14.0  42.8  30.4   7.9   2.3   2.6    .1

Wait Event Histogram Detail (64 msec to 2 sec)
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> Units for % of Total Waits:
   ms is milliseconds
   s is 1024 milliseconds (approximately 1 second)
-> % of Total Waits: total waits for all wait classes, including Idle
-> % of Total Waits: value of .0 indicates value was  Ordered by Event (only non-idle events are displayed)

                                                 % of Total Waits
                                 -----------------------------------------------
                           Waits
                           64ms
Event                      to 2s <32ms <64ms <1/8s <1/4s <1/2s   <1s   =2s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write       52 100.0    .0    .0
log file sync               3535  99.9    .1    .0

One thing that you should notice here is there are two sections of Wait Event Histogram; the buckets less than 32ms and buckets greater than 32ms. It is also important to note that only the absence of a value means no timings fell into that bucket — so even though the report shows .0 there are still events in that bucket (read the section description as it mentions this).

We can see from the second histogram section that there were 52 times that log file parallel write was over 64ms as well as 3535 times log file sync was over 64ms. At this point a hypothesis is formed that the two events are correlated — that is, the belief is the long log file parallel write events may be causing the long log file sync. To find data that supports the hypothesis (or not) we can look at the Active Session History (ASH) data to get a more granular view of the wait events.

Diving Into The ASHes With R

If the hypothesis is correct, the ASH data should show times where we observe long log file parallel write and long log file sync waits. One could write SQL against V$ACTIVE_SESSION_HISTORY to collect all the samples that may demonstrate the hypothesis — e.g. collect a list of sample ids for each event and examine them for overlap, but a visualization is worth a thousand words.

One way to get quick and easy visualizations from ASH data is to use R. R has become a very popular tool for those doing statistical analysis and it has some quite useful graphing and plotting packages built in. R can connect to Oracle via a JDBC package which makes importing data trivial.

Here is a plot that I put together using R for ~300 sample ids (~5 minutes) from ASH (recall that TIME_WAITED is in microseconds):

Rplot01

As you can see from the plots, nearly every time there is a long log file parallel write there are also numerous long log file sync events. This data supports the hypothesis.

Averages Suck

Most any statistician (or anyone keen on numbers and stats) will tell you averages suck. This does not mean averages are not useful, but one has to keep in mind averages can hide infrequent outliers (which may make them a tad bit evil). For instance, AWR is reporting an average of 1ms for log file parallel write but we can see that there are several that are falling in the 20ms range in the 5 minute capture. If we zoom in on the graph (set the y axis at a max of 3ms), we can understand why:
Rplot02

Most of the log file parallel write events are coming in around 0.5ms so even with some 20ms outliers it still yields an average of 1ms, thus hiding the magnitude of impact for the outliers. This is why drilling down into the ASH data was important for us to understand the scope of the issue.

With More Data Comes More Questions

At this point, the visualization of ASH data shows a strong correlation between log file sync and log file parallel write outliers, but the root cause has not yet been identified. Perhaps there is more insight lurking in the ASH data?

Given this is an OLTP workload and log file parallel write is an IO operation, perhaps it’s worth looking at another IO operation like, say, db file sequential read – single block IO times.

In the below plot, I’ve taken a 60 second window of ASH data and plotted all the db file sequential read events.

Rplot11
Interestingly, there are several samples where the IOs are taking significantly longer than most — the majority are below 10ms but we see a few samples where there are groups in the 20ms or more range. Let’s add the log file sync events to the plot.
Rplot12
Hmmm… even more interesting. The data is showing a correlation between log file sync and db file sequential read. Any bets on what the plot looks like if log file parallel write is added?

Rplot13

Very interesting. The data is showing us that all 3 events are correlated strongly. Clearly we are on to something here…and using R to plot the ASH data was an easy way to present and understand it visually.

Summary

While this blog post did not go into specific root cause, it was a healthy journey in debugging and data driven analysis. Keep in mind that just because AWR averages look good, don’t overlook the fact that the event histograms should also be reviewed to see if there are outliers. R allows an easy way to put scatter plots of event times together to get a nice visual of what is going on.

Source Code

Here is the code I used to do the analysis. It’s pretty straight forward and well commented (I think). Enjoy!

I’ll also mention that I use the RStudio IDE for R.

HOWTO: Using the Oracle XMLDB Repository to Automatically Shred Windows Office Documents (Part 1)

People who have attended the UKOUG presentation this year where Mark Drake, Sr. Product Manager XML Technologies / XMLDB, Oracle HQ, and I demonstrated the first principles of the XDB Repository, might have been impressed with its (GEO/KML Spatial, Image EXIF info) capabilities combined with Google Earth. This post will zoom in on how to

Read More…

VirtualBox 4.1.8 Released…

VirtualBox 4.1.8 is now available!

Happy upgrading… :)

Cheers

Tim…




Sherlock Holmes: A Game of Shadows…

Sherlock Holmes: A Game of Shadows is a worthy sequel to the previous film. I read a couple of reviews saying the plot was too complicated. This surprises me because I would have though most people who saw the first film would understand that you are not meant to be able to fathom the clues. The whole point is that he is so amazing he can string a bunch of trivial things together and come up with some outlandish conclusion. The more ridiculous the better as far as I’m concerned.

This film is even more camp than the previous outing. There is definitely some bromance going on between Holmes and Watson. What with Holmes cross dressing and his brother (Stephen Fry) calling him Shirley, you get some totally Mr Humphries moments. The onscreen chemistry between Jude Law and Robert Downey Jr. is so good I imagine they would be a riot if you went down the pub with them.

A special note must be made of Stephen Fry’s hair in one of the scenes. It is all curled up at the ends and is truly a sight to behold. I was both cringing and laughing all at the same time. Perfect casting if you ask me.

If you loved the first one you’ll love this one. Great holiday fun.

Cheers

Tim…




Return Of The Mack…

On my final day in Bulgaria I managed to leave my coat, hat and gloves in a restaurant about 2 hours away from Sofia. I realized when we were about 30 minutes from Sofia, but thought it was no point saying anything because we didn’t have time to go back and get them. When we arrived at the hotel someone else noticed I wasn’t wearing a coat and I explained my foolishness and thought no more of it.

A few days after I got home I got an email from Milena Gerova to say the coat had been located and she was going to post it back to me. A man in a van delivered it this morning, hence the title of the post, “Return of the mack”. Thanks very much Milena. :)

If you came here hoping to see a reference to the 1996 Mark Morrison song called “Return Of The Mack“, there it is. If you were hoping for a reference to the recent Mann: The Mack song, you have that too. :)

Cheers

Tim…