Search

OakieTags

Who's online

There are currently 0 users and 32 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

IOT Part 6 – Inserts and Updates Slowed Down (part A)

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
<……..IOT4 – Boosting Buffer Cache efficiency
<……….IOT5 – Primary Key Drawback
…………>IOT6(B) – OLTP Inserts

A negative impact of using Index Organized Tables is that inserts are and updates can be significantly slowed down. This post covers the former and the reasons why – and the need to always run tests on a suitable system. (I’m ignoring deletes for now – many systems never actually delete data and I plan to cover IOTs and delete later)

Using an IOT can slow down insert by something like 100% to 1000%. If the insert of data to the table is only part of a load process, this might result in a much smaller overall impact on load, such as 25%. I’m going to highlight a few important contributing factors to this wide impact spread below.

If you think about it for a moment, you can appreciate there is a performance impact on data creation and modification with IOTs. When you create a new record in a normal table it gets inserted at the end of the table (or perhaps in a block marked as having space). There is no juggling of other data.
With an IOT, the correct point in the index has to be found and the row has to be inserted at the right point. This takes more “work”. The inserting of the new record may also lead to an index block being split and the extra work this entails. Similar extra work has to be carried out if you make updates to data that causes the record to move within the IOT.
Remember, though, that an IOT is almost certainly replacing an index on the heap table which, unless you are removing indexes before loading data and recreating them after, would have to be maintained when inserting into the Heap table. So some of the “overhead” of the IOT would still occur for the heap table in maintaining the Primary Key index. Comparing inserts or updates between a heap table with no indexes and an IOT is not a fair test.

For most database applications data is generally written once, modified occasionally and read many times – so the impact an IOT has on insert/update is often acceptable. However, to make that judgement call you need to know

  • what the update activity is on the data you are thinking of putting into an IOT
  • the magnitude of the impact on insert and update for your system
  • the ratio of read to write.

There is probably little point putting data into an IOT if you constantly update the primary key values (NB see IOT-5 as to why an IOT’s PK columns might not be parts of a true Primary Key) or populate previously empty columns or hardly ever read the data.

There is also no point in using an IOT if you cannot load the data fast enough to support the business need. I regularly encounter situations where people have tested the response of a system once populated but fail to test the performance of population.

Now to get down to the details. If you remember the previous posts in this thread (I know, it has been a while) then you will remember that I create three “tables” with the same columns. One is a normal heap table, one is an Index Organized Table and one is a partitioned Index Organized Table, partitioned into four monthly partitions. All tables have two indexes on them, the Primary Key index (which is the table in the case of the IOTs) and another, roughly similar index, pre-created on the table. I then populate the tables with one million records each.

These are the times, in seconds, to create 1 million records in the the HEAP and IOT tables:

                  Time in Seconds
Object type         Run_Normal
------------------  ----------
Normal Heap table        171.9  
IOT table               1483.8

This is the average of three runs to ensure the times were consistent. I am using Oracle V11.1 on a machine with an Intel T7500 core 2 Duo 2.2GHz, 2GB memory and a standard 250GB 5000RPM disk. The SGA is 256MB and Oracle has allocated around 100MB-120MB to the buffer cache.

We can see that inserting the 1 million rows into the IOT takes 860% the time it does with a heap table. That is a significant impact on speed. We now know how large the impact is on Insert of using an IOT and presumably it’s all to do with juggling the index blocks. Or do we?

This proof-of-concept (POC) on my laptop {which you can also run on your own machine at home} did not match with a proof-of-concept I did for a client. That was done on V10.2.0.3 on AIX, on a machine with 2 dual-core CPUS with hyper-threading (so 8 virtual cores), 2GB SGA and approx 1.5GB buffer cache, with enterprise-level storage somewhere in the bowels of the server room. The results on that machine to create a similar number of records were:

                  Time in Seconds
Object type         Run_Normal
------------------  ----------
Normal Heap table        152.0  
IOT table                205.9

In this case the IOT inserts required 135% the time of the Heap table. This was consistent with other tests I did with a more complex indexing strategy in place, the IOT overhead was around 25-35%. I can’t go into too much more detail as the information belongs to the client but the data creation was more complex and so the actual inserts were only part of the process – this is how it normally is in real life. Even so, the difference in overhead between my local-machine POC and the client hardware POC is significant, which highlights the impact your platform can have on your testing.

So where does that leave us? What is the true usual overhead? Below are my more full results from the laptop POC.

                        Time in Seconds
Object type         Run_Normal    Run_quiet    Run_wrong_p
------------------  ----------    ---------    -----------
Normal Heap table        171.9        81.83         188.27  
IOT table               1483.8      1055.35        1442.82
Partitioned IOT          341.1       267.83         841.22 

Note that with the partitioned IOT the creation took 341 second, the performance ratio to a heap table is only 198% and is much better than the normal IOT. Hopefully you are wondering why!

I’m running this test on a windows laptop and other things are going on. The timings for Run_Quiet are where I took steps to shut down all non-essential services and applications. This yielded a significant increase for all three object types but the biggest impact was on the already-fastest Heap table.

The final set of figures is for a “mistake”. I created the partitions wrong such that half the data went into one partition and the rest into another and a tiny fraction into a third, rather than being spread over 4 partitions evenly. You can see that the Heap and normal IOT times are very similar to the Run_Normal results (as you would expect as these test are the same) but for the partitioned IOT the time taken is half way towards the IOT figure.

We need to dig into what is going on a little further to see where the effort is being spent, and it turns out to be very interesting. During my proof-of-concept on the laptop I grabbed the information from v$sesstat for the session before and after each object creation so I could get the figures just for the loads. I then compared the stats between each object population and show some of them below {IOT_P means Partitioned IOT}.

STAT_NAME                            Heap    	IOT	        IOT P
------------------------------------ ---------- -------------  -----------
CPU used by this session                  5,716         7,222        6,241
DB time                                  17,311       148,866       34,120
Heap Segment Array Inserts               25,538            10           10

branch node splits                           25            76           65
leaf node 90-10 splits                      752         1,463        1,466
leaf node splits                          8,127        24,870       28,841

consistent gets                          57,655       129,717      150,835
cleanout - number of ktugct calls        32,437        75,201       88,701
enqueue requests                         10,936        28,550       33,265

file io wait time                     4,652,146 1,395,970,993  225,511,491
session logical reads                 6,065,365     6,422,071    6,430,281
physical read IO requests                   123        81,458        3,068
physical read bytes                   2,097,152   668,491,776   25,133,056
user I/O wait time                          454       139,585       22,253
hot buffers moved to head of LRU         13,077       198,214       48,915
free buffer requested                    64,887       179,653      117,316

The first section shows that all three used similar amounts of CPU, the IOT and partitioned IOT being a little higher. Much of the CPU consumed was probably in generating the fake data.The DB Time of course pretty much matches the elapsed time well as the DB was doing little else.
It is interesting to see that the Heap insert uses array inserts which of course are not available to the IOT and IOT_P as the data has to be inserted in order. {I think Oracle inserts the data into the heap table as an array and then updates the indexes for all the entries in the array – and I am only getting this array processing as I create the data as an array from a “insert into as select” type load. But don’t hold me to any of that}.

In all three cases there are two indexes being maintained but in the case of the IOT and IOT_P, the primary key index holds the whole row. This means there has to be more information per key, less keys per block and thus more blocks to hold the same data {and more branch blocks to reference them all}. So more block splits will be needed. The second section shows this increase in branch node and leaf block splits. Double the branch blocks and triple the leaf block splits. This is probably the extra work you would expect for an IOT. Why are there more leaf block splits for the partitioned IOT? The same data of volume ends up taking up more blocks in the partitioned IOT – 200MB for the IOT_P in four partitions of 40-60MB as opposed to a single 170MB for the IOT. The larger overall size of the partition is just due to a small overhead incurred by using partitions and also a touch of random fluctuation.

So for the IOT and IOT_P there is about three times the index-specific work being done and a similar increase in related statistics such as enqueues, but not three times as it is not just index processing that contribute to these other statistics. However, the elapsed time is much more than three times as much. Also, the IOT_P is doing more index work than the IOT but it’s elapsed time is less. Why?

The fourth section shows why. Look at the file io wait times. This is the total time spent waiting on IO {in millionths of a second} and it is significantly elevated for the IOT and to a lesser degree for the IOT_P. Physical IO is generally responsible for the vast majority of time in any computer system where it has not been completely avoided.
Session logical reads are only slightly elevated, almost negligably so but the number of physical reads to support it increases from 123 for the Heap table insert to 81,458 for the IOT and 3,068 for the IOT_P. A clue as to why comes from the hot buffers moved to head of LRU and free buffer requested statistics. There is a lot more activity in moving blocks around in the buffer cache for the IOT and IOT_P.

Basically, for the IOT, all the blocks in the primary key segment are constantly being updated but eventually they won’t all fit in the block buffer cache – remember I said the IOT is eventually 170MB and the buffer cache on my laptop is about 100MB – so they are flushed down to disk and then have to be read back when altered again. This is less of a problem for the IOT_P as only one partition is being worked on at a time (the IOT_P is partitioned on date and the data is created day by day) and so more of it (pretty much all) will stay in memory between alterations. The largest partition only grows to 60MB and so can be worked on in memory.
For the heap, the table is simply appended to and only the indexes have to be constantly updated and they are small enough to stay in the block buffer cache as they are worked on.

This is why when I got my partitioning “wrong” the load took so much longer. More physical IO was needed as the larger partition would not fit into the cache as it was worked on – A quick check shows that logical reads and in fact almost all statistics were very similar but 26,000 IO requests were made (compared to 81,458 for the IOT and 3,068 for the correct IOT_P).

Of course, I set my SGA size and thus the buffer cache to highlight the issue on my laptop and I have to say even I was surprised by the magnitude of the impact. On the enterprise-level system I did my client’s proof of concept on, the impact on insert was less because the buffer cache could hold the whole working set, I suspect the SAN had a considerable cache on it, there was ample CPU resource to cope with the added latching effort and the time taken to actually create the data inserted was a significant part of the workload, reducing the overall impact of the slowness caused by the IOT.

{Update, in This little update I increase my block buffer cache and show that physical IO plummets and the IOT insert performance increases dramatically}.

This demonstrates that a POC, especially one for what will become a real system, has to be a realistic volume on realistic hardware.
For my client’s POC, I still did have to bear in mind the eventual size of the live working set and the probably size of the live block buffer cache and make some educated guesses.

It also explains why my “run_quiet” timings showed a greater benefit for the heap table than the IOT and IOT_P. A windows machine has lots of pretty pointless things running that take up cpu and a bit of memory, not really IO so much. I reduced the CPU load and it benefits activity that is not IO, so it has more impact on the heap table load. Much of the time for the IOT and IOT_P is taken hammering the disk and that just takes time.

So, in summary:

  • Using an IOT increases the index block splitting and, in turn, enqueues and general workload. The increase is in proportion to the size of the IOT compared to the size of the replaced PK.
  • The performance degredation across the whole load process may well be less than 50% but the only way to really find out is to test
  • You may lose the array processing load that may benefit a heap table load if you do the load via an intermediate table.
  • With an IOT you may run into issues with physical IO if the segment (or part of the segment) you are loading into cannot fit into the buffer cache (This may be an important consideration for partitioning or ordering of the data loaded)
  • If you do a proof of concept, do it on a system that is as similar to the real one as you can
  • Just seeing the elapsed time difference between test is sometimes not enough. You need to find out where that extra time is being spent

I’ve thrown an awful lot at you in this one post, so I think I will stop there. I’ve not added the script to create the test tables here, they are in IOT-5 {lacking only the grabbing of the v$sesstat information}.

Flash Is Fast! Provisioning Flash For Oracle Database Redo Logging? EMC F.A.S.T. Is Flash And Fast But Leaves Redo Where It Belongs.

Guy Harrison has been blogging his findings regarding solid state disk testing/performance with Oracle. Guy’s tests and reports are very thorough. This is a good body of work. The links to follow are at the bottom of this post.

Before reading, however, please consider the following thoughts about solid state disk as pertaining to Oracle I/O performance and flash:

  1. Oracle Database Smart Flash Cache (DSFC) is flash storage with libC/libaio physical I/O to “augment” the SGA. When using DSFC you will sustain DBWR writes even if your application only uses the SELECT statement. Few people are aware of this fact. The real SGA buffer pool becomes “L1” cache and DBWR spills clean blocks to the L2 (DSFC) where subsequent logical I/O (cache buffers chain hits) will actually require a physical read from flash. Sessions cannot access buffered data in the DSFC. Blocks have to first be read from flash into the SGA before the session can get along with its business. I have never seen DSFC work effectively. I have, on the other hand, seen a whitepaper showing read-intensive “oltp” serviced by an SGA that is much smaller than available RAM on the system but and augmented by flash. However, the paper is really just showing that flash reads are better than reads from an under-configured hard disk farm. I’ll blog about that paper soon.
  2. Database Smart Flash Cache, Really?  Don’t augment DRAM until you’ve maxed out DRAM.  Do I honestly aim to assert that augmenting nanosecond operations with millisecond operations it non-optimal?, that is my assertion.
  3. DBWR spills to DSFC aren’t charged to sessions, but such activity does take system bandwidth.
  4. If you find a real-life workload that benefits from DSFC please let me know.
  5. Redo Log physical I/O is a well-suited for round, brown spinning disks. Just don’t melt the disks with DBWR random writes and expect the same disks to favor the occasional large sequential writes issued by LGWR. Watch your log file parallel write (LFPW) wait events as a component of log file sync (LFS) and you’ll usually find that LFS is a CPU problem, not a LFPW problem. Read this popular post for more on that matter.
  6. Don’t expect much performance increase, in an Exadata environment, from the new “Exadata Smart Flash Log” feature. It may indeed smooth out LFPW service times, and that is a good thing, but the main beneif Smart Flash Log delivers to Exadata customers is relief from the HDD controller starvation (not to mention the imbalance of processors to spindles) that happens in Exadata when DBWR and LGWR are simultaneously demanding IOPS from the limited number of spindles in Exadata standard configurations. Remember, Exadata’s design center is bandwidth, not write IOPS. A full rack X2 Exadata configuration can only sustain on the order of 25,000 mirrored random writes per second. The closer one pushes Exadata to that 25K IOPS figure the more trouble LGWR has with log flushes. That is, Exadata Smart Flash Log (ESFL) is really just a way to flow LGWR traffic through different adaptors (PCI Flash). In fact, simply plugging in another LSI HDD controller with a few SATA drives dedicated to redo streaming writes would actually do quite well if not as well as ESFL. That is a topic for a different post.
  7. EMC Fully Automated Storage Tiering (FAST) does not muddle around with redo because redo does really well with spinning disks. Hard disk drives do just fine with large sequential writes.
  8. Oracle Database 11g Release 2 added the ability to specify the block size for a redo log. If you do feel compelled to flush redo to solid state I recommend you crack open the documentation for the BLOCKSIZE syntax and add some 4K blocking-factor logs. I made this point on Guy’s blog comment section. I’m not sure if his tests were 4K block size or if they were flushing redo with 512-byte alignment (which flash really doesn’t favor).

And now references to Guy Harrison’s posts:

http://guyharrison.squarespace.com/blog/2011/10/27/using-flash-disk-for-redo-on-exadata.html

http://guyharrison.squarespace.com/ssdguide/04-evaluating-the-options-for-exploiting-ssd.html

Filed under: oracle

W-ASH : Web Enabled ASH

I’m excited about the ease of creating rich user applications  that are web enabled  given the state of technology now. JavaScript and JQuery have gone from being disdained as “not a very serious” language to moving towards the limelight of front and center.

Here is a small example.

Download the following file:  W-ASH (web enabled ASH, file is wash.tar.gz )

Go to your apache web server root, in my case on redhat Linux is

# cd /usr/local/apache2
# gzip -d wash.tar.gz
# tar xvf wash.tar
-rwxr-xr-x  21956  14:08:21 cash.sh
-rw-r--r--  30881  11:52:10 htdocs/ash.html
drwxr-xr-x      0  15:40:52 htdocs/js/
-rwxr-xr-x  10958  14:04:42 cgi-bin/json_ash.sh

(the directory htdocs/js has a number of files put into it from Highcharts. I edited them out to make the output cleaner)

There are 3 basic files

  1. cash.sh  – collect ASH like data from Oracle into a flat file, it  runs in a continual loop
  2. ash.html  – basic web page using Highcharts
  3. json_ash.sh – cgi to read ASH like data and give it to the web page via JSON

Now you are almost ready to go. You just need to start the data collection with “cash.sh”  (collect ASH)

./cash.sh
Usage: usage    [sid] [port]

The script “cash.sh” requires “sqlplus” be in the path and that is all. It’s probably easiest to

  • move/copy cash.sh to an ORACLE_HOME/bin
  • su oracle
  • kick it off as in:
nohup cash.sh system change_on_install 172.16.100.250 orcl &

The script “cash.sh” will create a directory in /tmp/MONITOR/day_of_the_week for each day of the week, clearing out any old files, so there are only maximum 7 days of data. (to stop the  collection run “rm /tmp/MONITOR/clean/*end” )

To view the data go to your web server address and add “ash.html?q=machine:sid
For example my web server is on 172.16.100.250
The database I am monitoring is on host 172.16.100.250 with Oracle SID “orcl”

http://172.16.100.250/ash.html?q=172.16.100.250:orcl

 


See video at : http://screencast.com/t/sZrFxZkTrmn

UltraEdit 2.3 for Mac Available…

UltraEdit 2.3 for Mac is now available. You can get the download here. The change log is here.

Previous releases have coincided with the Linux release. Let’s hope 2.3 for Linux will come soon!

Cheers

Tim…




I really liked this one...

I answer lots of questions about Oracle.  Many of them are ambiguous, unclear, not fully specified, hard to follow.  It is like unraveling a puzzle sometimes (ok, most of the times...)

That is why I found this question to be really amusing - especially after reading the comments.  Lots of disagreement as to the meaning of the question!

So, what do you think the answer is... I'll post what I think after I see some feedback.  It is a very very interesting question.

And it also points to why I think the answer to all questions is mostly "Why" or "It depends".  We usually need a lot more information to answer what might appear to be simple questions.

Apologies

.. both to my readers, and to those waiting for the latest (and late) book.

I’ve been so busy for the last three weeks that I’ve had virtually no time for any serious blogging, or even for answering existing comments. Apart from checking and returning the proofs for the book as fast as Apress sends them to me, I’ve also been busy travelling and doing “proper” work – and I now find that it’s been nearly three weeks since I last published anything

In the interim my most interesting trip has been to Tokyo, where I gave a couple of presentations at the Insight Out conference. As Debra Lilleyhas already commented, the conference was very well organised, the audience attentive, and the hospitality of our hosts was astounding. The English language presentations were subject to simultaneous translation into Japanese, and a fair proportion of the audience were wearing headphones so they could follow the translation. I’ve done the same type of thing with Italian and Spanish events, keeping an ear open for the whisper of the translation from the headphones of a member of the audience,  but, as one of the translators explained to me, not only does it usually take more syllables to say something in Japanese than it does in English but Japanese is a reflexive language so you can’t start translating until the speaker reaches the object of the verb. This makes it particularly important to speak slowly enough to allow the interpreter to keep up … and it’s surprisingly hard work listening to yourself to make sure you don’t speed up. (My admiration for people who do simultaneous translation is unbounded – and I fully appreciate the need for them to work in pairs and switch every 10 minutes.)

Normal service will be resumed as soon as possible – starting with a catch-up on the outstanding comments; in the meantime here is a cartoon about String Theory that made me smile a little while ago.

 

Oracle ASS - one time it will save yours !

For those who don't know Oracle ASS - it's an awk script that is hidden in Oracle's LTOM (aka Lite Onboard Monitoring).
You can download it from metalink - see note ID 352363.1.
It's an awk script that formats system state dumps. I now and then use it to format massive trace files to something I can actually use.
A couple of months ago I had to investigate why a 2 node RAC on windows kept on crashing.
The dba managing it didn't really bother to check the system state dump - instead they just open a call with Oracle and uploaded the trace files.
As usual, the dba blamed the developers - 'it's the app - it's the app' - and I had to defuse the situation and as the developers couldn't see anything wrong and the relationship started to turn sour.
I wasn't allowed access to the system but I was able to have a look at the trace files - or as the dba called it 'a very complex logfile'.
Fair enough - the system state dump was several megabytes larges but luckely I had Oracle ASS ;-)
Within minutes I kind of figured out what the problem was. 
Just looking at some of the wait events from the output below:
SGA: allocation forcing component growth
cursor: pin S wait on X
row cache lock
This looks like an sga resize operation, at which point library cache locks arise.
A sga resize operation can hang the database for brief moments of time especially when the shared pool needs to be resized.
However it this case, the instance was crashing.
With no access to the system I told the dba to have a look v$sga_resize_ops. If there are excessive resize operations, the sga might simply be sized too small.
I also got hand of the alert logs - as the instances were frequently crashing, I could see the startup in the alert.log, showing me sga settings.
And I kid you not - these 2 nodes were configure with SGA_TARGET=256M !!! I was surprise this thing even started !
Bumping up sga_target basically solved all issues.
This puppy was running in production for a month, had lots of issues and yet, the dba hadn't even bothered to check even the most basics settings.
Instead, they just blamed the app, opened a call with oracle, getting some general advise and be done with it. Amazing...
awk -f ass109.awk tracefile.trc > out
Starting Systemstate 1
..............................................................................
.................................................................
Ass.Awk Version 1.0.9 - Processing xyz_ora_4948.trc
System State 1
~~~~~~~~~~~~~~~~
1:                                      
2:  waiting for 'pmon timer'            
3:  waiting for 'DIAG idle wait'        
4:  waiting for 'rdbms ipc message'     
5:  waiting for 'rdbms ipc message'    [Latch received-location:] 
6:  waiting for 'ges remote message'    
7:  waiting for 'gcs remote message'    
8:  waiting for 'gcs remote message'    
9:  waiting for 'gcs remote message'    
10: waiting for 'gcs remote message'    
11: waiting for 'gcs remote message'    
12: waiting for 'gcs remote message'    
13: last wait for 'rdbms ipc message'   
14: waiting for 'rdbms ipc message'     
15: waiting for 'rdbms ipc message'     
16: waiting for 'rdbms ipc message'     
17: waiting for 'rdbms ipc message'     
18: waiting for 'rdbms ipc message'     
19: waiting for 'SGA: allocation forcing component growth' 
20: waiting for 'enq: DR - contention' [Enqueue DR-00000000-00000000] 
21: waiting for 'SGA: allocation forcing component growth' 
22: waiting for 'SGA: allocation forcing component growth' 
23: waiting for 'SGA: allocation forcing component growth' 
24: waiting for 'rdbms ipc message'     
25: waiting for 'SGA: allocation forcing component growth' 
     Cmd: Select
26: waiting for 'ASM background timer'  
27: waiting for 'rdbms ipc message'     
28: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
29: waiting for 'rdbms ipc message'     
30: waiting for 'rdbms ipc message'     
31: waiting for 'Streams AQ: qmn coordinator idle wait' 
32: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
33: waiting for 'PX Deq: Parse Reply'   
34: waiting for 'SQL*Net message from client' 
35: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
36: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
37: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
38: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
39: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
40: waiting for 'SGA: allocation forcing component growth' 
     Cmd: Select
41: waiting for 'Streams AQ: qmn slave idle wait' 
42: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
43: for 'Streams AQ: waiting for time management or cleanup tasks' 
44: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
45: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
46: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
47: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
48: waiting for 'enq: PS - contention' [Enqueue PS-00000002-00000E0C] 
     Cmd: Select
49: waiting for 'class slave wait'      
50: waiting for 'SGA: allocation forcing component growth' 
     Cmd: PL/SQL Execute
51:                                     
52: waiting for 'PX Deq: Parse Reply'   
53: waiting for 'SGA: allocation forcing component growth' 
     Cmd: PL/SQL Execute
54: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
55: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
56: waiting for 'PX Deq: Parse Reply'   
57: waiting for 'SGA: allocation forcing component growth' 
58: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
59: waiting for 'SGA: allocation forcing component growth' 
     Cmd: Select
60: waiting for 'SGA: allocation forcing component growth' 
     Cmd: Select
61: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
62: waiting for 'PX Deq: Parse Reply'   
63: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
64: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
65: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
66: waiting for 'PX Deq: Parse Reply'   
67: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
68: waiting for 'SGA: allocation forcing component growth' 
69: last wait for 'ksdxexeotherwait'   [Rcache object=000007FF88BFD200,] 
70: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
71: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
72: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
73: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
74: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
75: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
76: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
77: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
78: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
79: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
80: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
81: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
82: waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
83: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
84: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
85: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
86: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
87: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
88: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
89: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
90: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
91: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
92: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
93: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
94: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
95: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
96: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
97: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
98: waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
99: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
100:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
101:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
102:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
103:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
104:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
105:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
106:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
107:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
108:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
109:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
110:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
111:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
112:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
113:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
114:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
115:waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
116:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
117:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
118:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
119:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
120:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
121:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
122:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
123:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
124:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
125:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
126:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
127:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
128:waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
129:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
130:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
131:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
132:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
133:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
134:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
135:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
136:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
137:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
138:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
139:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
140:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
141:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
142:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
143:waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
Blockers
~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
  systemstate.
                    Resource Holder State
    Latch received-location:    ??? Blocker
Enqueue DR-00000000-00000000    ??? Blocker
Rcache object=000007FF88BFD200,    57: waiting for 'SGA: allocation forcing component growth'
Enqueue PS-00000002-00000E0C    48: Self-Deadlock
Rcache object=000007FF88AC2658,    68: waiting for 'SGA: allocation forcing component growth'
Query Co-Ordinator to Query Slave Mapping
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
QC=  33:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0E)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  35:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0F)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0F)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  44: PS-01-0E0F   1   00000,         35:         44:  DTA,     FRE     STRE    0
QC=  42:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0D)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0E)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  46: PS-01-0E0E   1   00000,         42:         46:  DTA,     FRE     STRE    0
QC=  45:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E07)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E09)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  61: PS-01-0E09   1   00000,         45:         61:  DTA,     FRE     STRE    0
QC=  47:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0B)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0C)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  54: PS-01-0E0C   1   00000,         47:         54:  DTA,     FRE     STRE    0
QC=  48:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0C)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0D)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  52:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0A)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  55:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E08)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0A)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  58: PS-01-0E0A   1   00000,         55:         58:  DTA,     FRE     STRE    0
QC=  56:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E05)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  62:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E06)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  66:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E09)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
                         ------------------------
STATUS Key:
  DEQ = buffer has been dequeued
  EML = buffer on emergency message list
  ENQ = buffer has been enqueued
  FLST= buffer is on SGA freelist
  FRE = buffer is free (unused)
  GEB = buffer has been gotten for enqueuing
  GDB = dequeued buffer has been gotten 
  INV = buffer is invalid (non-existent)
  QUE = buffer on queue message list
  RCV = buffer has been received 
  NOFL= not on freelist (just removed)
                              ------------------------
Object Names
~~~~~~~~~~~~
Latch received-location:              last post received-location: kjmdrms
Enqueue DR-00000000-00000000                              
Rcache object=000007FF88BFD200,                              
Enqueue PS-00000002-00000E0C                              
Rcache object=000007FF88AC2658,                              
53697 Lines Processed.

Auto DOP And Direct-Path Inserts

This is just a short note about one of the potential side-effects of the new Auto Degree Of Parallelism (DOP) feature introduced in 11.2.

If you happen to have Parallel DML enabled in your session along with Auto DOP (and here I refer to the PARALLEL_DEGREE_POLICY = AUTO setting, not LIMITED) then it might take you by surprise that INSERT statements that are neither decorated with a parallel hint nor use any parallel enabled objects can be turned into direct-path inserts.

Now don't get me wrong - I think this is reasonable and in-line with the behaviour so far because you have enabled parallel DML and Auto DOP therefore is eligible to make use of that feature. According to the documentation the default mode of parallel inserts is direct-path, so Auto DOP simply follows the documented behaviour when deciding to use parallel DML. Note that depending on the data volume to insert you can even end up with a serial direct-path insert combined with a parallel query part.

It is just that you need to be aware of the fact that a simple INSERT INTO ... SELECT FROM on serial objects might turn into a direct-path insert.

The main caveat to watch out for is that the direct-path insert won't re-use any space available in the existing blocks of the segment but will always allocate blocks above the current High Water Mark (HWM).

So if you use this feature along with some application logic that deletes rows from a segment then by enabling Auto DOP you might end up with an unreasonable segment growth that can have all kinds of nasty side effects.

Another side effect of this is more obvious: An existing application logic might break because it attempts to re-access the object after the now direct-path insert within the the same transaction which will end up with an "ORA-12838: cannot read/modify an object after modifying it in parallel".

If you still want to make use of parallel DML but need to be able to re-use available space in existing blocks you can try to explicitly specify the NOAPPEND hint that still allows parallel AUTO to be used but will prevent the direct-path insert mode for both serial and parallel inserts - 11g introduced the parallel conventional insert, by the way.

Here is a small test case to demonstrate the behaviour:

set echo on linesize 200 pagesize 0 trimspool on tab off

drop table t;

purge table t;

create table t
as
select
rownum as id
, rpad('x', 100) as vc1
from
dual
connect by level <= 1000000
;

exec dbms_stats.gather_table_stats(null, 'T')

alter session enable parallel dml;

alter session set parallel_degree_policy = manual;

insert into t select * from t where rownum <= 1000;

select * from table(dbms_xplan.display_cursor(null, null));

select count(*) from t;

commit;

alter session set parallel_degree_policy = auto;

insert into t select * from t where rownum <= 1000;

select * from table(dbms_xplan.display_cursor(null, null));

select count(*) from t;

commit;

insert /*+ noappend */ into t select * from t where rownum <= 1000;

select * from table(dbms_xplan.display_cursor(null, null));

select count(*) from t;

commit;

And here is the output I get from 11.2.0.1:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> drop table t;

Table dropped.

SQL>
SQL> purge table t;

Table purged.

SQL>
SQL> create table t
2 as
3 select
4 rownum as id
5 , rpad('x', 100) as vc1
6 from
7 dual
8 connect by level <= 1000000
9 ;

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'T')

PL/SQL procedure successfully completed.

SQL>
SQL> alter session enable parallel dml;

Session altered.

SQL>
SQL> alter session set parallel_degree_policy = manual;

Session altered.

SQL>
SQL> insert into t select * from t where rownum <= 1000;

1000 rows created.

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));
SQL_ID 95x60r5k6mhka, child number 0
-------------------------------------
insert into t select * from t where rownum <= 1000

Plan hash value: 508354683

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 4200 (100)| |
| 1 | LOAD TABLE CONVENTIONAL | | | | | |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS FULL | T | 1000K| 101M| 4200 (1)| 00:00:51 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM<=1000)

20 rows selected.

SQL>
SQL> select count(*) from t;
1001000

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> alter session set parallel_degree_policy = auto;

Session altered.

SQL>
SQL> insert into t select * from t where rownum <= 1000;

1000 rows created.

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));
SQL_ID 95x60r5k6mhka, child number 2
-------------------------------------
insert into t select * from t where rownum <= 1000

Plan hash value: 482288532

-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 2326 (100)| | | | |
| 1 | LOAD AS SELECT | | | | | | | | |
|* 2 | COUNT STOPKEY | | | | | | | | |
| 3 | PX COORDINATOR | | | | | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10000 | 1000K| 101M| 2326 (1)| 00:00:28 | Q1,00 | P->S | QC (RAND) |
|* 5 | COUNT STOPKEY | | | | | | Q1,00 | PCWC | |
| 6 | PX BLOCK ITERATOR | | 1000K| 101M| 2326 (1)| 00:00:28 | Q1,00 | PCWC | |
|* 7 | TABLE ACCESS FULL| T | 1000K| 101M| 2326 (1)| 00:00:28 | Q1,00 | PCWP | |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM<=1000)
5 - filter(ROWNUM<=1000)
7 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 2

30 rows selected.

SQL>
SQL> select count(*) from t;
select count(*) from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> insert /*+ noappend */ into t select * from t where rownum <= 1000;

1000 rows created.

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));
SQL_ID dv79tpggm6q4k, child number 1
-------------------------------------
insert /*+ noappend */ into t select * from t where rownum <= 1000

Plan hash value: 2717876046

------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 2326 (100)| | | | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | | | | |
|* 2 | COUNT STOPKEY | | | | | | | | |
| 3 | PX COORDINATOR | | | | | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10000 | 1000K| 101M| 2326 (1)| 00:00:28 | Q1,00 | P->S | QC (RAND) |
|* 5 | COUNT STOPKEY | | | | | | Q1,00 | PCWC | |
| 6 | PX BLOCK ITERATOR | | 1000K| 101M| 2326 (1)| 00:00:28 | Q1,00 | PCWC | |
|* 7 | TABLE ACCESS FULL | T | 1000K| 101M| 2326 (1)| 00:00:28 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM<=1000)
5 - filter(ROWNUM<=1000)
7 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 2

30 rows selected.

SQL>
SQL> select count(*) from t;
1003000

SQL>
SQL> commit;

Commit complete.

SQL>

Notice how the insert turns into a direct-path insert with Auto DOP and how the subsequent query fails.

As already mentioned, the automatic conversion to direct-path insert with Auto DOP can only been seen when Parallel DML is enabled in the session.

Mystats utility

A variation on Jonathan Lewis's SNAP_MY_STATS package to report the resource consumption of a unit of work between two snapshots. Designed to work under constrained developer environments, this version has enhancements such as time model statistics and the option to report on specific statistics. ***Update*** Now available in two formats: 1) as a PL/SQL package and 2) as a free-standing SQL*Plus script (i.e. no installation/database objects needed). June 2007 (updated October 2011)

Runstats utility

A variation on Tom Kyte's invaluable RUNSTATS utility that compares the resource consumption of two alternative units of work. Designed to work under constrained developer environments and builds on the original with enhancements such as "pause and resume" functionality, time model statistics and the option to report on specific statistics. ***Update*** Now available in two formats: 1) as a PL/SQL package and 2) as a free-standing SQL*Plus script (i.e. no installation/database objects needed). January 2007 (updated October 2011)