Search

OakieTags

Who's online

There are currently 0 users and 23 guests online.

Recent comments

Oakies Blog Aggregator

Catch-up

Looking back at the comments audit trail on wordpress I see that I’ve got nearly two months of comments waiting for answers – and some of those answers need serious thought that I haven’t had time to supply yet. But I’ve got a (fairly) free day on Thursday so I’ll see what I can do to bring the comments up to date.

I’ve just had a great day at the Trivadis CBO days – Maria Colgan (the optimizer lady) on the 20th anniversary of the CBO, then Joze Senegacnik on transformations, me on Strategies for Statistics in 11g, ending with Randolf Geist on Parallel Execution (including a reference to one of my older blog items which I now think could well be wrong – so I may have to spend Thursday reviewing it and looking at his analysis instead of working through the comments).

Update 13th Dec

Well, it’s taken me all day – but I seem to have caught up with the comment trail. I’ve probably missed some from the preceding weeks, though, so I’ll be trawling backwards for a little while yet.

I have had time, though, to do a little more work on the parallel query and hash join buffered blog – and it’s very interesting, and I’m definitely wrong, so I’m going to have to prioritise the job of correcting it.

AMIS vat Oracle OpenWorld 2012 samen in Speciale Whitepaper

Als sluitstuk van de jaarlijkse Oracle OpenWorld conferentie brengt AMIS een whitepaper uit. Een handzaam document waarin het volledige verhaal van Oracle OpenWorld 2012 is gebundeld. Een team van AMIS was tijdens de conferentie in oktober nadrukkelijk aanwezig; als sponsor, deelnemer, netwerker en spreker – en als aandachtig luisteraar en analist. Zeven Oracle-specialisten van AMIS

Read More...

Viewing Figures

It’s time for another landmark. The blog hit 3,000,000 views on 30th November  - but I’ve been too busy to do my usual analysis of posts. The latest half million took 42 weeks (five weeks less than the previous half million increment) and here are the latest stats on popular views:

First the top five across the lifetime of the blog (excluding the home page and opening menu) and the old faviourites haven’t changed:

AWR / Statspack menu 57,477 Updated from time to time
NOT IN 40.952 Feb 2007
Cartesian Merge Join 33,367 Dec 2006
dbms_xplan in 10g 33,279 Dec 2006
SQL Profiles (10g) 25,219 Feb 2007

If we look at just the last year, the top five are:

AWR / Statspack menu 19.307 Updated from time to time
Oracle Core 9.996 Dec 2011
Shrink Tablespace 8,788 Feb 2010
AWR Reports 8,548 Feb 2011
NOT IN 8,026 Feb 2007

A couple of old favourites (dbms_xplan in 10g and SQL Profiles (10g) have finally fallen off the top slots, and Cartesian Merge Join didn’t make it back after slipping last year – but it’s interesting (and not entirely surprising) to see how NOT IN is still coming up after so many years.

The other statistic of note is the number of followers the blog has, currently 1,197 (although it was 1,185 on 30th Nov – I probably picked up a few more immediately after the UKOUG annual conference).

 

Hash Join Buffered


Introduction

A few years ago Jonathan Lewis published a blog post that described one of the interesting side effects of Oracle's Parallel Execution implementation: Sometimes operations that usually are non-blocking will be turned into blocking ones. Mostly these are represented by additional BUFFER SORT operation that show up in the parallel version of an execution plan from 10g on (pre-10g does the same internally but doesn't show in the execution plan), but there is a special case which is the HASH JOIN BUFFERED operation that gets used with the HASH data distribution of the join row sources.Jonathan came to the conclusion that the HASH JOIN BUFFERED turns into a blocking operation by buffering the result set of the join by looking at different trace files (the 10104 hash join trace file and 10046 extended trace) and the fact that when the join produced no result (no matches between the two row sources) the obvious spill to disk didn't happen anymore.He showed this using an example that simply joined two tables using Parallel Execution returning the result of the join to the client. Another interesting point of this example is that the BUFFERed operation takes place although it is not entirely obvious why if you follow the explanation that "at most two Parallel Slave Sets can be active per Data Flow Operation" and hence sometimes both sets are busy and a third set would be required to receive the data produced by the other two. This is however not the case with this simple example, as the data produced by the two sets simply needs to be returned to the client by sending it to the Query Coordinator.While I prepared my "Parallel Execution" seminar I wanted to demonstrate this via a simple test case where two rather small tables are joined but result in a quite large result set. Since the HASH JOIN BUFFERED is supposed to buffer this large result set before returning it to the client this should make this special behaviour quite obvious.

Test Case Results

However the test case showed some interesting results that seemed to suggest that Jonathan's conclusions weren't entirely correct:1. Although the 10104 hash join trace file reads "BUFFER(Compile) output of the join for Parallel Query" and "BUFFER(Execution) output of the join for PQ" it looks like the HASH JOIN BUFFERED operation in fact does buffer the second row source rather than the result of the join operation. I'll demonstrate below why I believe this is so2. Although I could reproduce Jonathan's test case, in particular that no spill to disk takes place when the two row sources do not match, I believe that the point of the HASH JOIN BUFFERED is that it can take advantage of the fact that the hash table for the first row source is already built when the second row source is accessed. So in principle it looks like the operation only buffers data from the second row source that has a match in the hash table. Data that doesn't match isn't buffered - that's the special functionality of the HASH JOIN BUFFERED that makes it different from separate BUFFER SORT operations that buffer unconditionally and allows to explain why no obvious buffering takes place if the two row sources don't match.3. When looking into the 10104 HASH JOIN trace file it becomes obvious that the spilling to disk of the odd 9th partition as described in Jonathan's post takes place before the actual probe phase seems to begin (kxhfSetPhase: phase=PROBE_2), which again I believe suggests that it cannot be the result set that gets buffered, since this will only be produced as soon as the probe phase begins4. The implementation restriction of Oracle's Parallel Execution that requires these additional, artificial blocking operations does not seem to be "at most two Parallel Slave Sets can be active at the same time", but more precisely it seems to be "At most one data distribution can be active at the same time"This includes the final data distribution to the Query Coordinator process in case of queries and explains why the simple case of a two table join using HASH distribution results in a BUFFERED operation: The PX SEND HASH operation for the second row source would have to be active at the same time as the PX SEND QC operation returning data to the Query Coordinator as the HASH JOIN by default is only blocking when consuming the first row source to build the hash table but it isn't blocking when processing the second row source probing the hash table.Since it doesn't seem to be supported to have two PX SEND operations active at the same time, some artificial blocking operation needs to be introduced, in this case the HASH JOIN BUFFERED, that first consumes the second row source completely before starting the actual probe phase. By doing so, the PX SEND operation used to distribute the second row source to the Parallel Slaves performing the hash join is no longer active when the actual probe phase starts and therefore the result set can be produced and sent to the Query Coordinator using the then only active PX SEND QC operation.The following formatted execution plan highlights the two PX SEND operations that would have to be active at the same time if there wasn't a blocking operation in between:


-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | TQ |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | |
| 1 | COUNT | | 1 | | | |
|* 2 | FILTER | | 1 | | | |
| 3 | PX COORDINATOR | | 1 | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10002 | 0 | Q1,02 | P->S | QC (RAND) | <---------
| 5 | VIEW | | 2 | Q1,02 | PCWP | |
|* 6 | HASH JOIN BUFFERED | | 2 | Q1,02 | PCWP | | <--------- Artificial blocking
| 7 | PX RECEIVE | | 2 | Q1,02 | PCWP | |
| 8 | PX SEND HASH | :TQ10000 | 0 | Q1,00 | P->P | HASH |
| 9 | PX BLOCK ITERATOR | | 2 | Q1,00 | PCWC | |
|* 10 | TABLE ACCESS FULL| T2 | 26 | Q1,00 | PCWP | |
| 11 | PX RECEIVE | | 2 | Q1,02 | PCWP | |
| 12 | PX SEND HASH | :TQ10001 | 0 | Q1,01 | P->P | HASH | <---------
| 13 | PX BLOCK ITERATOR | | 2 | Q1,01 | PCWC | |
|* 14 | TABLE ACCESS FULL| T4 | 26 | Q1,01 | PCWP | |
-------------------------------------------------------------------------------------


Test Case Details

Here are the preparation steps for the simple test case I used:


create table t2
compress
as
select
rownum as id
, mod(rownum, 1000) + 1 as fk
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;

exec dbms_stats.gather_table_stats(null, 't2')

-- Create a copy of T2
create table t4
compress as
select * from t2;

-- Add rows to T4 that don't match T2
-- This shows that the BUFFERED operation discards those non-matching rows
-- rather than buffering them when using T4 as second row source
insert /*+ append */
into t4
select
1000000 + rownum as id
, 1000 + mod(rownum, 1000) + 1 as fk
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;

commit;

exec dbms_stats.gather_table_stats(null, 't4')

alter table t2 parallel 2;

alter table t4 parallel 2;

And this is basically the simple script that gets executed:


set echo on timing on time on

alter session set workarea_size_policy = manual;

alter session set hash_area_size = 80000000;
alter session set sort_area_size = 80000000;

alter session set statistics_level = all;

alter session set tracefile_identifier = 'HJ_trace1';

alter session set events '10104 trace name context forever, level 10';

--explain plan for
--select max(a_filler), max(b_filler) from (
select /* HJ_trace1 */ * from (
select /*+ no_merge use_hash(a b) no_cpu_costing leading(a b) pq_distribute(b, hash, hash) */
a.filler as a_filler, b.filler as b_filler
from
t2 a
, t4 b
where
-- a.id = b.id
a.fk = b.fk
)
where rownum > 1
;

alter session set events '10104 trace name context off';

set linesize 200 pagesize 0

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS ALL ADVANCED'));

I've used manual workarea sizing to make the test repeatable.Note that I've added a comment to the query that should make the query text unique in order to generate a new parent cursor for each test run (so you would need to modify this comment for each test run). The only reason for this is the limitation of DBMS_XPLAN.DISPLAY_CURSOR with Parallel Execution as outlined in one of my previous posts, otherwise the "ALLSTATS ALL" option of DISPLAY_CURSOR would aggregate the statistics over all executions rather than only the last parallel execution.Because I also tested some other costing related issues I disabled CPU costing for this test, however the results should be exactly the same when enabling CPU costing.So there are basically two sets of data: T2 is 1,000K rows and approx. 100MB in size, and T4 is twice the size and rows, however only 100MB out of the 200MB represent data that matches T2 on either ID or FK.The result set is approx. 200 bytes per row, so for example 200MB if 1,000K rows are produced.The tables are compressed using BASIC compression, which results in this case in a very good compression ratio as the FILLER column is 100 bytes in size but only has one distinct value and therefore can benefit a lot from the symbol replacement performed by BASIC compression.The point of this compression is that it makes in this particular case very obvious that while you can benefit a lot from compression at storage, I/O level and Buffer Cache in general, at SQL execution time Oracle has to process uncompressed row sources (and not compressed blocks), so all the workareas that have to be used for hash tables, sorts or simple buffering won't benefit from the compression but have to be big enough for the uncompressed row source data. Likewise any data that gets distributed using PX SEND also represents uncompressed data volume.There are two important variations possible to this test case query:1. Instead of joining on ID which produces 1,000K rows join the two tables on the FK column which results in a huge result set of 1,000M rows (each row matches 1,000 rows from the other row source)2. Use this variation of the query that doesn't require an artificial blocking due to the fact that a true blocking operation gets used:


select max(a_filler), max(b_filler) from (
select /*+ no_merge use_hash(a b) no_cpu_costing leading(a b) pq_distribute(b, hash, hash) */
a.filler as a_filler, b.filler as b_filler
from
t2 a
, t4 b
where
-- a.id = b.id
a.fk = b.fk
)
;

This way the impact of the BUFFERED operation can be easily separated from other potential activity like a too small workarea for an optimal hash join. If the operation completes without TEMP I/O activity when using a true blocking operation but spills to TEMP when running in BUFFERED mode then the I/O activity very likely comes from the additional buffering of data.

Detailed Analysis


Small Result Set

Running the variation in BUFFERED mode where only 1,000K rows get produced, the following can be seen from the DBMS_XPLAN.DISPLAY_CURSOR output (assuming you don't run this cross-instance in RAC as DBMS_XPLAN can only show relevant statistics for the local instance), using a block size of 8KB and a 32bit version of Oracle:


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 3053 | | | | 0 |00:00:27.10 | 10 | 0 | 0 | | | | |
| 1 | COUNT | | 1 | | | | | | | 0 |00:00:27.10 | 10 | 0 | 0 | | | | |
|* 2 | FILTER | | 1 | | | | | | | 0 |00:00:27.10 | 10 | 0 | 0 | | | | |
| 3 | PX COORDINATOR | | 1 | | | | | | | 1000K|00:00:25.45 | 10 | 0 | 0 | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 1001K| 99M| 3053 | Q1,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 5 | VIEW | | 2 | 1001K| 99M| 3053 | Q1,02 | PCWP | | 1000K|00:00:48.29 | 0 | 13208 | 13208 | | | | |
|* 6 | HASH JOIN BUFFERED | | 2 | 1001K| 203M| 3053 | Q1,02 | PCWP | | 1000K|00:00:44.91 | 0 | 13208 | 13208 | 130M| 8379K| 2/0/0| 54272 |
| 7 | PX RECEIVE | | 2 | 1000K| 101M| 171 | Q1,02 | PCWP | | 1000K|00:00:04.07 | 0 | 0 | 0 | | | | |
| 8 | PX SEND HASH | :TQ10000 | 0 | 1000K| 101M| 171 | Q1,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 9 | PX BLOCK ITERATOR | | 2 | 1000K| 101M| 171 | Q1,00 | PCWC | | 1000K|00:00:06.15 | 2066 | 2014 | 0 | | | | |
|* 10 | TABLE ACCESS FULL| T2 | 26 | 1000K| 101M| 171 | Q1,00 | PCWP | | 1000K|00:00:02.29 | 2066 | 2014 | 0 | | | | |
| 11 | PX RECEIVE | | 2 | 2000K| 204M| 352 | Q1,02 | PCWP | | 1000K|00:00:16.85 | 0 | 0 | 0 | | | | |
| 12 | PX SEND HASH | :TQ10001 | 0 | 2000K| 204M| 352 | Q1,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 13 | PX BLOCK ITERATOR | | 2 | 2000K| 204M| 352 | Q1,01 | PCWC | | 2000K|00:00:12.68 | 4212 | 4158 | 0 | | | | |
|* 14 | TABLE ACCESS FULL| T4 | 26 | 2000K| 204M| 352 | Q1,01 | PCWP | | 2000K|00:00:05.09 | 4212 | 4158 | 0 | | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The statement got executed with a Parallel Degree of 2. Notice that only 2K (16MB) resp. 4K (32MB) blocks were processed for reading the two row sources, but the hash join had to read/write two times 50MB (100MB uncompressed data volume in total). If it was the result set that got buffered I would expect it to read/write 200MB in total.If you repeat this variation with the true blocking operation then it shouldn't spill to disk at all, as indicated above by the O=2 (two optimal hash joins) in the column "O/1/M", which is interesting on its own, since it confirms that the HASH JOIN operated in optimal mode albeit the fact that it spilled to disk due to the BUFFERED operation.

Large Result Set

Running the variation in BUFFERED mode that joins on FK and therefore produces a huge result set looks like this (cancelled after a short while, so not run to completion in this case here):


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 58314 | | | | 0 |00:00:00.01 | 10 | 0 | 0 | | | |
| 1 | COUNT | | 1 | | | | | | | 0 |00:00:00.01 | 10 | 0 | 0 | | | |
|* 2 | FILTER | | 1 | | | | | | | 0 |00:00:00.01 | 10 | 0 | 0 | | | |
| 3 | PX COORDINATOR | | 1 | | | | | | | 3694K|00:00:35.06 | 10 | 0 | 0 | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 1000M| 96G| 58314 | Q1,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 5 | VIEW | | 2 | 1000M| 96G| 58314 | Q1,02 | PCWP | | 3694K|00:01:03.14 | 0 | 508 | 12827 | | | |
|* 6 | HASH JOIN BUFFERED | | 2 | 1000M| 195G| 58314 | Q1,02 | PCWP | | 3694K|00:00:49.43 | 0 | 508 | 12827 | 133M| 8387K| 2/0/0|
| 7 | PX RECEIVE | | 2 | 1000K| 100M| 171 | Q1,02 | PCWP | | 1000K|00:00:09.37 | 0 | 0 | 0 | | | |
| 8 | PX SEND HASH | :TQ10000 | 0 | 1000K| 100M| 171 | Q1,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 9 | PX BLOCK ITERATOR | | 2 | 1000K| 100M| 171 | Q1,00 | PCWC | | 1000K|00:00:06.77 | 2066 | 2014 | 0 | | | |
|* 10 | TABLE ACCESS FULL| T2 | 26 | 1000K| 100M| 171 | Q1,00 | PCWP | | 1000K|00:00:02.87 | 2066 | 2014 | 0 | | | |
| 11 | PX RECEIVE | | 2 | 2000K| 200M| 352 | Q1,02 | PCWP | | 2000K|00:00:16.63 | 0 | 0 | 0 | | | |
| 12 | PX SEND HASH | :TQ10001 | 0 | 2000K| 200M| 352 | Q1,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 13 | PX BLOCK ITERATOR | | 2 | 2000K| 200M| 352 | Q1,01 | PCWC | | 2000K|00:00:12.80 | 4212 | 4158 | 0 | | | |
|* 14 | TABLE ACCESS FULL| T4 | 26 | 2000K| 200M| 352 | Q1,01 | PCWP | | 2000K|00:00:05.25 | 4212 | 4158 | 0 | | | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

There are a few interesting points to mention:1. If the query is executed without the outer query that filters all data, the first rows are returned pretty quickly. If it was the result set that got buffered, this shouldn't be the case here, instead a huge TEMP space usage should be observed until finally the result set is returned to the parent operation/client2. The second row source is consumed completely before the join operation is completed and a steady TEMP read activity can be observed while the data is returned to the client3. The data volume written to TEMP corresponds roughly to what was written to TEMP in the first example, and stays like that during the whole execution. It doesn't increase any more during the join operation.The difference in TEMP usage of the first example might come from the fact that I've used 75000000 as workarea size for some of my initial test runs and therefore the output above comes from such an early run.So this variation pretty clearly shows that it is not the result set that gets buffered. It looks like the second row source is what gets buffered, as it is already consumed completely before the join is completed.

Large Result Set, true blocking operation

Repeating the same variant with a true blocking operation that doesn't require the BUFFERED mode of the HASH JOIN, looks like that (again not run to completion here):


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | O/1/M |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 58441 | | | | 0 |00:00:00.05 | 10 | 0 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 104 | | | | | 0 |00:00:00.05 | 10 | 0 | | | |
| 2 | PX COORDINATOR | | 1 | | | | | | | 0 |00:00:00.05 | 10 | 0 | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 1 | 104 | | Q1,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | | | |
| 4 | SORT AGGREGATE | | 2 | 1 | 104 | | Q1,02 | PCWP | | 0 |00:00:00.01 | 0 | 0 | | | |
| 5 | VIEW | | 2 | 1000M| 96G| 58441 | Q1,02 | PCWP | | 11M|00:01:14.09 | 0 | 0 | | | |
|* 6 | HASH JOIN | | 2 | 1000M| 195G| 58441 | Q1,02 | PCWP | | 11M|00:00:37.73 | 0 | 0 | 133M| 8133K| 2/0/0|
| 7 | PX RECEIVE | | 2 | 1000K| 100M| 171 | Q1,02 | PCWP | | 1000K|00:00:09.03 | 0 | 0 | | | |
| 8 | PX SEND HASH | :TQ10000 | 0 | 1000K| 100M| 171 | Q1,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | | | |
| 9 | PX BLOCK ITERATOR | | 2 | 1000K| 100M| 171 | Q1,00 | PCWC | | 1000K|00:00:06.69 | 2066 | 2014 | | | |
|* 10 | TABLE ACCESS FULL| T2 | 26 | 1000K| 100M| 171 | Q1,00 | PCWP | | 1000K|00:00:02.89 | 2066 | 2014 | | | |
| 11 | PX RECEIVE | | 2 | 2000K| 200M| 352 | Q1,02 | PCWP | | 11309 |00:00:00.05 | 0 | 0 | | | |
| 12 | PX SEND HASH | :TQ10001 | 0 | 2000K| 200M| 352 | Q1,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | | | |
| 13 | PX BLOCK ITERATOR | | 2 | 2000K| 200M| 352 | Q1,01 | PCWC | | 12602 |00:00:00.13 | 29 | 46 | | | |
|* 14 | TABLE ACCESS FULL| T4 | 2 | 2000K| 200M| 352 | Q1,01 | PCWP | | 12602 |00:00:00.09 | 29 | 46 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Notice the difference: No TEMP activity, and the second row source gets gradually consumed as the join processes and data is returned to the parent operations.

BUFFERED vs. BUFFER SORT

In order to demonstrate the cunning optimization the HASH JOIN BUFFERED can perform, let's use a different distribution method that results in the second row source getting distributed via BROADCAST and therefore requires in this case here an artificial, separate BUFFER SORT operation when receiving the broadcasted data (as otherwise again two PX SEND operations would be active at the same time):


select /* HJ_trace1 */ * from (
select /*+ no_merge use_hash(a b) no_cpu_costing leading(a b) pq_distribute(b, none, broadcast) */
a.filler as a_filler, b.filler as b_filler
from
t2 a
, t4 b
where
a.id = b.id
-- a.fk = b.fk
)
where rownum > 1

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2922 | | | | 0 |00:00:51.64 | 10 | 0 | 0 | | | | |
| 1 | COUNT | | 1 | | | | | | | 0 |00:00:51.64 | 10 | 0 | 0 | | | | |
|* 2 | FILTER | | 1 | | | | | | | 0 |00:00:51.64 | 10 | 0 | 0 | | | | |
| 3 | PX COORDINATOR | | 1 | | | | | | | 1000K|00:00:50.10 | 10 | 0 | 0 | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10001 | 0 | 1001K| 99M| 2922 | Q1,01 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 5 | VIEW | | 2 | 1001K| 99M| 2922 | Q1,01 | PCWP | | 1000K|00:01:37.72 | 2070 | 58364 | 56350 | | | | |
|* 6 | HASH JOIN | | 2 | 1001K| 203M| 2922 | Q1,01 | PCWP | | 1000K|00:01:34.58 | 2070 | 58364 | 56350 | 129M| 8125K| 2/0/0| |
| 7 | PX BLOCK ITERATOR | | 2 | 1000K| 101M| 171 | Q1,01 | PCWC | | 1000K|00:00:05.48 | 2066 | 2014 | 0 | | | | |
|* 8 | TABLE ACCESS FULL | T2 | 26 | 1000K| 101M| 171 | Q1,01 | PCWP | | 1000K|00:00:02.30 | 2066 | 2014 | 0 | | | | |
| 9 | BUFFER SORT | | 2 | | | | Q1,01 | PCWC | | 4000K|00:01:02.00 | 4 | 56350 | 56350 | 495M| 7329K| 2/0/0| 221K|
| 10 | PX RECEIVE | | 2 | 2000K| 204M| 352 | Q1,01 | PCWP | | 4000K|00:00:21.14 | 0 | 0 | 0 | | | | |
| 11 | PX SEND BROADCAST | :TQ10000 | 0 | 2000K| 204M| 352 | Q1,00 | P->P | BROADCAST | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 12 | PX BLOCK ITERATOR | | 2 | 2000K| 204M| 352 | Q1,00 | PCWC | | 2000K|00:00:12.81 | 4212 | 4158 | 0 | | | | |
|* 13 | TABLE ACCESS FULL| T4 | 26 | 2000K| 204M| 352 | Q1,00 | PCWP | | 2000K|00:00:04.83 | 4212 | 4158 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Notice how now the complete T4 row source was buffered twice (that's the side effect of broadcasting it to each Parallel Slave, in this case with degree 2), resulting in more than four times more TEMP space usage than in the BUFFERED variant. So the separate BUFFER SORT operation obviously wasn't able to avoid the buffering of the data from T4 that doesn't match T2 (in which case it should have buffered only approx. 100MB of data twice), whereas the HASH JOIN BUFFERED simply discarded that data from T4 immediately without bothering to buffer it.It is also interesting to note that the BUFFER SORT operation was reported as "optimal", although it obviously spilled to disk (roughly two times 221MB, the 221K is the old defect of DBMS_XPLAN.DISPLAY_CURSOR to report the TEMP usage using the wrong unit)

Very Small Result Set

I also did a complementary test where the result set generated is much smaller than the second row source, just in case there is another cunning optimization in place that could decide to buffer either the second row source or the result set depending on which is estimated to be smaller.


select /* HJ_trace2 */ * from (
select /*+ no_merge use_hash(a b) no_cpu_costing leading(a b) pq_distribute(b, hash, hash) */
a.id as a_id, b.id as b_id
from
t2 a
, t4 b
where
a.id = b.id
a.filler = b.filler
)
where rownum > 1

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2922 | | | | 0 |00:00:27.73 | 10 | 0 | 0 | | | | |
| 1 | COUNT | | 1 | | | | | | | 0 |00:00:27.73 | 10 | 0 | 0 | | | | |
|* 2 | FILTER | | 1 | | | | | | | 0 |00:00:27.73 | 10 | 0 | 0 | | | | |
| 3 | PX COORDINATOR | | 1 | | | | | | | 1000K|00:00:26.08 | 10 | 0 | 0 | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 1001K| 24M| 2922 | Q1,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 5 | VIEW | | 2 | 1001K| 24M| 2922 | Q1,02 | PCWP | | 1000K|00:00:51.28 | 0 | 12319 | 12319 | | | | |
|* 6 | HASH JOIN BUFFERED | | 2 | 1001K| 203M| 2922 | Q1,02 | PCWP | | 1000K|00:00:47.69 | 0 | 12319 | 12319 | 130M| 8379K| 2/0/0| 51200 |
| 7 | PX RECEIVE | | 2 | 1000K| 101M| 171 | Q1,02 | PCWP | | 1000K|00:00:09.69 | 0 | 0 | 0 | | | | |
| 8 | PX SEND HASH | :TQ10000 | 0 | 1000K| 101M| 171 | Q1,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 9 | PX BLOCK ITERATOR | | 2 | 1000K| 101M| 171 | Q1,00 | PCWC | | 1000K|00:00:06.87 | 2066 | 2014 | 0 | | | | |
|* 10 | TABLE ACCESS FULL| T2 | 26 | 1000K| 101M| 171 | Q1,00 | PCWP | | 1000K|00:00:02.91 | 2066 | 2014 | 0 | | | | |
| 11 | PX RECEIVE | | 1 | 2000K| 204M| 352 | Q1,02 | PCWP | | 2000K|00:00:16.74 | 0 | 0 | 0 | | | | |
| 12 | PX SEND HASH | :TQ10001 | 0 | 2000K| 204M| 352 | Q1,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 13 | PX BLOCK ITERATOR | | 2 | 2000K| 204M| 352 | Q1,01 | PCWC | | 2000K|00:00:12.43 | 4212 | 4158 | 0 | | | | |
|* 14 | TABLE ACCESS FULL| T4 | 26 | 2000K| 204M| 352 | Q1,01 | PCWP | | 2000K|00:00:04.77 | 4212 | 4158 | 0 | | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

So although the result set is estimated only to be 24MB in size in this case the amount of data that spilled to disk is still roughly 100MB, which seems to suggest that it is always the second row source that gets buffered.

10104 Trace File Snippet

Finally, a sample extract from the 10104 hash join trace file showing that spilling to disk takes place before the PROBE_2 phase begins:


.
.
.
kxhfWrite: hash-join is spilling to disk
kxhfWrite: Writing dba=130176 slot=56 part=8
kxhfIsDone: waiting slot=56 lbs=000000001CCB7308
kxhfWrite: Writing dba=129920 slot=57 part=8
kxhfIsDone: waiting slot=57 lbs=000000001CCB7308
.
.
.
kxhfIsDone: waiting slot=61 lbs=000000001CD37308
kxhfWrite: Writing dba=118528 slot=62 part=8

*** 2012-11-15 18:27:12.706
qerhjFetch: max probe row length (mpl=0)
kxhfSetPhase: phase=PROBE_2
qerhjFetch: PQ Partition rows:500826 clusters:56 in-memory slots 9
kxhfIsDone: waiting slot=62 lbs=000000001CD37308
kxhfWrite: Writing dba=118272 slot=63 part=8
kxhfWrite: Writing dba=118144 slot=64 part=8
kxhfRead(): Reading dba=130432 into slot=62
kxhfIsDone: waiting slot=62 lbs=000000001CD37308
kxhfIsDone: waiting slot=63 lbs=000000001CD37308
kxhfRead(): Reading dba=130304 into slot=63
.
.
.


Footnote

The tests were performed on recent (11.2) releases as well as 10.2.0.3 which is one of the versions Jonathan used for testing in his original post. All versions tested showed the same behaviour, so it doesn't look like the buffering of the second row source is a change that got introduced in recent releases.Furthermore please note that my OTN mini series on Parallel Execution that I wrote already a couple of months ago but was only published recently doesn't include this knowledge here and therefore explains the BUFFERED operation and the reasons for the blocking operations partially incorrectly.

Summary

The test case results show that the HASH JOIN BUFFERED operation seems to buffer the second row source. In principle it operates like a BUFFER SORT operation on the second row source but takes advantage of the fact that it only needs to buffer data that matches data in the first row source.The limitation why the artificial blocking operations are introduced seems to revolve around the fact that at most a single PX SEND operation can be active concurrently.

Dune…

I’ve been a fan of the film Dune for ages. I also liked the TV mini series. The Children of Dune TV mini series was a bit too random for me. A remake of Dune is in the offing. With that in mind it is a little surprising to me that I’ve never got round to reading the book until now.

I think it’s safe to say I loved it. It’s one of the coolest things I’ve read in ages. Watching the film and TV series did not spoil it for me in the slightest, and quite unusually, reading the book has not lessened my love of the film.

The characters are brilliant. There is a lot of wisdom in the book too. In a previous post I mentioned one quote. I could pick out loads that are incredibly well observed and interesting. It ticks so many of the boxes for me I’m just sad there are not more of the same quality. Having discussed this book with some of the Oakies, they’ve advised me the rest in the series are pretty bad and the books written by his son are terrible. I’m not sure if I will read more, because I’m not sure I want to risk the disappointment. I’ll think on it.

I’m moving on to Cold Days now. Go Harry!

Cheers

Tim…


Dune… was first posted on December 8, 2012 at 1:18 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Update Error

When doing updates with statements that use correlated subqueries, it’s important to make sure that your brackets (parentheses) are in the right place. Here are two statements that look fairly similar but have remarkably different results – and sometimes people don’t realise how different the statements are:

update t1
set
        padding = (
                select
                        t2.padding
                from    t2
                where   t2.id = t1.id
        )
where   t1.small_vc <= lpad(20,10,'0')
;

update t1
set
        padding = (
                select
                        t2.padding
                from    t2
                where   t2.id = t1.id
                and     t1.small_vc <= lpad(20,10,'0')
        )
;

The first statement will update rows in t1 where t1.small_vc <= lpad(20,10,’0′), copying the value of padding from t2 where (if) the id columns match, but setting t1.padding to null if there is no match.

The second statement will update every single row in t1 – some of the rows will probably be updated as expected (i.e. to get the same result as the first statement) but every row where t1.small_vc is greater than lpad(20,10,’0′) will have the padding column set to null.

Originally I created this little demo to respond to a question on OTN – but when I ran it with rowsource execution statistics enabled, I discovered that it also happened to demonstrate an odd bug in that feature – even in 11.2.0.3. Here are the two plans corresponding to the above two statements. The update is supposed to update 20 rows in t1, the incorrect form of the code will update all 10,000 rows in the table:


20 rows updated.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  73tz2pqv4z8rc, child number 0
-------------------------------------
update t1 set  padding = (   select    t2.padding   from t2   where
t2.id = t1.id  ) where t1.small_vc <= lpad(20,10,'0')

Plan hash value: 417405447

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |       |      1 |        |      0 |00:00:00.01 |     219 |
|   1 |  UPDATE                      | T1    |      1 |        |      0 |00:00:00.01 |     219 |
|*  2 |   TABLE ACCESS FULL          | T1    |      1 |     20 |     20 |00:00:00.01 |     173 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T2    |     20 |      1 |     20 |00:00:00.01 |      45 |
|*  4 |    INDEX UNIQUE SCAN         | T2_PK |     20 |      1 |     20 |00:00:00.01 |      25 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."SMALL_VC"<='0000000020')
   4 - access("T2"."ID"=:B1)

10000 rows updated.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  ax22zpj8xdwpn, child number 0
-------------------------------------
update t1 set  padding = (   select    t2.padding   from t2   where
t2.id = t1.id   and t1.small_vc <= lpad(20,10,'0')  )

Plan hash value: 2442374960

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |       |      3 |        |      0 |00:00:00.27 |   21576 |
|   1 |  UPDATE                       | T1    |      3 |        |      0 |00:00:00.27 |   21576 |
|   2 |   TABLE ACCESS FULL           | T1    |      3 |  10000 |  29692 |00:00:00.03 |     514 |
|*  3 |   FILTER                      |       |  19692 |        |     40 |00:00:00.01 |      85 |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2    |     40 |      1 |     40 |00:00:00.01 |      85 |
|*  5 |     INDEX UNIQUE SCAN         | T2_PK |     40 |      1 |     40 |00:00:00.01 |      45 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(:B1<='0000000020')
   5 - access("T2"."ID"=:B1)

Note, in both cases, how the number of rows reported in the update operation (line 1) is zero although we might expect it to match the A-rows value reported in the table access full operation (line 2) which tells us how many rows were identified for update. But something strange has happened in the line reporting the table access full in the second case – clearly there’s a bug somewhere because there were only 10,000 rows in the table, yet the tablescan has (apparently) identified 29,692 rows for update.

Not only is the tablescan line reporting silly numbers, the filter line is as well (because that should only start 10,000 times – once per row in the table) moreover the subquery lines (4 and 5) should only start 20 times each because the filter should eliminate the need to run the subquery for all rows except the 20 where “T1″.”SMALL_VC”<=’0000000020. Even stranger is the fact that when I ran this test several times in a row (recreating the tables each time), the starts and A-rows values for the filter and table access full lines changed (decreasing) every time!

It took me a little while to figure out what was going on – but the doubling of the starts for the subquery gave me a hint, confirmed when I took a snapshot of how the session stats changed when I ran the second update. It’s just another variant of the odd “write consistency” behaviour that can appear when running tablescan through a large volume of data and updating it (although I have to say that I hadn’t seen it happening for a mere 10,000 rows before).  Oracle starts the tablescan and update, and after a while decides that it has to rollback and restart the update – the 20 rows that were subject to the subquery were the first 20 rows of the table, which is why they were almost sure to trigger the subquery twice each, had they appeared half way through the table the effect might not have been so consistent).

Footnote:

If you want to repeat my test for yourselves, here’s the code to generate the data sets:

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

create table t2
as
select * from t1
;

alter table t1 add constraint t1_pk primary key(id);
alter table t2 add constraint t2_pk primary key(id);

DBMS_XPLAN.DISPLAY_CURSOR And Parallel Execution


Introduction

DBMS_XPLAN.DISPLAY_CURSOR can be used to get more insights into the actual resource consumption on execution plan operation level when using the GATHER_PLAN_STATISTICS hint (from 10g on), or increasing the STATISTICS_LEVEL parameter to ALL (on session level, on system level the overhead is probably prohibitive).As soon as a SQL execution is done (either successfully, cancelled or with error) the corresponding extended data in the child cursor gets populated/updated and the additional information about the actual runtime profile can be accessed via V$SQL_PLAN_STATISTICS resp. V$SQL_PLAN_STATISTICS_ALL - this is what DISPLAY_CURSOR uses to populate the additional columns in the formatted output of the execution plan.This works well for normal, serial execution where a single session performs the SQL execution and allows gathering extended information about the following statistics on execution plan line level:- Actual cardinalities vs. estimated cardinalities- Actual time spent on each operation (only reliable when using STATISTICS_LEVEL = ALL or setting "_rowsource_sample_freq" to 1 which can have a significant impact on runtime due to overhead)- Logical I/O- Physical I/O- Memory usage of workareas- TEMP space usageOne crucial information that is not available via this interface is how the actual time was spent (CPU vs. Wait Events).

Parallel Execution

However getting information about these Rowsource Statistics by default doesn't work well with Parallel Execution. When using the formatting option "ALLSTATS LAST" that is usually recommended to obtain the extended statistics for the last execution of the SQL statement you'll only see the statistics related to the work performed by the Query Coordinator, but no information related to the work performed by the Parallel Slaves.Here is an example DBMS_XPLAN.DISPLAY_CURSOR output for a Parallel Execution using the "ALLSTATS LAST" option:


-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | Used-Tmp|
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:02:41.57 | 10 | | | | |
| 1 | COUNT | | 1 | | 0 |00:02:41.57 | 10 | | | | |
|* 2 | FILTER | | 1 | | 0 |00:02:41.57 | 10 | | | | |
| 3 | PX COORDINATOR | | 1 | | 2000K|00:02:29.30 | 10 | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
|* 5 | HASH JOIN BUFFERED | | 0 | 2000K| 0 |00:00:00.01 | 0 | 56M| 7910K| 14M (0)| 30720 |
| 6 | PX RECEIVE | | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
| 7 | PX SEND HASH | :TQ10000 | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
| 8 | PX BLOCK ITERATOR | | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
|* 9 | TABLE ACCESS FULL| T4 | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
| 10 | PX RECEIVE | | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
| 11 | PX SEND HASH | :TQ10001 | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
| 12 | PX BLOCK ITERATOR | | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
|* 13 | TABLE ACCESS FULL| T2 | 0 | 2000K| 0 |00:00:00.01 | 0 | | | | |
-------------------------------------------------------------------------------------------------------------------------------------

You can partially work around this limitation by using the format option "ALLSTATS ALL" instead. This option means that the information provided by DBMS_XPLAN.DISPLAY_CURSOR is based on a different column set of V$SQL_PLAN_STATISTICS_ALL that aggregates across all executions of the SQL statement. For Parallel Exection the statistics representing the work performed by the Parallel Slaves will be added to these columns, so using "ALLSTATS ALL" includes that information.However, you need to be careful, since this means that you can't distinguish between the last and previous executions of the same cursor. So if you execute the same cursor multiple times in parallel, "ALLSTATS ALL" will show statistics that cover all these executions. You can work around this in a test case scenario by deliberately modifying the SQL text, for example by using a corresponding comment, that leads to the creation of a separate, unique parent cursor. This way it is ensured that "ALLSTATS ALL" effectively only displays information related to the last execution, since there is only a single (parallel) execution of that cursor.Here is again the same Parallel Execution as above, this time using the "ALLSTATS ALL" option:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 831 (100)| | | | | 0 |00:02:41.57 | 10 | 2 | 0 | | | | |
| 1 | COUNT | | 1 | | | | | | | | 0 |00:02:41.57 | 10 | 2 | 0 | | | | |
|* 2 | FILTER | | 1 | | | | | | | | 0 |00:02:41.57 | 10 | 2 | 0 | | | | |
| 3 | PX COORDINATOR | | 1 | | | | | | | | 2000K|00:02:29.30 | 10 | 2 | 0 | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 2000K| 223M| 831 (2)| 00:00:01 | Q1,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
|* 5 | HASH JOIN BUFFERED | | 8 | 2000K| 223M| 831 (2)| 00:00:01 | Q1,02 | PCWP | | 2000K|00:16:51.21 | 815 | 29239 | 29202 | 56M| 7910K| 8/0/0| 30720 |
| 6 | PX RECEIVE | | 8 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,02 | PCWP | | 2000K|00:05:23.83 | 0 | 0 | 0 | | | | |
| 7 | PX SEND HASH | :TQ10000 | 0 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 8 | PX BLOCK ITERATOR | | 8 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | PCWC | | 2000K|00:05:44.20 | 4838 | 4735 | 0 | | | | |
|* 9 | TABLE ACCESS FULL| T4 | 103 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | PCWP | | 2000K|00:04:50.92 | 4838 | 4735 | 0 | | | | |
| 10 | PX RECEIVE | | 8 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,02 | PCWP | | 2000K|00:07:16.22 | 0 | 0 | 0 | | | | |
| 11 | PX SEND HASH | :TQ10001 | 0 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 12 | PX BLOCK ITERATOR | | 8 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | PCWC | | 2000K|00:02:05.78 | 4748 | 4228 | 0 | | | | |
|* 13 | TABLE ACCESS FULL| T2 | 104 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | PCWP | | 2000K|00:01:29.86 | 4748 | 4228 | 0 | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Notice the difference - the work performed by the Parallel Slaves is (mostly) visible now. Apart from that the "ALL" formatting option added some columns that are not shown when using the "LAST" option, which you can customize using the more granular formatting options of DBMS_XPLAN.Reading the output is not that simple as for serial executions, in particular because there is a mixture of wall-clock / DB Time time for the activities related to the Query Coordinator and the aggregated DB Time for the Parallel Slaves.Furthermore the rule that applies to serial execution plans that the values for time / work are cumulative is not adhered to for Parallel Execution, at least not across Parallel Slave Sets / Table Queues and the Query Coordinator.

Multiple DFOs And Cross-Instance

However, depending on the exact details of the execution plan and the actual execution of the SQL statement, V$SQL_PLAN_STATISTICS_ALL respectively DISPLAY_CURSOR still might miss information about the Parallel Execution even when using "ALLSTATS ALL".In particular the following two points are important to consider:1. If the Parallel Execution is cross-instance (runs on multiple nodes of a RAC) then DBMS_XPLAN.DISPLAY_CURSOR will only show information about the work performed on the local instance, since it only gathers information from the local V$SQL_PLAN_STATISTICS_ALL dynamic performance view. DBMS_XPLAN.DISPLAY_CURSOR doesn't show the complete picture in such cases.Here is again the same execution plan as above, this time executed cross instance on two nodes participating in the execution:

#eeeeee; border: 1px dashed rgb(204, 204, 204); overflow: scroll;">
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 831 (100)| | | | | 0 |00:01:42.95 | 10 | 0 | 0 | | | | |
| 1 | COUNT | | 1 | | | | | | | | 0 |00:01:42.95 | 10 | 0 | 0 | | | | |
|* 2 | FILTER | | 1 | | | | | | | | 0 |00:01:42.95 | 10 | 0 | 0 | | | | |
| 3 | PX COORDINATOR | | 1 | | | | | | | | 2000K|00:01:29.20 | 10 | 0 | 0 | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 2000K| 223M| 831 (2)| 00:00:01 | Q1,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
|* 5 | HASH JOIN BUFFERED | | 4 | 2000K| 223M| 831 (2)| 00:00:01 | Q1,02 | PCWP | | 1000K|00:04:30.07 | 0 | 14601 | 14601 | 56M| 7910K| 4/0/0| 30720 |
| 6 | PX RECEIVE | | 4 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,02 | PCWP | | 1000K|00:01:17.90 | 0 | 0 | 0 | | | | |
| 7 | PX SEND HASH | :TQ10000 | 0 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 8 | PX BLOCK ITERATOR | | 4 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | PCWC | | 1038K|00:01:27.13 | 2585 | 2530 | 0 | | | | |
|* 9 | TABLE ACCESS FULL| T4 | 55 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | PCWP | | 1038K|00:01:07.22 | 2585 | 2530 | 0 | | | | |
| 10 | PX RECEIVE | | 4 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,02 | PCWP | | 1000K|00:02:11.36 | 0 | 0 | 0 | | | | |
| 11 | PX SEND HASH | :TQ10001 | 0 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 12 | PX BLOCK ITERATOR | | 4 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | PCWC | | 1011K|00:00:46.56 | 2402 | 0 | 0 | | | | |
|* 13 | TABLE ACCESS FULL| T2 | 53 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | PCWP | | 1011K|00:00:26.53 | 2402 | 0 | 0 | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note how only "half" of the work is reported (except for the Query Coordinator work). When running DBMS_XPLAN.DISPLAY_CURSOR for the corresponding cursor on the second participating node, I get the other "half":

#eeeeee; border: 1px dashed rgb(204, 204, 204); overflow: scroll;">
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 0 | | | 831 (100)| | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 1 | COUNT | | 0 | | | | | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
|* 2 | FILTER | | 0 | | | | | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 3 | PX COORDINATOR | | 0 | | | | | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 2000K| 223M| 831 (2)| 00:00:01 | Q1,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
|* 5 | HASH JOIN BUFFERED | | 4 | 2000K| 223M| 831 (2)| 00:00:01 | Q1,02 | PCWP | | 999K|00:04:30.16 | 215 | 14590 | 14570 | 56M| 7910K| 4/0/0| 30720 |
| 6 | PX RECEIVE | | 4 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,02 | PCWP | | 999K|00:01:17.35 | 0 | 0 | 0 | | | | |
| 7 | PX SEND HASH | :TQ10000 | 0 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 8 | PX BLOCK ITERATOR | | 4 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | PCWC | | 961K|00:01:29.64 | 2253 | 2205 | 0 | | | | |
|* 9 | TABLE ACCESS FULL| T4 | 48 | 2000K| 11M| 180 (2)| 00:00:01 | Q1,00 | PCWP | | 961K|00:01:10.14 | 2253 | 2205 | 0 | | | | |
| 10 | PX RECEIVE | | 4 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,02 | PCWP | | 999K|00:01:51.26 | 0 | 0 | 0 | | | | |
| 11 | PX SEND HASH | :TQ10001 | 0 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 12 | PX BLOCK ITERATOR | | 4 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | PCWC | | 988K|00:00:51.52 | 2346 | 0 | 0 | | | | |
|* 13 | TABLE ACCESS FULL| T2 | 51 | 2000K| 211M| 647 (2)| 00:00:01 | Q1,01 | PCWP | | 988K|00:00:30.00 | 2346 | 0 | 0 | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2. If the parallel execution plan consists of multiple so called "Data Flow Operations" (DFOs, you can read more about those DFOs in my OTN mini series about Parallel Execution), indicated by multiple PX COORDINATOR operations, then these different DFOs will be represented by multiple child cursors at execution time. So each DFO ends up with a separate child cursor.Since DBMS_XPLAN.DISPLAY_CURSOR cannot aggregate information across multiple child cursors the information displayed again will be incomplete in such cases.You can run DISPLAY_CURSOR for each of the child cursor generated, but this doesn't give you the same level of information. Furthermore, depending on the version and actual circumstances, the additional child cursors might not inherit the corresponding rowsource statistics setting, so these child cursors might not even contain any additional information in V$SQL_PLAN_STATISTICS_ALLHere is again a similar execution plan as above, this time using a parallel TEMP table transformation that automatically results in a separate DFO and therefore a separate child cursor at runtime. The execution in this case was using a degree of 2 and was single instance:

#eeeeee; border: 1px dashed rgb(204, 204, 204); overflow: scroll;">
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1635 (100)| | | | | 0 |00:02:23.91 | 229 | 1 | 127 | | | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | | | | 0 |00:02:23.91 | 229 | 1 | 127 | | | | |
| 2 | PX COORDINATOR | | 1 | | | | | | | | 2 |00:00:34.99 | 33 | 0 | 0 | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 0 | 2000K| 11M| 721 (2)| 00:00:01 | Q1,00 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 4 | LOAD AS SELECT | | 0 | | | | | Q1,00 | PCWP | | 0 |00:00:00.01 | 0 | 0 | 0 | 256K| 256K| | |
| 5 | PX BLOCK ITERATOR | | 0 | 2000K| 11M| 721 (2)| 00:00:01 | Q1,00 | PCWC | | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
|* 6 | TABLE ACCESS FULL | T4 | 0 | 2000K| 11M| 721 (2)| 00:00:01 | Q1,00 | PCWP | | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 7 | COUNT | | 1 | | | | | | | | 0 |00:01:48.83 | 13 | 0 | 0 | | | | |
|* 8 | FILTER | | 1 | | | | | | | | 0 |00:01:48.83 | 13 | 0 | 0 | | | | |
| 9 | PX COORDINATOR | | 1 | | | | | | | | 2000K|00:01:35.86 | 13 | 0 | 0 | | | | |
| 10 | PX SEND QC (RANDOM) | :TQ20002 | 0 | 2000K| 236M| 914 (4)| 00:00:01 | Q2,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
|* 11 | HASH JOIN BUFFERED | | 2 | 2000K| 236M| 914 (4)| 00:00:01 | Q2,02 | PCWP | | 2000K|00:03:01.81 | 160 | 29295 | 29295 | 55M| 7910K| 2/0/0| 123K|
| 12 | PX RECEIVE | | 2 | 2000K| 24M| 254 (4)| 00:00:01 | Q2,02 | PCWP | | 2000K|00:01:07.27 | 0 | 0 | 0 | | | | |
| 13 | PX SEND HASH | :TQ20000 | 0 | 2000K| 24M| 254 (4)| 00:00:01 | Q2,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 14 | VIEW | | 2 | 2000K| 24M| 254 (4)| 00:00:01 | Q2,00 | PCWP | | 2000K|00:01:34.53 | 3355 | 3328 | 0 | | | | |
| 15 | PX BLOCK ITERATOR | | 2 | 2000K| 11M| 254 (4)| 00:00:01 | Q2,00 | PCWC | | 2000K|00:01:09.08 | 3355 | 3328 | 0 | | | | |
|* 16 | TABLE ACCESS FULL| SYS_TEMP_0FD9D6603_158141 | 26 | 2000K| 11M| 254 (4)| 00:00:01 | Q2,00 | PCWP | | 2000K|00:00:43.39 | 3355 | 3328 | 0 | | | | |
| 17 | PX RECEIVE | | 2 | 2000K| 211M| 647 (2)| 00:00:01 | Q2,02 | PCWP | | 2000K|00:01:16.59 | 0 | 0 | 0 | | | | |
| 18 | PX SEND HASH | :TQ20001 | 0 | 2000K| 211M| 647 (2)| 00:00:01 | Q2,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 19 | PX BLOCK ITERATOR | | 2 | 2000K| 211M| 647 (2)| 00:00:01 | Q2,01 | PCWC | | 2000K|00:00:47.97 | 4358 | 0 | 0 | | | | |
|* 20 | TABLE ACCESS FULL | T2 | 26 | 2000K| 211M| 647 (2)| 00:00:01 | Q2,01 | PCWP | | 2000K|00:00:17.82 | 4358 | 0 | 0 | | | | |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Notice how the output suggests that the parallel execution part of the TEMP table transformation didn't start at all and didn't perform any work. If however the second child cursor related to the other DFO is analyzed, the following information gets reported:

#eeeeee; border: 1px dashed rgb(204, 204, 204); overflow: scroll;">
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 0 | | | 1635 (100)| | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 1 | TEMP TABLE TRANSFORMATION | | 0 | | | | | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 2 | PX COORDINATOR | | 0 | | | | | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 0 | 2000K| 11M| 721 (2)| 00:00:01 | Q1,00 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 4 | LOAD AS SELECT | | 2 | | | | | Q1,00 | PCWP | | 0 |00:01:07.92 | 8027 | 4737 | 3198 | 1058K| 1058K| 2/0/0|
| 5 | PX BLOCK ITERATOR | | 2 | 2000K| 11M| 721 (2)| 00:00:01 | Q1,00 | PCWC | | 2000K|00:00:33.10 | 4761 | 4735 | 0 | | | |
|* 6 | TABLE ACCESS FULL | T4 | 26 | 2000K| 11M| 721 (2)| 00:00:01 | Q1,00 | PCWP | | 2000K|00:00:11.27 | 4761 | 4735 | 0 | | | |
| 7 | COUNT | | 0 | | | | | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
|* 8 | FILTER | | 0 | | | | | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 9 | PX COORDINATOR | | 0 | | | | | | | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 10 | PX SEND QC (RANDOM) | :TQ20002 | 0 | 2000K| 236M| 914 (4)| 00:00:01 | Q2,02 | P->S | QC (RAND) | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
|* 11 | HASH JOIN BUFFERED | | 0 | 2000K| 236M| 914 (4)| 00:00:01 | Q2,02 | PCWP | | 0 |00:00:00.01 | 0 | 0 | 0 | 83M| 5864K| |
| 12 | PX RECEIVE | | 0 | 2000K| 24M| 254 (4)| 00:00:01 | Q2,02 | PCWP | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 13 | PX SEND HASH | :TQ20000 | 0 | 2000K| 24M| 254 (4)| 00:00:01 | Q2,00 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 14 | VIEW | | 0 | 2000K| 24M| 254 (4)| 00:00:01 | Q2,00 | PCWP | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 15 | PX BLOCK ITERATOR | | 0 | 2000K| 11M| 254 (4)| 00:00:01 | Q2,00 | PCWC | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
|* 16 | TABLE ACCESS FULL| SYS_TEMP_0FD9D6603_158141 | 0 | 2000K| 11M| 254 (4)| 00:00:01 | Q2,00 | PCWP | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 17 | PX RECEIVE | | 0 | 2000K| 211M| 647 (2)| 00:00:01 | Q2,02 | PCWP | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 18 | PX SEND HASH | :TQ20001 | 0 | 2000K| 211M| 647 (2)| 00:00:01 | Q2,01 | P->P | HASH | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
| 19 | PX BLOCK ITERATOR | | 0 | 2000K| 211M| 647 (2)| 00:00:01 | Q2,01 | PCWC | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
|* 20 | TABLE ACCESS FULL | T2 | 0 | 2000K| 211M| 647 (2)| 00:00:01 | Q2,01 | PCWP | | 0 |00:00:00.01 | 0 | 0 | 0 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Here you can see the missing parallel execution work actually performed.

Real-Time SQL Monitoring And XPLAN_ASH

If you are on 11g already and have the corresponding Diagnostic + Tuning Pack license, the best way to get a complete picture about Parallel Execution is using the "active" Real-Time SQL Monitoring report. It shows information already while the statement executes and doesn't have above mentioned limitations, so can work with cross-instance executions and execution plans using multiple DFOs (although there are bugs in the current versions related to such plans), besides that offers even more insights into the execution details than DBMS_XPLAN.DISPLAY_CURSOR / V$SQL_PLAN_STATISTICS_ALL. It is interesting to note that Real-Time SQL Monitoring doesn't show the actual time consumed on execution plan line level as extended Rowsource Statistics does, which explains why it doesn't come with the same overhead. Since Real-Time SQL Monitoring analyzes ASH data instead, it can still come up with some reasonable execution plan line level work distribution information (including the differentation between CPU time and waits), although not as accurate as the actual timing information that can be gathered via Rowsource Statistics.If you don't have a Tuning Pack license but at least Diagnostic Pack, or you're still on 10g (+ Diagnostic Pack license) then you can use my XPLAN_ASH tool to gather some interesting information about Parallel Execution. Since Active Session History is available cross instance and isn't limited to particular child cursors, it doesn't have above limitations and therefore can provide the full picture about a SQL execution based on ASH data. In 10g however, the ASH data doesn't have a relation to execution plan lines and misses some other information available from 11g on, so some important analysis on execution plan line level that can be done with 11g ASH data is not available in 10g.

Footnote

If you look carefully at above execution plans you'll notice the HASH JOIN BUFFERED operations that are reported as "Optimal" Hash Joins. This in principle means that the Hash Join operation itself could be done completely in memory. Why does DBMS_XPLAN.DISPLAY_CURSOR then show Read/Write/TEMP activity for the HASH JOIN BUFFERED operation? I'll cover this in detail in my next post - I believe the explanations (including my own) published so far for that type of operation are incorrect.

Summary

DBMS_XPLAN.DISPLAY_CURSOR doesn't work very well with Parallel Execution. For simple execution plans consisting only of a single DFO, and single-instance executions the "ALLSTATS ALL" option can be used as a workaround.If available, Real-Time SQL Monitoring is the tool to use for Parallel Execution analysis. My XPLAN_ASH tool also offers some unique insights, in particular regarding systematic analysis of Parallel Execution work distribution skew.

Sharing a Kindle account

I have discovered that sharing a Kindle account with family members can cause Amazon to prompt me with the oddest reading suggestions:

I suppose you never really know someone until you see what they read. It's possible that my reading choices are also involved in the determination of these recommendations. However, if Jack Reacher novels and Steal Like an Artist result in these recommendations, it just makes me wonder about the algorithm. Ha!

Friday Philosophy: It’s Not What you Know – It’s What you are Getting To Know.

This has been a good but tiring week. It started with the UKOUG TEBS conference where I saw lots of people I know, a few who I didn’t but now do and I had good times in pubs and restaurants. One evening culminated in my singing part of “Two out of Three ain’t bad” in the style of a munchkin with Nial Litchfield in a pub at 1am, which I am sure he woud rather forget about – so if you know him, ask him about it. For me that was the indicator to go lie down in the dark and sleep. Irrespectve of drunken singing, I must had talked about 20 topics with 40 people over the conference, exchanging ideas, tricks and war stories.

I also presented on the Sunday, teaching people. I went to many fine presentations through the week and learnt stuff, balancing the knowledge books. This included one talk on Wednesday where I found out I had been teaching people wrong on Sunday! *sigh* Oh dear. And it is something I have been getting wrong for years. I’ll check the fine detail before I blog about that though. I now hate SQL Maria for pointing out my error – which is unfair but that’s life. Sorry Maria, I’ll buy you a beer sometime to make up for it.

After all that, I came back to work to find we had some big issues with one system. I won’t go into the details but there were half a dozen of us working on it, coming up with ideas, knocking each other’s suggestions down or agreeing with them – and coming up with working solutions. None of us could have done it alone and I reckon most of us learnt a couple of things in the process (such as how bad some of the app design was :-) ).

So all in all, a week of learning that has worn me out but left me happy and that is what I like about my work life at present. I am working with people who know stuff and who want to know stuff. I am in a role that exposes me to a lot of issues. I am working for a boss (and team) who let me take time out from the official job to learn stuff too. I suspect it is going to end soon though as the company I am working for are taking large numbers of people to the carpark and shooting them (figuratively speaking only, I should point out). It will be a shame if it does end. But then, just as when I worked with the best IT person I ever worked with, I’ll be happy I had the experience.

Yes, I know I am lucky to have such a good working life at present {but, you know something? I did take efforts to make it like this – maybe a future FF on that}.

So having said I like my working life so much, who is willing to bet I’ll be complaining like a spoilt child all next week?!?

Adding disks to VMware Workstation 8 on the fly on RHEL 6

Although this post is primarily written for users of VMware Workstation 8 it is applicable for any RedHat 6 clone and adding disks with single path on the fly. Multipathing requires additional setup in dm-multipath or the vendor multipathing software which I won’t cover here. A quick hint though: you need to set disk.EnableUUID = “TRUE” in your VM’s config file for scsi_id to return a value.

The situation is common: you created a virtual machine and need more storage. Hopefully you created it using LVM which would allow you to add the new disk to an existing volume group followed by a resize operation of the logical volume which is short on space. But before you can do this you have to add a new LUN to your setup-here is how you can do this without rebooting the VM.

First I recommend you install lsscsi (for convenience, not really necessary) and the sg3_utils:

[root@server1 ~]# yum install lsscsi.x86_64 sg3_utils.x86_64
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package lsscsi.x86_64 0:0.23-2.el6 will be installed
---> Package sg3_utils.x86_64 0:1.28-4.el6 will be installed
--> Processing Dependency: sg3_utils-libs = 1.28-4.el6 for package: sg3_utils-1.28-4.el6.x86_64
--> Processing Dependency: libsgutils2.so.2()(64bit) for package: sg3_utils-1.28-4.el6.x86_64
--> Running transaction check
---> Package sg3_utils-libs.x86_64 0:1.28-4.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================================================================
 Package                                      Arch                                 Version                                    Repository                           Size
========================================================================================================================================================================
Installing:
 lsscsi                                       x86_64                               0.23-2.el6                                 local                                38 k
 sg3_utils                                    x86_64                               1.28-4.el6                                 local                               470 k
Installing for dependencies:
 sg3_utils-libs                               x86_64                               1.28-4.el6                                 local                                51 k

Transaction Summary
========================================================================================================================================================================
Install       3 Package(s)

Total download size: 559 k
Installed size: 1.4 M
Is this ok [y/N]: y
Downloading Packages:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                    25 MB/s | 559 kB     00:00
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : sg3_utils-libs-1.28-4.el6.x86_64                                                                                                                     1/3
  Installing : sg3_utils-1.28-4.el6.x86_64                                                                                                                          2/3
  Installing : lsscsi-0.23-2.el6.x86_64                                                                                                                             3/3

Installed:
  lsscsi.x86_64 0:0.23-2.el6                                                        sg3_utils.x86_64 0:1.28-4.el6

Dependency Installed:
  sg3_utils-libs.x86_64 0:1.28-4.el6

Complete!

With this done it is easy to check the attached SCSI devices, and on my system I found these:

[root@server1 ~]# lsscsi
[0:0:0:0] disk VMware, VMware Virtual S 1.0 /dev/sda
[0:0:1:0] disk VMware, VMware Virtual S 1.0 /dev/sdb
[2:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0

This means there are two virtual disks and a virtual CD-ROM. Now it’s time to add the new disk. Do so using the user VMware workstation interface. Once completed, you need to rescan the scsi bus:

[root@server1 ~]# rescan-scsi-bus.sh
Host adapter 0 (mptspi) found.
Host adapter 1 (ata_piix) found.
Host adapter 2 (ata_piix) found.
Scanning SCSI subsystem for new devices
Scanning host 0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning for device 0 0 0 0 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 00
      Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
      Type:   Direct-Access                    ANSI SCSI revision: 02
Scanning for device 0 0 1 0 ...
OLD: Host: scsi0 Channel: 00 Id: 01 Lun: 00
      Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
      Type:   Direct-Access                    ANSI SCSI revision: 02
Scanning for device 0 0 2 0 ...
NEW: Host: scsi0 Channel: 00 Id: 02 Lun: 00
      Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
      Type:   Direct-Access                    ANSI SCSI revision: 02
Scanning for device 0 0 2 0 ...
OLD: Host: scsi0 Channel: 00 Id: 02 Lun: 00
      Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
      Type:   Direct-Access                    ANSI SCSI revision: 02
Scanning host 1 channels  0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning host 2 channels  0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning for device 2 0 0 0 ...
OLD: Host: scsi2 Channel: 00 Id: 00 Lun: 00
      Vendor: NECVMWar Model: VMware IDE CDR10 Rev: 1.00
      Type:   CD-ROM                           ANSI SCSI revision: 05
0 new device(s) found.
0 device(s) removed.

A new disk has been found, as shown by lscsi:

[root@server1 ~]# lsscsi
[0:0:0:0]    disk    VMware,  VMware Virtual S 1.0   /dev/sda
[0:0:1:0]    disk    VMware,  VMware Virtual S 1.0   /dev/sdb
[0:0:2:0]    disk    VMware,  VMware Virtual S 1.0   /dev/sdc
[2:0:0:0]    cd/dvd  NECVMWar VMware IDE CDR10 1.00  /dev/sr0
[root@server1 ~]#

Very nice-from there on it’s just another disk you can format using parted or fdisk. And you are done.