Search

OakieTags

Who's online

There are currently 0 users and 23 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Oracle : buffer busy wait

 

Oracle 10 and 11

Buffer Busy Waits usually happen on Oracle 10 and 11 mainly because of insert contention into tables or Indexes.  There are a few other rare cases of contention on old style RBS segments, file headers blocks and freelists.
Before Oracle 10 and 11 there was one other major reason which was readers waiting for readers, ie one user does a phyiscal IO of a block into memory and a second user want to read that block. The second user waits until the IO is finished by the first user. Starting in 10g this wait has been given the name ”read by other session“.  Before Oracle 10g this was also a “buffer busy wait”.
The easiest way to analyse the bottleneck and find a solution is to use ASH (active session History) available in Oracle 10g with the diagnostics pack license or using Simulated ASH for free or using a product like DB Optimizer.
Data block class, which can be found in ASH,  is the most important piece of information in analysing buffer busy waits. If we know the block class we can determine what kind of bottleneck:
 If CLASS=
    1. data block
      • IF OTYPE =
      • INDEX , then the insert index leaf block is probably hot, solutions are
        • Hash partition the index
        • Use reverse key index
      • TABLE, then insert block is hot,solutions
        • Use free lists
        • Put Object in ASSM tablespace
    2. Segment header - If “segment header” occurs at the same time as CLASS= “data block” on the same object and the object is of OTYPE= “TABLE”  then this is just a confirmation that the TABLE needs to use free lists or  ASSM.
    3. File Header Block - Most likely extent allocation problems, look at extent size on tablespace and increase the extent size to there are few extent allocations and less contention on the File Header Block.
    4. free lists - Add free list groups to the object
    5. undo header - Not enough UNDO segments, if using old RBS then switch to AUM
    6. undo block - Hot spot in UNDO, application issue
How do we find the block class? With a quick query on the ASH data like:
select
       o.object_name obj,
       o.object_type otype,
       ash.SQL_ID,
       w.class
from v$active_session_history ash,
     ( select rownum class#, class from v$waitstat ) w,
      all_objects o
where event='buffer busy waits'
   and w.class#(+)=ash.p3
   and o.object_id (+)= ash.CURRENT_OBJ#
Order by sample_time;
For Example

 

OBJ    OTYPE  SQL_ID        CLASS
------ ------ ------------- ------------------
TOTO1  TABLE  8gz51m9hg5yuf data block
TOTO1  TABLE  8gz51m9hg5yuf data block
TOTO1  TABLE  8gz51m9hg5yuf segment header
TOTO1  TABLE  8gz51m9hg5yuf data block
If we find that CLASS=datablock, then we will want more information to diagnose, such as the object type “OTYPE” , object name and what kind of tablespace the object is stored in. The following query provides that information:

set linesize 120

col block_type for a20
col objn for a25
col otype for a15
col filen for 9999
col blockn for 9999999
col obj for a20
col tbs for a10
select
       bbw.cnt,
       bbw.obj,
       bbw.otype,
       bbw.sql_id,
       bbw.block_type,
       nvl(tbs.name,to_char(bbw.p1)) TBS,
       tbs_defs.assm ASSM
from (
    select
       count(*) cnt,
       nvl(object_name,CURRENT_OBJ#) obj,
       o.object_type otype,
       ash.SQL_ID sql_id,
       nvl(w.class,'usn '||to_char(ceil((ash.p3-18)/2))||' '||
                    decode(mod(ash.p3,2),
                         1,'header',
                         0,'block')) block_type,
       --nvl(w.class,to_char(ash.p3)) block_type,
       ash.p1 p1
    from v$active_session_history ash,
        ( select rownum class#, class from v$waitstat ) w,
        all_objects o
    where event='buffer busy waits'
      and w.class#(+)=ash.p3
      and o.object_id (+)= ash.CURRENT_OBJ#
      and ash.session_state='WAITING'
      and ash.sample_time > sysdate - &minutes/(60*24)
      --and w.class# > 18
   group by o.object_name, ash.current_obj#, o.object_type,
         ash.sql_id, w.class, ash.p3, ash.p1
  ) bbw,
    (select   file_id, 
       tablespace_name name
  from dba_data_files
   ) tbs,
    (select
 tablespace_name    NAME,
        extent_management  LOCAL,
        allocation_type    EXTENTS,
        segment_space_management ASSM,
        initial_extent
     from dba_tablespaces 
   ) tbs_defs
  where tbs.file_id(+) = bbw.p1
    and tbs.name=tbs_defs.name
Order by bbw.cnt
/
and the output looks like

  CNT OBJ     OTYPE   SQL_ID        BLOCK_TYPE       TBS        ASSM

----- ------- ------- ------------- ---------------- ---------- ------
    3 TOTO1   TABLE   8gz51m9hg5yuf segment header   NO_ASSM    MANUAL
   59 TOTO1   TABLE   8gz51m9hg5yuf data block       NO_ASSM    MANUAL
Oracle 7, 8 and 9

Before Oracle 10, buffer busy waits also happened because IO blocking another user wanting to do the same IO. On Oracle 9, the main reasons for buffer busy waits are

1)       IO read contention (only Oracle 9i and below)

2)       Insert Block Contention on Tables or Indexes
3)       Rollback Segment Contention

On  7.0  - 8.1.5 see http://sites.google.com/site/embtdbo/oracle-buffer-busy-wait/oracle-buffer-busy-wait-7-8-1-5

On version 8 and 9, the p3 value has a different meaning. Instead  of meaning the block type (which is the best thing to know) it means the kind of buffer busy wait. There are only two values that matter to us, values in

100 range = read waits (basically just an IO wait)

Reader blocking Reader, ie one reader is reading a block in and another person wants to read this block and waits on a buffer busy wait p3=130.

200 range = write contetion (same as in 10g)

Writers blocking other writers for example while doing inserts either because of no free lists on the table or because everyone is inserting into the same index block.

If you have set up ASH style collection with S-ASH or have a product like DB Optimizer you can run a query like:

select

       count(*) cnt,
       o.object_name obj,
       o.object_type otype,
       ash.CURRENT_OBJ#,
       ash.SQL_ID,
       decode(substr(ash.p3,1,1),1,'read',2,'write',p3) p3
from v$active_session_history ash,
      all_objects o
where event='buffer busy waits'
   and o.object_id (+)= ash.CURRENT_OBJ#
group by o.object_name, o.object_type, ash.sql_id, ash.p3,ash.CURRENT_OBJ#
order by cnt
/
And see what kind of buffer busy waits there are and what the objects are:

 CNT OBJ     OTYPE   CURRENT_OBJ#     SQL_ID P3

--- ------- ------- ------------ ---------- ------
  1                           -1 1375352856 read
  2                           -1  996767823 read
  2                           -1 2855119862 write
 17                           -1 1375352856 write
 89 TOTO1   TABLE         296030 1212617343 write
109                       296022 1212617343 write

Often the Current_obj# is -1 so we can’t figure out what the object is . There is an alternative method

col block_type for a18

col objn for a25
col otype for a15
col event for a15
col blockn for 999999
col segment_name for a20
col partition_name for a15
col owner for a15
set timing on
/*
drop table myextents;
l
create table myextents as select * from dba_extents;
l
*/
select
       count(*),
       ext.owner,
       ext.segment_name,
       ext.partition_name,
      ext.segment_type,
        decode(substr(ash.p3,1,1),1,'read',2,'write',p3) p3
       --ash.p1,
       --ash.p2
from v$active_session_history ash,
     myextents ext
where
       event = 'buffer busy waits'
   and ( current_obj# = -1 or current_obj#=0  or current_obj# is null )
   --and sample_time > sysdate - &minutes/(60*24)
   --and session_state='WAITING'
   and  ext.file_id(+)=ash.p1 and
        ash.p2 between  ext.block_id and ext.block_id + ext.blocks
group by
       ext.owner,
       ext.segment_name,
       ext.partition_name,
       ext.segment_type,
       p3
       --ash.p1,
       --ash.p2,
       --ash.sql_id
Order by count(*)
/
Because querying DBA_EXTENTS  is a slow operation, I made a copy of DBA_EXTENTS which will be faster to query.

CNT OWNER  SEGMENT_NAME   PARTITION_NAME  SEGMENT_TYPE  P3

--- ------ -------------- --------------- ------------- --------
  1 SYS    _SYSSMU2$                      TYPE2 UNDO    read
  1 SYS    _SYSSMU3$                      TYPE2 UNDO    write
This second option of getting the object from P1 and P2 (file and block) should probably be done only with the users consent, because we would have to create a copy of the dba_extent table which might take a long time if it’s big.
No ASH ?

If you don’t have ASH data you will have to do some guess work.
Block Class (block type)

The first step in finding out the source of buffer busy waits is looking at

     V$waitstats
This will tell us what kind of datablocks we have contention on.
File with contention

You can also get an idea of what file contains the object with the buffer busy waits by looking at:

    X$KCBFWAIT
Object with contention

Starting in version 9i there is the table

    v$segstat
That will list the objects with buffer busy waits.
If you are on version 7 or 8 good luck finding the object without setting up ASH style data collection.
Why do buffer busy waits happen?

To put it most succinctly, buffer busy waits happen because two users want to change a block at the same time. Two users can change the same block, or even same row “at the same time” ie without committing, but that’s different from the actual operation of modifying the block. The modification on the block in RAM, or computer memory, can only be done by one process at at time in order to avoid memory corruptions. Different users can modify different blocks at the same time but only one user or process can modify a the same block at a time.
In order to really understand what’s going on we have to take a look at how Oracle manages memory and block access and modifications.
Here is the layout of
Above is a diagram shows some of the essential parts of Oracle in regards to performance tuning.
In the machine memory are
  •     Oracle’s SGA, or System Global Area, a memory that is shared between Oracle users
  •     LGWR – log writer process
  •     DBWR – database writer process
  •     User1,2,3 … – user processes, in this case “shadow processes”

On the machine file system are

  • Redo log files
  • Data files
The SGA is composed of (among other things)
  • Log Buffer
  • Library Cache
  • Buffer Cache
What’s important for understanding buffer busy waits is how the buffer cache is managed. Here is view of the buffer cache with more components:
In order to access a block, a user (shadow process) has to get a latch (cache buffer chains latch) which protects buckets or linked lists of buffer headers. Once the header desired if found the latch is released. The buffer headers point to the actual data block in memory. Before modifying a block in memory a user has to lock the buffer header. The buffer header is locked any time a modification is made whether it is reading a block into memory or modifying a block that is already in memory. Usually the header is locked only for a brief amount of time but when there is a lot of concurrent access the buffer header can become a bottleneck.
BBW when readling data – read by other session

A buffer busy can happen on oracle 7,8 and 9 when one user is reading a block into memory and a second user wants to read that block. Instead of the second user trying to read that block into memory as well, they just wait for the first user to finish. Starting in Oracle 10, this kind of wait was renames “read by other session”

BBW on insert
If multiple concurrent users are inserting into a table that doesn’t have free lists or is not in an ASSM tablespace then all users will end up inserting into the same block, the first one on the free list and this block will become the hot block
by adding free lists or moving the table to an ASSM tablespace we will alleviate the bottleneck.
Multiple free lists:
The other option is ASSM or Automatic Segment Space Management which is set at the tablespace level.
In this case free block information is kept in Level 1 BMB (or bitmapped blocks). These Level 1 BMBs are chosen by a hash on the users process ID thus distributing the inserts across the table.
The inserts would look something like this (somewhat exaggerated drawing)
the ASSM BMB blocks take up more space in the table , about 1 extra block for every 16 data blocks and there is overhead first looking in the header/level 3 BMB block then going to the Level 2  then level 1 and finally to the datablock but all in all ASSM is worth reduced costs of management verses free lists.
Identifying and creating ASSM tablespaces
Which tablespaces are ASSM or not?

select

        tablespace_name,
        extent_management  LOCAL,
        allocation_type    EXTENTS,
        segment_space_management ASSM,
        initial_extent
from dba_tablespaces

TABLESPACE_NAME LOCAL      EXTENTS   ASSM

--------------- ---------- --------- ------
SYSTEM          LOCAL      SYSTEM    MANUAL
UNDOTBS1        LOCAL      SYSTEM    MANUAL
SYSAUX          LOCAL      SYSTEM    AUTO
TEMP            LOCAL      UNIFORM   MANUAL
USERS           LOCAL      SYSTEM    AUTO
EXAMPLE         LOCAL      SYSTEM    AUTO
DATA            LOCAL      SYSTEM    MANUAL
creating an ASSM tablespace:

create tablespace data2 

datafile '/d3/kyle/data2_01.dbf' 
size 200M
segment space management auto;

BBW on index (because of insert)

If users are inserting data that has a rising key value, especially a monotonically rising value, then all the new inserts will have to update the leading edge leaf block of the index and with high concurrent inserts this can cause buffer busy waits.
Solutions
Hash partition the index
Reverse Key Index

BBW on old style RBS

IF block class > 18 it’s an old style RBS segment

Select  CURRENT_OBJ#||' '||o.object_name objn,

          o.object_type otype,
          CURRENT_FILE# filen,
          CURRENT_BLOCK# blockn,
          ash.SQL_ID,
          w.class ||' '||to_char(ash.p3) block_type
from v$active_session_history ash,
      (select rownum class#, class from v$waitstat ) w,
       all_objects o
where event='buffer busy waits'
    and w.class#(+)=ash.p3
    and o.object_id (+)= ash.CURRENT_OBJ#
Order by sample_time;    

OBJN        OTYPE  FILEN  BLOCKN SQL_ID        BLOCK_TYPE

----------- ------ ------ ------ ------------- ------------
54962 TOTO1 TABLE     16   45012 8gz51m9hg5yuf data block 
54962 TOTO1 TABLE     16     161 8gz51m9hg5yuf segment header
0                     14       9 8gz51m9hg5yuf  87
0                     14       9 8gz51m9hg5yuf  87

IF the block is of class > 18, the there will be no object name, so we have to look it up ourselves to be sure:

select  segment_name,  

            segment_type
from     dba_extents 
where
     &P2  between 
    block_id and block_id + blocks – 1
     and 
     file_id = &P1 ;

Plug in 14 for P1 the file # and 9 for P2 the block number:

SEGMENT_NAME   SEGMENT_TYPE

-------------- --------------
R2             ROLLBACK
solution
move to new AUM or Automatic Undo Mangement
alter system set undo_management=auto  scope=spfile;
BBW on a file header
The ASH data has two different fields that indicate the file # and block # when the wait is a buffer busy wait.
For a buffer busy wait
    File # = p1  *and* File # = current_file#
    Block # = P2  *and* Block # = current_block#
if  p1 != current_file#  or p2 != current_block# then use p1 and p2. They are more reliable.
for example

Time   P1  P2 OBJN OTYPE FN BLOCKN BLOCK_TYPE

----- --- --- ---- ----- -- ------ -----------------
11:44 202   2 -1          0      0 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
Notice P1 != BLOCKN (BLOCKN is CURRENT_BLOCK#)  and P2 != FN (FN is CURRENT_FILE#)
The real file # is P1 =202 and block # is P2 which is 2
In my database I only had 10 files, so what is this file# 202?!
Solution
If you are getting buffer busy waits on the file header block for a tempfile (datafile in a temporary tablespace) then try increasing the “next extent” size in the temporary tablespace.
This wait can happen when lots of extents are being allocated in the temporary tablespace.
What Would ADDM do?
Interstingly enough the ADDM page doesn’t show the new load that has recently come on the system but the analysis is there.  I clicked on the next to bottom line in the page, “Read and write contention on database blocks was consuming significant database time.
Here are the outputs for the different scenarios.
 inserts into a table contention
 inserts into a table with contention on index
 
 
RBS contention
 
File Header Contention

 

Health Care Crises in Application Development

if someone fraudulently uses your information for medical services or drugs, you could be held liable for the costs

The demand for healthcare application development is exploding and has been exploding over the past couple of years because of

  • Obama Care – Affordable Care Act
  • Regulatory – HITECH and HIPAA Privacy Acts
  • ICD  10
  • Pro-active Health Care (versus reactive health care)
  • Mobile devices

but to develop applications for health care requires the data to be masked. Why does masking data matter and matter especially for health care? If patient information gets out it can be quite damaging. One heuristic for the importance of healthcare information is that on the black market health care information on an individual tends to sell for 100x the credit card information for an individual. Imagine that someone needs health coverage and they swipe the health care information for someone else giving them free treatment. The value of the “free treatment” can well exceed the maximums on a credit card. Also imagine the havoc it can cause for the original individual if some jumps onto their health care. Important information like blood type can be logged incorrectly or the person my have HIV logged against them when they themselves are clear. It can take years to repair the damage or never if the damage is fatal.

What do Britney Spears, George Clooney, Octomom (Nadya Suleman)and the late Farah Fawcett have in common? They are all victims of medical data breaches! … How much would a bookie pay to know the results of a boxer’s medical checkup before a title bout? What would a tabloid be willing to pay to be the first to report a celebrity’s cancer diagnosis? Unfortunately it doesn’t stop there and the average citizen is equally a target.  

When data gets to untrusted parties it is called leakage. To avoid leakage, companies use masking. Masking is a form of data mediation or transformation that replaces sensitive data with equally valid fabricated data. Masking data can be more work on top of the already significant work of provisioning copies of a source database to development and QA. Development and QA can get these database copies in minutes for almost no storage overhead using Delphix (as has been explained extensively on previous blogs) but by default these copies, or virtual databases(VDB), are not masked.  Without Delphix, to mask database copies in development and QA would require masking every single copy, but with Delphix one can provision a single VDB, masked that VDB, and then clone in minutes for almost no storage as many masked copies of that first masked VDB as needed.

 Screen Shot 2014-03-21 at 10.38.14 AM

In the above graphic, Delphix links to a source database, and keeps a compressed version along with a rolling time window of changes from the source database. With this data Delphix can spin up a clone of the source database, anywhere in that time window. The clone can be spun up in a few minutes and takes almost no storage because it initially shares all the duplicate blocks on Delphix. This first VDB can be masked and then clones of the masked VDB can be made in minutes for almost no extra storage.

With Delphix in the architecture making masked copies is fast, easy and efficient. The first VDB that is masked will take up some extra storage for all the changed data. This VDB can then become the basis for all other development and QA masked copies so there is no need to worry about whether or not a development or QA database is masked. Because the source for all development and QA copies is masked then there is no way for any unmasked copies to make it into development and QA. Without the secure architecture of Delphix  it becomes more complicated to verify and enforce that each copy is indeed masked. By consolidating the origins of all the down stream copies into a single set of masked shared data blocks, we can rest assured that all the down stream versions are also masked. The cloning interface in Delphix also logs all cloning activity and chain of custody reports can be run.

How do we actually accomplish the masking? Masking can be accomplished with a number of technologies available in the industry. With Delphix these technologies can be run on a VDB in the same manner that they are currently being used with regular physical clone databases. Alternatively Delphix has hooks for the provisioning where tools can be leveraged before the VDB is fully provisioned out.

Delphix has partnered with Axis Technology to streamline and automate the masking process with virtual databases. Look for upcoming blog posts to go into more detail about Axis and Delphix.

 

Golden rules of RAC performance diagnostics

After collaborating with many performance engineers in a RAC database, I have come to realize that there are common pattern among the (mis)diagnosis. This blog about discussing those issues. I talked about this in Hotsos 2014 conference also.

Golden rules

Here are the golden rules of RAC performance diagnostics. These rules may not apply general RAC configuration issues though.

  1. Beware of top event tunnel vision
  2. Eliminate infrastructure as an issue
  3. Identify problem-inducing instance
  4. Review send-side metrics also
  5. Use histograms, not just averages

Looks like, this may be better read as a document. So, please use the pdf files of the presentation and a paper. Presentation slide #10 shows indepth coverage on gc buffer busy* wait events. I will try to blog about that slide later (hopefully).

Golden rules of RAC diagnostics paper

Golden rules of rac diagnostics ppt

Scripts mentioned in the presentation can be downloaded here.

scripts

IOUG Collaborate 2014 and EM12c Odyssey

Collaborate is coming up fast and this year is going to be a great conference for all involved.  I’ll have a number of sessions, both for those attending in person and those taking advantage of the virtual content.

As a virtual speaker, I’ve been offered a pass for the virtual that I can offer to one lucky recipient.  This will grant you access to the two virtual tracks of manageability and performance offered by IOUG!  I will award this virtual registration code by the end of the week to the person who can come up with the best name for my new VM I’m building.  It better be good, too…. :)

Tweet the Database Name submission with #C14LV and #NameMyEM  to me, @DBAKevlar

As for some great content…. :)I’ll be presenting three sessions this year for Oracle, (v signifies a virtual and in person session)-

  • Mastering Enterprise Manager 12c Monitoring(v)   April 9th, 2-3pm
  • Deep Dive into ASH and AWR(v)  April 8th, 5:30-6:30pm
  • Database as a Service, (DBaaS) in a DBAs World April 9th, 4:30-5:30pm

You’ll also be able to find all of my wonderful teammates presenting at the following sessions for more, fantastic Oracle content here.

See YOU at Collaborate 14!!

oracle3

 

 



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [IOUG Collaborate 2014 and EM12c Odyssey], All Right Reserved. 2014.

RLS bug

RLS – row level security, aka VPD (virtual private database) or FGAC (fine grained access control) has a critical bug in 11g. The bug is unpublished, but gets mentioned in various other documents, so can be identified as: Bug: 7828323 “SYS_CONTEXTS RETURNS WRONG VALUE WITH SHARED_CONTEXT_SENSITIVE”

The title tells you nearly everything you need to know – if you’ve declared a security policy as context_sensitive or shared_context_sensitive then a change to the context ought to result in the associated predicate function being called to generate a new security predicate the next time the policy becomes relevant. Thanks to bug 7828323 this doesn’t always happen – so queries can return the wrong set of results.

There are some patches for older versions (11.1.0.7 and 11.2.0.2 as far as I’ve checked), but if you don’t have, or can’t get, a patch the “workaround” is to change any relevant policies to dynamic; unfortunately the consequence of this is that the predicate function will then be called for every execution of any statement against any objects protected by that policy.

Depending on how your application has been written and how many queries are likely to invoke security policies this could easily increase your CPU usage by several percent (and if it’s a badly written application maybe a lot more).

Footnote:

It has occurred to me to wonder what happens if you have use a (normal) pl/sql function in a select list, and the function executes a statement against a table, and the table is protected by a context_sensitive security policy – and you decide to use the pl/sql result cache on the function. How long is an item supposed to stay in the result cache, and if it’s longer than a single execution of a statement will the result cache be invalidated if your context changes in a way that invalidates the current security predicate ? No time to check or test at present, though, but I’d be very cautious about putting RLS predicate functions into the result cache until I’ve played around with that idea for a bit.

RLS bug

RLS – row level security, aka VPD (virtual private database) or FGAC (fine grained access control) has a critical bug in 11g. The bug is unpublished, but gets mentioned in various other documents, so can be identified as: Bug: 7828323 “SYS_CONTEXTS RETURNS WRONG VALUE WITH SHARED_CONTEXT_SENSITIVE”

The title tells you nearly everything you need to know – if you’ve declared a security policy as context_sensitive or shared_context_sensitive then a change to the context ought to result in the associated predicate function being called to generate a new security predicate the next time the policy becomes relevant. Thanks to bug 7828323 this doesn’t always happen – so queries can return the wrong set of results.

There are some patches for older versions (11.1.0.7 and 11.2.0.2 as far as I’ve checked), but if you don’t have, or can’t get, a patch the “workaround” is to change any relevant policies to dynamic; unfortunately the consequence of this is that the predicate function will then be called for every execution of any statement against any objects protected by that policy.

Depending on how your application has been written and how many queries are likely to invoke security policies this could easily increase your CPU usage by several percent (and if it’s a badly written application maybe a lot more).

Footnote:

It has occurred to me to wonder what happens if you have use a (normal) pl/sql function in a select list, and the function executes a statement against a table, and the table is protected by a context_sensitive security policy – and you decide to use the pl/sql result cache on the function. How long is an item supposed to stay in the result cache, and if it’s longer than a single execution of a statement will the result cache be invalidated if your context changes in a way that invalidates the current security predicate ? No time to check or test at present, though, but I’d be very cautious about putting RLS predicate functions into the result cache until I’ve played around with that idea for a bit.

Automated Linux VM build on ESX

How to automatically install RedHat like (Red Hat, Cent OS, Oracle Enterprise Linux) Linux system on ESX or VirtualBox servers ? There are at least two methods. I have seen VM cloning in lot of places using ESX (VSphere Center) or VirtualBox functionality. Cloning is fine but it required some intervention after clone will be finished (like host rename or IP address change). What if we want to install different releases of systems - well it is required to have a clone of every single release like RedHat 6.2, OEL 6.3 or OEL 6.4. It require a rework every time a new release is available on the market.

But there is a another method - developed years ago well before virtualization era. It is based on installation answer file and KickStart installation method. If I add DHCP and TFTP and NFS server to this equation I will get nice and quite configurable way to build my VMs very fast.

First of all DHCP server has to be configured. In my case I just created a internal network inside ESX and setup an additional DHCP sever for build purpose only. You can use any DHCP server but it has to be able to support TFTP redirection and also booting functionality. In my base I'm using DHCP, TFTP and NFS server provided by Ubuntu 13.04

DHCP configuration

root@myown:~# cat /etc/dhcp/dhcpd.conf

ddns-update-style none;

# option definitions common to all supported networks...
option domain-name "priv.localdomain";

default-lease-time 600;
max-lease-time 7200;

option subnet-mask 255.255.255.0;
option broadcast-address 10.0.0.255;
option domain-name-servers 192.168.1.20;

subnet 10.0.0.0 netmask 255.255.255.0 {
range dynamic-bootp 10.0.0.1 10.0.0.100;
filename "pxelinux.0";
next-server 10.0.0.250;
}

PXELINUX.0 has to be copied from default location into TFTP directory and it has to be accessible by it. You can find "pxelinux.0" file in Ubuntu syslinux-common package.
Install it using "apt-get install syslinux-common" and then copy from it's default location into /tftpboot

root@myown:~# cp /usr/lib/syslinux/pxelinux.0 /tftpboot/

TFTP configuration - default port and /tftpboot used as files location

pioro@myown:/etc$ cat /etc/xinetd.d/tftp
service tftp
{
protocol = udp
port = 69
bind = 10.0.0.250
socket_type = dgram
wait = yes
user = nobody
server = /usr/sbin/in.tftpd
server_args = /tftpboot
disable = no
}

TFTP directory structure
 

pioro@myown:/tftpboot$ ls -l
total 37816
-rw-r--r-- 1 root root 34551609 Dec 22 17:04 initrd.img
-rw-r--r-- 1 root root 26461 Dec 22 17:26 pxelinux.0
drwxr-xr-x 2 root root 4096 Jan 13 22:02 pxelinux.cfg
-r--r--r-- 1 root root 441 Dec 22 17:04 TRANS.TBL
-rwxr-xr-x 1 root root 4128944 Dec 22 17:04 vmlinuz

PXELINUX.CFG directory inside TFTP directory

pioro@myown:/tftpboot/pxelinux.cfg$ ls -l
total 8
-rw-r--r-- 1 root root 137 Dec 22 18:29 01-00-0c-29-41-69-15
-rw-r--r-- 1 root root 138 Jan 13 21:55 01-00-0c-29-99-7d-3d

Files names are related to NIC MAC addresses. For example: 01-00-0c-29-41-69-15 is first file for MAC address 00:0c:29:41-69:15
Now take a look what is inside a host configuration file

pioro@myown:/tftpboot/pxelinux.cfg$ cat 01-00-0c-29-41-69-15
default Oracle Linuxas_64
label Oracle Linuxas_64
kernel vmlinuz
append initrd=initrd.img ks=nfs:10.0.0.250:/images/ks.cfg ksdevice=eth1

This are boot properties created using a GRUB style and describing a kernel and initrd images. KS parameter is a configuration parameter of KickStart process. In the above example KickStart configuration file is placed on NFS server in /images directory and it's called ks.cfg. In addition to that KickStart will configure interface eth1 which is private ESX network in my case.

My DHCP and TFTP server has NFS server configured as well. It's exporting only one directory /images which keep a KickStart configuration files and also is a mount point for ISO image.

pioro@myown:/tftpboot/pxelinux.cfg$ cat /etc/exports
/images/ *(ro,subtree_check,crossmnt)

ISO with Linux distribution should be mounted below /images directory using loop option.

root@myown:~# mount -o loop /nfs/disk2/images/OEL65.iso /images/OEL65/
mount: block device /nfs/disk2/images/OEL65.iso is write-protected, mounting read-only

Now I have a access to installation files and also to PXE boot files. In my case all have been located in the following directory /images/OEL65/images/pxeboot/ and I just copied it into TFTP /tftpboot directory

root@myown:~# ls -l /images/OEL65/images/pxeboot/
total 37775
-rw-r--r-- 2 root root 34551609 Nov 26 05:02 initrd.img
-r--r--r-- 1 root root 441 Nov 26 05:04 TRANS.TBL
-rwxr-xr-x 2 root root 4128944 Nov 26 05:02 vmlinuz

root@myown:~# cp /images/OEL65/images/pxeboot/* /tftpboot/

Inside NFS exported directory I have also KickStart configuration files

pioro@myown:/images$ ls -l
total 12
-rw-r--r-- 1 root root 1935 Jan 13 21:59 ks2.cfg
-rw-r--r-- 1 root root 1936 Jan 13 22:00 ks.cfg
drwxr-xr-x 2 root root 4096 Dec 22 18:27 OEL65

Example Kickstart configuration file

pioro@myown:/images$ cat dg1.cfg
#platform=x86, AMD64, or Intel EM64T
#version=DEVEL
# Firewall configuration
firewall --disabled
# Install OS instead of upgrade
install
# Use CDROM installation media
nfs --server 10.0.0.250 --dir /images/OEL65/
#cdrom
# Root password
rootpw --plaintext
# System authorization information
auth --useshadow --passalgo=sha512
# Use graphical install
graphical
firstboot --disable
# System keyboard
keyboard us
# System language
lang en_US
# SELinux configuration
selinux --disabled
# Installation logging level
logging --level=info
# Reboot after installation
reboot
# System timezone
timezone Europe/Dublin
# Network information
network --bootproto=static --device=eth0 --ip=192.168.1.51 --nameserver=192.168.1.20 --netmask=255.255.255.0 --onboot=on --hostname=dg1.localhost
network --bootproto=static --device=eth1 --ip=10.0.0.1 --netmask=255.255.255.0 --onboot=on
# network --bootproto=dhcp --device=eth2 --onboot=on
# System bootloader configuration
bootloader --location=mbr
# Clear the Master Boot Record
zerombr
# Partition clearing information
clearpart --all
# Disk partitioning information
part /boot --fstype ext4 --size=200
part pv.01 --size=1 --grow
volgroup VolGroup pv.01
logvol swap --fstype swap --vgname=VolGroup --size=1024 --name=lv_swap
logvol / --fstype ext4 --vgname=VolGroup --size=1 --grow --name=lv_root


%packages
@base
@console-internet
@core
@debugging
@directory-client
@hardware-monitoring
@large-systems
@network-file-system-client
@performance
@perl-runtime
@security-tools
@server-platform
@server-policy
@system-admin-tools
gcc

%end 

Above configuration file will partition a disk into root and swap partition and configure two networks. In addition to that packages groups specified after %packages line will be installed.

Below are screen shots from my ESX environment:

Finding a MAC address of VM - Open VM configuration and go to Network adapters

Booting process - this VM is booting from NIC 2 using private network and all services configured above.

If you looking to step by step instructions you can find it on Tim Hall website:
PXE Network installations 
KickStart

I based my environment build on Tim's website and some Google research. 
Happy installing !!!

regards,
Marcin

Effective Indexing Webinar

Thanks to everyone for attending today's Effective Indexing webinar sponsored by IOUG with Embarcadero. For attendees, IOUG will likely send out a link to the recording and PDF of the presentation, but I also wanted to post it here.

Presentation PDF
Webinar recording

Please note that during the webinar I mentioned the new clustering factor calculation available in 12c and also available in some patchsets for 11g releases. The patch number I referred to is incorrect. Instead, please search for the Bug/Enhancement article in MOS. You can find it by searching for "Bug 13262857 - Enh: provide some control over DBMS_STATS index clustering factor computation (Doc ID 13262857.8)."

Thanks again and stay tuned for additional webinars coming soon!

Parallel Execution Overhead

I've started a new mini series about "Parallel Execution Skew" at AllThingsOracle.com. In order to avoid bloating the articles too much there I'll post some accompanying notes here on my blog.If you follow the initial post you'll see that the sample query demonstrated there scales almost perfectly with Parallel Execution - the serial execution takes 57 seconds on my test system, whereas the a Parallel Execution at a DOP of 4 takes something between 14 and 15 seconds, so almost four times faster.Here are again the corresponding two table join queries along with their execution plans that I haven't included in the "Parallel Execution Skew" article - the plans are generated using Oracle database version 12c which you can see from the new HYBRID HASH distribution method for parallel plans:Serial query:


select count(t_2_filler) from (
select /*+ monitor
no_parallel
leading(t_1 t_2)
use_hash(t_2)
no_swap_join_inputs(t_2)
*/
t_1.id as t_1_id
, t_1.filler as t_1_filler
, t_2.id as t_2_id
, t_2.filler as t_2_filler
from t_1
, t_2
where
t_2.fk_id_uniform = t_1.id
and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

Execution plan:


------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 213 | | 24508 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 213 | | | |
|* 2 | HASH JOIN | | 100K| 20M| 225M| 24508 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T_1 | 2000K| 202M| | 796 (2)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T_2 | 2000K| 204M| | 1289 (2)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T_2"."FK_ID_UNIFORM"="T_1"."ID")
filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','
\1',1,1,'c')>= REGEXP_REPLACE ("T_1"."FILLER",'^\s+([[:alnum:]]+)\s+$','
\1',1,1,'c'))

Parallel Query:


select count(t_2_filler) from (
select /*+ monitor
leading(t_1 t_2)
use_hash(t_2)
no_swap_join_inputs(t_2)
pq_distribute(t_2 hash hash)
*/
t_1.id as t_1_id
, t_1.filler as t_1_filler
, t_2.id as t_2_id
, t_2.filler as t_2_filler
from t_1
, t_2
where
t_2.fk_id_uniform = t_1.id
and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

Execution plan:


------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 213 | | 6186 (1)| 00:00:01 | | | |
| 1 | SORT AGGREGATE | | 1 | 213 | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 1 | 213 | | | | Q1,02 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 213 | | | | Q1,02 | PCWP | |
|* 5 | HASH JOIN | | 100K| 20M| 56M| 6186 (1)| 00:00:01 | Q1,02 | PCWP | |
| 6 | PX RECEIVE | | 2000K| 202M| | 221 (1)| 00:00:01 | Q1,02 | PCWP | |
| 7 | PX SEND HYBRID HASH | :TQ10000 | 2000K| 202M| | 221 (1)| 00:00:01 | Q1,00 | P->P | HYBRID HASH|
| 8 | STATISTICS COLLECTOR | | | | | | | Q1,00 | PCWC | |
| 9 | PX BLOCK ITERATOR | | 2000K| 202M| | 221 (1)| 00:00:01 | Q1,00 | PCWC | |
| 10 | TABLE ACCESS FULL | T_1 | 2000K| 202M| | 221 (1)| 00:00:01 | Q1,00 | PCWP | |
| 11 | PX RECEIVE | | 2000K| 204M| | 358 (2)| 00:00:01 | Q1,02 | PCWP | |
| 12 | PX SEND HYBRID HASH | :TQ10001 | 2000K| 204M| | 358 (2)| 00:00:01 | Q1,01 | P->P | HYBRID HASH|
| 13 | PX BLOCK ITERATOR | | 2000K| 204M| | 358 (2)| 00:00:01 | Q1,01 | PCWC | |
| 14 | TABLE ACCESS FULL | T_2 | 2000K| 204M| | 358 (2)| 00:00:01 | Q1,01 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("T_2"."FK_ID_UNIFORM"="T_1"."ID")
filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c')>= REGEXP_REPLACE
("T_1"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c'))

Note: If you're interesting in learning more about reading Parallel Execution plans, Jonathan Lewis recently completed a series of posts about that topicLooking at a corresponding Real-Time SQL Monitoring report (see the post at AllThingsOracle.com) of the Parallel Execution it can also be seen that it took approx. 58 seconds of Database Time (spread across four worker processes), so really pretty close to the duration / Database Time of the serial execution.However in real-life cases that have to process large amounts of data and use more complex execution plans you'll usually see Parallel Execution not scaling perfectly, even if the work distribution is not the problem as discussed in the article series.To some degree this is because Parallel Execution comes with overhead, and therefore the database actually has to work more than with a comparable serial execution.The purpose of this post is to demonstrate this using the set-up used in the article series, by simply extending the two table join used there to a three table join.Here is the corresponding serial query:


select count(t_2_filler) from (
select /*+ monitor
leading(t_1 t_2)
use_hash(t_2)
no_swap_join_inputs(t_2)
use_hash(a)
no_swap_join_inputs(a)
opt_estimate(join (t_1 t_2) scale_rows=20)
no_parallel
*/
t_1.id as t_1_id
, t_1.filler as t_1_filler
, t_2.id as t_2_id
, t_2.filler as t_2_filler
from t_1
, t_1 a
, t_2
where
t_2.fk_id_uniform = t_1.id
and t_2.id = a.id
and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

Execution plan:


-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 225 | | 49560 (1)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 225 | | | |
|* 2 | HASH JOIN | | 2006K| 430M| 441M| 49560 (1)| 00:00:02 |
|* 3 | HASH JOIN | | 2003K| 418M| 225M| 25172 (1)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T_1 | 2000K| 202M| | 796 (2)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T_2 | 2000K| 215M| | 1289 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T_1 | 2000K| 11M| | 795 (2)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T_2"."ID"="A"."ID")
3 - access("T_2"."FK_ID_UNIFORM"="T_1"."ID")
filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','
\1',1,1,'c')>= REGEXP_REPLACE ("T_1"."FILLER",'^\s+([[:alnum:]]+)\s+$','
\1',1,1,'c'))

And here is the parallel query:


select count(t_2_filler) from (
select /*+ monitor
leading(t_1 t_2)
use_hash(t_2)
no_swap_join_inputs(t_2)
pq_distribute(t_2 hash hash)
use_hash(a)
no_swap_join_inputs(a)
pq_distribute(a hash hash)
opt_estimate(join (t_1 t_2) scale_rows=20)
*/
t_1.id as t_1_id
, t_1.filler as t_1_filler
, t_2.id as t_2_id
, t_2.filler as t_2_filler
from t_1
, t_1 a
, t_2
where
t_2.fk_id_uniform = t_1.id
and t_2.id = a.id
and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

Execution plan:


----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 225 | | 12473 (1)| 00:00:01 | | | |
| 1 | SORT AGGREGATE | | 1 | 225 | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10004 | 1 | 225 | | | | Q1,04 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 225 | | | | Q1,04 | PCWP | |
|* 5 | HASH JOIN | | 2006K| 430M| 110M| 12473 (1)| 00:00:01 | Q1,04 | PCWP | |
| 6 | PX RECEIVE | | 2003K| 418M| | 6354 (1)| 00:00:01 | Q1,04 | PCWP | |
| 7 | PX SEND HYBRID HASH | :TQ10002 | 2003K| 418M| | 6354 (1)| 00:00:01 | Q1,02 | P->P | HYBRID HASH|
| 8 | STATISTICS COLLECTOR | | | | | | | Q1,02 | PCWC | |
|* 9 | HASH JOIN BUFFERED | | 2003K| 418M| 56M| 6354 (1)| 00:00:01 | Q1,02 | PCWP | |
| 10 | PX RECEIVE | | 2000K| 202M| | 221 (1)| 00:00:01 | Q1,02 | PCWP | |
| 11 | PX SEND HYBRID HASH | :TQ10000 | 2000K| 202M| | 221 (1)| 00:00:01 | Q1,00 | P->P | HYBRID HASH|
| 12 | STATISTICS COLLECTOR | | | | | | | Q1,00 | PCWC | |
| 13 | PX BLOCK ITERATOR | | 2000K| 202M| | 221 (1)| 00:00:01 | Q1,00 | PCWC | |
| 14 | TABLE ACCESS FULL | T_1 | 2000K| 202M| | 221 (1)| 00:00:01 | Q1,00 | PCWP | |
| 15 | PX RECEIVE | | 2000K| 215M| | 358 (2)| 00:00:01 | Q1,02 | PCWP | |
| 16 | PX SEND HYBRID HASH | :TQ10001 | 2000K| 215M| | 358 (2)| 00:00:01 | Q1,01 | P->P | HYBRID HASH|
| 17 | PX BLOCK ITERATOR | | 2000K| 215M| | 358 (2)| 00:00:01 | Q1,01 | PCWC | |
| 18 | TABLE ACCESS FULL | T_2 | 2000K| 215M| | 358 (2)| 00:00:01 | Q1,01 | PCWP | |
| 19 | PX RECEIVE | | 2000K| 11M| | 221 (1)| 00:00:01 | Q1,04 | PCWP | |
| 20 | PX SEND HYBRID HASH | :TQ10003 | 2000K| 11M| | 221 (1)| 00:00:01 | Q1,03 | P->P | HYBRID HASH|
| 21 | PX BLOCK ITERATOR | | 2000K| 11M| | 221 (1)| 00:00:01 | Q1,03 | PCWC | |
| 22 | TABLE ACCESS FULL | T_1 | 2000K| 11M| | 221 (1)| 00:00:01 | Q1,03 | PCWP | |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("T_2"."ID"="A"."ID")
9 - access("T_2"."FK_ID_UNIFORM"="T_1"."ID")
filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c')>= REGEXP_REPLACE
("T_1"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c'))

Note: The query uses an undocumented OPT_ESTIMATE hint to correct the bad join cardinality estimate caused by the complex REGEXP_REPLACE expression as part of the join condition - since the execution plan is pretty much dictated by the hints this is not strictly necessary in this case. For curiosity you could remove the OPT_ESTIMATE and PQ_DISTRIBUTE(A HASH HASH) hints. If you do so you'll notice that the optimizer uses then a BROADCAST distribution for the join result of T_1 and T_2 due to the incorrect cardinality estimate - so choosing the appropriate distribution method is another area where the cardinality estimates can cause a lot of trouble, in particular for under-estimates that lead to large row sources being distributed via broadcast.On my test system the serial execution still takes approx. 59 seconds, so not much longer than the serial two table join, whereas the Parallel Execution takes between 28 and 30 seconds at a DOP of 4 - clearly not a four times improvement as for the simple two table join (14-15 seconds), although the work was distributed evenly across the worker processes.We can see this confirmed by looking at a corresponding Real-Time SQL Monitoring report (click on the picture to enlarge):

Notice how for the majority of the time four Parallel Execution Servers were active, so work distribution is not the problem here. We can get a clue that some overhead was the problem here by looking at the "Time & Wait Statistics" section in the "Overview" at the top of the report: It shows a "Database Time" of 2 minutes (of which almost 100% are CPU time), so clearly the Parallel Execution had to work much more than the serial execution, which just took close to one minute.What seems to happen here is that the overhead of the HASH JOIN BUFFERED operation that actually processes the probe row source of the hash join twice, once to buffer it (and by doing so buffering only rows that will survive the join according to the internal bitmap generated as part of the hash table build process (so false positives are possible), see the HASH JOIN BUFFERED post for more background information), and a second time to perform the actual join by processing the previously buffered data, causes the costly regular expression that is part of the join condition to be evaluated twice as many times as the serial execution has to do.Now this particular case is an exaggeration of the overhead due to the CPU intensive expression - nevertheless it is a good demonstration that you shouldn't always expect Parallel Execution to scale perfectly, even with almost perfect work distribution.

Analysing Parallel Execution Skew - Video Tutorial

Along the new mini series "Parallel Execution Skew" at "AllThingsOracle.com" that provides some information what you can do if you happen to have a parallel SQL execution that is affected by work distribution problems I'm publishing a series of video tutorials that explain how you can actually detect and measure whether a SQL execution is affected by skew or not.

Originally this tutorial was planned as one part (Part 5 actually) of the XPLAN_ASH video tutorial series, however so far I've only managed to publish just the inital two parts of that series, and these are already a bit outdated as I've released new versions of the XPLAN_ASH tool with significant changes and enhancements since then.

Since this tutorial covers a lot of information and I don't want to bother you watching a single multi-hour video, I'm splitting this tutorial into four parts:

  • Brief introduction
  • Explaining Data Flow Operations (DFOs)
  • Analysing Parallel Execution Skew without Diagnostic / Tunink Pack license
  • Analysing Parallel Execution Skew with Diagnostic / Tunink Pack license

So here is the initial part, a brief introduction what the "Parallel Execution Skew" problem is about:

The links mentioned at the beginning of the video are:
Webinar recording "Analysing and Troubleshooting Parallel Execution"
OTN mini series "Understanding Parallel Execution"