I've published the final part of my video tutorial and the final part of my mini series "Parallel Execution Skew" at AllThingsOracle.com concluding what I planned to publish on the topic of Parallel Execution Skew.
Anyone regularly using Parallel Execution and/or relying on Parallel Execution for important, time critical processing should know this stuff. In my experience however almost no-one does, and therefore misses possibly huge opportunities for optimizing Parallel Execution performance.
Since all this was published over a longer period of time this post therefore is a summary with pointers to the material.
If you want to get an idea what the material is about, the following video summarizes the content:
This is the third part of the video tutorial "Analysing Parallel Execution Skew". In this part I show how to analyse a parallel SQL execution regarding Parallel Execution Skew.
If you don't have a Diagnostics / Tuning Pack license the options you have for doing that are quite limited, and the approach, as demonstrated in the tutorial, has several limitations and shortcomings.
Here is the video:
This is the second part of the video tutorial "Analysing Parallel Execution Skew". In this part I introduce the concept of "Data Flow Operations (DFOs)" and "DFO Trees", which is what a Parallel Execution plan is made of. DFOs / DFO Trees are specific to Parallel Execution and don't have any counterpart in a serial execution plan.
Understanding the implications of DFOs / DFO Trees is important as prerequisite for understanding some of the effects shown in the later parts of the video tutorial, hence I covered this as a separate topic.
Note that this tutorial also demonstrates some new 12c features regarding Parallel Execution, in particular how Oracle 12c now lifts many of the previous limitations that lead to the generation of multiple DFO Trees.
Here is the video:
Along the new mini series "Parallel Execution Skew" at "AllThingsOracle.com" that provides some information what you can do if you happen to have a parallel SQL execution that is affected by work distribution problems I'm publishing a series of video tutorials that explain how you can actually detect and measure whether a SQL execution is affected by skew or not.
Originally this tutorial was planned as one part (Part 5 actually) of the XPLAN_ASH video tutorial series, however so far I've only managed to publish just the inital two parts of that series, and these are already a bit outdated as I've released new versions of the XPLAN_ASH tool with significant changes and enhancements since then.
Using objects residing in multiple blocksizes
I've already mentioned it several times on my blog but I would like to take the chance here again to stress the point that the cost based optimizer does a bad job when it comes to calculating costs for full table scans of objects residing in non-default block sizes. It really looks like that this feature has been introduced to support transportable tablespaces but it obviously hasn't been tested very thoroughly when it comes to cost calculation.
Each of the different modes has its deficiencies when dealing with objects in non-default blocksizes. The somehow odd thing is that the traditional I/O costing does the best job, and all system statistics based calculations are utterly wrong.
Traditional I/O based costing
The traditional I/O based costing simply scales the MBRC up or down according to the non-default blocksize to come to the same I/O read request size. So if you e.g. have a MBRC of 8 and a default blocksize of 8KB and now calculate the cost for an object residing in a 2KB tablespace, the MBRC will be multiplied 4, which results in a MBRC of 32. The I/O cost will be different although due to the different adjustment used with the higher MBRC setting. The adjusted MBRC for 32 is 16.39 whereas the adjusted MBRC for 8 is 6.59, so the calculated cost for the full table scan of the object residing in the 2KB tablespace will be higher. Likewise the same happens when using an object in a 16KB non-default tablespace. The MBRC will be reduced accordingly to 4 to get the same I/O read size again. Since adjusted MBRC for MBRC = 4 is 4.17, the cost calculated will actually be less for the object residing the 16KB tablespace.
System statistics in 9i
In 10g the CPU costing mode is enabled by default and is supported by the default NOWORKLOAD system statistics.
But you can use system statistics already in 9i, although you have to enable them explicitly.
Oracle 9i is out of support, but I guess there are still a lot of systems out there that are using the 9.2 release, therefore I find it worth to mention what you can do in 9i with system statistics. This can be helpful if you consider to test your application already in the 9i environment with system statistics before upgrading to 10g.
In most descriptions about 9i and system statistics only WORKLOAD system statistics are mentioned, but 9i also supports NOWORKLOAD system statistics, although not in the same full flavour as 10g does.
You can activate CPU costing in 9i by running DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD'), but this seems to work differently than in 10g and later.
Whereas in 10g and later this actually measures the IOSEEKTIM and IOTFRSPEED values, this seems to activate in 9i something that is comparable with the default NOWORKLOAD system statistics of 10g.
The SYS.AUX_STATS$ does not show any actual values, but tests revealed that 9i (at least 220.127.116.11) in that case seems to measure the CPU speed (in 10g this is the CPUSPEEDNW value) and uses the same default values for IOSEEKTIM and IOTFRSPEED as 10g does (10ms and 4096 bytes/ms resp.).
Running some tests showed that you arrive at the same I/O cost as you do in 10g with the default NOWORKLOAD system statistics.
Before heading on to the remaining modes of system statistics, let's summarize what has been observed in part 1 regarding the default NOWORKLOAD system statistics in 10g and later. The following table shows what the test case from the previous post demonstrated:
Table 1: 8KB MSSM locally managed tablespace 10,000 blocks table segment
default NOWORKLOAD system statistics:
If you happen to have a 16KB default blocksize the results would look like the following. Note that the table is now only 5,000 blocks in size, and the SREADTIM is now a bit longer (10+16384/4096=14ms instead of 10+8192/4096=12ms) therefore the 16KB blocksize calculation makes the full table scan look a bit cheaper to the optimizer when using the default NOWORKLOAD system statistics.
Table 2: 16KB MSSM locally managed tablespace 5,000 blocks table segment
default NOWORKLOAD system statistics:
This is the first part of a series of posts that cover one of the fundamentals of the cost based optimizer in 9i and later. Understanding how the different system statistics modes work is crucial in making the most out of the cost based optimizer, therefore I'll attempt to provide some detailed explanations and samples about the formulas and arithmetics used. Finally I'll show (again) that using multiple block sizes for "tuning" purposes is a bad idea in general, along with detailed examples why I think this is so.
One of the deficiencies of the traditional I/O based costing was that it simply counted the number of I/O requests making no differentation between single-block I/O and multi-block I/O.
System statistics were introduced in Oracle 9i to allow the cost based optimizer to take into account that single-block I/Os and multi-block I/Os should be treated differently in terms of costing and to include a CPU component in the cost calculation.
The system statistics tell the cost based optimizer (CBO) among other things the time it takes to perform a single block read request and a multi-block read request. Given this information the optimizer ought to be able to come to estimates that better fit the particular environment where the database is running on and additionally use an appropriate costing for multi-block read requests that usually take longer than single block read requests. Given the information about the time it takes to perform the read requests the cost calculated can be turned into a time estimate.
The cost calculated with system statistics is still expressed in the same units as with traditional I/O based costing, which is in units of single-block read requests.