Using objects residing in multiple blocksizes
I've already mentioned it several times on my blog but I would like to take the chance here again to stress the point that the cost based optimizer does a bad job when it comes to calculating costs for full table scans of objects residing in non-default block sizes. It really looks like that this feature has been introduced to support transportable tablespaces but it obviously hasn't been tested very thoroughly when it comes to cost calculation.
Each of the different modes has its deficiencies when dealing with objects in non-default blocksizes. The somehow odd thing is that the traditional I/O costing does the best job, and all system statistics based calculations are utterly wrong.
Traditional I/O based costing
The traditional I/O based costing simply scales the MBRC up or down according to the non-default blocksize to come to the same I/O read request size. So if you e.g. have a MBRC of 8 and a default blocksize of 8KB and now calculate the cost for an object residing in a 2KB tablespace, the MBRC will be multiplied 4, which results in a MBRC of 32. The I/O cost will be different although due to the different adjustment used with the higher MBRC setting. The adjusted MBRC for 32 is 16.39 whereas the adjusted MBRC for 8 is 6.59, so the calculated cost for the full table scan of the object residing in the 2KB tablespace will be higher. Likewise the same happens when using an object in a 16KB non-default tablespace. The MBRC will be reduced accordingly to 4 to get the same I/O read size again. Since adjusted MBRC for MBRC = 4 is 4.17, the cost calculated will actually be less for the object residing the 16KB tablespace.
Given this simple script (Run this on any 10.2.x (including XE) or 18.104.22.168/7):
create table test2
select * from dba_objects;
create index i_test1_1 on test1(object_name);
create index i_test1_2 on test1(object_id);
create index i_test2_1 on test2(object_name);
create index i_test2_2 on test2(object_id);
exec dbms_stats.gather_table_stats(null, 'TEST1', estimate_percent=>null, method_opt=>'for all columns size 1');
exec dbms_stats.gather_table_stats(null, 'TEST2', estimate_percent=>null, method_opt=>'for all columns size 1');
And this simple query:
with this plan:
Plan 1 (the obvious one):
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2 | 212 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS SEMI | | 2 | 212 | 6 (0)| 00:00:01 |
System statistics in 9i
In 10g the CPU costing mode is enabled by default and is supported by the default NOWORKLOAD system statistics.
But you can use system statistics already in 9i, although you have to enable them explicitly.
Oracle 9i is out of support, but I guess there are still a lot of systems out there that are using the 9.2 release, therefore I find it worth to mention what you can do in 9i with system statistics. This can be helpful if you consider to test your application already in the 9i environment with system statistics before upgrading to 10g.
In most descriptions about 9i and system statistics only WORKLOAD system statistics are mentioned, but 9i also supports NOWORKLOAD system statistics, although not in the same full flavour as 10g does.
You can activate CPU costing in 9i by running DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD'), but this seems to work differently than in 10g and later.
Whereas in 10g and later this actually measures the IOSEEKTIM and IOTFRSPEED values, this seems to activate in 9i something that is comparable with the default NOWORKLOAD system statistics of 10g.
The SYS.AUX_STATS$ does not show any actual values, but tests revealed that 9i (at least 22.214.171.124) in that case seems to measure the CPU speed (in 10g this is the CPUSPEEDNW value) and uses the same default values for IOSEEKTIM and IOTFRSPEED as 10g does (10ms and 4096 bytes/ms resp.).
Running some tests showed that you arrive at the same I/O cost as you do in 10g with the default NOWORKLOAD system statistics.
Before heading on to the remaining modes of system statistics, let's summarize what has been observed in part 1 regarding the default NOWORKLOAD system statistics in 10g and later. The following table shows what the test case from the previous post demonstrated:
Table 1: 8KB MSSM locally managed tablespace 10,000 blocks table segment
default NOWORKLOAD system statistics:
If you happen to have a 16KB default blocksize the results would look like the following. Note that the table is now only 5,000 blocks in size, and the SREADTIM is now a bit longer (10+16384/4096=14ms instead of 10+8192/4096=12ms) therefore the 16KB blocksize calculation makes the full table scan look a bit cheaper to the optimizer when using the default NOWORKLOAD system statistics.
Table 2: 16KB MSSM locally managed tablespace 5,000 blocks table segment
default NOWORKLOAD system statistics:
This is the first part of a series of posts that cover one of the fundamentals of the cost based optimizer in 9i and later. Understanding how the different system statistics modes work is crucial in making the most out of the cost based optimizer, therefore I'll attempt to provide some detailed explanations and samples about the formulas and arithmetics used. Finally I'll show (again) that using multiple block sizes for "tuning" purposes is a bad idea in general, along with detailed examples why I think this is so.
One of the deficiencies of the traditional I/O based costing was that it simply counted the number of I/O requests making no differentation between single-block I/O and multi-block I/O.
System statistics were introduced in Oracle 9i to allow the cost based optimizer to take into account that single-block I/Os and multi-block I/Os should be treated differently in terms of costing and to include a CPU component in the cost calculation.
The system statistics tell the cost based optimizer (CBO) among other things the time it takes to perform a single block read request and a multi-block read request. Given this information the optimizer ought to be able to come to estimates that better fit the particular environment where the database is running on and additionally use an appropriate costing for multi-block read requests that usually take longer than single block read requests. Given the information about the time it takes to perform the read requests the cost calculated can be turned into a time estimate.
The cost calculated with system statistics is still expressed in the same units as with traditional I/O based costing, which is in units of single-block read requests.
Some time ago on the OTN forum the following table layout was part of a discussion regarding performance issues and it revealed an interesting anomaly regarding list partition pruning:
If you're using list partitioning with partitions that use multiple values that map to a single list partition then the optimizer obviously uses a questionable approach when you're using multiple values on the partition key to prune to a single partition.
Consider the following table layout:
I'm now going to populate that table using this sample data: