Search

Top 60 Oracle Blogs

Recent comments

CBO

When your projection is not cost-free

First of all I have to admit that the title of this post is a bit misleading - the projection is almost never "cost-free". Indirectly cost will be generated by accessing rows from row sources to obtain any columns selected, and with the introduction of System Statistics, Oracle even assigns a direct CPU cost to accessing a particular column of a row - as it has been pointed out by Joze Senegacnik and is demonstrated e.g. in Christian Antognini's "Troubleshooting Oracle Performance" book on page 117.

However, Oracle obviously treats work that has to be performed as part of the projection differently than work that has to be performed as part of the selection part.

In particular user-defined functions or scalar subqueries will not be accounted for costing when calculating the overall query cost.

This holds even true for user-defined functions that have a cost assigned via the Extensible Optimizer framework. For more information regarding the "Extensible Optimizer" framework, read Adrian Billington's article on oracle-developer.net or refer e.g. to our latest "Expert Oracle Practices" OakTable book where Joze Senegacnik dedicated a whole chapter to this topic.

Of course the best way to deal with this situation is to avoid using user-defined functions or scalar subqueries and replace them with appropriate joins/subqueries. In most cases this yields the best performance, provided that the optimizer comes up with a reasonable execution plan, and also solves the issue of non-contributed work, because regular join constructs will be considered in the cost calculation.

However, what to do of you're in the situation that you can't simply change the query (e.g. third party vendor application)?

If a user-defined function or scalar subquery is used as part of the selection clause the cost-based optimizer will make use of any cost associated with the function, or evaluated for the scalar subquery, whereas the same construct used as part of the projection will not be taken into account for the cost.

In particular in the case of user-defined functions that perform costly operations, for example recursive SQL, and are not declared as deterministic (or are really of undeterministic nature) this can make a significant difference.

The problem with functions that are not declared as deterministic is that the built-in caching feature of Oracle that can help with scalar subqueries or functions declared as deterministic (since Oracle 10.2) can not be used to alleviate the potentially resource-intensive numerous calls to the function.

In case of more complex queries that make use of views, the view merging transformations applied to the query therefore can lead to quite different work performed by a query.

Consider the following example setup:

drop function generate_lio;

drop table t1;

create table t1 (
run_id integer not null, /* identify the process inserting the data */
batch_id integer not null, /* represents clustered data, could also be a (arriving) date */
a_value number null, /* represents sequence based data */
a_random number null, /* represents randomly scattered data */
a_date timestamp default systimestamp not null, /* represents the insert timestamp */
filler char(1) default 'x' not null /* can be used to size the row as required */
);

create index t1_idx1 on
t1 (
a_random
);

insert into t1 (
run_id
, batch_id
, a_value
, a_random
)
select
1 as run_id
, trunc(id - 1 / 400) + 1 as batch_id
, id as a_value
, trunc(dbms_random.value(1, 40000.999999999)) as a_random
from
(
select
level as id
from
dual
connect by
level <= 40000
);

commit;

exec dbms_stats.gather_table_stats(null, 't1')

create or replace function generate_lio(in_lio in number default 1)
return number
--deterministic
as
n_val number;
begin
select /*+ first_rows(1) */
run_id
into
n_val
from
(
select
*
from
(
select
rownum as rn
, run_id
, substrb(rowid, 1, 15) as block
, a_random
from
t1
where
a_random is not null
order by
a_random
)
where rownum <= in_lio
)
where
rn = in_lio;
return trunc(dbms_random.value(0, 1000)) * n_val;
end generate_lio;
/

associate statistics with functions generate_lio default cost (100000, 3, 0) default selectivity 100;

The code creates a table holding 40,000 rows and a simple index with a bad clustering factor. A user-defined function that allows to generate logical I/O based on that index is created and associated a default cost (and selectivity) using ASSOCIATE STATISTICS. The function deliberately uses the DBMS_RANDOM package to simulate a non-deterministic behaviour. Each BATCH_ID in the table covers 400 rows, with 100 batch_ids in total.

Now consider the following query:

/* Hint is required from 11g on to prevent GROUP BY placement */
select /*+ no_place_group_by */
sum(val), t1.filler
from
t1
, (
/* See the difference in the runtime statistics
between merging this view and not */
select /* no_merge */
generate_lio(1) as val
, batch_id
from
t1
) t2
where
t1.batch_id = t2.batch_id
and t1.batch_id = 42
group by
t1.filler;

This query will generate 160,000 rows by combining 400 rows from each row source. Here is the execution plan (all tested on 11.1.0.7) when merging the view T2 by default:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 146 (28)| 00:00:02 |
| 1 | HASH GROUP BY | | 1 | 8 | 146 (28)| 00:00:02 |
|* 2 | HASH JOIN | | 160K| 1250K| 115 (8)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| T1 | 400 | 2000 | 55 (4)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| T1 | 400 | 1200 | 55 (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
4 - filter("BATCH_ID"=42)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=1) "T1"."FILLER"[CHARACTER,1], SUM("GENERATE_LIO"(1))[22]
2 - (#keys=1) "T1"."FILLER"[CHARACTER,1]
3 - "T1"."BATCH_ID"[NUMBER,22], "T1"."FILLER"[CHARACTER,1]
4 - "BATCH_ID"[NUMBER,22]

Notice in particular where the evaluation of the function takes place according to the "Projection" information ("+PROJECTION" option of DBMS_XPLAN.DISPLAY).

And here is the execution plan when explicitly requesting to not merge the view T2:

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 146 (28)| 00:00:02 |
| 1 | HASH GROUP BY | | 1 | 21 | 146 (28)| 00:00:02 |
|* 2 | HASH JOIN | | 160K| 3281K| 115 (8)| 00:00:02 |
|* 3 | TABLE ACCESS FULL | T1 | 400 | 2000 | 55 (4)| 00:00:01 |
| 4 | VIEW | | 400 | 6400 | 55 (4)| 00:00:01 |
|* 5 | TABLE ACCESS FULL| T1 | 400 | 2400 | 55 (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="T2"."BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
5 - filter("BATCH_ID"=42)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=1) "T1"."FILLER"[CHARACTER,1], SUM("VAL")[22]
2 - (#keys=1) "T1"."FILLER"[CHARACTER,1], "VAL"[NUMBER,22]
3 - "T1"."BATCH_ID"[NUMBER,22], "T1"."FILLER"[CHARACTER,1]
4 - "VAL"[NUMBER,22], "T2"."BATCH_ID"[NUMBER,22]
5 - "BATCH_ID"[NUMBER,22]

It is not that obvious from the "Projection" information, but in this case the function (the "VAL" of the "Projection" in operation id 4) is evaluated before the join takes place.

Notice that the (still undocumented) "NO_PLACE_GROUP_BY" hint is required from 11g on to prevent the optimizer from getting too clever with this kind of statement. The GROUP BY is used in this case to simplify the result set processing aggregating it into a single row, but in 11g by default the new GROUP BY placement pushes the GROUP BY into the view, effectively solving the issue of excessive function calls by simply reducing the row source sizes that subsequently get joined. Since this is not supposed to be point of this demonstration, the clever trick of pushing the group by into the view is prevented. However it is interesting to note how new features of the optimizer can help to solve problems by side-effects.

The problem described here can still be seen in 11g without any hints when not using a GROUP BY clause to aggregate the result set.

If you run this with statistics_level set to ALL and check the runtime statistics (DBMS_XPLAN.DISPLAY_CURSOR), you'll notice a significant difference between merging the view T2 or not. By default Oracle will merge the view, and obviously perform the "projection" as part of the HASH GROUP BY operation after joining the data, leading to 160,000 calls to the function, each generating three logical I/Os (when using 1 as parameter to the function).

-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:14.58 | 480K| | | |
| 1 | HASH GROUP BY | | 1 | 1 | 1 |00:00:14.58 | 480K| 805K| 805K| 602K (0)|
|* 2 | HASH JOIN | | 1 | 160K| 160K|00:00:00.01 | 390 | 988K| 988K| 317K (0)|
|* 3 | TABLE ACCESS FULL| T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
|* 4 | TABLE ACCESS FULL| T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
4 - filter("BATCH_ID"=42)

Compare the difference in runtime and the number of logical I/Os performed ("Buffers") to this one, when not merging the view:

------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.34 | 1590 | | | |
| 1 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.34 | 1590 | 805K| 805K| 369K (0)|
|* 2 | HASH JOIN | | 1 | 160K| 160K|00:00:00.02 | 1590 | 988K| 988K| 354K (0)|
|* 3 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.02 | 195 | | | |
| 4 | VIEW | | 1 | 400 | 400 |00:00:00.04 | 1395 | | | |
|* 5 | TABLE ACCESS FULL| T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="T2"."BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
5 - filter("BATCH_ID"=42)

So obviously it is a good idea in this particular case to not merge the view, but although a I/O cost has been explicitly assigned to the function, you can see that both execution plans have exactly the same cost and Oracle happily merges the view.

The costing looks different when using the function as part of the projection clause:

select /*+ no_place_group_by */
sum(val), t1.filler
from
t1
, (
select
1 as val
, batch_id
, a_date
, run_id
from
t1
)t2
where
t1.batch_id = t2.batch_id
and t1.batch_id = 42
and generate_lio(t2.run_id) >= 0
group by
t1.filler;

Note that I had to change the function parameter to something that refers to a column expression, otherwise the optimizer treats "generate_lio(1)" as "independent" and adds a FILTER operation that evaluates "generate_lio" exactly once.

It is now obvious from the execution plan that the cost of executing the function is considered:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 1356 (4)| 00:00:17 |
| 1 | HASH GROUP BY | | 1 | 11 | 1356 (4)| 00:00:17 |
|* 2 | HASH JOIN | | 160K| 1718K| 1325 (2)| 00:00:16 |
|* 3 | TABLE ACCESS FULL| T1 | 400 | 2000 | 55 (4)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| T1 | 400 | 2400 | 1265 (1)| 00:00:16 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
4 - filter("BATCH_ID"=42 AND "GENERATE_LIO"("RUN_ID")>=0)

Notice the increased cost of the operation id 4 - calling the function 400 times with an associated I/O cost of 3 and adding on top the associated CPU cost.

A similar behaviour can be seen when using scalar subqueries as part of the selection or projection.

Consider this query:

select /*+ no_place_group_by */
sum(val), t1.filler
from
t1
, (
select /* no_merge */
(
select
min(run_id)
from
t1 a
where
a.a_random = b.a_random
) as val
, batch_id
from
t1 b
) t2
where
t1.batch_id = t2.batch_id
and t1.batch_id = 42
group by
t1.filler;

Checking the execution plan you'll notice again the same cost as above when using the function in the projection:

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 146 (28)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 2 | 16 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T1_IDX1 | 2 | | 1 (0)| 00:00:01 |
| 4 | HASH GROUP BY | | 1 | 13 | 146 (28)| 00:00:02 |
|* 5 | HASH JOIN | | 160K| 2031K| 115 (8)| 00:00:02 |
|* 6 | TABLE ACCESS FULL | T1 | 400 | 2000 | 55 (4)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | T1 | 400 | 3200 | 55 (4)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."A_RANDOM"=:B1)
5 - access("T1"."BATCH_ID"="BATCH_ID")
6 - filter("T1"."BATCH_ID"=42)
7 - filter("BATCH_ID"=42)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) MIN("RUN_ID")[22]
2 - "RUN_ID"[NUMBER,22]
3 - "A".ROWID[ROWID,10]
4 - (#keys=1) "T1"."FILLER"[CHARACTER,1], SUM( (SELECT MIN("RUN_ID") FROM
"T1" "A" WHERE "A"."A_RANDOM"=:B1))[22]
5 - (#keys=1) "T1"."FILLER"[CHARACTER,1], "B"."A_RANDOM"[NUMBER,22]
6 - "T1"."BATCH_ID"[NUMBER,22], "T1"."FILLER"[CHARACTER,1]
7 - "BATCH_ID"[NUMBER,22], "B"."A_RANDOM"[NUMBER,22]

Notice again the subtle difference in the projection when preventing the view merge:

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 146 (28)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 2 | 16 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T1_IDX1 | 2 | | 1 (0)| 00:00:01 |
| 4 | HASH GROUP BY | | 1 | 21 | 146 (28)| 00:00:02 |
|* 5 | HASH JOIN | | 160K| 3281K| 115 (8)| 00:00:02 |
|* 6 | TABLE ACCESS FULL | T1 | 400 | 2000 | 55 (4)| 00:00:01 |
| 7 | VIEW | | 400 | 6400 | 55 (4)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | T1 | 400 | 4400 | 55 (4)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."A_RANDOM"=:B1)
5 - access("T1"."BATCH_ID"="T2"."BATCH_ID")
6 - filter("T1"."BATCH_ID"=42)
8 - filter("BATCH_ID"=42)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) MIN("RUN_ID")[22]
2 - "RUN_ID"[NUMBER,22]
3 - "A".ROWID[ROWID,10]
4 - (#keys=1) "T1"."FILLER"[CHARACTER,1], SUM("VAL")[22]
5 - (#keys=1) "T1"."FILLER"[CHARACTER,1], "VAL"[NUMBER,22]
6 - "T1"."BATCH_ID"[NUMBER,22], "T1"."FILLER"[CHARACTER,1]
7 - "VAL"[NUMBER,22], "T2"."BATCH_ID"[NUMBER,22]
8 - "BATCH_ID"[NUMBER,22], "B"."A_RANDOM"[NUMBER,22]

It is also interesting to note that although the subquery is executed as part of the VIEW projection step in operation id 7, the scalar subquery is still shown at top level of the query starting with operation id 1. It would be more accurate to show it as child of operation id 7 in this particular case, but this is probably not supported by EXPLAIN PLAN at present.

At runtime however, the outcome is different from the function case, mainly due to the filter optimization / subquery caching feature, which also makes the subquery implicitly deterministic - it will only get executed as many times as there are distinct number of input values, which is the A_RANDOM column in this case.

In both cases the subquery will be executed only approx. 400 times, because there are only 400 distinct values in the generated row source.

View merged:

-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.32 | 1418 | | | |
| 1 | SORT AGGREGATE | | 398 | 1 | 398 |00:00:00.01 | 1028 | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 398 | 2 | 781 |00:00:00.01 | 1028 | | | |
|* 3 | INDEX RANGE SCAN | T1_IDX1 | 398 | 2 | 781 |00:00:00.01 | 402 | | | |
| 4 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.32 | 1418 | 805K| 805K| 570K (0)|
|* 5 | HASH JOIN | | 1 | 160K| 160K|00:00:00.02 | 390 | 988K| 988K| 390K (0)|
|* 6 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.02 | 195 | | | |
|* 7 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."A_RANDOM"=:B1)
5 - access("T1"."BATCH_ID"="BATCH_ID")
6 - filter("T1"."BATCH_ID"=42)
7 - filter("BATCH_ID"=42)

View not merged:

-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.24 | 1418 | | | |
| 1 | SORT AGGREGATE | | 398 | 1 | 398 |00:00:00.02 | 1028 | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 398 | 2 | 781 |00:00:00.02 | 1028 | | | |
|* 3 | INDEX RANGE SCAN | T1_IDX1 | 398 | 2 | 781 |00:00:00.02 | 402 | | | |
| 4 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.24 | 1418 | 805K| 805K| 362K (0)|
|* 5 | HASH JOIN | | 1 | 160K| 160K|00:00:00.02 | 1418 | 988K| 988K| 377K (0)|
|* 6 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.02 | 195 | | | |
| 7 | VIEW | | 1 | 400 | 400 |00:00:00.02 | 1223 | | | |
|* 8 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
-----------------------------------------------------------------------------------------------------------------------------

It is obvious from the "Buffers" column that the scalar subquery has been executed at different steps of the execution plan.

However in cases where you're not that lucky and the filter optimization doesn't work that efficiently, there still might be a significant difference between the merged and unmerged view variant of the query.

Again, when moving the subquery to the selection, the cost calculation looks quite different:

select /*+ no_place_group_by */
sum(val), t1.filler
from
t1
, (
select
run_id as val
, batch_id
from
t1
) t2
where
t1.batch_id = t2.batch_id
and t1.batch_id = 42
and t1.run_id >= (
select /*+ no_unnest */
min(run_id)
from
t1 a
where
a.a_random = t1.a_random
)
group by
t1.filler;
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 349K (1)| 01:10:00 |
| 1 | HASH GROUP BY | | 1 | 19 | 349K (1)| 01:10:00 |
|* 2 | FILTER | | | | | |
|* 3 | HASH JOIN | | 160K| 2968K| 116 (9)| 00:00:02 |
|* 4 | TABLE ACCESS FULL | T1 | 400 | 2400 | 55 (4)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | T1 | 400 | 5200 | 56 (6)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 8 | | |
| 7 | TABLE ACCESS BY INDEX ROWID| T1 | 2 | 16 | 3 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | T1_IDX1 | 2 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("T1"."RUN_ID">= (SELECT /*+ NO_UNNEST */ MIN("RUN_ID") FROM "T1"
"A" WHERE "A"."A_RANDOM"=:B1))
3 - access("T1"."BATCH_ID"="BATCH_ID")
4 - filter("BATCH_ID"=42)
5 - filter("T1"."BATCH_ID"=42)
8 - access("A"."A_RANDOM"=:B1)

It is interesting that the FILTER operation is executed "late" (and therefore potentially more often) - since it is only depending on the T1.RUN_ID and T1.A_RANDOM column it could be executed "earlier" while processing the T1 row source, and in fact this can be achieved by adding the PUSH_SUBQ hint to the subquery. I haven't investigated this further, but may be the optimizer doesn't cost the different subquery pushing options when explicitly requesting to not unnest it (the NO_UNNEST hint) - without the NO_UNNEST hint the subquery is transformed into a join in this particular case.

It can be seen that in this case the optimizer used a "worst case" approach estimating that the scalar subquery gets executed many, many times. Very likely the cost increase can be explained by 160,000 times the cost of the scalar subquery (which might be less than 3 and gets rounded up in the EXPLAIN PLAN output) due to the "late" execution.

At runtime the filter optimization again works very efficiently:

-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.38 | 1418 | | | |
| 1 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.38 | 1418 | 805K| 805K| 362K (0)|
|* 2 | FILTER | | 1 | | 160K|00:00:00.16 | 1418 | | | |
|* 3 | HASH JOIN | | 1 | 160K| 160K|00:00:00.01 | 390 | 968K| 968K| 377K (0)|
|* 4 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
|* 5 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
| 6 | SORT AGGREGATE | | 398 | 1 | 398 |00:00:00.04 | 1028 | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| T1 | 398 | 2 | 781 |00:00:00.02 | 1028 | | | |
|* 8 | INDEX RANGE SCAN | T1_IDX1 | 398 | 2 | 781 |00:00:00.02 | 402 | | | |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("T1"."RUN_ID">=)
3 - access("T1"."BATCH_ID"="BATCH_ID")
4 - filter("BATCH_ID"=42)
5 - filter("T1"."BATCH_ID"=42)
8 - access("A"."A_RANDOM"=:B1)

So it is not obvious to me why the optimizer treats these two cases differently - in case of the selection it uses a "worst-case" approach, but why this is not used in case of the projection is not clear to me.

In summary, you need to be careful in particular when using functions as part of the projection clause and complex queries - the order of evaluation might make a significant difference to the overall query performance.

As already mentioned, the best way to deal with such constructs is to avoid them. If you can not, you first should evaluate if the function can be declared as deterministic (will be cached from 10.2 on), or if you can use the workaround of wrapping the function call into a scalar subquery (select f(x) from dual) to take advantage of the subquery caching feature, which makes the function also implicitly deterministic.

CLUSTERING_FACTOR What-If Analysis

In Oracle the clustering factor of an index is a single number that is supposed to represent the correlation between the order of the index and the order of the corresponding table.

If an execution plan contains an index range scan including an access to a table by ROWID, the clustering factor tells the cost-based optimizer how "clustered" or "scattered" the data in the table is with respect to the index - that is on average how likely contiguous rows from an index range scan will point to the same table block. Widely scattered data will require to read a different block for every row returned from the index range scan, in contrast in the case of clustered data many of the contiguous rows from the index will point to the same block, making a significant difference to the cost calculated by the cost-based optimizer (and of course making also a significant difference at actual execution time, in particular when many table blocks have to be read from disk).

As described e.g. in detail by Jonathan Lewis in his Cost-Based Fundamentals book, Chapter 5 (Clustering Factor), there are various scenarios that have significant influence on the clustering factor, and there are scenarios where Oracle actually doesn't get the clustering factor right - in particular when dealing with data that is inserted simultaneously in conjunction with some segment space management method that attempts to spread the inserted data across different blocks to avoid contention. This can happen for instance when using multiple freelists/freelist groups with manual segment space management (MSSM) or automatic segment space management (ASSM) for that matter.

Usually the clustering factor in case of an index range scan with table access involved represents the largest fraction of the cost associated with the operation, therefore indexes with high clustering factors (meaning that the table data is scattered in relation to the index order) tend to be ignored by the cost-based optimizer and different access paths might be favored instead, like full table scans or the usage of different available indexes.

Since this clustering factor is therefore often such a crucial information I have derived a simple query from what DBMS_STATS.GATHER_INDEX_STATS uses (and Jonathan mentions and explains in his book) and that allows to perform a what-if analysis regarding the clustering factor.

It can be used to:

- Validate/correct the clustering factor of an existing index determined by DBMS_STATS
- In particular check if concurrent inserts together with some segment space management lead to an non-representative clustering factor (this is something DBMS_STATS does not support at present and can only be corrected manually using DBMS_STATS.SET_INDEX_STATS)
- Perform various what-if scenarios without the need to actually create/re-create the index, e.g.
- The clustering factor of a new index to be added
- The effect of adding a column to an existing index on the clustering factor
- The effect of changing the column order of an existing index
- The effect of changing an index to an reverse index

Similar information could also be obtained by simply creating the corresponding index, but there are some points to consider here:

Creating an index up to release 10g might lead to unwanted changes in the execution plans if performed on a live system. In 11g the option to create an index as invisible can be used to avoid this, however you still incur the overhead of writing the index structure and allocating physical space, which could be significant depending on the amount of data.

Furthermore the query allows for certain kinds of analysis that is simply not possible by creating an index - more on this later.

The query has this general form:

select
sys_op_countchg(substrb(row_id,1,15), )
[* (100 / %perc)] as clustering_factor
, count(*)
[* (100 / %perc)] as cnt
, count(distinct substrb(row_id,1,15))
[* (100 / %perc)] as blocks
from
(
select /*+ no_merge no_eliminate_oby */
/* optionally use parallelism e.g. parallel(t, 2) */
rowid as row_id
[, t.*]
[, dbms_rowid.rowid_relative_fno(rowid) as file_number]
[, dbms_rowid.rowid_block_number(rowid) as block_number]
[, dbms_rowid.rowid_row_number(rowid) as row_number]
from
[sample [block] (%perc)] t
where
is not null
[or is not null]
order by
[asc|desc]
[, [asc|desc]]
, rowid
);

and a first version of a ready-to-use script could look like this:

-- Simulate / calculate clustering factor
set verify off

set termout off

column ora10 new_value if_v10 noprint
column oralower10 new_value if_lower_v10 noprint

-- Determine version for regular expression support
select
decode(substr(banner, instr(banner, 'Release ') + 8, 1), '1', '', '--') as ora10
from
v$version
where
rownum = 1;

select
decode('&if_v10', '--', '', '--') as oralower10
from
dual;

set termout on

accept table_name prompt 'Enter table name: '

define sample_pct = 100

accept sample_pct number default &sample_pct prompt 'Enter sample percent (default &sample_pct): '

define p_degree = DEFAULT

accept p_degree default &p_degree prompt 'Enter parallel degree (default &p_degree): '

define history = 1

accept history number default &history prompt 'Enter number of blocks to remember (default &history): '

-- ideally comma separated without spaces surrounding the comma
accept col_list prompt 'Enter comma separated index column list: '

set termout off

column define_sample_block new_value sample_block noprint

select
case
when &sample_pct < 100
then 'sample block (&sample_pct)'
else ''
end as define_sample_block
from
dual;

column col_list_where_expr new_value col_list_where noprint

-- Try to get clever with the WHERE clause derived from the index expression
select
replace(
&if_v10 regexp_replace('&col_list', '( asc| desc)?( nulls (last|first))?', '')
&if_lower_v10 replace(replace(replace(replace(replace('&col_list', ' asc', ''), ' desc', ''), ' nulls', ''), ' last', ''), ' first', '')
, ','
, ' is not null or '
) || ' is not null' as col_list_where_expr
from
dual;

column p_degree_hint_expr new_value p_degree_hint noprint

select
case
when '&p_degree' != 'DEFAULT'
then 'parallel (t &p_degree)'
else ''
end as p_degree_hint_expr
from
dual;

-- set echo on verify on
set termout on

select
sys_op_countchg(substrb(row_id,1,15), &history)
* (100 / &sample_pct) as clustering_factor
, count(*)
* (100 / &sample_pct) as cnt
, count(distinct substrb(row_id,1,15))
* (100 / &sample_pct) as blocks
from
(
select /*+ no_merge no_eliminate_oby &p_degree_hint */
rowid as row_id
--, t.*
from
&table_name &sample_block t
where
&col_list_where
order by
&col_list
, rowid
);

It is based on the undocumented aggregate function SYS_OP_COUNTCHG that is apparently used by DBMS_STATS to calculate the clustering factor.

The second parameter to this function (I called it "history") is very interesting, since it represents the number of blocks the function "remembers" to determine if the block has "changed" or not. DBMS_STATS uses 1 as value and therefore if we have data that is still clustered but unluckily scattered across a few blocks it will lead to a likely non-representative clustering factor since walking the index may jump forth and back between these few blocks but the SYS_OP_COUNTCHG function will increase the clustering factor with each different block, although it stays within the same few blocks and therefore these blocks very likely will be held in the cache.

For example in case of concurrent inserts and ASSM or freelist / freelist groups with MSSM choosing an appropriate number of blocks to retain could be the number of concurrent processes that insert the data - more on this in the demonstration part later.

The remaining placeholders are straightforward - the table name obviously, and if it is a large table you can use the sample clause to avoid reading the whole table, but then the values returned need to be adjusted accordingly - you could also try to run a potentially required full table scan in parallel (something that the original DBMS_STATS query doesn't) - note however that the aggregate function will/needs to be performed by the query coordinator (due to the dependency of the clustering factor evaluation on the data order) which might represent the bottleneck in case of parallel execution.

If you want to get a feeling on how the data will be sorted according to the index definition you can use the [t.*] and [DBMS_ROWID...] clauses and execute only the inner query without the aggregate function - in this case a potentially required sort operation is going to be more costly due to the increased data volume to sort.

The expressions in the WHERE and ORDER BY clause are supposed to represent the columns and/or expressions (in case of function-based indexes) used in the index definition.

The WHERE clause will ensure that only data will be considered that leads to non-null expressions in the index (a b*tree index only covers non-null data), and the ORDER BY clause will order the data the way the index will be ordered.

Note that non-unique indexes will get the ROWID added to make the index expression unique - for unique indexes this is not required, but doesn't harm, since the expression by itself is already "unique". You can omit the ROWID in this case, but it won't change the outcome.

This query also allows some interesting considerations. For example if you know that you'll mostly access only a particular "hot" part of the table which is well clustered, but the remaining "cold(er)" part of the table is rather scattered (for example in case of batch inserts of newly arrived data into a partially deleted and shrunk table via the new SHRINK option introduced in 10g), the overall average clustering factor determined might be bad but probably not representative for a typical query accessing only the "hot"/"latest" data. You could then add the corresponding selection criteria to the query to restrict the data analysed accordingly and use the obtained clustering factor to correct the index statistics using DBMS_STATS.SET_INDEX_STATS.

Here is an demonstration of some of the common scenarios regarding the clustering factor. It allows to reproduce issues with concurrent inserts, extra columns and changed column order. It is a modification of some code I've recently used to reproduce similar issues that one of my clients had.

prompt Tablespace
accept tblspace
rem define tblspace = test_8k

drop table t1 purge;

drop sequence seq_t1_run_id;

drop sequence seq_t1_seq_id;

create table t1 (
run_id integer not null, /* identify the process inserting the data */
batch_id integer not null, /* represents clustered data, could also be a (arriving) date */
a_value number null, /* represents sequence based data */
a_random number null, /* represents randomly scattered data */
a_date timestamp default systimestamp not null, /* represents the insert timestamp */
filler char(1) default 'x' not null /* can be used to size the row as required */
)
tablespace &tblspace;

/* a sample index */
create index t1_idx1 on
t1 (
batch_id,
a_value
)
tablespace &tblspace;

create sequence seq_t1_run_id;

create sequence seq_t1_seq_id;

create or replace procedure populate_t1(i_run_id in integer, i_iter in integer) as
begin
dbms_output.put_line(
dbms_lock.request(
1
, dbms_lock.s_mode
, release_on_commit => true
)
);
commit;
for i in 1..i_iter loop
for j in 1..100 loop
insert into t1 (
run_id
, batch_id
, a_value
, a_random
)
values (
i_run_id
, i
, seq_t1_seq_id.nextval
, trunc(dbms_random.value(1, 1000))
);
commit;
dbms_lock.sleep(0.01);
end loop;
end loop;
end;
/

If you want to test the effect of concurrent inserts with ASSM for instance, choose an appropriate tablespace (or modify the script to use freelists / freelist groups with MSSM) and run the following code.

In a main session run this:

-- run this as main session
-- afterwards start as many of the below code snippets
-- and press return in the main session
-- to let them all start at the same time

truncate table t1;

begin
dbms_output.put_line(
dbms_lock.request(
1
, dbms_lock.x_mode
, release_on_commit=>true
)
);
end;
/

prompt Press return to continue when sessions have been started

accept x

commit;

prompt Press return to continue when sessions have completed

accept x

exec dbms_stats.gather_table_stats(null, 't1', estimate_percent=>null, cascade=>true)

select
index_name
, clustering_factor
from
user_indexes
where
table_name = 'T1';

Then start this in as many sessions as you want to run concurrently:

set timing on echo on

variable n_run_id number

exec select seq_t1_run_id.nextval into :n_run_id from dual;

exec populate_t1(:n_run_id, 100)

commit;

After the sessions have started they're going to wait on the lock of the main session. Press ENTER to get the sessions started and press ENTER again after the session have completed to gather statistics and get initial information about the CLUSTERING_FACTOR of the sample index determined by DBMS_STATS.

The number of iterations (set to 100 in the code snippet above) determines how long this code will run - every insert is delayed by 1/100th of a second - the minimum delay supported by DBMS_LOCK.SLEEP - so the block above will insert 10,000 rows lasting approx. 100 seconds.

The code uses the simple synchronisation method also used by Jonathan in his sample scripts (based on DBMS_LOCK.REQUEST) - the main session allocates a user lock in exclusive mode, all other sessions attempt to request this in shared mode. Therefore all sessions will wait until the main session commits to release the lock. Note that this uses a hard coded lock handle - in a non-test system it is advisable to use DBMS_LOCK.ALLOCATE_UNIQUE to generate a unique lock handle.

Depending on what you've chosen as concurrency and segment space managment, the clustering factor of the index on (batch_id, a_value) might be close to the number of blocks or rows in the table as determined by the final DBMS_STATS call.

You can use now the query to perform some analysis regarding the clustering factor. You could run e.g. the following query:

select
sys_op_countchg(substrb(row_id,1,15), 1) as clustering_factor
, count(*) as cnt
, count(distinct substrb(row_id,1,15)) as blocks
from
(
select /*+ no_merge no_eliminate_oby */
rowid as row_id
, t1.*
from
t1
where
batch_id is not null
or a_value is not null
order by
batch_id
, a_value
, rowid
);

This should give you exactly the clustering factor that has been determined by the DBMS_STATS call used above.

In my case when using MSSM I got a clustering factor of 190 with the table having 186 blocks for 40,000 rows (four concurrent processes each inserting 10,000 rows).

When using ASSM for the same setup (four processes) I got a clustering factor of 28,483 (!) for the same index. Note that the results might vary significantly, depending on how the processes were assigned to the different freelists (MSSM) or block groups (ASSM).

If you've used ASSM or freelist / freelist groups with MSSM then replace the "history" parameter with your number of concurrent processes (or number of freelists, if you had more processes than freelists), e.g. in case of four concurrent processes:

select
sys_op_countchg(substrb(row_id,1,15), 4) as clustering_factor
...

and you should notice a significant drop in the clustering factor, caused by the fact that multiple concurrent inserts used different blocks and therefore the data is not in a single block, but clustered in a few blocks and Oracle has in this constructed case to "jump" forth and back between these few blocks to obtain the data (actually caused by the A_VALUE column which is an increasing value, but written concurrently by the different processes, so with ASSM/freelist (groups) each increasing value is potentially stored in a different block).

In my particular case the simulated clustering factor for ASSM dropped from 28,483 to 188.

Some variations of the query allow to reproduce some other scenarios, e.g. use the following columns to see the impact of adding a badly scattered column to an index:

batch_id

(in my case clustering factor 188 for MSSM with default freelists)

vs.

batch_id
, a_random

(in my case clustering factor 22,191 for MSSM with default freelists, selecting a history size of 4 showed a clustering factor of 190)

or this one to see the impact of changing the column order:

batch_id
, a_value
, a_random
, rowid

(in my case clustering factor 190 for MSSM with default freelists)

vs.

a_random
, batch_id
, a_value
, rowid

(in my case clustering factor 35,904 for MSSM with default freelists, and here increasing the history size to 4 doesn't make a significant difference)

You could also use variations of the following query to get a feeling how the data arrived in the table:

select
rowid
, run_id
, batch_id
, a_value
, to_char(a_date, 'DD.MM.YYYY HH24:MI:SS.FF') as insert_timestamp
, dbms_rowid.rowid_relative_fno(rowid) as file_number
, dbms_rowid.rowid_block_number(rowid) as block_number
, dbms_rowid.rowid_row_number(rowid) as row_number
from
t1
order by
a_date;

By using different ORDER BYs (or no ORDER BY) for the above query you can get some other interesting insights how the data is stored in the table - in particular the difference when using multiple freelists or ASSM with concurrent inserts.

Happy simulating!

The CPU Costing Model: A Few Thoughts Part V (Reality)

There’s plenty more I could talk about regarding the CBO CPU costing model and system statistics but I’ll make this my final little comment on this subject for now.   As previously discussed, the CPU costing model basically takes the time it takes to perform the all necessary I/O related activities and all the time it takes to [...]

The CPU Costing Model: A Few Thoughts Part IV (Map of the Problematique)

It’s called the CPU Costing model because among other things, it includes the time associated with performing CPU operations.   The CPU Costing model formula once again:   (sum of all the single block I/Os x average wait time for a single block I/O +  sum of all the multiblock I/Os x average wait time for [...]

The CPU Costing Model: A Few Thoughts Part III (Bang Bang)

One of the advantages of system statistics and the CPU costing model is in how the CBO deals with multiblock reads and the impact of the db_file_multiblock_read_count parameter. When performing a full table (or fast full index) scan, the CBO needs to determine just how many multiblock read operations are likely to be performed so the associated operation can [...]

The CPU Costing Model – A Few Thoughts Part II

As previously discussed, the formula used by the CBO using the CPU costing model is basically:
 
(sum of all the single block I/Os x average wait time for a single block I/O +
 sum of all the multiblock I/Os x average wait time for a multiblock I/O +
 sum of all the required CPU cycles / CPU cycles per second)
/
average [...]

The CPU Costing Model: A Few Thoughts Part I

In the coming days, I’ll post a series of little entries highlighting a specific point in relation to the use of system statistics and the CPU cost model. In my previous post, we looked at how the cost of a FTS is calculated using the CPU costing model and how this generally results in an increase in the associated FTS [...]

Multi-column joins

Consider the following scenario with four tables. Two of them represent master data, the third one uses a concatenated primary key consisting of foreign keys to the first two, and the fourth one has a foreign key to the third one.

drop table t1 cascade constraints purge;

drop table t2 cascade constraints purge;

drop table t3 cascade constraints purge;

drop table t4 cascade constraints purge;

create table t1 (
t1_id integer not null constraint pk_t1 primary key,
filler1 varchar2(40),
filler2 varchar2(40)
);

create table t2 (
t2_id integer not null constraint pk_t2 primary key,
filler1 varchar2(40),
filler2 varchar2(40)
);

create table t3 (
t1_id integer not null,
t2_id integer not null,
filler1 varchar2(40),
filler2 varchar2(40),
constraint pk_t3 primary key (t1_id, t2_id) using index (
create index pk_t3 on t3 (t1_id, t2_id)
),
constraint fk_t3_1 foreign key (t1_id) references t1 (t1_id),
constraint fk_t3_2 foreign key (t2_id) references t2 (t2_id));

create table t4 (
t4_id integer not null constraint pk_t4 primary key,
t1_id integer,
t2_id integer,
filler1 varchar2(40),
filler2 varchar2(40),
constraint t4_fk_1 foreign key (t1_id, t2_id) references t3 (t1_id, t2_id)
);

Notice that the primary key of "t3" is using a non-unique index, which is supported and can be used e.g. for deferrable constraints or when loading data into tables that might be non-unique so that the constraint can be disabled without dropping the (unique) index. This allows to simply re-enable the constraint after cleaning up the non-unique rows instead of re-creating an unique index (and the risk of losing the index if anything goes wrong).

Now when using an uncorrelated data set for the concatenated keys, Oracle's default (join) selectivity formulas apply and the estimated cardinalities are correct. Table "t1" has 10,000 rows, "t2" 3 rows, table "t3" holds 30,000 rows combining "t1" and "t2" data. "t4" has 300,000 rows.

-- non-correlated column values
exec dbms_stats.set_table_stats(null, 't1', numrows=>10000, numblks=>100, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't2', numrows=>3, numblks=>1, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't3', numrows=>30000, numblks=>300, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't4', numrows=>300000, numblks=>3000, avgrlen=>100)

exec dbms_stats.set_column_stats(null, 't1', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't2', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't4_id', distcnt=>300000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_index_stats(null, 'pk_t1', numdist=>10000, clstfct=>10000, indlevel=>2, numlblks=>10, numrows=>10000)

exec dbms_stats.set_index_stats(null, 'pk_t2', numdist=>3, clstfct=>3, indlevel=>1, numlblks=>1, numrows=>3)

exec dbms_stats.set_index_stats(null, 'pk_t3', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>30000)

exec dbms_stats.set_index_stats(null, 'pk_t4', numdist=>300000, clstfct=>300000, indlevel=>2, numlblks=>300, numrows=>30000)

Joining t4 to t3 results in a correct estimate of 300K rows:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1456 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 300K| 14M| 1120K| 1456 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

Things look however different if we have the awkward situation of correlated column values for the concatenated keys:

-- correlated column values
exec dbms_stats.set_table_stats(null, 't1', numrows=>10000, numblks=>100, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't2', numrows=>20000, numblks=>200, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't3', numrows=>30000, numblks=>300, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't4', numrows=>300000, numblks=>3000, avgrlen=>100)

exec dbms_stats.set_column_stats(null, 't1', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't2', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't4_id', distcnt=>300000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_index_stats(null, 'pk_t1', numdist=>10000, clstfct=>10000, indlevel=>2, numlblks=>10, numrows=>10000)

exec dbms_stats.set_index_stats(null, 'pk_t2', numdist=>20000, clstfct=>20000, indlevel=>2, numlblks=>20, numrows=>20000)

exec dbms_stats.set_index_stats(null, 'pk_t3', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>30000)

exec dbms_stats.set_index_stats(null, 'pk_t4', numdist=>300000, clstfct=>300000, indlevel=>2, numlblks=>300, numrows=>30000)

Here we simulate 20,000 distinct values in one column, 10,000 distinct values in the second one, but only 30,000 distinct values for the combination of both columns. In this case Oracle's default selectivity formula underestimates the cardinality since it is assuming uncorrelated values:

select /*+ opt_param('_optimizer_join_sel_sanity_check', 'false') */
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 45 | 2340 | 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

Oracle simply multiplies the selectivity of the two columns and arrives at a join cardinality of 45 rows (1/20,000*1/10,000*300,000*30,000).

You'll notice that I had to use a undocumented optimizer parameter to arrive at that default selectivity. If you run an EXPLAIN PLAN for the same statement without the hint, you'll get the following estimate:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 30000 | 1523K| 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

It can be seen from a 10053 optimizer trace file that Oracle uses a "Multi-column cardinality sanity check" by default in cases where the calculated multi-column selectivity falls below a certain limit, obviously using the smaller selectivity available from the different 1/num_rows of the tables/row sources involved in the join, arriving at an estimate 30,000 rows in this particular case.

Changing the non-unique index used for the primary key on "t3" to a unique index will bring another sanity check into the picture: The "concatenated index" sanity check that uses the number of distinct values of an unique index that corresponds exactly to the join columns used.

create table t3 (
t1_id integer not null,
t2_id integer not null,
filler1 varchar2(40),
filler2 varchar2(40),
constraint pk_t3 primary key (t1_id, t2_id) using index (
create unique index pk_t3 on t3 (t1_id, t2_id)
),
constraint fk_t3_1 foreign key (t1_id) references t1 (t1_id),
constraint fk_t3_2 foreign key (t2_id) references t2 (t2_id));

With this unique index in place Oracle uses the number of distinct keys from this index to calculate the selectivity of the join and therefore arrives at the correct cardinality again:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 300K| 14M| 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

So this is another case where the uniqueness of an index makes a significant difference for optimizer calculations.

Note that from 11g on there more options to help the optimizer to come up with a better estimate even with the non-unique index on (t3.t1_id, t3.t2_id). Obviously 11g introduced extended statistics on column groups, so we can do the following:

variable ext_name varchar2(30)

exec :ext_name := dbms_stats.create_extended_stats(null, 't3', '(t1_id, t2_id)')

exec dbms_stats.set_column_stats(null, 't3', :ext_name, distcnt=>30000, nullcnt=>0)

This allows to derive the correct selectivity for these correlated column values using the extended statistics set.

Another option in 11g is adding an index on (t4.t1_id, t4.t2_id), like that:

create index ix_t4 on t4 (t1_id, t2_id);

exec dbms_stats.set_index_stats(null, 'ix_t4', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>300000)

Having now two non-unique indexes Oracle 11g comes up again with the correct join cardinality of 300K. Notice that this doesn't work in pre-11g. Pre-11g versions require the index on t3 to be unique to take advantage of the "concatenated index" sanity check.

Having demonstrated all these sanity checks available for multi-column joins (the general multi-column and the concatenated index sanity check), let's see what happens when joining three tables:

select
count(*)
from
t1
, t3
, t4
where
t4.t1_id = t1.t1_id
and t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | | 1468 (4)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 65 | | | |
|* 2 | HASH JOIN | | 300K| 18M| | 1468 (4)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN | PK_T1 | 10000 | 126K| | 4 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 300K| 14M| 1120K| 1455 (3)| 00:00:18 |
| 5 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
-----------------------------------------------------------------------------------------

All I've done is to add "t1", in this case joining to "t4" on "t1_id". Thanks to the concatenated index sanity check (or the extended column group statistics in 11g) the calculated join cardinality is still 300K.

Now what happens if one decides to join "t3" to "t1" on "t1_id" instead of "t4.t1_id"? From a logical point of view this should lead to exactly the same result, since we can deduce that if "t4.t1_id" = "t1.t1_id" and "t3.t1_id = t1.t1_id" then "t3.t1_id = t4.t1_id".

select
count(*)
from
t1
, t3
, t4
where
t4.t1_id = t1.t1_id
and t3.t1_id = t1.t1_id
and t3.t2_id = t4.t2_id;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | | 1475 (4)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 65 | | | |
|* 2 | HASH JOIN | | 45 | 2925 | | 1475 (4)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN | PK_T1 | 10000 | 126K| | 4 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 450K| 22M| 1120K| 1459 (3)| 00:00:18 |
| 5 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
-----------------------------------------------------------------------------------------

The result is astonishing. By making this simple change we have effectively disabled all available sanity checks and arrive at the result based on the the default, uncorrelated selectivity.

So whenever you perform multi-column joins and the column data is correlated, be very careful how you join the tables - it might make a significant difference to the calculations of the optimizer.

CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV)

As we have seen in the previous posts of this series, in 11g a new figure named "NewDensity" has been introduced as a replacement for the "density" column statistic for columns whose histogram has been collected; this change has been backported in 10.2.0.4 also.
In the previous post we discussed how NewDensity influences the CBO [...]

Is plan_hash_value a final say?

I was reviewing a performance issue with a client recently. Problem is that increased global cache waits causing application slowdown affecting few critical business functions. Using one of my script gc_traffic.sql > and graphing the results with Excel spreadsheet, it is established that there is a marked increase in GC traffic today compared to week earlier. Similar jobs runs every day and so comparing two week days is sufficient to show the GC traffic increase. Graph is between total blocks and AWR snap time in 30 minutes interval. [Click the picture below to review the graph clearly.]

gc comparison

Identifying the object creating this increased GC traffic is essential to identify root cause. We were able to quickly determine that this increase in GC traffic was localized around few SQL statements using ADDM and AWR reports. We decided to focus on one SQL with an obvious increase in elapsed time compared to prior week. So, first question asked, is there a change in the plan? plan_hash_value was reviewed and quickly determined that there is no change in the plan_hash_value.

Little bit of history, there were few changes performed by the vendor over the weekend as a part of few bug fixes. Vendor’s argument was that since there is no change to the plan_hash_value, SQL access plan did not change and so, this can’t be due to vendor changes. My Client’s argument was that there were no changes to the environment and problem started after the application changes.

There are many different things that can go wrong without changes to the execution plan. We can ignore those conditions for now since (a) there has been no changes to the environment (b) no visible changes to the data (c) no error message and average CR recv time is consistent with prior weeks. Well, Let’s cut to the chase. It boiled down to a question “Can SQL plan change without a change in plan_hash_value?”. What do you think? Please answer in the poll below.

plan_hash_value and hash_value

Hash_value of a SQL statement is generated from the text of an SQL statement and plan_hash_value is generated from the execution plan of that SQL statement[ More accurately, from that child cursors' execution plan and exactly what is involved in generating plan_hash_value is not published]. It is a general belief that plan_hash_value will change even if there is a slightest change in the execution plan. But, that is not always the case!

Test case

We will use a small table to explore this issue.

prompt Test case #1: Initial test case with index on columns (n1, n2).
prompt ==========
create table t1 (n1 number, n2 number, v1 varchar2(100));
create index t1_n1 on t1(n1,n2);

explain plan for select * from t1 where n1=:b1 and n2=:b2;
select * from table (dbms_xplan.display);
prompt plan hash value in this case is 626762252

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N1"=TO_NUMBER(:B1) AND "N2"=TO_NUMBER(:B2))

So, we got the plan_hash_value as highlighted above. In the following test case we will recreate the index reordering the columns as (n2, n1).

prompt Test case #2: Index with re-ordered columns
prompt ==========
drop index t1_n1;
create index t1_n1 on t1(n2,n1);

Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  5298 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N2"=TO_NUMBER(:B2) AND "N1"=TO_NUMBER(:B1))

Notice that in the test result above filter predicate changed reflecting index column reordering. But the plan_hash_value did not change. Point is that execution plan can change without a change in plan_hash_value (due to change in the underlying tables).

Let’s modify that index dropping a column from the index in the next test case.

prompt Test case #3: dropping a column from index t1_n1.

drop index t1_n1;
create index t1_n1 on t1(n1);

explain plan for select * from t1 where n1=:b1 and n2=:b2;

Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |    41   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |    41   (3)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  2119 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access("N1"=TO_NUMBER(:B1))

We see that predicates changed but the plan_hash_value did not change. In the test case below, we will modify index to be a function based index and test this SQL statement. There are also few more self-evident test cases below.

prompt Test case #3: Index is a function based index. Still, no change in the plan_hash_value.

create index t1_n1 on t1(to_char(n1));
explain plan for select * from t1 where to_char(n1)=:b1 and n2=:b2;

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |   127   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |   127   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  2119 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access(TO_CHAR("N1")=:B1)

prompt Test case #4: Different schema and same SQL. plan_hash_value did not change.

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    78 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |    78 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     3 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access("N1"=TO_NUMBER(:B1))

plan_hash_value is also case sensitive for index names. In the test cases below, we will create case sensitive indices.

prompt Test case #5: Lower case index_name.. plan_hash_value changed.
create index "t1_n1" on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;

Plan hash value: 2252949961
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | t1_n1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

prompt Test case #6: Upper case index_name.. plan_hash_value did not change.
create index "T1_N1" on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;
Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

prompt Test case #7: Space in the index name changes plan_hash_value though.
create index "T1_N1 " on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;
Plan hash value: 1377541522
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1  |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Summary

In summary, plan_hash_value is a very good indicator to see if the plan changed or not. Don’t get me wrong, I also use plan_hash_value, but in addition to comparing CPU time and elapsed time. Execution plan can change even when there is no change to the plan_hash_value. Salient points from these test cases are:

  1. Plan_hash_value is dependent upon partial execution plan not on complete execution plan.
  2. If a predicate is moved from filter_predicate to access_predicate or vice-versa, it doesn’t affect plan_hash_value.
  3. Changes in the parallelism of the queries does not affect plan_hash_value. For example, if a query used 4 parallel slaves today and 16 parallel slaves yesterday, that change is not visible through plan_hash_value. This behavior is expected as parallel_adaptive_multiuser can allocate slaves depending upon the state of that instance.
  4. Plan_hash_value is case-sensitive to index/table names. Also sensitive to white-space characters.
  5. Plan_hash_value is not sensitive to index types. For example, if the index type is a function based index as in our test case #4, as long as, index name did not change plan_hash_value will remain the same.
  6. Plan_hash_value is not sensitive to schema either. SQL statement accessing different schemas can have same plan_hash_value too.

Back to our problem. One of the index was recreated removing a column and caused optimizer to apply filter predicates at table level increasing number of accesses to table block tremendously, leading to more logical reads, more Global cache waits etc. This problem was amplified since this SQL was executed very frequently and concurrently from all RAC instances.

In my client’s defense, this application change was tested thoroughly. But, alas, test data chosen for this performance test was not probing this specific issue. This performance issue did not show up in development. Essentially, chosen data in the performance benchmark suite was not an excellent choice.

As they say in Gaelic language “Go raimh maith agat” to my client for allowing me to post this blog.
This can be read in traditional format in
plan_hash_value_and_gc
Update 1: Added the document and also corrected a typo.