Search

Top 60 Oracle Blogs

Recent comments

Lower Cost Ignored

This is an update on a post I wrote nearly 10 years ago describing how the optimizer could choose to ignore a lower cost indexed access path and use a higher cost index if the lower cost were based on guesswork. The original article article used (select {constant} from dual) as a way of supplying an “unpeekable bind” in a predicate, but that mechanism stopped working 11gR2, hence the update.

The upate also goes into a little more detail about event 38036 which can be used to modify this behaviour by defining a “cut-off” percentage where Oracle will switch back to using the lower cost path.

We start with the code to generate the data – including, in this case – a table that I can query to supply “hidden constants” to the optimizer:

em
rem     Script:         index_range_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                rownum <= 10000 -- > comment to avoid wordpress format issue
)
select
        rownum                  id,
        9 + mod(rownum-1,3)     scattered,
        trunc(rownum / 500)     clustered,
        lpad(mod(rownum,2),10)  ind_pad,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 40000 -- > comment to avoid wordpress format issue
;

create index t1_equi  on t1(scattered, ind_pad, clustered);
create index t1_range on t1(clustered, scattered);

create table driver(name varchar2(20), value number(4,0));
create index drv_i1 on driver(name);

insert into driver values('Forty',40);
insert into driver values('FortyOne',41);
commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'DRIVER',
                method_opt       => 'for all columns size 1'
        );
end;
/


The driver table models a pattern that I see fairly frequently of a table holding a set of “constants” that are subject to infrequent changes. (An alternative strategy is to have a PL/SQL package of constants and a function to return a value when supplied with a name.)

The key columns in the table are

  • scattered: 3 distinct values evenly but randomly scattered across the table – not a column you would index separately
  • clustered: 500 consecutive rows each for 80 distinct values, so very well clustered data

There are two indexes on the main data table

  • t1_range (clustered scattered) — roughly (80 * 3 =) 240 distinct keys
  • t1_equi (scattered, ind_pad, clustered) — roughly (3 * 2 * 80) = 480 distinct keys

If I execute the query: “select * from t1 where clustered between 40 and 41 and scattered = 10” Oracle uses the t1_range index to return an estimated 491 rows at a cost of 36. The estimates aren’t too far out since in my case the query returned 334 rows in 26 buffer visits (when my SQL*Plus arraysize was 500).

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |   496 |  8432 |    36   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1       |   496 |  8432 |    36   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_RANGE |   496 |       |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CLUSTERED">=40 AND "SCATTERED"=10 AND "CLUSTERED".le.41)
       filter("SCATTERED"=10)

But what happens if we decide to keep the start and end values for clustered in the driver table:


set autotrace traceonly explain

select 
        /*  index(t1 t1_range) */
        small_vc
from
        t1
where
        scattered = 10
and     clustered between (select value from driver where name = 'Forty')
                  and     (select value from driver where name = 'FortyOne')
;

set autotrace off



You’ll notice the “hint that isn’t a hint” I’ve tested two versions of the query, one without a hint and one where I inserted the necessary “+” to have a hint instead of a comment. Here are the execution plans (with “<=” edited to “.le.” in the Predicate Information):


=======
Default
=======
-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |    33 |   561 |    67   (3)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED  | T1      |    33 |   561 |    63   (4)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                    | T1_EQUI |    60 |       |    56   (4)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| DRIVER  |     1 |    11 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | DRV_I1  |     1 |       |     1   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| DRIVER  |     1 |    11 |     2   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | DRV_I1  |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SCATTERED"=10 AND "CLUSTERED">= (SELECT "VALUE" FROM "DRIVER" "DRIVER"
              WHERE "NAME"='Forty') AND "CLUSTERED".le. (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE
              "NAME"='FortyOne'))
       filter("CLUSTERED">= (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE "NAME"='Forty')
              AND "CLUSTERED".le. (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE "NAME"='FortyOne'))
   4 - access("NAME"='Forty')
   6 - access("NAME"='FortyOne')

=============
Hint t1_range
=============
--------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |    33 |   561 |    10   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED  | T1       |    33 |   561 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                    | T1_RANGE |    60 |       |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| DRIVER   |     1 |    11 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | DRV_I1   |     1 |       |     1   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| DRIVER   |     1 |    11 |     2   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | DRV_I1   |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CLUSTERED">= (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE "NAME"='Forty')
              AND "SCATTERED"=10 AND "CLUSTERED".le. (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE
              "NAME"='FortyOne'))
       filter("SCATTERED"=10)
   4 - access("NAME"='Forty')
   6 - access("NAME"='FortyOne')

The big difference occurs at operation 2. You should notice two things – first, the estimate of rows is wrong by a factor of 10, but that’s not the point I want to chase. Secondly by default the optimizer has  selected the plan using the (far) more expensive index. This is a deliberately coded choice and if you have access to MOS then you should check Document ID 4112254.8: “CBO may not choose best index for single table access with unpeeked bind/function”.

Rerun the test (unhinted) after issuing.

alter session set events '38036 trace name context forever, level 10'

You will find that the optimizer automatically takes the path using the lower cost index. For values of 10 or higher Oracle will pick the lower cost index, for values of 9 or less the optimizer will stick with the higher cost index.

The question, of course, is what’s special about the value 10. Given the clue in the MOS document, and the costs from the plans above, and allowing for some fiddling with the arithmetic we can note the following:

  • Cost of using the expensive index for the table access is 63 (From the 10053 trace it’s actually 62.525996)
  • Cost of using the cheaper index for the table access if 6 (Again, from the 1003 trace, 6.037011)
  • Express 6.037011 / 62.525996 as a percentage and you get 9.655 which rounds to 10.

Probably not a coincidence! (Though it might take a few more test to decide whether it’s round() or ceiling())

So if the optimizer is picking the wrong index, and the arithmetic for the one it should be using is dictated by guessing, then calculate the ratio of the two costs for the table access, express as a percentage and round up (and add some if you want to) then set the event.

I’m not really sure how much use you could, or should, make of this event. Possibly it’s one of those things you use temporarily to allow you to generate an SQL Baseline (the event setting doesn’t get captured as an “opt_param()” hint unlike other fiddles of this type that you might use). Maybe there are a few special cases in batch jobs were the optimizer is a little inconsistent about which index to pick and the event is relevant and could be set through an ‘alter session’ command. It’s nice to know it’s there – but may be hard to use in practice.

Footnote

If you were wondering about the cardinality estimates of 60 rowids from the index and 33 rows from the table. This is the effect of the standard  “guesses for unpeeked binds” the optimizer uses. For bounded (between) range on an index the selectivity is 0.0045, for a table it’s 0.0025; in this case we then have to include a factor of 1/3 in the arithmetic to allow for the “scattered=10” predicate:

40,000 * 0.0045 * 1/3 = 60

40,000 * 0.0025 * 1/3 = 33.33….