Search

Top 60 Oracle Blogs

Recent comments

Between

Reading Richard Foote’s latest blog note about automatic indexing and “non-equality” predicates I was struck by a whimsical thought about how the optimizer handles “between” predicates. (And at the same time I had to worry about the whimsical way that WordPress treats “greater than” and “less than” symbols.)

It’s probably common knowledge that if your SQL has lines like this:

columnA between {constant1} and {constant2}

the optimizer will transform them into lines like these:

    columnA >= {constant1}
and columnA <= {constant2}

The question that crossed my mind – and it was about one of those little details that you might never look at until someone points it out – was this: “does the optimizer get clever about which constant to use first?”

The answer is yes (in the versions I tested). Here’s a little demonstration:

rem
rem     Script:         between.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2021
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
select
        rownum  rn,
        ao.*
from
        all_objects ao
where
        rownum <= 50000
;

set autotrace traceonly explain

select  object_name
from    t1
where
        rn between 45 and 55
;


select  object_name
from    t1
where
        rn between 49945 and 49955
;


select  object_name
from    t1
where
        rn between 24945 and 24955
;

select  object_name
from    t1
where
        rn between 25045 and 25055
;

set autotrace off

All I’ve done is create a table with 50,000 rows and a column that is basically a unique sequence number between 1 and 50,000. Then I’ve checked the execution plans for a simple query for 11 rows based on the sequence value – but for different ranges of values.

Two of the ranges are close to the low and high values for the sequence; two of the ranges are close to, but either side of, the mid-point value (25,000) of the sequence. The big question is: “does the execution plan change with choice of range?”. The answer is Yes, and No.

No … because the only possible execution path is a full tablescan

Yes … because when you examine the plan properly you’ll notice a change in the Predicate Information. Here are the first two execution plans produced by the calls to dbms_xplan.display():

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    12 |   528 |   140   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    12 |   528 |   140   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN"<=55 AND "RN">=45)

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    12 |   528 |   140   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    12 |   528 |   140   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN">=49945 AND "RN"<=49955)

Notice how the order of the filter predicates has changed as we move from one end of the range to the other. The optimizer has decided do the test that is more likely to fail first, and the test that is more likely to succeed second (which means there won’t be many rows where it has to run both tests which will make a small difference in the CPU usage).

Picking out just the filter predicate line from the output for this script (host grep filter between.lst) you can see the same pattern appear when the values supplied are very close to the mid-point (25,000).

SQL> host grep filter between.lst
   1 - filter("RN"<=55 AND "RN">=45)
   1 - filter("RN">=49945 AND "RN"<=49955)
   1 - filter("RN"<=24955 AND "RN">=24945)
   1 - filter("RN">=25045 AND "RN"<=25055)

My code has used literal values to demonstrate an effect. It’s worth checking whether we would still see the same effect if we were using bind variables (and bind variable peeking were enabled). So here’s a little more of the script:

set serveroutput off

variable b1 number
variable b2 number

exec :b1 := 45
exec :b2 := 55

select
        /* low_test */
        object_name
from    t1
where
        rn between :b1 and :b2
/

select * from table(dbms_xplan.display_cursor(format=>'basic +predicate'));

exec :b1 := 49945
exec :b2 := 49955

select
        /* high_test */
        object_name
from    t1
where
        rn between :b1 and :b2
/

select * from table(dbms_xplan.display_cursor(format=>'basic +predicate'));
set serveroutput on

Since autotrace simply calls “explain plan” and doesn’t know anything about bind variables (treating them as unpeekable character strings) I’ve used code that executes the statements and pulls the plans from memory. Here are the results (with some of the script’s output deleted):

EXPLAINED SQL STATEMENT:
------------------------
select  /* low_test */  object_name from t1 where  rn between :b1 and :b2

Plan hash value: 3332582666

-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|*  1 |  FILTER            |      |
|*  2 |   TABLE ACCESS FULL| T1   |
-----------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:B2>=:B1)
   2 - filter(("RN"<=:B2 AND "RN">=:B1))


EXPLAINED SQL STATEMENT:
------------------------
select  /* high_test */  object_name from t1 where  rn between :b1 and :b2

Plan hash value: 3332582666

-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|*  1 |  FILTER            |      |
|*  2 |   TABLE ACCESS FULL| T1   |
-----------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:B2>=:B1)
   2 - filter(("RN">=:B1 AND "RN"<=:B2))

As you can see, when we query the low value the first comparison is made against :b2, when we query the high range the first comparison is made against :b1.

It is actually worth knowing that this can happen. How many times have you heard the question: “the plan’s the same, why is the performance different?”. Maybe the body of the plan looks the same and has the same plan_hash_value, but today the first person to execute the query supplied bind values that made the optimizer choose to apply the filters in the opposite order to usual. This probably won’t make much difference to CPU usage in most cases there are bound to be a few cases where it matters.

You’ll notice, by the way, that the plan with bind variables includes a FILTER operation that doesn’t appear in the plans with literal values. This is an example of “conditional SQL” – if you check the predicate information for operation 1 you’ll see that it’s checking that :b2 is greater than :b1, if this test doesn’t evaluate to true then operation 1 will not make a call to operation 2, i.e. the tablescan is in the plan but won’t happen at run-time.

(I believe that there may be some RDBMS which will treat (e.g.) “X between 20 and 10” as being identical to “X between 10 and 20” – Oracle doesn’t.)

Left as an exercise

The test data was created as a completely evenly spaced (by value) and evenly distributed (by count) set of values. How would things change if the data were sufficiently skewed that the optimizer would default to creating a histogram when gathering stats.

Left as another exercise**

There are lots of little bits of arithmetic that go into the CPU_COST component of an execution plan – including a tiny factor to allow for the number of columns that Oracle has to “step over” (by counting bytes) as it projects the columns needed by the query; so if you had a second “between” predicate on another column in the table, could you manage to get all 24 possible orders for the 4 transformed predicates by adjusting the ranges of the between clauses and/or moving the two columns to different positions in the row.

** For those in lockdown who need something to do to fill the time.