Index Usage – 2

I’ve been a little slow in the follow-up to my previous posting on possibly redundant indexes. Before going into the slightly more complex stuff, there’s another peripheral point (but a very important one) that’s worth raising about how clever the optimizer can be. Here’s some code for 11.2.0.4 to demonstrate the point:

create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(sysdate,'MM') + (rownum-1)/1440	date_time,
	rpad('x',100)				padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;


alter table t1 
add (
        date_only
	generated always as (trunc(date_time)) virtual 
)
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t1_i1 on t1(date_only) nologging;

So, in a two-step process, I’ve got an indexed virtual column that holds the value of the date_time column truncated to just the date. Would you expect the optimizer to use the index to execute the following query efficiently:


select
        max(id)
from
        t1
where
        date_time between sysdate-1 and sysdate
;

Note that the query references the real date_time column not the virtual column date_only, and it’s not using the expression that defines the index – yet the plan reads as follows:


-----------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |     1 |    21 |    86   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE                       |       |     1 |    21 |            |          |
|*  2 |   FILTER                              |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |  1442 | 30282 |    86   (2)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_I1 |  4306 |       |    13   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!>=SYSDATE@!-1)
   3 - filter("DATE_TIME"<=SYSDATE@! AND "DATE_TIME">=SYSDATE@!-1)
   4 - access("T1"."DATE_ONLY">=TRUNC(SYSDATE@!-1) AND
              "T1"."DATE_ONLY"<=TRUNC(SYSDATE@!))

It’s a little odd that even though the optimizer in the newer versions of Oracle treats many simple expressions on sysdate as constants it still checks (operation 2) that “sysdate >= sysdate – 1” but perhaps that’s just a case of a piece of generic code that isn’t worth the risk or effort of changing.

The key point, of course, is that Oracle has managed to generate some extra predicates that allow it to use the “wrong” index to get a first approximation of the result set fairly efficiently, and then used the original predicate to reduce the approximation down to the correct result set.

If you want a quick sanity check on the access predicates used for operation 4:

  • If date_time >= sysdate-1, then trunc(date_time) >= trunc(sysdate-1)
  • If date_time <= sysdate, then trunc(date_time) <= trunc(sysdate)

This style of predicate manipulation also works numeric data types, but I think its greatest benefit (or convenience) is likely to come from date data types where the data has been created with a time component but there are frequent “date-only” queries. The days of creating two indexes as a workaround for handling generated code that wants to deal with both date_time and trunc(date_time) predicates should be numbered.

Footnote:

This enhancement probably appeared in 11.2.0.2, and I first saw it described in October 2013 in this blog note by Mohamed Houri; but 12c offers a delightful little enhancement – here’s what my table looks like in the 12c version of the code:


SQL> desc t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ID                                     NUMBER
 DATE_TIME                              DATE
 PADDING                                VARCHAR2(100)

SQL> 

Where’s the virtual column ? The 12c version of my code had a slightly different definition for it:


alter table t1  
add (
        date_only
        invisible
        generated always as (trunc(date_time)) virtual
)
;

The transformation still works even when the virtual column is invisible. So (subject to searching for anomalies, boundary conditions and bugs) it looks as if you can change the table definition, and get the benefits of two indexes for the price of one without the application realising that anything has changed.