Search

Top 60 Oracle Blogs

Recent comments

Interval Oddity

Interval partitioning is a popular strategy for partitioning date-based data. It’s an enhanced variant of range partitioning that allows you to define a starting partition and an interval that should be used to derive the high values for all subsequent partitions – and Oracle doesn’t even have to create intervening partitions if you insert data that goes far beyond the current partition, it automatically creates exactly the right partition (with the correct high_value and correctly inferred lower boundary) for the incoming data and behaves as if the intervening partitions will become available when they’re needed at some later point in time. So no need for DBAs to work out actual partition high_values, no need to ensure that all the partitions you need out into the future have been pre-created, no more batch processes crashing with Oracle error ORA-14400: inserted partition key does not map to any partition.

But there’s a surprising difference between traditional range partitioning and the newer interval partitioning that will increase CPU usage in some cases and may (though I haven’t yet investigated this in sufficient detail to create an example) lead to variations in execution plans.

To demonstrate the difference I’m going to create two tables with the same structure and content then run the same query against them and show you the resulting execution plans. Here’s the code to create and populate the tables:

rem
rem     Script:         interval_or_range.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem


create table t_interval(
        order_date      date not null,
        order_id        number(10,0) not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range (order_date) 
interval (numtoyminterval(1,'MONTH'))
subpartition by hash (order_id) subpartitions 4
        (
                partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy'))
        )
;

create table t_range(
        order_date      date not null,
        order_id        number(10,0) not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range (order_date) 
subpartition by hash (order_id) subpartitions 4
        (
                partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy')),
                partition start_p2 values less than (to_date('01-Feb-2020','dd-mon-yyyy')),
                partition start_p3 values less than (to_date('01-Mar-2020','dd-mon-yyyy')),
                partition start_p4 values less than (to_date('01-Apr-2020','dd-mon-yyyy')),
                partition start_p5 values less than (to_date('01-May-2020','dd-mon-yyyy')),
                partition start_p6 values less than (to_date('01-Jun-2020','dd-mon-yyyy')),
                partition start_p7 values less than (to_date('01-Jul-2020','dd-mon-yyyy')),
                partition start_p8 values less than (to_date('01-Aug-2020','dd-mon-yyyy')),
                partition start_p9 values less than (to_date('01-Sep-2020','dd-mon-yyyy'))
        )
;

insert into t_range select
        to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
        rownum,
        lpad(rownum,10,'0'),
        lpad('x',100,'x')
from
        all_objects
where
        rownum <= 240 -- > comment to avoid wordpress format issue  
;

insert into t_interval select
        to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
        rownum,
        lpad(rownum,10,'0'),
        lpad('x',100,'x')
from
        all_objects
where
        rownum <= 240 -- > comment to avoid wordpress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T_RANGE',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T_INTERVAL',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve used composite partitioned tables in this example but the same anomaly appears with simple partitioning – which you can test by commenting out the “subpartion by ..” lines. The two tables have partitions defined to hold a month’s data. I’ve inserted a few rows into most of the partitions, and ensured that I haven’t accidentally attempted to insert data that falls outside the legal range of the table with the predefined partitions .

To show that the tables are nominally identical here’s the list of partitions with their high values and number of rows:

column table_name       format a15
column partition_name   format a15
column high_value       format a80

break on table_name skip 1

select
        table_name, partition_name, num_rows, high_value
from
        user_tab_partitions
where
        table_name in ('T_INTERVAL','T_RANGE')
order by
        table_name, partition_name
;


ABLE_NAME      PARTITION_NAME    NUM_ROWS HIGH_VALUE
--------------- --------------- ---------- --------------------------------------------------------------------------------
T_INTERVAL      START_P1                 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10722              30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10723              29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10724              31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10725              30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10726              31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10727              30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10728              31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10729              28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

T_RANGE         START_P1                 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P2                30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P3                29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P4                31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P5                30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P6                31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P7                30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P8                31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P9                28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA


18 rows selected.

With this setup we can now run a simple query against the two tables using a where clause that is an exact match for the partition definition for the July data:

set serveroutput off

select 
        count(*) 
from 
        t_range 
where   order_date >= date '2020-07-01'  -- to_date('01-Jul-2020','dd-mon-yyyy')
and     order_date <  date '2020-08-01'  -- to_date('01-Aug-2020','dd-mon-yyyy')
/

select * from table(dbms_xplan.display_cursor);

select 
        count(*) 
from 
        t_interval 
where   order_date >= date '2020-07-01'  -- to_date('01-Jul-2020','dd-mon-yyyy')
and     order_date <  date '2020-08-01'  -- to_date('01-Aug-2020','dd-mon-yyyy')
/

select * from table(dbms_xplan.display_cursor);

I’ve left in two versions of the date predicates – the “ANSI” style, and a strictly formatted “to_char()” style that is still the approached used most commonly in Oracle systems. The effect on the execution plans is the same for both variations of the date declaration, and here are the two execution plans – reported from an instance of 19.3.0.0:

---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |       |       |   130 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |         |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|         |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
|   3 |    TABLE ACCESS FULL    | T_RANGE |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
---------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |       |       |   130 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |            |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|            |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
|*  3 |    TABLE ACCESS FULL    | T_INTERVAL |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("ORDER_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DATE"

It’s quite possible that you won’t notice the difference these plans, even though I’ve put them directly one below the other; and it’s very likely that most people wouldn’t think about the difference if they didn’t have the two plans available at the same time to compare.

If you can’t spot the difference it’s because I’ve deliberately arranged them in a way that might fool you into not looking carefully enough.

If you’ve spotted the difference it’s probably because you’ve examined the Predicate Information section. There isn’t one for the the t_range example but there is one for the t_interval example – and it was a little sneaky of me to make it easy for you to assume that the one set of predicates I showed was common to the two plans.

Since the plans have been pulled from memory (v$sql_plan) it seems likely that they are truthful and the query based on the interval partitioning is actually checking every row in the partition against two date values. In my example that’s not going to make much difference to performance, but in a datawarehouse with millions of rows per partition there’s scope for a noticeable increase in CPU between the two queries – especially since the block reads are likely to be direct path, bypassing much of the usual (CPU -intensive) buffer cache activity.

Interestingly when I checked the 10053 trace files for the two queries the CPU cost for the pair was identical. Normally you expect to see at least a little CPU cost (even if it’s only tens of units out of millions) for checking a predicate. This raises the question – does the optimizer allow a cost for the t_range table for an event that isn’t going to happen, or is it reporting an event that isn’t going to happen for the t_interval table. [Edit: easy enough to check by doing the test on a simple heap table that clones the data from that one partition, and checking the calculated cost with and without the predicate]

There’s a further ramification to this anomaly, relating to the question on the Oracle-L list server that prompted the investigation. What happens on Exadata where the predicates can be offloaded to storage?

The original question wasn’t actually about the appearance (or not) of the predicates, it was about an unexpected cardinality estimate for a query involving two equi-partitioned tables, so this note has gone completely off-topic from the question; but another strange detail about the predicates showed up when I suggested the creation of a column group on the join columns. A (redundant) predicate that had been offloaded to storage stopped being offloaded; here, from the original posting with the original tables, are the two different Predicate Information sections that appeared – the first without the column group, the second after the column group had been created and its stats collected:

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
   6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - storage("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Predicate Information (identified by operation id):
---------------------------------------------------
   4 -
access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
   6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"

Notice how the storage() predicate that appears at operation 8 in the first set has disappeared from the second even though operation 6 manages to retain its storage() predicate throughout.

I’ve modelled a further example of odd behaviour using two pairs of tables – one pair using range/hash partitioning the other interval/hash partitioning. I won’t reproduce it here but the problem of redundant predicates appearing and then disappearing gets even stranger.

I haven’t yet produced an example where the unexpected predicate behaviour has affected the costs or cardinalities – but I’ve only spent a couple of hours playing around with well-formed examples: it’s possible that in badly formed examples (e.g. with statistical inconsistencies) the side effect could mean that two notionally identical queries produce different numbers and different plans because they end up with different predicates in the final transformed query.

Footnote

Following an email from David Kurtz, it occurred to me that I should have made it clear that the disappearance of predicates on the partition key is expected behaviour when the predicates are clearly synchronised with the partition boundaries. The behaviour for the interval partitioning is the oddity, the behaviour for the “normal” range partitioning is the standard.