Search

Top 60 Oracle Blogs

Recent comments

Subquery with OR

I’ve written a couple of notes in the past about the problems of optimising queries with predicates of the form “or exists {subquery}”. A recent question on the Oracle Developer Community forum brought to my attention an improvement in this area in (very precisely) 12.2, as well as giving me a cute example of how the first cut of a new feature doesn’t always cover every detail, and creating a nice example of how the new technology enhances the old technology.

We start with some data and a simple query running under 12.2.0.1:

rem
rem     Script:         exists_with_or_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2  -- feature not implemented
rem

create table cat_contact(
        contact_method_id       varchar2(1) not null,
        contact_id              number(8,0) not null,
        small_vc                varchar2(10),
        padding                 varchar2(100)
);

alter table cat_contact add constraint cc_pk primary key(contact_id);
create index cc_i1 on cat_contact(contact_method_id);

insert into cat_contact
select
        chr(64 + case when rownum <= 10 then rownum else 26 end),
        rownum,
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_contact where contact_method_id in ('A','B','C');

create table cat_item(
        contact_id      number(8,0) not null,
        item_category   varchar2(1) not null,
        small_vc        varchar2(10),
        padding         varchar2(100),
        constraint ci_ref_cc foreign key(contact_id) references cat_contact
)
;

alter table cat_item add constraint ci_pk primary key(contact_id, item_category);
create index ci_i1 on cat_item(item_category);

insert into cat_item 
select
        rownum,
        chr(88 + case when rownum <= 10 then mod(rownum,2) else 2 end),
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_item where item_category in ('X','Y');

execute dbms_stats.gather_table_stats(user,'cat_contact')
execute dbms_stats.gather_table_stats(user,'cat_item')

I’ve created and populated two tables (the table and column names come from the ODC thread). There’s a foreign key relationship defined between cat_item and cat_contact, both tables have primary keys declared, with a couple of extra columns declared not null.

I’ve populated the two tables with a small amount of data and each table has one column rigged with very skewed data:

  • cat_contact.contact_method_id is mostly ‘Z’ with one row each of ‘A’ to ‘J’ ,
  • cat_item.item_category (the second column in the primary key) is mostly ‘Z’ with 5 rows each of ‘X’ and ‘Y’

After populating each table I’ve queried it in a way which means the subsequent stats gathering will create frequency histograms on these two columns and the optimizer will be able to take advantage of the skew in its arithmetic, which means it may choose to use the indexes I’ve created on those skewed columns if the right values appear in the queries.

So here’s the query we’re interested in:

SELECT  /*+ 
                qb_name(main) 
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR      c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

Here’s the default execution plan (in 12.2.0.1 with my settings for system stats and various other optimizer-related figures that MIGHT make a difference) pulled from memory after executing the query to return 10 rows.


-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |       |       |    34 (100)|          |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL | CAT_CONTACT | 10000 |  1152K|    34   (6)| 00:00:01 |
|   3 |   INLIST ITERATOR   |             |       |       |            |          |
|*  4 |    INDEX UNIQUE SCAN| CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((INTERNAL_FUNCTION("C"."CONTACT_METHOD_ID") OR  IS NOT NULL))
   4 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR
              "I"."ITEM_CATEGORY"='Y')))

For every row in the cat_contact table Oracle has checked whether or not the contact_method is an ‘A’, ‘B’, or ‘C’ and passed any such rows up to its parent, for all other rows it’s then executed the subquery to see if the row with the matching contact_id in contact_item has an ‘X’ or ‘Y’ as the item_category. It’s had to run the subquery 9,997 times (there were only three rows matching ‘A’,’B’,’C’) and the INLIST ITERATOR at operation 3 means that it’s probed the index nearly 20,000 timtes. This does not look efficient.

I’ve said in previous articles that when you need to optimize queries of this shape you need to rewrite them as UNION ALL queries to separate the two parts of the complex OR predicate and then make sure that you don’t report any items twice – which you do by making use of the lnnvl() function. So let’s do this – but let’s do it the lazy “new technology” way by upgrading to 19c and executing the query there; here’s the plan I got in 19.3.0.0:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

The optimizer has used the new “cost-based OR-expansion” transformation to rewrite the query as a UNION ALL query. We can see an efficient access into cat_contact to identify the ‘A’,’B’,’C’ rows, and then we can see that the second branch of the union all handles the existence subquery but the optimizer has unnested the subquery to select the 10 rows from cat_item where the item_category is ‘X’ or ‘Y’ and used those rows in a nested loop to drive into the cat_contact table using the primary key. We can also see the use of the lnnvl() function in operation 13 that ensures we don’t accidentally report the ‘A’,’B’,’C’ rows again.

So let’s go back to 12.2.0.1 and see what happens if we just add the /*+ or_expand(@main) */ hint to the query. Here’s the resulting execution plan:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

We get exactly the plan we want – with the same cost as the 19c cost, which happens to be less than half the cost of the default plan that we got from 12.2.0.1. So it looks like there may be case where you will need to hint OR-expansion because is might not appear by default.

Other Observations 1 – ordering

You may have noticed that my query has, unusually for me, put the existence subquery first and the simple filter predicate second in the where clause. I don’t like this pattern as (over time, and with different developers modifying queries) it’s too easy in more complex cases to “lose” the simple predicate; a one-liner can easily drift, change indents, get bracketed with another predicate that it shouldn’t be connected with and so on. I’ve actually seen production systems producing wrong results because little editing accidents like this (counting brackets is the classic error) have occured – so I’m going to rerun the test on 12.2.0.1 with the predicates in the order I would normally write them.

Here’s the “corrected” query with its execution plan:


SELECT  /*+ 
                qb_name(main) 
                or_expand(@main)
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                c.contact_method_id IN ('A', 'B', 'C')  
        OR
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        )
;  


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    16 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    16   (7)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   4 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   5 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   6 |       INLIST ITERATOR                     |                 |       |       |            |          |
|   7 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|*  8 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|*  9 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|  10 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
|* 11 |    FILTER                                 |                 |       |       |            |          |
|  12 |     INLIST ITERATOR                       |                 |       |       |            |          |
|  13 |      TABLE ACCESS BY INDEX ROWID BATCHED  | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|* 14 |       INDEX RANGE SCAN                    | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|  15 |     INLIST ITERATOR                       |                 |       |       |            |          |
|* 16 |      INDEX UNIQUE SCAN                    | CI_PK           |     1 |     6 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
   9 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  11 - filter(LNNVL( IS NOT NULL))
  14 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  16 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y')))

The execution plan has jumped from 14 lines to 17 lines, the cost has gone up from 14 to 16, and both branches of the plan now report access to cat_contact and cat_item (though only through its primary key index in the second branch). What’s happened?

Oracle 12.2.0.1 has rewritten the query as a UNION ALL working from the bottom up – so in this case the first branch of the rewrite handles the original filter subquery, unnesting it to drive efficient from cat_item to cat_contact. This means the second branch of the rewrite has to find the ‘A’,’B’,’C’ rows in cat_contact and then check that the filter subquery hadn’t previously reported them – so the optimizer has applied the lnnvl() function to the filter subquery – which you can nearly see in the Predicate Information for operation 11.

To make it clearer, here’s what you get as the predicate information for that operation after calling explain plan and dbms_xplan.display()

  11 - filter(LNNVL( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1)))

In 12.2 the order of predicates in your query seems to be important – unless told otherwise the optimizer is working from the bottom-up (then rewriting top-down). But there is hope (though not documented hope). I added the /*+ or_expand(@main) */ hint to the query to force OR-expansion. Checking the Outline Information of the plan I could see that this had been expanded to /*+ or_expand(@main (1) (2)) */. Taking a wild guess as the significance of the numbers and changing the hint to /*+ or_expand(@main (2) (1) */ I re-ran the test and back to the more efficient plan – with the filter subquery branch appearing second in the UNION ALL view and the lnnvl() applied to the simpler predicate.

So the OR-expansion code is not fully cost-based in 12.2.0.1, but you can modify the behaviour through hinting. First to force it to appear (which may not happen even if it seems to be the lower cost option), and secondly to control the ordering of the components of the UNION ALL. As with all things relating to hints, though, act with extreme caution: we do not have sufficient documentation explaining exactly how they work, and with some of them we don’t even know whether the code path is even complete yet.

Other Observations 2 – 12cR1

The or_expand() hint and cost-based OR-expansion appeared specifically in 12.2.0.1; prior to that we had a similar option in the use_concat() hint and concatenation – which also attempts to rewrite your query to produce a union all of disjoint data sets. But there are restrictions on what concatentation can do. I rarely remember what all the restrictions are, but there are two critical restrictions:

  • first, it will only appear by default if there is an indexed access path available to drive every branch of the rewrite
  • secondly, it will not apply further transformations to the separate branches that it produces

If we try adding the or_expand() hint to our query in 12.1.0.2 it will have no effect, so let’s add a suitable use_concat() hint and see what happens:

explain plan for
SELECT  /*+ 
                qb_name(main) 
                use_concat(@main 8 or_predicates(1))
--              use_concat(@main   or_predicates(1))
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR
                c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             | 10000 |  1152K|    40   (3)| 00:00:01 |
|   1 |  CONCATENATION                        |             |       |       |            |          |
|   2 |   INLIST ITERATOR                     |             |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| CAT_CONTACT |     3 |   354 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | CC_I1       |     3 |       |     3   (0)| 00:00:01 |
|*  5 |   FILTER                              |             |       |       |            |          |
|*  6 |    TABLE ACCESS FULL                  | CAT_CONTACT |  9997 |  1151K|    35   (6)| 00:00:01 |
|   7 |    INLIST ITERATOR                    |             |       |       |            |          |
|*  8 |     INDEX UNIQUE SCAN                 | CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C')
   5 - filter( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1))
   6 - filter(LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C'))
   8 - access("I"."CONTACT_ID"=:B1 AND ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))

26 rows selected.

As you can see by forcing concatentation I’ve got my “union all” view with lnnvl() applied in the second branch. But the second branch was the “select where exists()” branch and the optimizer has not been able (allowed?) to do the unnesting that would let it drive efficiently from the cat_item table to the cat_contact table. The effect of this is that the plan still ends up with a full tablescan of cat_contact running a filter subquery on virtually every row- so concatenation doesn’t save us anything.

The significance of the “8” in the hint, by the way is (I believe) that it tells the optimizer to use inlist iterators when possible. If I had omitted the “8” the plan would have had 4 branches – one each for ‘A’, ‘B’, and ‘C’ and the fourth for the filter subquery. I could also have added a hint /*+ use_concat(@subq or_predicates(1)) */ to replace operations 7 and 8 with a single index range scan with a filter predicate for the ‘X’/’Y’ check (and that might, in any case, be slightly more efficient than the iteration approach).

Footnote(s)

The “legacy” OR-expansion (“concatenation” a.k.a. LORE in the optimizer trace file) can be controlled through the hints use_concat(), and no_expand().

The new cost-based OR-expansion (now ORE in the optimizer trace file) can be controlled through the hints or_expand() and no_or_expand().

The new cost-based OR-expansion has some restrictions, for example it is explicitly blocked in a MERGE statement, even in 19c, as reported in this blog note by Nenad Noveljic. As the blog note shows, concatenation is still possible but you (may) have to disable cost based OR-expansion.

I scanned the executable for the phrase “ORE: bypassed” to see if there were any messages that would suggest other reasons why cost-based OR-expansion would not be used; unfortunately the only relevant string was “ORE: bypassed – %s” [update (see comment 5 below): after ignoring case there was a second option: “ORE: Bypassed for disjunct chain: %s.”] – in other words all the possible bypass messages would be filled in on demand. I found a list of messages that might be relevant; I’d be a little cautious about trusting it but if you don’t see the feature appearing when you’re expecting it then it might be worth checking whether one of these could apply.

  • Old OR expansion hints present
  • Semi join hint present
  • QB has FALSE predicate
  • QB marked for NO Execution
  • Full Outer join QB
  • Rownum found in disjunction
  • Anti/semi/outer join in disjunction
  • Opposite Range in disjunction
  • No Index or Partition driver found
  • Predicate chain has all constant predicates
  • Negated predicate found
  • Long bitmap inlist in OR predicate
  • PRIOR expression in OR predicate
  • All 1-row tables found
  • No valid predicate for OR expansion
  • Disjunctive subquery unnesting possible
  • Subquery unnesting possible
  • Subquery coalesced query block
  • Merge view query block

Finally – here’s another reference blog note comparing LORE with ORE from Mohamed Houri.