Search

Top 60 Oracle Blogs

Recent comments

Subquery Factoring (9)

Several years go (eight to be precise) I wrote a note suggesting that Oracle will not materialize a factored subquery unless it is used at least twice in the main query. I based this conclusion on a logical argument about the cost of creating and using a factored subquery and, at the time, I left it at that. A couple of years ago I came across an example where even with two uses of a factored subquery Oracle still didn’t materialize even though the cost of doing so would reduce the cost of the query – but I never got around to writing up the example, so here it is:


create table t1
as
select
        object_id, data_object_id, created, object_name, rpad('x',1000) padding
from
        all_objects
where
        rownum <= 10000
;

exec dbms_stats.gather_table_stats(user,'T1')

explain plan for
with gen as (
        select /*+ materialize */ object_id, object_name from t1
)
select
        g1.object_name,
        g2.object_name
from
        gen g1,
        gen g2
where
        g2.object_id = g1.object_id
;

select * from table(dbms_xplan.display);

You’ll notice that my original table has very wide rows, but my factored subquery selects a “narrow” subset of those rows. My target is to have an example where doing a tablescan is very expensive but the temporary table holding the extracted data is much smaller and cheaper to scan.

I’ve included a materialize hint in the SQL above, but you need to run the code twice, once with, and once without the hint. Here are the two plans – unhinted first:


============================
Unhinted - won't materialize
============================

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   468K|   428   (2)| 00:00:03 |
|*  1 |  HASH JOIN         |      | 10000 |   468K|   428   (2)| 00:00:03 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
---------------------------------------------------------------------------

==================================
Hinted to materialize - lower cost
==================================

--------------------------------------------------------------------------------------------------------- 
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT           |                            | 10000 |   585K|   227   (2)| 00:00:02 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6664_9DAAEB7 |       |       |            |          | 
|   3 |    TABLE ACCESS FULL       | T1                         | 10000 |   234K|   214   (2)| 00:00:02 | 
|*  4 |   HASH JOIN                |                            | 10000 |   585K|    13   (8)| 00:00:01 | 
|   5 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 | 
|   7 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Clearly the optimizer isn’t considering the costs involved. If I add the predicate “where object_id > 0” (which identifies ALL the rows in the table), materialization occurs unhinted (with the same costs reported as for the hinted plan above. My tentative conclusion is that the transformation is a heuristic one that follows the rule “two or more appearances of the subquery and some indication of row selection in the subquery rowsource”. (In fact if the rowsource is “select * from pipeline_function” the requirement for subsetting doesn’t seem to apply.)

The plans above came from 11.2.0.4 but I got the same result, with a slight difference in costs, in 12.1.0.2. It’s worth pointing out that despite Oracle apparently ignoring the costs when deciding whether or not to materialize, it still seems to report self-consistent values after materialization: the 227 for the plan above is the 214 for creating the temporary table plus the 13 for deriving the hash join of the two copies of the temporary table.