# Star Transformation And Cardinality Estimates

If you want to make use of Oracle's cunning Star Transformation feature then you need to be aware of the fact that the star transformation logic - as the name implies - assumes that you are using a proper star schema.

Here is a nice example of what can happen if you attempt to use star transformation but your model obviously doesn't really correspond to what Oracle expects:

drop table d;

purge table d;

drop table t;

purge table t;

create table t
as
select
rownum as id
, mod(rownum, 100) + 1 as fk1
, 1000 + mod(rownum, 10) + 1 as fk2
, 2000 + mod(rownum, 100) + 1 as fk3
from
dual
connect by
level <= 1000000
;

exec dbms_stats.gather_table_stats(null, 't')

create bitmap index t_fk1 on t (fk1);

create bitmap index t_fk2 on t (fk2);

create bitmap index t_fk3 on t (fk3);

create table d
as
select
rownum as id
, case when rownum between 1 and 100 then 'Y' else 'N' end as is_flag_d1
, case when rownum between 1001 and 1010 then 'Y' else 'N' end as is_flag_d2
, case when rownum between 2001 and 2100 then 'Y' else 'N' end as is_flag_d3
from
dual
connect by
level <= 10000
;

exec dbms_stats.gather_table_stats(null, 'd', method_opt => 'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 IS_FLAG_D1, IS_FLAG_D2, IS_FLAG_D3');

This is a simplified example of a model where multiple, potentially small, dimensions are stored in a single physical table and the separate dimensions are represented by views that filter the corresponding dimension data from the base table.

So we have a fact table with one million rows and a "collection" dimension table that holds three dimensions, among others.

In order to enable the star transformation bitmap indexes on the foreign keys of the fact table are created.

The dimension table has histograms on the flag columns to tell the optimizer about the non-uniform distribution of the column data.

Now imagine a query where we query the fact table (and possibly do some filtering on the fact table by other means like other dimensions or direct filtering on the fact table) but need to join these three dimensions just for displaying purpose - the dimensions itself are not filtered so the join will not filter out any data.

Let's first have a look at an execution plan of such a simply query with star transformation disabled:

select /*+ no_star_transformation */
count(*)
from
t f
, (select * from d where is_flag_d1 = 'Y') d1
, (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3
where
f.fk1 = d1.id
and f.fk2 = d2.id
and f.fk3 = d3.id
;

SQL> explain plan for
2 select /*+ no_star_transformation */
3 count(*)
4 from
5 t f
6 , (select * from d where is_flag_d1 = 'Y') d1
7 , (select * from d where is_flag_d2 = 'Y') d2
8 , (select * from d where is_flag_d3 = 'Y') d3
9 where
10 f.fk1 = d1.id
11 and f.fk2 = d2.id
12 and f.fk3 = d3.id
13 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format => 'BASIC +ROWS +PREDICATE'));
Plan hash value: 77569906

----------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
|* 2 | HASH JOIN | | 940K|
|* 3 | TABLE ACCESS FULL | D | 100 |
|* 4 | HASH JOIN | | 945K|
|* 5 | TABLE ACCESS FULL | D | 100 |
|* 6 | HASH JOIN | | 950K|
|* 7 | TABLE ACCESS FULL| D | 10 |
| 8 | TABLE ACCESS FULL| T | 1000K|
----------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("F"."FK3"="D"."ID")
3 - filter("IS_FLAG_D3"='Y')
4 - access("F"."FK1"="D"."ID")
5 - filter("IS_FLAG_D1"='Y')
6 - access("F"."FK2"="D"."ID")
7 - filter("IS_FLAG_D2"='Y')

So clearly the optimizer got it quite right - the join to the dimensions is not going to filter out significantly - the slight reduction in rows comes from the calculations based on the histograms generated.

But now try the same again with star transformation enabled:

SQL> explain plan for
2 select /*+ star_transformation opt_param('star_transformation_enabled', 'temp_disable') */
3 count(*)
4 from
5 t f
6 , (select * from d where is_flag_d1 = 'Y') d1
7 , (select * from d where is_flag_d2 = 'Y') d2
8 , (select * from d where is_flag_d3 = 'Y') d3
9 where
10 f.fk1 = d1.id
11 and f.fk2 = d2.id
12 and f.fk3 = d3.id
13 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format => 'BASIC +ROWS +PREDICATE'));
Plan hash value: 459231705

----------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
|* 2 | HASH JOIN | | 9 |
|* 3 | HASH JOIN | | 9 |
|* 4 | HASH JOIN | | 10 |
|* 5 | TABLE ACCESS FULL | D | 10 |
| 6 | TABLE ACCESS BY INDEX ROWID | T | 10 |
| 7 | BITMAP CONVERSION TO ROWIDS| | |
| 8 | BITMAP AND | | |
| 9 | BITMAP MERGE | | |
| 10 | BITMAP KEY ITERATION | | |
|* 11 | TABLE ACCESS FULL | D | 100 |
|* 12 | BITMAP INDEX RANGE SCAN| T_FK1 | |
| 13 | BITMAP MERGE | | |
| 14 | BITMAP KEY ITERATION | | |
|* 15 | TABLE ACCESS FULL | D | 100 |
|* 16 | BITMAP INDEX RANGE SCAN| T_FK3 | |
| 17 | BITMAP MERGE | | |
| 18 | BITMAP KEY ITERATION | | |
|* 19 | TABLE ACCESS FULL | D | 10 |
|* 20 | BITMAP INDEX RANGE SCAN| T_FK2 | |
|* 21 | TABLE ACCESS FULL | D | 100 |
|* 22 | TABLE ACCESS FULL | D | 100 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("F"."FK3"="D"."ID")
3 - access("F"."FK1"="D"."ID")
4 - access("F"."FK2"="D"."ID")
5 - filter("IS_FLAG_D2"='Y')
11 - filter("IS_FLAG_D1"='Y')
12 - access("F"."FK1"="D"."ID")
15 - filter("IS_FLAG_D3"='Y')
16 - access("F"."FK3"="D"."ID")
19 - filter("IS_FLAG_D2"='Y')
20 - access("F"."FK2"="D"."ID")
21 - filter("IS_FLAG_D1"='Y')
22 - filter("IS_FLAG_D3"='Y')

What an astonishing result: Not only Oracle will try now to access all rows of the fact table by single-block random I/O, which by itself can be a disaster for larger real-life fact tables, in particular when dealing with Exadata features like Smart Scans which are only possible with multi-block direct-path reads, but furthermore if this was part of a more complex execution plan look at the cardinality estimates: They are off by five orders of magnitude - very likely a receipt for disaster for any step following afterwards.

The point here is simple: The Star Transformation calculation model obviously doesn't cope with the "collection" of dimensions in a single table very well, but assumes a dimensional model where each dimension is stored in separate table(s). If you don't adhere to that model the calculation will be badly wrong and the results possibly disastrous.

Here the Star Transformation assumes a filtering on dimension tables that are effectively no filter but this is something the current calculation model is not aware of. If you put the three dimensions in separate tables no "artificial" filter is required and hence the calculation won't be mislead.

Of course one could argue that the star transformation optimization seems to do a poor job since the normal optimization based on the same input data produces a much better estimate, but at least for the time being that's the way this transformation works and the model chosen better reflects this.