Search

Top 60 Oracle Blogs

Recent comments

Data Warehouse Design: Snowflake Dimensions and Lost Skew Trap

This post is part of a series that discusses some common issues in data warehouses. Originally written in 2018, but I never got round to publishing it.
While I was experimenting with the previous query I noticed that the cost of the execution plans didn't change as I changed the COUNTRY_ISO_CODE, yet the data volumes for different countries are very different.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select c.country_name
, u.cust_state_province
, COUNT(*) num_sales
, SUM(s.amount_sold) total_amount_sold
from sales s
, customers u
, products p
, times t
, countries c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND u.cust_id = s.cust_id
AND u.country_id = c.country_id
AND c.country_iso_code = '&&iso_country_code'
AND p.prod_category_id = 205
and t.fiscal_year = 1999
GROUP BY c.country_name, u.cust_state_province
ORDER BY 1,2
/
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 70%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Plan hash value: 3095970037
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1473 (100)| | | | 45 |00:00:01.77 | 101K| | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | | | 45 |00:00:01.77 | 101K| | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D7C68_A4BC21 | 1 | | | | | | | 0 |00:00:00.13 | 1889 | 1024 | 1024 | |
| * 3 | HASH JOIN | | 1 | 2413 | 94107 | 418 (1)| 00:00:01 | | | 18520 |00:00:00.10 | 1888 | 1185K| 1185K| 639K (0)|
| * 4 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
| 5 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 1138K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.02 | 1521 | | | |
| 6 | SORT GROUP BY | | 1 | 2359 | 101K| 1055 (1)| 00:00:01 | | | 45 |00:00:01.65 | 99111 | 6144 | 6144 | 6144 (0)|
| * 7 | HASH JOIN | | 1 | 3597 | 154K| 1054 (1)| 00:00:01 | | | 64818 |00:00:01.58 | 99111 | 2391K| 1595K| 2025K (0)|
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7C68_A4BC21 | 1 | 2413 | 62738 | 5 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | | | |
| 9 | VIEW | VW_ST_C525CEF3 | 1 | 3597 | 64746 | 1048 (1)| 00:00:01 | | | 64818 |00:00:01.44 | 99111 | | | |

Note:

  • There are 55500 rows on CUSTOMERS
  • There are 23 rows on COUNTRIES
  • Oracle expects 2413 rows on joining those tables
    • 55500÷23= 2413.04, so Oracle assumes the data is evenly distributed between countries, although there are histograms on COUNTRY_ISO_CODE and COUNTRY_ID. 
    • This is sometimes called 'lost skew'. The skew of a dimension does not pass into the cardinality calculation on the fact table.

If I replace the predicate on COUNTRY_ISO_CODE with a predicate on COUNTRY_ID then the estimate of the number of rows from customers is correctly 18520 rows. The cost of the star transformation has gone up from 1473 to 6922.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 70%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Plan hash value: 1339390240

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 6922 (100)| | | | 45 |00:00:01.50 | 97998 | | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | | | 45 |00:00:01.50 | 97998 | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D7C6A_A4BC21 | 1 | | | | | | | 0 |00:00:00.06 | 1524 | 1024 | 1024 | |
| 3 | NESTED LOOPS | | 1 | 18520 | 651K| 417 (1)| 00:00:01 | | | 18520 |00:00:00.04 | 1523 | | | |
| 4 | TABLE ACCESS BY INDEX ROWID | COUNTRIES | 1 | 1 | 15 | 1 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
| 5 | INDEX UNIQUE SCAN | COUNTRIES_PK | 1 | 1 | | 0 (0)| | | | 1 |00:00:00.01 | 1 | | | |
| 6 | TABLE ACCESS FULL | CUSTOMERS | 1 | 18520 | 379K| 416 (1)| 00:00:01 | | | 18520 |00:00:00.03 | 1521 | | | |
| 7 | SORT GROUP BY | | 1 | 2359 | 101K| 6505 (1)| 00:00:01 | | | 45 |00:00:01.43 | 96473 | 6144 | 6144 | 6144 (0)|
| 8 | HASH JOIN | | 1 | 82724 | 3554K| 6499 (1)| 00:00:01 | | | 64818 |00:00:01.37 | 96473 | 2391K| 1595K| 2002K (0)|
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7C6A_A4BC21 | 1 | 18520 | 470K| 25 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | | | |

In fact, I only get the star transformation if I force the issue with a STAR_TRANSFORMATION hint. Otherwise, I get the full scan plan which is much cheaper, but again the cardinality calculation on CUSTOMERS is correct.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 70%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Plan hash value: 3784979335
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1595 (100)| | | | 45 |00:00:00.54 | 2065 | 472 | | | |
| 1 | SORT GROUP BY | | 1 | 45 | 3510 | 1595 (3)| 00:00:01 | | | 45 |00:00:00.54 | 2065 | 472 | 6144 | 6144 | 6144 (0)|
| 2 | HASH JOIN | | 1 | 81133 | 6180K| 1589 (3)| 00:00:01 | | | 64818 |00:00:00.43 | 2065 | 472 | 2337K| 2200K| 2221K (0)|
| 3 | TABLE ACCESS FULL | CUSTOMERS | 1 | 18520 | 379K| 416 (1)| 00:00:01 | | | 18520 |00:00:00.02 | 1521 | 0 | | | |
| 4 | HASH JOIN | | 1 | 81133 | 4516K| 1172 (3)| 00:00:01 | | | 110K|00:00:00.35 | 544 | 472 | 2546K| 2546K| 1610K (0)|
| 5 | TABLE ACCESS FULL | PRODUCTS | 1 | 26 | 208 | 3 (0)| 00:00:01 | | | 26 |00:00:00.01 | 4 | 0 | | | |
| 6 | HASH JOIN | | 1 | 229K| 10M| 1167 (3)| 00:00:01 | | | 246K|00:00:00.30 | 539 | 472 | 1133K| 1133K| 1698K (0)|
| 7 | PART JOIN FILTER CREATE | :BF0000 | 1 | 364 | 9828 | 17 (0)| 00:00:01 | | | 364 |00:00:00.01 | 57 | 0 | | | |
| 8 | NESTED LOOPS | | 1 | 364 | 9828 | 17 (0)| 00:00:01 | | | 364 |00:00:00.01 | 57 | 0 | | | |
| 9 | TABLE ACCESS BY INDEX ROWID| COUNTRIES | 1 | 1 | 15 | 1 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | 0 | | | |
| 10 | INDEX UNIQUE SCAN | COUNTRIES_PK | 1 | 1 | | 0 (0)| | | | 1 |00:00:00.01 | 1 | 0 | | | |
| 11 | TABLE ACCESS FULL | TIMES | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
| 12 | PARTITION RANGE JOIN-FILTER | | 1 | 918K| 19M| 1142 (3)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.21 | 482 | 472 | | | |
| 13 | TABLE ACCESS FULL | SALES | 5 | 918K| 19M| 1142 (3)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.20 | 482 | 472 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------