Here’s an important thought if you’ve got any large tables which are purely hash partitioned. As a general guideline you should * not *need partition level stats on those tables. The principle of hash partitioned tables is that the rows are distributed uniformly and randomly based on the hash key so, with the assumption that the number of different hash keys is “large” compared to the number of partitions, any one partition should look the same as any other partition.

Consider, as a thought experiment *(and as a warning)*, a table of * product_deliveries* which is hash partitioned by

Consider a query for *“deliveries to Basingstoke”* – in the absence of a histogram on the delivery location the optimizer will produce a cardinality estimate that is:

- total rows in table / number of distinct delivery locations in table

Now consider a query for: *“deliveries of product X to Basingstoke”* – again in the absence of histograms. The optimizer could have two ways of calculating this cardinality:

- total rows in table / (number of distinct products in table * number of distinct delivery locations in table)
- total rows in relevant partition / (number of distinct products in relevant partition * number of distinct delivery locations in relevant partition)

But given the * intent *of hash partitioning to distribute data evenly we can make three further observations:

- the number of rows in any one partition should be very similar to the number of rows in the table divided by the number of partitions
- the number of distinct products in any one partition should be very similar to the number of products in the table divided by the number of partitions
- the number of distinct locations in any one partition should be very similar to the number of distinct locations in the
.**whole table**

The second condition holds because product is the partition key, the third holds because location is * not *the partition key.

So we can rewrite the second, partition-oriented, formula as:

- (total rows in table / number of partitions) / ((number of distinct products in table / number of partitions) * number of distinct locations in table)

which, re-arranging parentheses and cancelling common factors, reduces to:

- total rows in table / (number of distinct products in table * number of distinct locations in table)

which matches the first formula. (Q.E.D.) In the absence of any statistics on hash partitions the optimizer can (ought to be able to) produce reasonable cardinality estimates based purely on table-level stats.

In fact if you look back into the history of partitioning this observation is implicit in the early days of composite partitioning when the only option was for * range/hash* composite partitions – the optimizer never used sub-partition stats to calculate costs or cardinality it used only partition-level statistics. (And it was several years before the optimizer caught up to the fact that (e.g.) range/list composite partitioning might actually need to do arithmetic based on subpartition stats.)

I did say that the example was also a warning. Hash partitioning is “expected” to have a large number of distinct key values compared to the number of partitions. (If you don’t meet this requirement then possibly you should be using * list* partitioning). There’s also a “uniformity” assumption built into the arithmetic (both the basic arithmetic and the hand-waving discussion I produced above). Just imagine that your company supplies a handful of products that for some strange reason are incredibly popular in Basingstoke. If this is the case then the assumption that

A pattern of this type has two generic effects on the optimizer, of course. First is the simple skew in the data – to have a significant impact the number of rows for the problem products would have to be * much larger* than average, which suggests the need for a suitably crafted histogram; secondly there’s an implied correlation between a few products and Basingstoke, so you might even end up creating a column group and manually coding a histogram on it to capture the correlation.

- February 2020 (57)
- January 2020 (88)
- December 2019 (46)
- November 2019 (45)
- October 2019 (64)
- September 2019 (38)
- August 2019 (40)
- July 2019 (45)
- June 2019 (37)
- May 2019 (43)
- April 2019 (49)
- March 2019 (61)
- February 2019 (25)
- January 2019 (35)
- December 2018 (39)
- November 2018 (55)
- October 2018 (75)
- September 2018 (36)
- August 2018 (89)
- July 2018 (58)
- June 2018 (61)
- May 2018 (66)
- April 2018 (40)
- March 2018 (61)
- February 2018 (67)
- January 2018 (61)
- December 2017 (37)
- November 2017 (47)
- October 2017 (57)
- September 2017 (46)

## Recent comments

2 years 4 weeks ago

2 years 17 weeks ago

2 years 21 weeks ago

2 years 22 weeks ago

2 years 26 weeks ago

2 years 47 weeks ago

3 years 16 weeks ago

3 years 45 weeks ago

4 years 30 weeks ago

4 years 30 weeks ago