Search

Top 60 Oracle Blogs

Recent comments

DBMS_STATS

Frequency histogram – where did that estimate come from?

Frequency histograms in any version of Oracle are pretty cool things, so whenever you have a distribution of data that potentially skewed and the number of distinct values fall under the limit of how many histogram buckets you can have, then a frequency histogram is pretty much a no-brainer. In particular, if you don’t have a large number of distinct values, the nice thing is that you can nominate the largest bucket size possible, and we’ll only create the buckets required to contain the frequency information.

For example, I’ll create table with only 3 distinct values (1,3 and 5) and the distribution of the data is skewed. Then I’ll ask for a 254-bucket histogram, but you can see by querying USER_HISTOGRAMS that only 3 buckets were required to hold the histogram.

12.2.0.1 And Later Support (Limited) Extended Stats On Virtual columns / Column Groups of Expressions

I do have a demo as part of my optimizer related workshops that shows the restriction / limitation of DBMS_STATS not supporting extended statistics on virtual columns / group of expressions, so for example the combination of both expressions and column groups, like ((TRUNC(COL1)), (TRUNC(COL2))).

Surprisingly, when following a certain sequence of operation, this starts working (to some degree) from 12.2.0.1 on.

Single Value Column Frequency Histogram Oracle 12c and later

It is hopefully in the meantime well known that Oracle has introduced in version 11g a new algorithm to gather statistics on a table that no longer requires sorting for determining the critical Number Of Distinct Values (NDV) figure - it instead uses a clever "approximate NDV" algorithm which always reads 100% of the table data and therefore in principle generates very accurate statistics. This new algorithm gets used only when the ESTIMATE_PERCENT parameter to the DBMS_STATS.GATHER*STATS calls is left at default or explicitly passed as "DBMS_STATS.AUTO_SAMPLE_SIZE". This new algorithm is also required in case other new features like "Incremental Statistics" should be used.

In 12c Oracle improved this algorithm allowing the generation of Frequency and the new Top Frequency histogram types in a single pass. The new Hybrid histogram type still requires a separate pass.

No more stale statistics in 19c

There is an odd contradiction that we all encounter for most databases, especially if they are predominantly used during the business day. Here is how that contradiction comes to be – it is in the way that we obtain and use optimizer  statistics on those databases. The contradiction runs like this:

  • To minimize service disruption, we gather statistics at a quiet time, for example, in the middle of the night
  • We then use those statistics during the business day whilst user activity is at its highest.
  • Highest user activity will typically mean the highest frequency of data changes.
  • Hence the statistics are at their peak accuracy when no-one is using them to optimize queries, and they are at their least accurate when everyone is using them to optimize queries!

We can demonstrate this easily with the following script run in 18c.

Statistics on Load

One of the talks I gave recently on the Groundbreaker Yatra tour was on the changes that arrive with the optimizer when you move to 12c and above. We talked about changes to histograms, all of the adaptive features etc, but also, one of the simplest but coolest optimizer improvements when it comes to loading data. When you perform a direct-path insert into an empty table, or a create-table-as-select (CTAS) operation, there is no need to perform an additional DBMS_STATS call to gather statistics at the conclusion of the load. The load process itself handles that for you. Here’s a simple example of that in action

Statistics on Object tables

Way back in Oracle 8.0 we introduced the “Object-Relational” database, which was “the next big thing” in the database community back then. Every vendor was scrambling to show just how cool their database technology was with the object-oriented programming paradigm.

Don’t get me wrong – using the Oracle database object types and features associated with them has made my programming life a lot easier over the years. But for me, it’s always been pretty much limited to that, ie, programming, not actually using the object types in a database design as such. Nevertheless, using objects as columns, or even creating tables of objects is supported by the database. For example, I can create a object type of MY_OBJECT (which could itself be made up of objects) and then have a table, not with that object as a column, but actually a table of that object.

EXPORT not GATHER with DBMS_STATS

Just a short post today on something that came in as a question for the upcoming Office Hours session which I thought could be covered quickly in a blog post without needing a lot of additional discussion for which Office Hours is more suited to.

The question was:

“When I gather statistics using DBMS_STATS, can I just create a statistic table and pass that as a parameter to get the results of the gather”

And the answer simply is “No” Smile but let me clear up the confusion.

How Not to Collect Optimizer Statistics in an Application Engine Program

I was told about a PeopleSoft customer experiencing an Oracle error when collecting statistics during an Application Engine.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ORA-06533: Subscript beyond count 
ORA-06512: at "SYS.DBMS_STATS"…

It is possibly a manifestation of a database bug. The workaround was not to use AUTO_SAMPLE_SIZE, so instead, this customer initially coded an explicit call to DBMS_STATS.GATHER_TABLE_STATS with a specific sample size.
This blog is not about the bug, but how to manage the workaround.

Managing Cost-Based Optimizer Statistics for PeopleSoft

I gave this presentation to UKOUG PeopleSoft Roadshow 2018

PeopleSoft presents some special challenges when it comes to collecting and maintaining the object statistics used by the cost-based optimizer.

Queue-based Concurrent Stats Prototype Implementation

This is just a prototype of a queue-based concurrent statistics implementation - using the same basic implementation I've used a a couple of years ago to create indexes concurrently.There are reasons why such an implementation might be useful - in 11.2.0.x the built-in Concurrent Stats feature might turn out to be not really that efficient by creating lots of jobs that potentially attempt to gather statistics for different sub-objects of the same table at the same time - which can lead to massive contention on Library Cache level due to the exclusive Library Cache locks required by DDL / DBMS_STATS calls.In 12.1 the Concurrent Stats feature obviously got a major re-write by using some more intelligent processing what and how should be processed concurrently - some of the details are exposed via the new view DBA_OPTSTAT_OPERATION_TASKS, but again I've seen it running lots of very