Search

OakieTags

Who's online

There are currently 0 users and 39 guests online.

Recent comments

Affiliations

statistics

Oracle SQL's MEDIAN Function

Article #3 in my ongoing series covering SQL statistic functions in Oracle Database is now up. The topic is the median:
 
MEDIAN: For When You Don't Really Mean It
 
Median is useful in typifying a data set when the data might be skewed, or in the presence of extreme outliers. For example, the U.S. Census Bureau reports median household income for states and counties so as paint a picture unskewed by the presence of, say, Bill Gates or Warren Buffet living just down the street. To learn more, hit the link.

Standard Deviation and the Mean

I've just put up the second in an ongoing series (I hope!) of articles on Oracle SQL's build-in statistical functions. The topic is standard deviation. The previous one, my first, is on the mean. Here are links to the two:
 
2. STDDEV: Standing Sentinel on Your Data
 
1. AVG: What Does it Mean?

Fixed stats

Some time ago I had two questions about fixed objects statistics for which I couldn’t quickly find the answers. Questions are:

At the time of asking these question I’ve read CBO development team blog post on the topic, but still I was unable to answer them distinctly. Well, it appears the questions are simple and could probably be deduced after careful reading (first two questions for sure). The core phrase is:

The DBMS_STATS.GATHER_FIXED_OBJECTS_STATS procedure gathers the same statistics as DBMS_STATS.GATHER_TABLE_STATS except for the number of blocks

Stale Stats

The question of how to list objects in need of stats collection appeared on the OTN database forum today from a DBA who had a system collecting stats at the weekend, but wanted to check whether there were any objects with stale stats during the week. It’s actually very easy to do this check because the schema and database calls to gather stats have an option to “list stale”; they also allow you to “list empty”, and “list auto” – the latter being the objects that would be subject to collection if you change the option to (the default) “gather”. “List auto” seems to be the concatenation of “list stale” and “list empty”, by the way.

So, after reading up the manual pages on dbms_stats ($ORACLE_HOME/rdbms/admin/dbmsstat.sql, or the PL/SQL Packages reference) all it takes is a little wrapping to do the job. The code below is a little overkill because it covers 6 different options in one go:

Maintaining Optimizer Statistics on PeopleSoft on Oracle 11g

I have been considering how to collect optimizer statistics for a PeopleSoft system running on an Oracle 11g database.  Despite 11g being several years old, most of my current customers are still using 10g, though some are looking at the upgrade to 11g.  I believe a slightly different approach is required.

In 2009, I wrote a series of blog postings on the subject of collecting statistics.  However these were all based on Oracle 10g.  I proposed a PL/SQL package that would use meta-data in a database table to determine how to collect statistics on a table, or deliberate supress collection of statistics.

Maintaining Optimizer Statistics on PeopleSoft on Oracle 11g

I have been considering how to collect optimizer statistics for a PeopleSoft system running on an Oracle 11g database.  Despite 11g being several years old, most of my current customers are still using 10g, though some are looking at the upgrade to 11g.  I believe a slightly different approach is required.

In 2009, I wrote a series of blog postings on the subject of collecting statistics.  However these were all based on Oracle 10g.  I proposed a PL/SQL package that would use meta-data in a database table to determine how to collect statistics on a table, or deliberate supress collection of statistics.

Minimum stats

Occasionally I come across complaints that dbms_stats is not obeying the estimate_percent when sampling data and is therefore taking more time than it “should” when gathering stats. The complaint, when I have seen it, always seems to be about the sample size Oracle chose for indexes.

There is a simple but (I believe) undocumented reason for this: because indexes are designed to collate similar data values they are capable of accentuating any skew in the data distribution, which means a sample taken from a small number of leaf blocks can be highly misleading as a guide to the whole index – so Oracle aims for a minimum sample size for gathering index stats.

Histogram Generation

This really could be published in the Errata and Addenda of “Cost Based Oracle – Fundamentals”, but it’s more convenient to publish the whole thing here and just add a link to the Errata pages.

In chapter 7, on page 156, I gave an example of the type of SQL that Oracle runs (in the dbms_stats package) to generate a histogram. A sample of the code, and the plan from the 9.2.0.8 tkprof output, is listed below:

Exchange Partition, Virtual Columns And Column Statistics

Here is an odd bug that can lead to some nasty side effects when using the EXCHANGE PARTITION technique. It is probably there for a very long time, simply because it depends on the usage of virtual columns, and the basic technique of virtual columns was introduced way back in the Oracle 8i times with the introduction of Function Based Indexes.

The problem isn't the exchange partition operation itself, but the accompanying swap of object statistics information, in particular the column statistics.

Look the following sequence of DDL and DML commands and pay then special attention to the output for the column statistics before and after the EXCHANGE PARTITION operation:

Unique Fail

As in – how come a unique (or primary key) index is predicted to return more than one row using a unique scan, for example (running on 10.2.0.3 – but the same type of thing happens on newer versions):