Search

Top 60 Oracle Blogs

Recent comments

histograms

Extended Histograms – 2

Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.

We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).

Extended Histograms

Today’s little puzzle comes courtesy of the Oracle-L mailing list. A table has two columns (c2 and c3), which contain only the values ‘Y’ and ‘N’, with the following distribution:


select   c2, c3, count(*)
from     t1
group by c2, c3
;

C C   COUNT(*)
- - ----------
N Y       1994
Y N      71482

2 rows selected.

The puzzle is this – how do you get the optimizer to predict a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:

where   c2 = 'N' and c3 = 'N'

Here are 4 tests you might try:

exp catch

No-one should be using exp/imp to export and import data any more, they should be using the datapump equivalents expdp/impdp – but if you’re on an older (pre-12c) version of Oracle and still using exp/imp to do things like moving tables with their production statistics over to test systems then be careful that you don’t fall into an obsolescence trap when you finally upgrade to 12c (or Oracle 18).

exp/imp will mess up some of your histograms if you’re still using them to move tables/statistics in 12c.

Hybrid histograms

Just a quick post here so I could consolidate some information about histograms in 12c.

Histogram Threat

Have you ever seen a result like this:


SQL> select sql_id, count(*) from V$sql group by sql_id having count(*) > 1000;

SQL_ID		COUNT(*)
------------- ----------
1dbzmt8gpg8x7	   30516

A client of mine who had recently upgraded to 12.2.0.1 RAC, using DRCP (database resident connection pooling) for an application using PHP was seeing exactly this type of behaviour for a small number of very simple SQL statements and wanted to find out what was going on because they were also seeing an undesirable level of contention in the library cache when the system load increased.

In this note I just want to highlight a particular detail of their problem – with an example – showing how easily histograms can introduce problems if you don’t keep an eye out for the dangers.

One of their queries really was as simple as this:

Histograms

I had a sudden urge to go a bit meta – so here’s a relative frequency histogram of my observations of the general use of histograms in Oracle:

 

Column Stats

I’ve made several comments in the past about the need for being selective when gathering objects statistics with particular reference to the trade-offs when creating histograms. With Oracle 12c it’s now reasonably safe (as far as I’m concerned) to set a method_opt as a table preference that identifies columns where you expect to see Frequency or (pace the buggy behaviour described in a recent post) a Top-N histograms. The biggest problem I have is that I keep forgetting the exact syntax I need – so I’ve written this note more as a reminder to myself than anything else.

Histogram Hassle

I came across a simple performance problem recently that ended up highlighting a problem with the 12c hybrid histogram algorithm. It was a problem that I had mentioned in passing a few years ago, but only in the context of Top-N histograms and without paying attention to the consequences. In fact I should have noticed the same threat in a recent article by Maria Colgan that mentioned the problems introduced in 12c by the option “for all columns size repeat”.

So here’s the context (note – all numbers used in this example are approximations to make the arithmetic obvious).  The client had a query with a predicate like the follwing:

Dynamic Sampling vs. Extended Statistics

On datawarehouse databases, I frequently recommend increasing the level of dynamic sampling because:

Histogram Upgrade – 2

While reading a blog post by Maria Colgan a couple of weeks ago I came across an observation about histograms that I’d not noticed before; worse still, it was a feature that seemed to make some “damage-limitation” advice I’d been giving for years a really bad idea! The threat appeared in these paragraphs:

Setting SIZE REPEAT ensures a histogram will only be created for any column that already has one. If the table is a partitioned table, repeat ensures a histogram will be created for a column that already has one on the global level.

What’s the down side to doing this?

The current number of buckets used in each histogram becomes the limit on the maximum number of buckets used for any histogram created in the future.