Here’s an odd little detail about the statistics of column groups. At first glance it’s counter-intuitive but it’s actually an “obvious” (once you’ve thought about it for a bit) consequence of the approximate_ndv() algorithm for gathering stats.
I’ll present it as a question:
I have a table with two columns: flag and v1. Although the column are not declared as non-null neither holds any nulls. If there are 26 distinct values for flag, and 1,000,000 distinct values for v1, what’s the smallest number of distinct values I should see if I create the column group (flag, v1) ?
This is an example from the Oracle Developer Community of using the output of SQL Monitor to detect a problem with object statistics that resulted in an extremely poor choice of execution plan.
A short time after posting the original statement of the problem the OP identified where he thought the problem was and the general principle of why he thought he had a problem – so I didn’t have to read the entire execution plan to work out a strategy that would be (at least) a step in the right direction of solving the performance problem.
This note, then, is just a summary of the five minute that I spent confirming the OP’s hypothesis and explaining how to work around the problem he had identified. It does, however, give a little lead-in to the comments I made to the OP in order to give a more rounded picture of what his execution plan wass telling us.
A question came up on the Oracle-L list server a couple of days ago expressing some surprise at the following execution plan:
This note is a variant of a note that I wrote a few months ago about the impact of nulls on column groups. The effect showed up recently on a client site with a little camouflage that confused the issue for a little while, so I thought it would be worth a repeat. We’ll start with a script to generate some test data:
Sometimes a good thing becomes at bad thing when you hit some sort of special case – today’s post is an example of this that came up on the Oracle-L listserver a couple of years ago with a question about what the optimizer was doing. I’ll set the scene by creating some data to reproduce the problem:
I seem to have written a number of aricles about column groups – the rather special, and most useful, variant on extended stats. To make it as easy as possible to find the right article I’ve decided to produce a little catalogue (catalog) of all the relevant articles, with a little note about the topic each article covers. Some of the articles will link to others in the list, and there are a few items in the list from other blogs. There are also a few items which are the titles of drafts which have been hanging around for the last few years.
A little while ago I added a postscript about gathering stats on a virtual column to a note I’d written five years ago and then updated with a reference to a problem on the Oracle database forum that complained that stats collection had taken much longer after the addition of a function-based index. The problem related to the fact that the function-based index was supported by a virtual column that used an instr() function on a CLOB (XML) column – and gathering stats on the virtual column meant applying the function to every CLOB in the table.
So my post-script, added about a month ago, suggested adding a preference (dbms_stats.set_table_prefs) to avoid gathering stats on that column. There’s a problem with this suggestion – it doesn’t work
Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.
We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).
Today’s little puzzle comes courtesy of the Oracle-L mailing list. A table has two columns (c2 and c3), which contain only the values ‘Y’ and ‘N’, with the following distribution:
select c2, c3, count(*) from t1 group by c2, c3 ; C C COUNT(*) - - ---------- N Y 1994 Y N 71482 2 rows selected.
The puzzle is this – how do you get the optimizer to predict a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:
where c2 = 'N' and c3 = 'N'
Here are 4 tests you might try:
There’s a question on the ODC database forum about column groups that throws up an interesting side point. The OP is looking at a query like the following and asking about which column groups might help the optimizer get the best plan:
Recent comments
2 years 51 weeks ago
3 years 11 weeks ago
3 years 15 weeks ago
3 years 16 weeks ago
3 years 21 weeks ago
3 years 42 weeks ago
4 years 10 weeks ago
4 years 40 weeks ago
5 years 24 weeks ago
5 years 24 weeks ago