A recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the question, I realised that I had originally supplied a formula for calculating cardinality, rather than selectivity, so I thought I’d supply a proper example.
We’ll start with a script to create some data and stats – and I’m going to start with a script I wrote in Jan 2001 (which is why it happens to use the analyze command rather than dbms_stats.gather_table_stats, even though this example comes from an instance of 22.214.171.124).
Here’s a simple update statement that identifies a few rows in a table then updates a column where a matching value can be derived from another table – it’s an example of an update by correlated subquery:
update t1 set small_vc = ( select max(small_vc) from t2 where t2.id = t1.id ) where mod(id,100) = 0 and exists ( select null from t2 where t2.id = t1.id ) ;
Last year I wrote a few articles for Simpletalk, a web service created by Redgate for users of SQL Server. This year, Redgate is setting up a similar service called “All things Oracle” (I’ve added a link in my blogroll) for Oracle users, and I’ve volunteered to write articles for them occasionally.
Some of the stuff they publish will be complete articles on their website, some will be short introductions with links to the authors’ own websites. My first article for them has just been posted – it’s an article that captures a couple of key points from the optimizer presentation I did at the UKOUG conference a couple of weeks ago.
In our application we extensively use a function-based index on an important table. Couple of days ago I’ve seen an interesting issue associated with this FBI, view and a GROUP BY query. I have to say I don’t have an explanation what exactly it is and how I should call it properly, hence just “trouble” in the subject line.
Here are a few thoughts on dbms_stats – in particular the procedure gather_index_stats.
The procedure counts the number of used leaf blocks and the number of distinct keys using a count distinct operation, which means you get an expensive aggregation operation when you gather stats on a large index. It would be nice efficiency feature if Oracle changed the code to use the new Approximate NDV mechanism for these counts.
If you consider the usage of Table Functions then you should be aware of some limitations to the optimizer calculations, in particular when considering a join between a Table Function and other row sources.
As outlined in one of my previous posts you can and should help the optimizer to arrive at a reasonable cardinality estimate when dealing with table functions, however doing so doesn't provide all necessary inputs to the join cardinality calculation that are useful and available from the statistics when dealing with regular tables.
Therefore even when following the recommended practice regarding the cardinality estimates it is possible to end up with some inaccuracies. This post will explain why.
Join Cardinality Basics
And you should remember that. Here is a nice example how Cost Based Optimizer can miss an obvious option (which is available to human eye and Oracle run-time with a hint) while searching for the best plan. CBO simply doesn’t consider Index Skip Scan with constant ‘in list’ predicates in the query, although it costs skip scan for a join. Such bits are always popping up here and there, so you just can’t say “The Cost Based Optimizer examines all of the possible plans for a SQL statement …”, even if Optimizer Team tells you CBO should do so. There will always be places where CBO will do less than possible to come to the best plan and will need a help from your side, such as re-written SQL or a hint.
If you want to make use of Oracle's cunning Star Transformation feature then you need to be aware of the fact that the star transformation logic - as the name implies - assumes that you are using a proper star schema.
Here is a nice example of what can happen if you attempt to use star transformation but your model obviously doesn't really correspond to what Oracle expects:
purge table d;
drop table t;
purge table t;
create table t
rownum as id
, mod(rownum, 100) + 1 as fk1
, 1000 + mod(rownum, 10) + 1 as fk2
, 2000 + mod(rownum, 100) + 1 as fk3
, rpad('x', 100) as filler
level <= 1000000
exec dbms_stats.gather_table_stats(null, 't')
create bitmap index t_fk1 on t (fk1);
As many have identified, the first thing to point out is that the two queries are not exactly equivalent. The BETWEEN clause is equivalent to a ‘>= and <=’ predicate, whereas the original query only had a ‘> and <’ predicate. The additional equal conditions at each end is significant. The selectivity of the original query is basically costed [...]
I have a table that has 1M rows with dates that span 2000 days, all evenly distributed (so there are 500 rows per day for the mathematically challenged). All stats are 100% accurate and I have an index on the date column. OK, I now select 1 day’s worth of data: [...]