Well, it surprised me!
I’ve said for a very long time that in principle, ignoring bugs and restrictions, the optimizer will always choose the lowest cost option during its search for an execution path. It turns out that this isn’t true. In a comment attached to a note I had written about a possible bug relating to function-based indexes I was told that there are cases where the optimizer follows a rule that allows it to ignore the lowest cost path if it is derived from a range-based predicate involving unpeekable bind variables.
The trouble is, any statement made about “bind variables” may also apply in any circumstances where the optimizer sees: “unknown value”. Here’s a simplified example that I find a little worrying (running on 11.1):
I think anyone who has read Wolfgang Breitling’s material about the optimizer will be familiar with the concept of Cardinality Feedback and one particular detail that when Oracle gets a cardinality estimate of one for a “driving” table then there’s a good chance that the execution plan will go wrong. (That’s not rule, by the way, just a fairly common observation after things have gone wrong.)
A recent note on OTN reminded me of a particular scenario where this specific problem can occur. It’s not particularly common, but it may hit people who are building data warehouses from multiple different sources. We start with an unlikely looking data set and very simple query:
drop table t1; create table t1 as select rownum id1, rownum id2 from all_objects where rownum <= 10000 ; execute dbms_stats.gather_table_stats(user,'t1'); set autotrace traceonly select count(*) from t1 where id1 = id2 ;
What do you think Oracle estimated cardinality will be for this predciate ? We know, because we saw the data being built, that we’re going to identify 10,000 rows. But the optimizer doesn’t see it that way – check line 2 of the execution plan. The optimizer thinks it will find just one row:
Here’s an odd little bug (I think) in the optimizer that showed itself in a thread on the OTN database forum some time ago. The optimizer can choose an index which is NOT the cheapest index for satisfying a query against a single table. Here’s the demo – which I’ve run against 11.1.0.6 using an 8KB block size, ASSM and system allocated extent sizes:
In the second note on my thesis that “all joins are nested loop joins with different startup costs” I want to look at hash joins, and I’ll start by going back to the execution plan I posted on “Joins – NLJ”. --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| [...]![]()
From time to time I’ve warned people that subquery factoring should be used with a little care if all you’re trying to do is make a query more readable by extracting parts of the SQL into “factored subqueries” (or Common Table Expressions – CTEs – if you want to use the ANSI term for them). [...]
A basic little post to conclude this discussion. The issues regarding whether to go for single column indexes vs. concatenated indexes are similar for Bitmap indexes as they are for B-Tree indexes. It’s generally more efficient to access a concatenated index as it’s only the one index with less processing and less throwaway rowids/rows [...]![]()
Query transformation is a set of techniques used by the optimizer to rewrite a query and optimizer it better. Few optimization paths open up to the optimizer after query transformation. Some query transformations must be costed to be chosen and some do not need to be costed. For example, if a table can be eliminated completely from the join, then that transformation is applied and need to cost that transformation is minimal.
Test case
We will use the following test case to illustrate the concepts behind Query transformation. Some of the optimizations that we see here works from version 11gR1 onwards and so, these test cases might not work in the versions 10g and below.
create table backup.t1 (n1 number not null primary key, n2 number not null, n3 varchar2(256) ); insert into backup.t1 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100); create table backup.t2 (n1 number not null primary key , n2 number not null, n3 varchar2(256) ); alter table backup.t1 add constrainT t1_fk foreign key (n2) references backup.t2(n1) insert into backup.t2 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100); insert into backup.t1 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100);
Join elimination (JE)
I’ve been trying to find a way to post an executable for several weeks because I’ve been sent a simple viewer for 10053 trace files written by Hans-Peter Sloot of Atos Origin and Robert van der Ende. They wrote this viewer because trace files from event 10053 can be enormous, and scrolling back and fore [...]
Oracle introduced with the 10.2.0.4 patch set a significant change how non-existing values are treated when there is a frequency histogram on a column / expression. See Jonathan Lewis' blog post which is probably the most concise description of the issue. In a nutshell the change is about the following (quoted from Jonathan's post): "If the value you supply does not appear in the histogram, but is inside the low/high range of the histogram then the cardinality will be half the cardinality of the least frequently occurring value that is in the histogram".
I'm still a bit puzzled why Oracle introduced such a significant change to the optimizer with a patch set, but one of the most obvious reasons might be that the change allows to generate frequency histograms using a rather low sample size, because there is no longer a similar threat as before when the frequency histogram misses one of the existing values (which would then return an estimate of 1 if they didn't appear in the histogram).
In fact when using the default DBMS_STATS.AUTO_SAMPLE_SIZE Oracle exactly does this: It uses by default a quite low sample size to perform the additional runs required for each histogram - probably an attempt to minimize the additional work that needs to be done for histogram generation.
In it is however quite interesting to see how exactly this behaviour together with the new treatment of non-existing values can turn into a threat as a recent thread on OTN demonstrated.
Consider the following scenario: You have a column with a highly skewed data distribution; there is a single, very popular value, and a few other, very unpopular values.
As discussed in my previous post, it’s a silly myth to suggest a bitmap index should only be used for so-called “low cardinality” columns else the resultant index would be “huge”. I thought it might be worth a quick refresh to see a simple little demonstration why such claims are a nonsense. There is in fact no limit to [...]
Recent comments
1 week 12 hours ago
1 week 18 hours ago
4 weeks 48 min ago
11 weeks 6 days ago
14 weeks 6 days ago
19 weeks 3 days ago
19 weeks 3 days ago
19 weeks 5 days ago