Search

Top 60 Oracle Blogs

Recent comments

CBOddity

Warning – make sure you read to the end of this post.

Someone sent me an email this morning asking how Oracle calculates the index cardinality of an index range scan. Now, as I’ve often said, I don’t do private email about Oracle – unless it just happens to catch my attention and looks like something that is sufficiently generic to be worth publishing.

Today’s emailer was a lucky winner – he’d sent me a very short email that took about 30 seconds to read, contained a significant error, and (at first sight) probably had the right information in it for me to work with. Here’s the problem, as a cut-n-paste from an SQL*Plus session:

SQL> explain plan for select * from admlxa.QRT_BENCH where QRT_BENCH_DATE < :a3;
Explained. 

Elapsed: 00:00:00.01 

SQL> select * from table(dbms_xplan.display); 

PLAN_TABLE_OUTPUT
-----------------
Plan hash value: 2896103184 

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |   424K|    81M|  3170   (4)| 00:00:05 |
|   1 |  TABLE ACCESS BY INDEX ROWID| QRT_BENCH       |   424K|    81M|  3170   (4)| 00:00:05 |
|*  2 |   INDEX RANGE SCAN          | IDX_QRT_BENCH_1 | 78876 |       |   303   (5)| 00:00:01 |
----------------------------------------------------------------------------------------------- 

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("QRT_BENCH_DATE"<:A3) 

Look closely at the “Rows” column – there’s clearly a logic error appearing here. If you select 78,876 rowids from an index you can’t possibly acquire 424,000 rows from the table – so where have those two numbers come from ?

The supporting information looked like this:

num_rows for QRT_BENCH table = 8480798
num_distinct for QRT_BENCH_DATE column = 458 

num_rows for IDX_QRT_BENCH_1 = 8763975
distinct_keys for IDX_QRT_BENCH_1 = 537

Of course, I really needed to know whether this was a single-column or multi-column index if I wanted to model the problem correctly and do further checks on when the error appeared, but this was good enough to get started. We note, first, that the 424K for the table cardinality is just the standard “5% due to range predicate with unknown value”: round(0.05 * 8480798) = 424040.

    Step 1: since I don’t know where the number 78876 comes from, let’s try to work backwards – use it in a bit if arithmetic and see what drops out. Let’s try dividing it into the table cardinality: 424040 / 78876 = 5.3760

    Step 2: Does 5.376 look familiar — it may be a coincidence, but isn’t that really close to 1% of the number of distinct keys in the index ?

At this point I don’t have any time to investigate in detail, but a scratch hypothesis is that Oracle is calculating something like: 5 * (number of rows in table / (number of distinct keys in index)); and maybe that magic five appears through a piece of code that takes 5%, but for some reason then divides by the 1% associated with the selectivity normally associated with function(col).

If I had the time (and the data set) I’d start playing with dbms_stats.set_index_stats() et.al. to see the effect of changing num_distinct and num_rows to see if my initial guess was somewhere in the right ballpark. As it is I’ve just emailed this note back to the source.

Erratum:

Before I published this note I got a reply from the original correspondent, with the following comment:

Sure enough – five years ago I had published some details about exactly this “feature” – and that 537 vs. 5.376 really was just a coincidence.

I decided to publish the note anyway for three reasons –

    one: to make sure you realise that I do make mistakes
    two: to show you that simple games with numbers may give you a working hypothesis
    three: to remind you that once you’ve got a working hypothesis it’s often easy to think of ways to demonstrate that your hypothesis is wrong. (A couple of hacks of the statistics would have shown me a constant 0.009 appearing, rather than anything like 5% divided by 1%.)