Search

Top 60 Oracle Blogs

Recent comments

Clustering_Factor

Originally drafted July 2018

“How do you find out what the clustering_factor of an index would be without first creating the index ?”

I’m not sure this is really a question worth asking or answering[1], but since someone asked it (and given the draft date I have no idea who, where, when or why), here’s an answer for simple heap tables in the good old days before Oracle made public the table_cached_blocks preference. It works by sorting the columns you want in the index together with the table rowid, and then comparing the file/block component of the rowid (cutting the relevant characters from the string representation of the rowid) with the previous one to see if the current row is in the same block as the previous row.  If the row is in a different block we count one, otherwise zero.  Finally we sum the ones.

In the demo I’ve copied a little data from view all_objects, with the intention of creating an index on object_name. After running my counting query I’ve then created the index and checked its clustering_factor to show that I’ve got a match.


rem
rem     Script: clustering_factor_est.sql
rem     Author: J.P.Lewis
rem     Dated:  July 2018
rem

create table t1
as
select  *
from    all_objects
where   rownum <= 10000
;

prompt  ====================
prompt  Tablescan with lag()
prompt  ====================

select
        sum(count_chg)
from    (
        select
                case
                        when substrb(rowid,1,15) <> lag(substrb(rowid,1,15),1,'000000000') over(order by object_name, rowid)
                                then 1
                                else 0
                end     count_chg
        from
                t1
        where
                object_name is not null
        )
;

prompt  ======================
prompt  create index and check
prompt  ======================

create index t1_i1 on t1(object_name);

select  clustering_factor
from    user_indexes 
where   table_name = 'T1'
and     index_name = 'T1_I1'
;

Pasting the output from running the above:


Table created.

====================
Tablescan with lag()
====================

SUM(COUNT_CHG)
--------------
          3901

1 row selected.

======================
create index and check
======================

Index created.


CLUSTERING_FACTOR
-----------------
             3901

1 row selected.


This will work for a global index on a partitioned table, but will give meaningless answers for globally partitioned indexes and local indexes. Furthermore it’s not a mechanism that lends itself to calculating the clustering_factor if you’ve set the table_cached_blocks preference to a value other than 1.

[1] Given the amount of work it would take to run the query to check the clustering_factor you might as well create the index (invisible, perhaps and nologging if that doesn’t cause problems) – which also gives you the option for setting the table_cached_blocks and gathering_stats to see how the clustering_factor varies.

Update (shortly after posting)

It has occurred to me that if you have a “nosegment” index that you’ve been using to test whether or not the optimizer would use it IF you created it, then the one piece of information that is missing from the data dicionary for that index is its clustering_factor (For a multicolumn index you can get a valid value for distinct_keys by creating a column group on the set of columns – which would otherwise be faked when you create the nosegment index.) This code might allow you to write a clustering_factor to the index definition that would give you a better idea of whether the optimizer would use it if you created it.  (To be tested.)

Update (a few days later)

I’ve modified the principle query in three ways:

  • changed the substr() to substrb()
  • changed the extract from the rowid to be (1,15) instead of (7,9) – so that I have the (data_object_id, file_id, block_id) not just (file_id, block_id)
  • added a (redundant) predicate  “object_name is not null” to the inline query