Search

Top 60 Oracle Blogs

Recent comments

FBI or Virtual

This note has has been sitting with the other 800 drafts since some time in May 2019, and started with a comment about following on from “a recent talk on how to engineer indexes properly”. Unfortunately I don’t remember when I wrote it, or why it came about.I mention this only because the note shows you how you can run into irritating limitations when you’re trying to do things properly.

First, a little script to generate some highly skewed data:

rem
rem     Script:         fbi_or_virtual_desc.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem
rem     Tested on:
rem       19.2.0.0
rem       18.3.0.0
rem       12.2.0.1

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id1,
        rownum                          id2,
        chr(65+trunc(ln(rownum)))       flag,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

If you check the way I’ve defined the flag column you’ll see that it generates uppercase letters of the alphabet (chr(65) = ‘A’), and the use of the ln() (natural logarithm_ function with rownum means the earlier letters in the alphabet have far fewer rows than later letters. Here’s a quick query to show the skew:


select
        flag, count(*)
from
        t1
group by
        flag
order by
        count(*)
/

FLAG   COUNT(*)
---- ----------
A             2
B             5
C            13
D            34
E            94
F           255
G           693
H          1884
I          5123
J         13923
K         37848
L        102880
M        279659
N        557587

14 rows selected.

Imagine I have a requirement to access the rows where flag = ‘D’ very efficiently, and I’d like to do that in descending order of id1. I could create an index on (flag, id1) to help, of course – but that’s going to produce a very large index that (with help from a histogram and literal parameter usage, combined with the excellent clustering pattern) the optimizer would probably use only for the values from ‘A’ to ‘H’.

Clearly the solution is to take advantage of a function-based index here to create index entries for just the rows we are interested in and “hide” the rest. Better still we can create a virtual column and index that because we would then have a simple column name to use in our SQL and we could even make it invisible (from 12c) so that it won’t cause problems with sloppy code that does a “select *” or “insert values()” without a list of columns.

Which is the better choice? Let’s add the virtual column and then create indexes. You’ll notice I created id1 and id2 with identical values – this allows me to do test both strategies in a single pass of the script. If I tried to create the virtual column and the function-based index using id1 in both cases I’d get an error message that I was using the same expression twice – though the actual error would depend on the order of creation.

If I created the virtual column first the error on attempting to create a function-based index would be: “ORA-54018: A virtual column exists for this expression”, and if I created the function-based index first the error on trying to create the virtual column would be “ORA-54015: Duplicate column expression was specified”.

So I’m using id1 to test the virtual column approach and id2 for to test the function-based index approach.


alter table t1 add (
        id1_c generated always as
                (case when flag = 'D' then id1 end)
                virtual
        )
;

create index t1_i0 on t1(id1);

create index t1_i1  on t1(id1_c);
create index t1_i1d on t1(id1_c desc);

create index t1_i2  on t1((case when flag = 'D' then id2 end));
create index t1_i2d on t1((case when flag = 'D' then id2 end) desc);

And here’s is what we see if we “set echo on” during the index creation script:


SQL>
SQL> create index t1_i0 on t1(id1);

Index created.

SQL>
SQL> create index t1_i1  on t1(id1_c);

Index created.

SQL> create index t1_i1d on t1(id1_c desc);
create index t1_i1d on t1(id1_c desc)
                          *
ERROR at line 1:
ORA-54034: virtual columns not allowed in functional index expressions

SQL>
SQL> create index t1_i2  on t1((case when flag = 'D' then id2 end));

Index created.

SQL> create index t1_i2d on t1((case when flag = 'D' then id2 end) desc);

Index created.

We can’t create a descending index on the virtual column – even though we can create a descending function-based index on exactly the expression that we wanted to use for the virtual column – so it seems we can get what we want, provided we are prepared to forgo the niceness of naming (and perhaps making invisible) a virtual column.

So are we happy?

No, we’re not – and here’s why:

select
        index_name, num_rows, leaf_blocks
from
        user_indexes
where
        table_name = 'T1'
order by
        index_name
;

INDEX_NAME             NUM_ROWS LEAF_BLOCKS
-------------------- ---------- -----------
T1_I0                   1000000        2226
T1_I1                        34           1
T1_I2                        34           1
T1_I2D                  1000000        1812

To make a column in an index “descending” Oracle takes the “one’s complement” of each byte in turn then appends 0xFF to the result. The effect of this is to turn a NULL into 0xFF which, as a non-null value, has to be stored in the index with a rowid pointing to the table. So an index holding 34 rows and avoiding the waste of 999,966 redundant entries turns into an index useing (wasting) space on those 999,966 entries when you try to create a descending version of the index. The index we end up with is nearly as big as the index on (flag, id1) would have been.

Conclusion

We can’t create the descending index we want by first creating a virtual column. On the other hand, if we create it as a simple function-based index it’s going to be just as big as the index we were trying to avoid creating.

Footnote 1

The ‘0xFF’ is why you can’t create a unique, descending index unless at least one of the columns is declared not null – a two-column unique index would allow multiple rows to hold (null, null), but try to recreate it descending and those rows now hold duplicated (0xFF, 0xFF) and index creation fails. (Worse still, you may have no problem data when you create the index, and then discover that processes break because they introduce problem data.)

Footnote 2

There’s no such thing as a “descending index” – there are only indexes where some of the columns are stored descending. If you create an index with every single column marked as descending then all you’ve done is increase the size of the index, while confusing the optimizer and disabling some of the clever tricks it can do. The optimizer is perfectly able to read a normal index in descending order. (Which means that this example is a simplified demonstration of the problem – if I wanted the ‘D’ rows in descending order of id1 then I’d have an “order by id1 desc” in my query and the optimizer would produce a plan to walk the ascending (t1_i1) index in descending order.