Search

Top 60 Oracle Blogs

Recent comments

Column Stats

A little while ago I added a postscript about gathering stats on a virtual column to a note I’d written five years ago and then updated with a reference to a problem on the Oracle database forum that complained that stats collection had taken much longer after the addition of a function-based index. The problem related to the fact that the function-based index was supported by a virtual column that used an instr() function on a CLOB (XML) column – and gathering stats on the virtual column meant applying the function to every CLOB in the table.

So my post-script, added about a month ago, suggested adding a preference (dbms_stats.set_table_prefs) to avoid gathering stats on that column. There’s a problem with this suggestion – it doesn’t work

Oracle doesn’t play nicely when you try to limit the stats collection to a few columns – even in version 18.3. Here’s a demonstration of the effect. First we create a table that includes a column group (extended stats), a virtual column, and a function-based index – i.e. the three different ways of generating user-related virtual columns.


rem
rem     Script:         stats_struggle_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2018
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        lpad(rownum,10,'0')             v2
from
        generator       v1
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

execute dbms_stats.delete_table_stats(user,'t1')

begin
        dbms_output.put_line(
                dbms_stats.create_extended_stats(
                        ownname         => user,
                        tabname         => 'T1',
                        extension       => '(v1, v2)'
                )
        );
end;
/

alter table t1 add id_12 
        generated always as (mod(id,12)) virtual
;

create index t1_id on t1(mod(id,10));


Since I’ve run this on 12c and 18c I’ve included a call to delete table stats after creating the table. So the next step is to enable SQL trace and see what Oracle does under the covers when we try to gather stats on just a couple of columns in the table:


alter session set events '10046 trace name context forever';

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 't1',
                method_opt  => 'for columns size 1 id v1',
                cascade     => false
        );
end;
/

alter session set events '10046 trace name context off';

column column_name  format a32
column data_default format a32

select 
        column_name, data_default,
        num_nulls, num_distinct, to_char(last_analyzed,'hh24:mi:ss') gathered
from    user_tab_cols 
where   table_name = 'T1' 
order by 
        internal_column_id
;

COLUMN_NAME                      DATA_DEFAULT                      NUM_NULLS NUM_DISTINCT GATHERED
-------------------------------- -------------------------------- ---------- ------------ --------
ID                                                                         0        10000 16:13:12
V1                                                                         0        10000 16:13:12
V2
SYS_STUIBQVZ_50PU9_NIQ6_G6_2Y7   SYS_OP_COMBINED_HASH("V1","V2")
ID_12                            MOD("ID",12)
SYS_NC00006$                     MOD("ID",10)

According to the output of the last query we’ve gathered stats only on the two columns specified. But have we really avoided the work ? Here, with some cosmetic tidying, is the SQL executed by the package:

select 
        /*+
                full(t) no_parallel(t) no_parallel_index(t) dbms_stats
                cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
                xmlindex_sel_idx_tbl no_substrb_pad 
         */
        to_char(count(ID)),
        substrb(dump(min(ID),16,0,64),1,240),
        substrb(dump(max(ID),16,0,64),1,240),
        to_char(count(V1)),
        substrb(dump(min(V1),16,0,64),1,240),
        substrb(dump(max(V1),16,0,64),1,240),
        to_char(count(V2)),
        to_char(count(SYS_STUIBQVZ_50PU9_NIQ6_G6_2Y7)),
        to_char(count(ID_12)),
        to_char(count(SYS_NC00006$))
from
        TEST_USER.T1 t  /* NDV,NIL,NIL,NDV,NIL,NIL,ACL,ACL,ACL,ACL*/

We can see that Oracle has done a count(), min() and max() on id and v1, and the “comment” at the end of the text tells us that it’s applied the approximate_ndv mechanism to the first two columns queried but not the rest. However it has count()ed all the other columns – which means it’s evaluated their underlying expressions. So if you were hoping that limiting the columns gathered would avoid a really expensive function call, bad luck.

Threat / Bug alert

A further irritation showed up when I ran a test case that used a deterministic PL/SQL function to generate a virtual column: in 12.1.0.2 the function was called once per row (possibly because every row had a different value) whether or not it was in the list of columns for gathering stats; in 18.3 the function was called nearly twice per row when I didn’t specificy stats gathering for the column and nearly 4 times per row when I did. This looks like it might be a change (possibly accidental) to how deterministic functions can cache their inputs and outputs – possibly something as “minor” as the size of the cache. To be continued when time permits …