Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Single Value Column Frequency Histogram Oracle 12c and later

It is hopefully in the meantime well known that Oracle has introduced in version 11g a new algorithm to gather statistics on a table that no longer requires sorting for determining the critical Number Of Distinct Values (NDV) figure - it instead uses a clever "approximate NDV" algorithm which always reads 100% of the table data and therefore in principle generates very accurate statistics. This new algorithm gets used only when the ESTIMATE_PERCENT parameter to the DBMS_STATS.GATHER*STATS calls is left at default or explicitly passed as "DBMS_STATS.AUTO_SAMPLE_SIZE". This new algorithm is also required in case other new features like "Incremental Statistics" should be used.

In 12c Oracle improved this algorithm allowing the generation of Frequency and the new Top Frequency histogram types in a single pass. The new Hybrid histogram type still requires a separate pass.

11g always required a separate pass per histogram to be created - no matter what type of histogram (in 11g there was only Frequency and Height-Balanced) - which resulted in a quite aggressive sampling used for that purpose to minimize the time and resource usage for those separate passes, typically just using 5,500 rows and only sized up in case there were many NULL values (which is a very small sample size for larger tables). Note that this aggressive sampling only applies to the new "approximate NDV" code path - if you specify any explicit ESTIMATE_PERCENT Oracle uses the old code (which requires sorting for determining the NDV figure) and therefore the separate passes required to generate histograms are based on the same sampling percentage as used for the basic table and column statistics - actually Oracle can create a Global Temporary Table in a separate pass covering the required data from several columns in this case to avoid repeatedly scanning the table again and again.

I've recently came across an edge case at a client that showed that the new code has a flaw in the special case of columns that only have a single value on table or (sub)partition level.

First of all in my opinion in this special case of a column having only a single value a (Frequency) histogram doesn't add any value - everything required can be determined from the basic column statistics anyway - low and high value are the same, NDV is 1 and the number of NULLs should be all that is needed for proper cardinality estimates on such columns.

Now the new code path seems to be quite happy to generate histograms on all kinds of columns with low number of distinct values, be it useful or not. Since starting with 12c these histograms should all be covered by the main pass - since they can be represented by Frequency histograms when using the default bucket size of 254 - arguably there isn't much overhead in creating them, so why bother.

However, there is a flaw in the code: When the column has just a single value, then the code for some (unknown) reason determines that it requires a separate pass to generate a histogram and doesn't make use of the information already gathered as part of the main pass - which should hold everything needed.

So Oracle runs a separate pass to gather information for this histogram. Usually this doesn't make much difference, but this separate pass is no longer using the 100% data but resorts to the aggressive sampling as outlined above - if applicable. So usually it might just take those 5,500 rows to create a Frequency histogram on this single value column.

But in the edge case of such a single valued column that is NULL for the majority rows, the code recognizes this and no longer uses the aggressive sampling. Instead - probably depending on the number of NULLs - it needs to read a larger proportion of the table to find some non-NULL data.

In the case of my client this was a very large table, had numerous of such special edge case columns (single valued, NULL for most of the rows) which resulted in dozens of non-sampled full table scans of this very large table taking several days to complete.

When enabling the DBMS_STATS specific tracing the behaviour can be reproduced on the latest available versions (19.3 in my case here) - I've used the following test case to test four different scenarios and how the code behaved:

#eeeeee; border: 1px dashed rgb(204 , 204 , 204); overflow: auto;">set echo on serveroutput on size unlimited lines 800 long 800

select * from v$version;

exec dbms_stats.set_global_prefs('TRACE', 1048575)

-- Testcase 1: Single distinct value, almost all values are NULL except three rows
-- This scenario triggers a separate query to generate a histogram using no (!!) sampling
drop table test_stats purge;

create table test_stats
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
rownum as id,
case when rownum <= 3 then 1 else null end as almost_null
from
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
;

exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')

-- Testcase 2: Single distinct value, but only three rows are NULL
-- This scenario triggers a separate query to generate a histogram but using the usual aggressive sampling (5500 rows)
drop table test_stats purge;

create table test_stats
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
rownum as id,
case when rownum > 3 then 1 else null end as almost_null
from
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
;

exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')

-- Testcase 3: 11 distinct values, but only three rows are NULL
-- This scenario extracts the values directly from the ROWIDs returned from the main query, no separate query
drop table test_stats purge;

create table test_stats
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
rownum as id,
case when rownum > 3 then trunc(rownum / 100000) else null end as almost_null
from
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
;

exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')

-- Testcase 4: Three distinct values in three rows and all other rows have NULL
-- This scenario extracts the values directly from the ROWIDs returned from the main query, no separate query
-- Applies to two distinct values, too
-- So a single distinct value looks like a special case that triggers a separate query
-- If this is with combination of almost all rows having NULLs this query doesn't use sampling
-- ! Big threat if the table is big !
drop table test_stats purge;

create table test_stats
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
rownum as id,
case when rownum <= 2 then rownum else null end as almost_null
from
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
;

exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')

And the output from 19.3 looks like this - the critical parts highlighted in red and bold:

#eeeeee; border: 1px dashed rgb(204 , 204 , 204); overflow: auto;">SQL*Plus: Release 11.2.0.1.0 Production on Mon Aug 19 17:07:42 2019

Copyright (c) 1982, 2010, Oracle. All rights reserved.


Verbunden mit:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production


USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR
-------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ----------------
CBO_TEST orcl19c DELLXPS13 394 51553 19.0.0.0.0 20190819 4192 59 6232:10348 00007FFDE6AE6478 00007FFDE6573A48



SQL>
SQL> select * from v$version;

BANNER BANNER_FULL BANNER_LEGACY CON_ID
-------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ----------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production 0
Version 19.3.0.0.0


SQL>
SQL> exec dbms_stats.set_global_prefs('TRACE', 1048575)

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>
SQL> -- Testcase 1: Single distinct value, almost all values are NULL except three rows
SQL> -- This scenario triggers a separate query to generate a histogram using no (!!) sampling
SQL> drop table test_stats purge;
drop table test_stats purge
*
FEHLER in Zeile 1:
ORA-00942: Tabelle oder View nicht vorhanden


SQL>
SQL> create table test_stats
2 as
3 select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
4 rownum as id,
5 case when rownum <= 3 then 1 else null end as almost_null
6 from
7 (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
8 (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
9 ;

Tabelle wurde erstellt.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')
DBMS_STATS: Record gather table stats operation on table : test_stats
DBMS_STATS: job name:
DBMS_STATS: |--> Operation id: 223
DBMS_STATS: gather_table_stats:
DBMS_STATS: Start gather table stats -- tabname: test_stats
DBMS_STATS: job name:
DBMS_STATS: Started table CBO_TEST.TEST_STATS. at 19.08.19 17:07:43,293000000 +02:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting Mode: FALSE
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix: (parent pfix: ) ownname: tabname: test_stats partname: estimate_percent: 101 block_sample: FALSE method_opt: for columns size 1 id for columns size 254 almost_null degree: 32766 granularity: Z gIdxGran: cascade: NULL stattab: statid: statown: no_invalidate: NULL flush_colu: TRUE fxt: stattype: DATA start_time: 08-19-2019 17:07:43 gathering_group_stats: FALSE cms_only: FALSE force: FALSE options: Z executionPhase: SERIAL (NON-CONCURRENT) incremental: FALSE concurrent: FALSE loc_degree: loc_method_opt: for columns size 1 id for columns size 254 almost_null fxt_typ: 0 tobjn: 72916
DBMS_STATS: Preferences for table CBO_TEST.TEST_STATS
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME -
DBMS_STATS: STATS_RETENTION -
DBMS_STATS: MON_MODS_ALL_UPD_TIME -
DBMS_STATS: SNAPSHOT_UPD_TIME -
DBMS_STATS: TRACE - 0
DBMS_STATS: DEBUG - 0
DBMS_STATS: SYS_FLAGS - 0
DBMS_STATS: SPD_RETENTION_WEEKS - 53
DBMS_STATS: CASCADE - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE - NULL
DBMS_STATS: METHOD_OPT - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY - AUTO
DBMS_STATS: PUBLISH - TRUE
DBMS_STATS: STALE_PERCENT - 10
DBMS_STATS: APPROXIMATE_NDV - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE - FALSE
DBMS_STATS: INCREMENTAL - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL - TRUE
DBMS_STATS: AUTOSTATS_TARGET - AUTO
DBMS_STATS: CONCURRENT - OFF
DBMS_STATS: JOB_OVERHEAD_PERC - 1
DBMS_STATS: JOB_OVERHEAD - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS - 3
DBMS_STATS: TABLE_CACHED_BLOCKS - 1
DBMS_STATS: INCREMENTAL_LEVEL - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS - GATHER
DBMS_STATS: GATHER_AUTO - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE - 0
DBMS_STATS: GATHER_SCAN_RATE - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS - 15
DBMS_STATS: ROOT_TRIGGER_PDB - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS - FALSE
DBMS_STATS: AUTO_TASK_STATUS - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME - 3600
DBMS_STATS: AUTO_TASK_INTERVAL - 900
DBMS_STATS: reporting_man_log_task: target: "CBO_TEST"."TEST_STATS" objn: 72916 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: GLOBAL AND PARTITION global_requested: NULL pfix:
DBMS_STATS: (Baseline)
DBMS_STATS: STATISTIC LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized 0 0
DBMS_STATS: DML Parallelized 0 0
DBMS_STATS: DDL Parallelized 0 0
DBMS_STATS: DFO Trees 0 0
DBMS_STATS: Server Threads 0 0
DBMS_STATS: Allocation Height 0 0
DBMS_STATS: Allocation Width 0 0
DBMS_STATS: Local Msgs Sent 0 0
DBMS_STATS: Distr Msgs Sent 0 0
DBMS_STATS: Local Msgs Recv'd 0 0
DBMS_STATS: Distr Msgs Recv'd 0 0
DBMS_STATS: DOP 0 0
DBMS_STATS: Slave Sets 0 0
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix: ownname: CBO_TEST tabname: TEST_STATS pname: spname: execution phase: 1
Specified DOP=1 blocks=1520 DOP used=1
Specified DOP=1 blocks=1520 DOP used=1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS: NNV NDV AVG MMX HST EP RP NNNP IND CNDV HSTN HSTR COLNAME
DBMS_STATS: Y Y Y Y Y ID
DBMS_STATS: Y Y Y Y Y Y Y Y ALMOST_NULL
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U
DBMS_STATS: Starting query at 19.08.19 17:07:43,386000000 +02:00
DBMS_STATS: select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */to_char(count("ID")),substrb(dump(min("ID"),16,0,64),1,240),substrb(dump(max("ID"),16,0,64),1,240),to_char(count("ALMOST_NULL")),substrb(dump(min("ALMOST_NULL"),16,0,64),1,240),substrb(dump(max("ALMOST_NULL"),16,0,64),1,240),count(rowidtochar(rowid)) from "CBO_TEST"."TEST_STATS" t /* NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U*/
DBMS_STATS: Ending query at 19.08.19 17:07:43,589000000 +02:00
DBMS_STATS: STATISTIC LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized 0 0
DBMS_STATS: DML Parallelized 0 0
DBMS_STATS: DDL Parallelized 0 0
DBMS_STATS: DFO Trees 0 0
DBMS_STATS: Server Threads 0 0
DBMS_STATS: Allocation Height 0 0
DBMS_STATS: Allocation Width 0 0
DBMS_STATS: Local Msgs Sent 0 0
DBMS_STATS: Distr Msgs Sent 0 0
DBMS_STATS: Local Msgs Recv'd 0 0
DBMS_STATS: Distr Msgs Recv'd 0 0
DBMS_STATS: DOP 0 0
DBMS_STATS: Slave Sets 0 0
DBMS_STATS: Approximate NDV Result
DBMS_STATS:

0
1000000
1000000
6
1001984
1000000
3979802


1
Typ=2 Len=2: c1,2


2
Typ=2 Len=2: c4,2


3
3
0
1
3
6
1
AAARzUAAPAAAACBAAA,3,


4
Typ=2 Len=2: c1,2



5
Typ=2 Len=2: c1,2



DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min:
DBMS_STATS: max:
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL NULL NULL NULL NULL NULL NULL NULL 7 0 0
DBMS_STATS: min:
DBMS_STATS: max: Typ=2 Len=2: c1,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL NULL NULL NULL NULL NULL NULL NULL 7 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,2
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max:
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c4,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL 6 3 NULL NULL NULL 1 NULL 7 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,2
DBMS_STATS: start processing top n values for column "ALMOST_NULL"
DBMS_STATS: >> frequency histograms is not feasible
(dbms_lob.getlength(topn_values) <=
length(
)), skip!
DBMS_STATS: no histogram: setting density to 1/ndv (,000001)
DBMS_STATS: no histogram: setting density to 1/ndv (1)
DBMS_STATS: Iteration: 1 numhist: 1
DBMS_STATS: Iteration 1, percentage 100 nblks: 1520
DBMS_STATS: NNV NDV AVG MMX HST EP RP NNNP IND CNDV HSTN HSTR COLNAME
DBMS_STATS: Y ID
DBMS_STATS: Y Y Y Y ALMOST_NULL
Specified DOP=1 blocks=1520 DOP used=1
DBMS_STATS: Building Histogram for ALMOST_NULL
DBMS_STATS: bktnum=254, nnv=3, snnv=3, sndv=1, est_ndv=1, mnb=254
DBMS_STATS: Trying frequency histogram
DBMS_STATS: Starting query at 19.08.19 17:07:43,652000000 +02:00
DBMS_STATS: select substrb(dump(val,16,0,64),1,240) ep, cnt from (select /*+ no_expand_table(t) index_rs(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */"ALMOST_NULL" val,count(*) cnt from "CBO_TEST"."TEST_STATS" t where "ALMOST_NULL" is not null group by "ALMOST_NULL") order by val
DBMS_STATS: Evaluating frequency histogram for col: "ALMOST_NULL"
DBMS_STATS: number of values = 1, max # of buckects = 254, pct = 100, ssize = 3
DBMS_STATS: csr.hreq: 1 Histogram gathering flags: 1031
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist for CBO_TEST.TEST_STATS:
DBMS_STATS: ====================================================================================================
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 4,979802 1000000 0 1000000 1000000 1000000 ,000001 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c4,2
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL 1,000006 3 999997 3 1 1 ,16666666 1030 1 1
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,2
DBMS_STATS: Histograms:
DBMS_STATS: ---------------------------------------------------------------------------------------------------
DBMS_STATS: BVAL RPCNT EAVAL ENVAL EDVAL
DBMS_STATS: ---------------------------------------------------------------------------------------------------
DBMS_STATS: 3 C102 1 Typ=2 Len=2: c1,2
DBMS_STATS: Need Actual Values (DSC_EAVS)
DBMS_STATS: Histogram Type: FREQUENCY Data Type: 2
DBMS_STATS: Histogram Flags: 4100 Histogram Gathering Flags: 1030
DBMS_STATS: Incremental: FALSE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Number of rows in the table = 1000000, blocks = , average row length = 6, chain count = , scan rate = 0, sample size = 1000000, cstats.count = 3, cind = 3
DBMS_STATS: prepare reporting structures...
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ID part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ALMOST_NULL part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table CBO_TEST.TEST_STATS. at 19.08.19 17:07:43,761000000 +02:00
DBMS_STATS: reporting_man_update_task: objn: 72916 auto_stats: FALSE status: COMPLETED ctx.batching_coeff: 0

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>
SQL> -- Testcase 2: Single distinct value, but only three rows are NULL
SQL> -- This scenario triggers a separate query to generate a histogram but using the usual aggressive sampling (5500 rows)
SQL> drop table test_stats purge;

Tabelle wurde geloscht.

SQL>
SQL> create table test_stats
2 as
3 select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
4 rownum as id,
5 case when rownum > 3 then 1 else null end as almost_null
6 from
7 (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
8 (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
9 ;

Tabelle wurde erstellt.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')
DBMS_STATS: Record gather table stats operation on table : test_stats
DBMS_STATS: job name:
DBMS_STATS: |--> Operation id: 224
DBMS_STATS: gather_table_stats:
DBMS_STATS: Start gather table stats -- tabname: test_stats
DBMS_STATS: job name:
DBMS_STATS: Started table CBO_TEST.TEST_STATS. at 19.08.19 17:07:44,511000000 +02:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting Mode: FALSE
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix: (parent pfix: ) ownname: tabname: test_stats partname: estimate_percent: 101 block_sample: FALSE method_opt: for columns size 1 id for columns size 254 almost_null degree: 32766 granularity: Z gIdxGran: cascade: NULL stattab: statid: statown: no_invalidate: NULL flush_colu: TRUE fxt: stattype: DATA start_time: 08-19-2019 17:07:44 gathering_group_stats: FALSE cms_only: FALSE force: FALSE options: Z executionPhase: SERIAL (NON-CONCURRENT) incremental: FALSE concurrent: FALSE loc_degree: loc_method_opt: for columns size 1 id for columns size 254 almost_null fxt_typ: 0 tobjn: 72920
DBMS_STATS: Preferences for table CBO_TEST.TEST_STATS
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME -
DBMS_STATS: STATS_RETENTION -
DBMS_STATS: MON_MODS_ALL_UPD_TIME -
DBMS_STATS: SNAPSHOT_UPD_TIME -
DBMS_STATS: TRACE - 0
DBMS_STATS: DEBUG - 0
DBMS_STATS: SYS_FLAGS - 0
DBMS_STATS: SPD_RETENTION_WEEKS - 53
DBMS_STATS: CASCADE - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE - NULL
DBMS_STATS: METHOD_OPT - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY - AUTO
DBMS_STATS: PUBLISH - TRUE
DBMS_STATS: STALE_PERCENT - 10
DBMS_STATS: APPROXIMATE_NDV - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE - FALSE
DBMS_STATS: INCREMENTAL - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL - TRUE
DBMS_STATS: AUTOSTATS_TARGET - AUTO
DBMS_STATS: CONCURRENT - OFF
DBMS_STATS: JOB_OVERHEAD_PERC - 1
DBMS_STATS: JOB_OVERHEAD - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS - 3
DBMS_STATS: TABLE_CACHED_BLOCKS - 1
DBMS_STATS: INCREMENTAL_LEVEL - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS - GATHER
DBMS_STATS: GATHER_AUTO - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE - 0
DBMS_STATS: GATHER_SCAN_RATE - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS - 15
DBMS_STATS: ROOT_TRIGGER_PDB - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS - FALSE
DBMS_STATS: AUTO_TASK_STATUS - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME - 3600
DBMS_STATS: AUTO_TASK_INTERVAL - 900
DBMS_STATS: reporting_man_log_task: target: "CBO_TEST"."TEST_STATS" objn: 72920 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: GLOBAL AND PARTITION global_requested: NULL pfix:
DBMS_STATS: (Baseline)
DBMS_STATS: STATISTIC LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized 0 0
DBMS_STATS: DML Parallelized 0 0
DBMS_STATS: DDL Parallelized 0 0
DBMS_STATS: DFO Trees 0 0
DBMS_STATS: Server Threads 0 0
DBMS_STATS: Allocation Height 0 0
DBMS_STATS: Allocation Width 0 0
DBMS_STATS: Local Msgs Sent 0 0
DBMS_STATS: Distr Msgs Sent 0 0
DBMS_STATS: Local Msgs Recv'd 0 0
DBMS_STATS: Distr Msgs Recv'd 0 0
DBMS_STATS: DOP 0 0
DBMS_STATS: Slave Sets 0 0
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix: ownname: CBO_TEST tabname: TEST_STATS pname: spname: execution phase: 1
Specified DOP=1 blocks=1794 DOP used=1
Specified DOP=1 blocks=1794 DOP used=1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS: NNV NDV AVG MMX HST EP RP NNNP IND CNDV HSTN HSTR COLNAME
DBMS_STATS: Y Y Y Y Y ID
DBMS_STATS: Y Y Y Y Y Y Y Y ALMOST_NULL
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U
DBMS_STATS: Starting query at 19.08.19 17:07:44,527000000 +02:00
DBMS_STATS: select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */to_char(count("ID")),substrb(dump(min("ID"),16,0,64),1,240),substrb(dump(max("ID"),16,0,64),1,240),to_char(count("ALMOST_NULL")),substrb(dump(min("ALMOST_NULL"),16,0,64),1,240),substrb(dump(max("ALMOST_NULL"),16,0,64),1,240),count(rowidtochar(rowid)) from "CBO_TEST"."TEST_STATS" t /* NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U*/
DBMS_STATS: Ending query at 19.08.19 17:07:44,730000000 +02:00
DBMS_STATS: STATISTIC LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized 0 0
DBMS_STATS: DML Parallelized 0 0
DBMS_STATS: DDL Parallelized 0 0
DBMS_STATS: DFO Trees 0 0
DBMS_STATS: Server Threads 0 0
DBMS_STATS: Allocation Height 0 0
DBMS_STATS: Allocation Width 0 0
DBMS_STATS: Local Msgs Sent 0 0
DBMS_STATS: Distr Msgs Sent 0 0
DBMS_STATS: Local Msgs Recv'd 0 0
DBMS_STATS: Distr Msgs Recv'd 0 0
DBMS_STATS: DOP 0 0
DBMS_STATS: Slave Sets 0 0
DBMS_STATS: Approximate NDV Result
DBMS_STATS:

0
1000000
1000000
6
1001984
1000000
3979802


1
Typ=2 Len=2: c1,2


2
Typ=2 Len=2: c4,2


3
999997
0
1
999997
1999994
1
AAARzYAAPAAAACBAAD,999997,


4
Typ=2 Len=2: c1,2



5
Typ=2 Len=2: c1,2



DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min:
DBMS_STATS: max:
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL NULL NULL NULL NULL NULL NULL NULL 7 0 0
DBMS_STATS: min:
DBMS_STATS: max: Typ=2 Len=2: c1,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL NULL NULL NULL NULL NULL NULL NULL 7 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,2
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max:
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c4,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL 1999994 999997 NULL NULL NULL 1 NULL 7 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,2
DBMS_STATS: start processing top n values for column "ALMOST_NULL"
DBMS_STATS: >> frequency histograms is not feasible
(dbms_lob.getlength(topn_values) <=
length(
)), skip!
DBMS_STATS: no histogram: setting density to 1/ndv (,000001)
DBMS_STATS: no histogram: setting density to 1/ndv (1)
DBMS_STATS: Iteration: 1 numhist: 1
DBMS_STATS: Iteration 1, percentage ,5500016500049500148500445501336504009512 nblks: 1794
DBMS_STATS: NNV NDV AVG MMX HST EP RP NNNP IND CNDV HSTN HSTR COLNAME
DBMS_STATS: Y ID
DBMS_STATS: Y Y Y Y ALMOST_NULL
Specified DOP=1 blocks=1794 DOP used=1
DBMS_STATS: Building Histogram for ALMOST_NULL
DBMS_STATS: bktnum=254, nnv=999997, snnv=5500, sndv=1, est_ndv=1, mnb=254
DBMS_STATS: Trying frequency histogram
DBMS_STATS: Starting query at 19.08.19 17:07:44,730000000 +02:00
DBMS_STATS: select substrb(dump(val,16,0,64),1,240) ep, cnt from (select /*+ no_expand_table(t) index_rs(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */"ALMOST_NULL" val,count(*) cnt from "CBO_TEST"."TEST_STATS" sample ( .5500016500) t where "ALMOST_NULL" is not null group by "ALMOST_NULL") order by val
DBMS_STATS: Evaluating frequency histogram for col: "ALMOST_NULL"
DBMS_STATS: number of values = 1, max # of buckects = 254, pct = ,5500016500049500148500445501336504009512, ssize = 5499
DBMS_STATS: csr.hreq: 1 Histogram gathering flags: 1031
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist for CBO_TEST.TEST_STATS:
DBMS_STATS: ====================================================================================================
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 4,979802 1000000 0 1000000 1000000 1000000 ,000001 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c4,2
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL 2,999994 999997 3 5499 1 1 ,00000050 1030 1 1
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,2
DBMS_STATS: Histograms:
DBMS_STATS: ---------------------------------------------------------------------------------------------------
DBMS_STATS: BVAL RPCNT EAVAL ENVAL EDVAL
DBMS_STATS: ---------------------------------------------------------------------------------------------------
DBMS_STATS: 5499 C102 1 Typ=2 Len=2: c1,2
DBMS_STATS: Need Actual Values (DSC_EAVS)
DBMS_STATS: Histogram Type: FREQUENCY Data Type: 2
DBMS_STATS: Histogram Flags: 4100 Histogram Gathering Flags: 1030
DBMS_STATS: Incremental: FALSE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Number of rows in the table = 1000000, blocks = , average row length = 8, chain count = , scan rate = 0, sample size = 1000000, cstats.count = 3, cind = 3
DBMS_STATS: prepare reporting structures...
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ID part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ALMOST_NULL part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table CBO_TEST.TEST_STATS. at 19.08.19 17:07:44,761000000 +02:00
DBMS_STATS: reporting_man_update_task: objn: 72920 auto_stats: FALSE status: COMPLETED ctx.batching_coeff: 0

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>
SQL> -- Testcase 3: 11 distinct values, but only three rows are NULL
SQL> -- This scenario extracts the values directly from the ROWIDs returned from the main query, no separate query
SQL> drop table test_stats purge;

Tabelle wurde geloscht.

SQL>
SQL> create table test_stats
2 as
3 select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
4 rownum as id,
5 case when rownum > 3 then trunc(rownum / 100000) else null end as almost_null
6 from
7 (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
8 (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
9 ;

Tabelle wurde erstellt.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')
DBMS_STATS: Record gather table stats operation on table : test_stats
DBMS_STATS: job name:
DBMS_STATS: |--> Operation id: 225
DBMS_STATS: gather_table_stats:
DBMS_STATS: Start gather table stats -- tabname: test_stats
DBMS_STATS: job name:
DBMS_STATS: Started table CBO_TEST.TEST_STATS. at 19.08.19 17:07:45,448000000 +02:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting Mode: FALSE
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix: (parent pfix: ) ownname: tabname: test_stats partname: estimate_percent: 101 block_sample: FALSE method_opt: for columns size 1 id for columns size 254 almost_null degree: 32766 granularity: Z gIdxGran: cascade: NULL stattab: statid: statown: no_invalidate: NULL flush_colu: TRUE fxt: stattype: DATA start_time: 08-19-2019 17:07:45 gathering_group_stats: FALSE cms_only: FALSE force: FALSE options: Z executionPhase: SERIAL (NON-CONCURRENT) incremental: FALSE concurrent: FALSE loc_degree: loc_method_opt: for columns size 1 id for columns size 254 almost_null fxt_typ: 0 tobjn: 72921
DBMS_STATS: Preferences for table CBO_TEST.TEST_STATS
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME -
DBMS_STATS: STATS_RETENTION -
DBMS_STATS: MON_MODS_ALL_UPD_TIME -
DBMS_STATS: SNAPSHOT_UPD_TIME -
DBMS_STATS: TRACE - 0
DBMS_STATS: DEBUG - 0
DBMS_STATS: SYS_FLAGS - 0
DBMS_STATS: SPD_RETENTION_WEEKS - 53
DBMS_STATS: CASCADE - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE - NULL
DBMS_STATS: METHOD_OPT - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY - AUTO
DBMS_STATS: PUBLISH - TRUE
DBMS_STATS: STALE_PERCENT - 10
DBMS_STATS: APPROXIMATE_NDV - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE - FALSE
DBMS_STATS: INCREMENTAL - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL - TRUE
DBMS_STATS: AUTOSTATS_TARGET - AUTO
DBMS_STATS: CONCURRENT - OFF
DBMS_STATS: JOB_OVERHEAD_PERC - 1
DBMS_STATS: JOB_OVERHEAD - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS - 3
DBMS_STATS: TABLE_CACHED_BLOCKS - 1
DBMS_STATS: INCREMENTAL_LEVEL - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS - GATHER
DBMS_STATS: GATHER_AUTO - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE - 0
DBMS_STATS: GATHER_SCAN_RATE - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS - 15
DBMS_STATS: ROOT_TRIGGER_PDB - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS - FALSE
DBMS_STATS: AUTO_TASK_STATUS - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME - 3600
DBMS_STATS: AUTO_TASK_INTERVAL - 900
DBMS_STATS: reporting_man_log_task: target: "CBO_TEST"."TEST_STATS" objn: 72921 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: GLOBAL AND PARTITION global_requested: NULL pfix:
DBMS_STATS: (Baseline)
DBMS_STATS: STATISTIC LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized 0 0
DBMS_STATS: DML Parallelized 0 0
DBMS_STATS: DDL Parallelized 0 0
DBMS_STATS: DFO Trees 0 0
DBMS_STATS: Server Threads 0 0
DBMS_STATS: Allocation Height 0 0
DBMS_STATS: Allocation Width 0 0
DBMS_STATS: Local Msgs Sent 0 0
DBMS_STATS: Distr Msgs Sent 0 0
DBMS_STATS: Local Msgs Recv'd 0 0
DBMS_STATS: Distr Msgs Recv'd 0 0
DBMS_STATS: DOP 0 0
DBMS_STATS: Slave Sets 0 0
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix: ownname: CBO_TEST tabname: TEST_STATS pname: spname: execution phase: 1
Specified DOP=1 blocks=1781 DOP used=1
Specified DOP=1 blocks=1781 DOP used=1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS: NNV NDV AVG MMX HST EP RP NNNP IND CNDV HSTN HSTR COLNAME
DBMS_STATS: Y Y Y Y Y ID
DBMS_STATS: Y Y Y Y Y Y Y Y ALMOST_NULL
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U
DBMS_STATS: Starting query at 19.08.19 17:07:45,448000000 +02:00
DBMS_STATS: select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */to_char(count("ID")),substrb(dump(min("ID"),16,0,64),1,240),substrb(dump(max("ID"),16,0,64),1,240),to_char(count("ALMOST_NULL")),substrb(dump(min("ALMOST_NULL"),16,0,64),1,240),substrb(dump(max("ALMOST_NULL"),16,0,64),1,240),count(rowidtochar(rowid)) from "CBO_TEST"."TEST_STATS" t /* NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U*/
DBMS_STATS: Ending query at 19.08.19 17:07:45,651000000 +02:00
DBMS_STATS: STATISTIC LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized 0 0
DBMS_STATS: DML Parallelized 0 0
DBMS_STATS: DDL Parallelized 0 0
DBMS_STATS: DFO Trees 0 0
DBMS_STATS: Server Threads 0 0
DBMS_STATS: Allocation Height 0 0
DBMS_STATS: Allocation Width 0 0
DBMS_STATS: Local Msgs Sent 0 0
DBMS_STATS: Distr Msgs Sent 0 0
DBMS_STATS: Local Msgs Recv'd 0 0
DBMS_STATS: Distr Msgs Recv'd 0 0
DBMS_STATS: DOP 0 0
DBMS_STATS: Slave Sets 0 0
DBMS_STATS: Approximate NDV Result
DBMS_STATS:

0
1000000
1000000
6
1001984
1000000
3979802


1
Typ=2 Len=2: c1,2


2
Typ=2 Len=2: c4,2


3
999997
0
11
999997
1899998
11

AAARzZAAPAAAAVaAFp,100000,AAARzZAAPAAAANAAAb,100000,AAARzZAAPAAAAKMAEf,100000,AAARzZAAPAAAAHYAIj,100000,AAARzZAAPAAAASnABA,100000,AAARzZAAPAAAAbBAGO,100000,AAARzZAAPAAAAYOABl,100000,AAARzZAAPAAAAElAEP,100000,AAARzZAAPAAAAPzAFE,100000,AAARzZAAPAAAACBAAD,99996,AAARzZAAPAAAAd1ACK,1,


4
Typ=2 Len=1: 80


5
Typ=2 Len=2: c1,b



DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min:
DBMS_STATS: max:
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL NULL NULL NULL NULL NULL NULL NULL 7 0 0
DBMS_STATS: min:
DBMS_STATS: max: Typ=2 Len=2: c1,b
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL NULL NULL NULL NULL NULL NULL NULL 7 0 0
DBMS_STATS: min: Typ=2 Len=1: 80
DBMS_STATS: max: Typ=2 Len=2: c1,b
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max:
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c4,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL 1899998 999997 NULL NULL NULL 11 NULL 7 0 0
DBMS_STATS: min: Typ=2 Len=1: 80
DBMS_STATS: max: Typ=2 Len=2: c1,b
DBMS_STATS: start processing top n values for column "ALMOST_NULL"
DBMS_STATS: Parsing topn values..
DBMS_STATS: Extracted 11 rowid-freq pairs.
DBMS_STATS: topn sql (len: 744):
DBMS_STATS: +++ select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */ substrb(dump("ALMOST_NULL",16,0,64),1,240) d_1773304655, rowidtochar(rowid) rwid from "CBO_TEST"."TEST_STATS" t where rowid in (chartorowid('AAARzZAAPAAAACBAAD'),chartorowid('AAARzZAAPAAAAElAEP'),chartorowid('AAARzZAAPAAAAHYAIj'),chartorowid('AAARzZAAPAAAAKMAEf'),chartorowid('AAARzZAAPAAAANAAAb'),chartorowid('AAARzZAAPAAAAPzAFE'),chartorowid('AAARzZAAPAAAASnABA'),chartorowid('AAARzZAAPAAAAVaAFp'),chartorowid('AAARzZAAPAAAAYOABl'),chartorowid('AAARzZAAPAAAAbBAGO'),chartorowid('AAARzZAAPAAAAd1ACK')) order by "ALMOST_NULL"
DBMS_STATS: start executing top n value query..
DBMS_STATS: Starting query at 19.08.19 17:07:45,667000000 +02:00
DBMS_STATS: select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */ substrb(dump("ALMOST_NULL",16,0,64),1,240) d_1773304655, rowidtochar(rowid) rwid from "CBO_TEST"."TEST_STATS" t where rowid in (chartorowid('AAARzZAAPAAAACBAAD'),chartorowid('AAARzZAAPAAAAElAEP'),chartorowid('AAARzZAAPAAAAHYAIj'),chartorowid('AAARzZAAPAAAAKMAEf'),chartorowid('AAARzZAAPAAAANAAAb'),chartorowid('AAARzZAAPAAAAPzAFE'),chartorowid('AAARzZAAPAAAASnABA'),chartorowid('AAARzZAAPAAAAVaAFp'),chartorowid('AAARzZAAPAAAAYOABl'),chartorowid('AAARzZAAPAAAAbBAGO'),chartorowid('AAARzZAAPAAAAd1ACK')) order by "ALMOST_NULL"
DBMS_STATS: removal_count: -243 total_nonnull_rows: 999997 mnb: 254
DBMS_STATS: hist_type in exec_get_topn: 1024 ndv:11 mnb:254
DBMS_STATS: Evaluating frequency histogram for col: "ALMOST_NULL"
DBMS_STATS: number of values = 11, max # of buckects = 254, pct = 100, ssize = 999997
DBMS_STATS: csr.hreq: 1 Histogram gathering flags: 1031
DBMS_STATS: done_hist in process_topn: TRUE csr.ccnt: 1
DBMS_STATS: Mark column "ALMOST_NULL" as top N computed
DBMS_STATS: no histogram: setting density to 1/ndv (,000001)
DBMS_STATS: Skip topn computed column "ALMOST_NULL" numhist: 0
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist for CBO_TEST.TEST_STATS:
DBMS_STATS: ====================================================================================================
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 4,979802 1000000 0 1000000 1000000 1000000 ,000001 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c4,2
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL 2,899998 999997 3 999997 11 11 ,00000050 1286 1 1
DBMS_STATS: min: Typ=2 Len=1: 80
DBMS_STATS: max: Typ=2 Len=2: c1,b
DBMS_STATS: Histograms:
DBMS_STATS: ---------------------------------------------------------------------------------------------------
DBMS_STATS: BVAL RPCNT EAVAL ENVAL EDVAL
DBMS_STATS: ---------------------------------------------------------------------------------------------------
DBMS_STATS: 99996 80 0 Typ=2 Len=1: 80
DBMS_STATS: 199996 C102 1 Typ=2 Len=2: c1,2
DBMS_STATS: 299996 C103 2 Typ=2 Len=2: c1,3
DBMS_STATS: 399996 C104 3 Typ=2 Len=2: c1,4
DBMS_STATS: 499996 C105 4 Typ=2 Len=2: c1,5
DBMS_STATS: 599996 C106 5 Typ=2 Len=2: c1,6
DBMS_STATS: 699996 C107 6 Typ=2 Len=2: c1,7
DBMS_STATS: 799996 C108 7 Typ=2 Len=2: c1,8
DBMS_STATS: 899996 C109 8 Typ=2 Len=2: c1,9
DBMS_STATS: 999996 C10A 9 Typ=2 Len=2: c1,a
DBMS_STATS: 999997 C10B 10 Typ=2 Len=2: c1,b
DBMS_STATS: Need Actual Values (DSC_EAVS)
DBMS_STATS: Histogram Type: FREQUENCY Data Type: 2
DBMS_STATS: Histogram Flags: 4100 Histogram Gathering Flags: 1286
DBMS_STATS: Incremental: FALSE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Number of rows in the table = 1000000, blocks = , average row length = 8, chain count = , scan rate = 0, sample size = 1000000, cstats.count = 13, cind = 13
DBMS_STATS: prepare reporting structures...
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ID part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ALMOST_NULL part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table CBO_TEST.TEST_STATS. at 19.08.19 17:07:45,683000000 +02:00
DBMS_STATS: reporting_man_update_task: objn: 72921 auto_stats: FALSE status: COMPLETED ctx.batching_coeff: 0

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>
SQL> -- Testcase 4: Three distinct values in three rows and all other rows have NULL
SQL> -- This scenario extracts the values directly from the ROWIDs returned from the main query, no separate query
SQL> -- Applies to two distinct values, too
SQL> -- So a single distinct value looks like a special case that triggers a separate query
SQL> -- If this is with combination of almost all rows having NULLs this query doesn't use sampling
SQL> -- ! Big threat if the table is big !
SQL> drop table test_stats purge;

Tabelle wurde geloscht.

SQL>
SQL> create table test_stats
2 as
3 select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
4 rownum as id,
5 case when rownum <= 2 then rownum else null end as almost_null
6 from
7 (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
8 (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
9 ;

Tabelle wurde erstellt.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')
DBMS_STATS: Record gather table stats operation on table : test_stats
DBMS_STATS: job name:
DBMS_STATS: |--> Operation id: 226
DBMS_STATS: gather_table_stats:
DBMS_STATS: Start gather table stats -- tabname: test_stats
DBMS_STATS: job name:
DBMS_STATS: Started table CBO_TEST.TEST_STATS. at 19.08.19 17:07:46,276000000 +02:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting Mode: FALSE
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix: (parent pfix: ) ownname: tabname: test_stats partname: estimate_percent: 101 block_sample: FALSE method_opt: for columns size 1 id for columns size 254 almost_null degree: 32766 granularity: Z gIdxGran: cascade: NULL stattab: statid: statown: no_invalidate: NULL flush_colu: TRUE fxt: stattype: DATA start_time: 08-19-2019 17:07:46 gathering_group_stats: FALSE cms_only: FALSE force: FALSE options: Z executionPhase: SERIAL (NON-CONCURRENT) incremental: FALSE concurrent: FALSE loc_degree: loc_method_opt: for columns size 1 id for columns size 254 almost_null fxt_typ: 0 tobjn: 72922
DBMS_STATS: Preferences for table CBO_TEST.TEST_STATS
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME -
DBMS_STATS: STATS_RETENTION -
DBMS_STATS: MON_MODS_ALL_UPD_TIME -
DBMS_STATS: SNAPSHOT_UPD_TIME -
DBMS_STATS: TRACE - 0
DBMS_STATS: DEBUG - 0
DBMS_STATS: SYS_FLAGS - 0
DBMS_STATS: SPD_RETENTION_WEEKS - 53
DBMS_STATS: CASCADE - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE - NULL
DBMS_STATS: METHOD_OPT - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY - AUTO
DBMS_STATS: PUBLISH - TRUE
DBMS_STATS: STALE_PERCENT - 10
DBMS_STATS: APPROXIMATE_NDV - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE - FALSE
DBMS_STATS: INCREMENTAL - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL - TRUE
DBMS_STATS: AUTOSTATS_TARGET - AUTO
DBMS_STATS: CONCURRENT - OFF
DBMS_STATS: JOB_OVERHEAD_PERC - 1
DBMS_STATS: JOB_OVERHEAD - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS - 3
DBMS_STATS: TABLE_CACHED_BLOCKS - 1
DBMS_STATS: INCREMENTAL_LEVEL - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS - GATHER
DBMS_STATS: GATHER_AUTO - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE - 0
DBMS_STATS: GATHER_SCAN_RATE - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS - 15
DBMS_STATS: ROOT_TRIGGER_PDB - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS - FALSE
DBMS_STATS: AUTO_TASK_STATUS - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME - 3600
DBMS_STATS: AUTO_TASK_INTERVAL - 900
DBMS_STATS: reporting_man_log_task: target: "CBO_TEST"."TEST_STATS" objn: 72922 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: GLOBAL AND PARTITION global_requested: NULL pfix:
DBMS_STATS: (Baseline)
DBMS_STATS: STATISTIC LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized 0 0
DBMS_STATS: DML Parallelized 0 0
DBMS_STATS: DDL Parallelized 0 0
DBMS_STATS: DFO Trees 0 0
DBMS_STATS: Server Threads 0 0
DBMS_STATS: Allocation Height 0 0
DBMS_STATS: Allocation Width 0 0
DBMS_STATS: Local Msgs Sent 0 0
DBMS_STATS: Distr Msgs Sent 0 0
DBMS_STATS: Local Msgs Recv'd 0 0
DBMS_STATS: Distr Msgs Recv'd 0 0
DBMS_STATS: DOP 0 0
DBMS_STATS: Slave Sets 0 0
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix: ownname: CBO_TEST tabname: TEST_STATS pname: spname: execution phase: 1
Specified DOP=1 blocks=1520 DOP used=1
Specified DOP=1 blocks=1520 DOP used=1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS: NNV NDV AVG MMX HST EP RP NNNP IND CNDV HSTN HSTR COLNAME
DBMS_STATS: Y Y Y Y Y ID
DBMS_STATS: Y Y Y Y Y Y Y Y ALMOST_NULL
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U
DBMS_STATS: Starting query at 19.08.19 17:07:46,292000000 +02:00
DBMS_STATS: select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */to_char(count("ID")),substrb(dump(min("ID"),16,0,64),1,240),substrb(dump(max("ID"),16,0,64),1,240),to_char(count("ALMOST_NULL")),substrb(dump(min("ALMOST_NULL"),16,0,64),1,240),substrb(dump(max("ALMOST_NULL"),16,0,64),1,240),count(rowidtochar(rowid)) from "CBO_TEST"."TEST_STATS" t /* NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U*/
DBMS_STATS: Ending query at 19.08.19 17:07:46,448000000 +02:00
DBMS_STATS: STATISTIC LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized 0 0
DBMS_STATS: DML Parallelized 0 0
DBMS_STATS: DDL Parallelized 0 0
DBMS_STATS: DFO Trees 0 0
DBMS_STATS: Server Threads 0 0
DBMS_STATS: Allocation Height 0 0
DBMS_STATS: Allocation Width 0 0
DBMS_STATS: Local Msgs Sent 0 0
DBMS_STATS: Distr Msgs Sent 0 0
DBMS_STATS: Local Msgs Recv'd 0 0
DBMS_STATS: Distr Msgs Recv'd 0 0
DBMS_STATS: DOP 0 0
DBMS_STATS: Slave Sets 0 0
DBMS_STATS: Approximate NDV Result
DBMS_STATS:

0
1000000
1000000
6
1001984
1000000
3979802


1
Typ=2 Len=2: c1,2


2
Typ=2 Len=2: c4,2


3
2
0
2
2
4
2
AAARzaAAPAAAACBAAB,1,AAARzaAAPAAAACBAAA,1,


4
Typ=2 Len=2: c1,2



5
Typ=2 Len=2: c1,3



DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min:
DBMS_STATS: max:
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL NULL NULL NULL NULL NULL NULL NULL 7 0 0
DBMS_STATS: min:
DBMS_STATS: max: Typ=2 Len=2: c1,3
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL NULL NULL NULL NULL NULL NULL NULL 7 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,3
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max:
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 3979802 1000000 NULL NULL NULL 1001984 NULL 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c4,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL 4 2 NULL NULL NULL 2 NULL 7 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,3
DBMS_STATS: start processing top n values for column "ALMOST_NULL"
DBMS_STATS: Parsing topn values..
DBMS_STATS: Extracted 2 rowid-freq pairs.
DBMS_STATS: topn sql (len: 438):
DBMS_STATS: +++ select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */ substrb(dump("ALMOST_NULL",16,0,64),1,240) d_1773304655, rowidtochar(rowid) rwid from "CBO_TEST"."TEST_STATS" t where rowid in (chartorowid('AAARzaAAPAAAACBAAA'),chartorowid('AAARzaAAPAAAACBAAB')) order by "ALMOST_NULL"
DBMS_STATS: start executing top n value query..
DBMS_STATS: Starting query at 19.08.19 17:07:46,464000000 +02:00
DBMS_STATS: select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */ substrb(dump("ALMOST_NULL",16,0,64),1,240) d_1773304655, rowidtochar(rowid) rwid from "CBO_TEST"."TEST_STATS" t where rowid in (chartorowid('AAARzaAAPAAAACBAAA'),chartorowid('AAARzaAAPAAAACBAAB')) order by "ALMOST_NULL"
DBMS_STATS: removal_count: -252 total_nonnull_rows: 2 mnb: 254
DBMS_STATS: hist_type in exec_get_topn: 1024 ndv:2 mnb:254
DBMS_STATS: Evaluating frequency histogram for col: "ALMOST_NULL"
DBMS_STATS: number of values = 2, max # of buckects = 254, pct = 100, ssize = 2
DBMS_STATS: csr.hreq: 1 Histogram gathering flags: 1031
DBMS_STATS: done_hist in process_topn: TRUE csr.ccnt: 1
DBMS_STATS: Mark column "ALMOST_NULL" as top N computed
DBMS_STATS: no histogram: setting density to 1/ndv (,000001)
DBMS_STATS: Skip topn computed column "ALMOST_NULL" numhist: 0
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist for CBO_TEST.TEST_STATS:
DBMS_STATS: ====================================================================================================
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID 4,979802 1000000 0 1000000 1000000 1000000 ,000001 0 0 0
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c4,2
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME ACL SNNV NV SNNVDV NDV SNDV DENS HGATH HIND CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL 1,000004 2 999998 2 2 2 ,25 1286 1 1
DBMS_STATS: min: Typ=2 Len=2: c1,2
DBMS_STATS: max: Typ=2 Len=2: c1,3
DBMS_STATS: Histograms:
DBMS_STATS: ---------------------------------------------------------------------------------------------------
DBMS_STATS: BVAL RPCNT EAVAL ENVAL EDVAL
DBMS_STATS: ---------------------------------------------------------------------------------------------------
DBMS_STATS: 1 C102 1 Typ=2 Len=2: c1,2
DBMS_STATS: 2 C103 2 Typ=2 Len=2: c1,3
DBMS_STATS: Need Actual Values (DSC_EAVS)
DBMS_STATS: Histogram Type: FREQUENCY Data Type: 2
DBMS_STATS: Histogram Flags: 4100 Histogram Gathering Flags: 1286
DBMS_STATS: Incremental: FALSE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Number of rows in the table = 1000000, blocks = , average row length = 6, chain count = , scan rate = 0, sample size = 1000000, cstats.count = 4, cind = 4
DBMS_STATS: prepare reporting structures...
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ID part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ALMOST_NULL part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table CBO_TEST.TEST_STATS. at 19.08.19 17:07:46,464000000 +02:00
DBMS_STATS: reporting_man_update_task: objn: 72922 auto_stats: FALSE status: COMPLETED ctx.batching_coeff: 0

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>

So this test case shows that in case of a column having more than a single value the code happily extracts the required information using the ROWIDs collected during the main pass and doesn't require a separate pass (Test cases 3 and 4), but in the special case of a single valued column it bails out and runs a separate query typically using sampling, except for the edge case of a single valued column having NULLs in most of the rows (Test cases 1 and 2).

I've discussed this issue with Nigel Bayliss, the optimizer product manager at Oracle, and he agreed that it looks like a bug - and created a corresponding bug (Bug 30205756 - FREQUENCY HISTOGRAM NOT CREATED USING FULL ROW SAMPLE, which is probably not marked as public), just in case you come across this issue and want to provide this reference to Oracle Support.

The curious case of enq: TX – row lock contention

So we had a doubtful pleasure of migrating a few databases from Oracle Exadata X3-2 to Oracle Exadata X7-2 Cloud at Customer.
Why doubtful? Well, this a material for a whole different story with a lot of beer – let me just say, that CC gen 1 was a bit rough around the edges </p />
</p></div>

    	  	<div class=

Oracle State Objects and Reading System State Dumps Hacking Session Video

I have uploaded the 3 hour hacking session video about Oracle instance state objects and how to read Oracle systemstate dumps to my YouTube channel:
Note that as I just uploaded the video, you may want to wait for a few hours until YouTube publishes HD version (currently it's available only at 360p, but that should change soon).
The slides are in Slideshare:
https://www.slideshare.net/tanelp/oracle-state-objects-and-system-state-... Enjoy! :-)

Oracle State Objects and Reading System State Dumps Hacking Session Video

I have uploaded the 3 hour hacking session video about Oracle instance state objects and how to read Oracle systemstate dumps to my YouTube channel:
Note that as I just uploaded the video, you may want to wait for a few hours until YouTube publishes HD version (currently it's available only at 360p, but that should change soon).
The slides are in Slideshare:
https://www.slideshare.net/tanelp/oracle-state-objects-and-system-state-... Enjoy! :-)

Sharing my screen to the web (from Oracle free tier compute instance) using tmux and gotty

Sharing my screen to the web (from Oracle free tier compute instance) using tmux and gotty

For a long time, I use tmux for my live demos because I can multiplex my screen between my laptop and the beamer, I can show several panes, and I can script my commands using send-keys. I also use tmux to keep a stateful work environment which I run on my Oracle Cloud always free tier Compute Instance, which is 100% free and 100% available, and 100% accessible

  • 100% because even if you need to create a trial account with your credit card it is never charged and the free tier remains after the 30 days trial
  • 100% available because you never have to shut it down and it will never be terminated as long as it was used in the last 3 months (there was a bug recently but was quickly solved even if the free tier has no support)
  • 100% accessible because you can ssh from the internet, and your private key gives you access to the opc user that can sudo to root.

Here I’ll describe what I did to get further in sharing my screen. TMUX can share by attaching clients, which means open a new tty, ssh to the host, and run “tmux attach”. But in case there is any problem with the beamer, or if the screen is too large, I want a simple solution for attendees to see my screen on their laptop or tablet, in a web browser. I need two thins for that: open the port for http access and use gotty to run a tmux session in read-only.

Open a TCP port to the internet

I’ll share my screen on the port 12345 and I need first to open in on the host:

sudo iptables -I INPUT 5 -i ens3 -p tcp --dport 12345 -m state --state NEW,ESTABLISHED -j ACCEPT

Then on my network, I add this port in addition to SSH access from public network:

I follow this path: Compute Instance -> VCN Details -> Security List -> Default Security List -> Add Ingress Rule -> Source CDR = 0.0.0.0/0 and destination port = 12345

Compute Instance -> VCN Details -> Security List -> Default Security List -> Add Ingress Rule -> Source CDR and destination port

Here it is, the world wide web has access to my network on port 12345 and now I have to listen to it. I’ll use:

gotty

yudai/gotty

I download the latest version to my opc home dir:

wget -O- https://github.com/yudai/gotty/releases/download/v1.0.1/gotty_linux_amd64.tar.gz | tar -C ~ -zxvf -

I define a basic configuration file:

cat > ~/.gotty <<-CAT
preferences {
font_size = 14
background_color = "rgb(42, 42, 42)"
}
CAT

I generate a certificate if I want to enable TLS later:

openssl req -x509 -nodes -days 9999 -newkey rsa:2048 -keyout ~/.gotty.key -out ~/.gotty.crt <CH
Vaud
Lausanne
pachot.net
franck
$(hostname)
hello@pachot.net
stdin

And I can now run something to test it:

./gotty --port 12345 top -c 

which logs the following

I connect to my host:port (the IP address from the public network is the same used to ssh — not the one displayed by gotty here) from my browser and can see a terminal with “top -c” running:

That’s perfect. gotty has logged my connection:


tmux

As I use this to share my tmux screen, I add the following in .tmux.conf to start a tmux attach to the current tmux session through gotty, binding this to ^T here:

grep "gotty" ~/.tmux.conf || cat >> ~/.tmux.conf <<'CAT'
# gotty https://github.com/yudai/gotty
bind-key C-t new-window -P -F "screen shared on: http://12345.pachot.net" "TMUX= ~/gotty --port 12345 --title-format "@FranckPachot" --width $(( 0 + `tmux display -p '#{window_width}'` )) --height $(( 4 + `tmux display -p '#{window_height}'` )) --reconnect tmux attach -r -t `tmux display -p '#S'`"
CAT
  • --port defines the port to listen to (opened in iptables and in VCN)
  • I use tmux variables #{window_width} and #{window_height} to get the same number of columns and lines (width and height) because tmux resizes in all clients to fit the smallest one.
  • --reconnect tries to reconnect every 10 seconds if connection is lost
  • tmux attach -r (read-only as additional security but gotty starts in read-only by default). I unset TMUX variable as it is not a nested session.

Now when I am on TMUX, I can C-B C-T to open a window that runs gotty and attach to my current session. Then I just have to share the url and people can see my screen on their screen.

Here is my browser and my tty:

Midlands Microsoft 365 and Azure User Group – January 2020

https://oracle-base.com/blog/wp-content/uploads/2019/09/m365-and-azure-3... 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px" />

Last night I went to the Midlands Microsoft 365 and Azure User Group. It was co-organised by Urfaan Azhar and Lee Thatcher from Pure Technology Group, and Adrian Newton from my company.

First up was Matt Fooks speaking about “Microsoft Cloud App Security”. Matt covered a number of use cases, including shadow IT detection, log collection, checking compliance of applications and using policies to protect systems. He demoed a few of these. The flexibility of the cloud is great, but it also allows you to create a security nightmare as your cloud estate grows. MCAS gives you visibility and control over that. I guess the value of this will depend how far down the cloud journey you are. If you’ve got a bit of IaaS that’s being managed centrally, this isn’t going to sounds too interesting. Once you open the gates and let other people/teams get involved in provisioning services, you are going to need something like this to keep some level of control over the sprawl.

I heard one of the attendees mention Snowflake, so I collared him during the break to discuss it. I’m not so interested in the headline stuff. I care more about the boring day-to-day stuff, as people tend not to talk about it. It was a really interesting. Networking is great.

Next up was Richard Harrison with “The Journey from being a DBA Guy for 20 years to becoming an Azure Guy”. Richard was a fellow Oracle ACE before he moved in a new direction, so it was good to see him again. We spent some time chatting before his session, and I kept him for ages after the session chatting about a bunch of cloud related stuff. As the name of the session suggests, this was about his transition. What made him decide to make the move. His first opening into the world of the cloud. Some of the steps along the way and the amount of work involved. Richard is a smart guy, so when he says it’s hard work to keep on top of things due to the rate of change in the cloud, that should be a warning sign for people who are focused on the status quo.

There were some pieces that related nicely to the first session. For example, he discussed the control/governance aspect. To paraphrase, some services like security, budget management and databases are kept under central control, because of the critical nature of them, but people are given pretty much a free reign with platforms within their resource group. Why? Because there are loads of services with loads of features and trying to manage them centrally is practically impossible. I think of this as a move to a high-trust culture. Of course, you have tools to monitor what’s going on to stop people doing crazy stuff, but ultimately you have to start releasing control (where it’s appropriate) or people will look elsewhere. </p />
</p></div>

    	  	<div class=

Getting the most out of in-memory – part 2

In the previous post, I described the importance of checking V$IM_SEGMENTS to ensure that the in-memory store is populated to have confidence that you are indeed getting the benefit of the feature. But even if the store has been populated, when it comes to virtual columns, additional care is needed to make sure you can take advantage of the feature. Here is an example of how you might not be getting the in-memory benefits when you were expecting to (and how to solve it).

I’ll start by creating a table T as 20 copies of DBA_OBJECTS, and I’ll add a virtual column called OBJECT_DETAILS. The definition for the virtual column is somewhat nonsensical, being many nested SQRT calls, but this also makes it easy to check for in-memory usage (as I’ll explain shortly).


SQL> set termout on
SQL> create table t (
  2         owner varchar2(30),
  3         object_type varchar2(80),
  4         object_id number,
  5         object_details generated always as ( sqrt(sqrt(sqrt(sqrt(object_id))))));

Table created.

SQL> insert /*+ APPEND */ into t (owner,object_type,object_id)
  2      select owner,object_type,object_id
  3      from dba_objects, ( select 1 from dual connect by level <= 20 );

1656460 rows created.

SQL> commit;

Commit complete.

Before I start to tinker with in-memory, I’ll do a full scan of T and force it to evaluate the virtual column OBJECT_DETAILS. Because of the many nested square root calls, this is a very expensive query and takes over 5 seconds even though the table is quite small.


SQL> set timing on
SQL> select max(object_details)
  2  from t;

MAX(OBJECT_DETAILS)
-------------------
         2.12899839

1 row selected.

Elapsed: 00:00:05.73

This elapsed time can hence become our measuring stick for the detection of in-memory usage. If the OBJECT_DETAILS column is pre-calculated and housed in the in-memory store, then the elapsed time should be far lower. To achieve that, I’ll now put the table into the in-memory store


SQL> alter table t inmemory;

Table altered.

And to kickstart the in-memory store process , I’ll scan the table


SQL> select count(*) from t;

  COUNT(*)
----------
   1656460

1 row selected.

After waiting for a short while, I can check V$IM_SEGMENTS to make sure that table has been populated.


SQL> select owner,segment_name,bytes,bytes_not_populated,populate_status from v$im_segments;

OWNER      SEGMENT_NA      BYTES BYTES_NOT_POPULATED POPULATE_STAT
---------- ---------- ---------- ------------------- -------------
MCDONAC    T            46047232                   0 COMPLETED

As I said in my previous post, the population of V$IM_SEGMENTS should mean that all is in readiness for me to bask in the performance goodness of in-memory Smile. So now I’ll re-run my query


SQL> select max(object_details)
  2  from t;

MAX(OBJECT_DETAILS)
-------------------
         2.12899839

1 row selected.

Elapsed: 00:00:05.41

Well, that was disappointing Smile. The query performance was unchanged. Maybe the optimizer opted not to the use the in-memory store for this query. I can check the execution plan to see if that was the case.


SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  ay7fsgvavb6r4, child number 0
-------------------------------------
select max(object_details) from t

Plan hash value: 2966233522

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |       |       |    73 (100)|          |
|   1 |  SORT AGGREGATE             |      |     1 |    22 |            |          |
|   2 |   TABLE ACCESS INMEMORY FULL| T    |  1656K|    34M|    73  (20)| 00:00:01 |
------------------------------------------------------------------------------------

Even though the in-memory store has been populated and the optimizer is suggesting that an in-memory scan is being performed, additional care is required when it comes to the in-memory option and virtual columns. There is an initialization parameter that controls how the in-memory option interacts with virtual columns


SQL> show parameter inmemory_virtual_columns

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------
inmemory_virtual_columns             string      MANUAL

The default setting of MANUAL means that it is the responsibility of the database administrator to explicitly nominate which virtual columns need to be placed into the in-memory store. So I’ll go ahead and do that.


SQL> alter table t inmemory(object_details);

Table altered.

I’ll do a rescan of the table to prime the in-memory store and try again.


SQL> select count(*) from t;

  COUNT(*)
----------
   1656460

1 row selected.

SQL> select owner,segment_name,bytes,bytes_not_populated,populate_status from v$im_segments;

OWNER      SEGMENT_NA      BYTES BYTES_NOT_POPULATED POPULATE_STAT
---------- ---------- ---------- ------------------- -------------
MCDONAC    T            46047232                   0 COMPLETED

1 row selected.

SQL> select max(object_details)
  2  from t;

MAX(OBJECT_DETAILS)
-------------------
         2.12899839

1 row selected.

Elapsed: 00:00:05.30

Well, that was disappointing Smile. The query performance was still unchanged. If you refer back to the previous post, merely changing the metadata for columns is not the catalyst for a (re)population of the in-memory store. Hence I will toggle the entire table to reset its in-memory status.


SQL> alter table t no inmemory;

Table altered.

SQL> pause

SQL> alter table t inmemory;

Table altered.

SQL> pause

SQL> alter table t inmemory(object_details);

Table altered.

Now I will rescan the table to prime the in-memory store and try one more time.


SQL> select count(*) from t;

  COUNT(*)
----------
   1656460

1 row selected.

SQL> select owner,segment_name,bytes,bytes_not_populated,populate_status from V$IM_SEGMENTS;

OWNER      SEGMENT_NA      BYTES BYTES_NOT_POPULATED POPULATE_STAT
---------- ---------- ---------- ------------------- -------------
MCDONAC    T            46047232                   0 COMPLETED

1 row selected.

SQL> select max(object_details)
  2  from t
  3
SQL> pause

SQL> /

MAX(OBJECT_DETAILS)
-------------------
         2.12899839

1 row selected.

Elapsed: 00:00:00.00

And there we go! Now I scan the expensive column instantaneously because the in-memory store has been populated with the virtual column values.

Detecting the problem

You might be thinking: “Yes, but my virtual column does not have 20 nested SQRT calls, so how I do know if I have got things set up correctly?”

I’ll reset the table to not having the OBJECT_DETAILS column in the in-memory store


SQL> alter table t no inmemory;

Table altered.

SQL> alter table t inmemory;

Table altered.

SQL> select count(*) from t;

  COUNT(*)
----------
   1656460

1 row selected.

SQL> select owner,segment_name,bytes,bytes_not_populated,populate_status from v$im_segments;

OWNER      SEGMENT_NA      BYTES BYTES_NOT_POPULATED POPULATE_STAT
---------- ---------- ---------- ------------------- -------------
MCDONAC    T            46047232                   0 COMPLETED

At this point, the standard table columns are now populated into the in-memory store but the OBJECT_DETAILS column is not. I can query V$IM_COLUMN_LEVEL to check the in-memory metadata for each column.


SQL> select table_name, column_name, inmemory_compression
  2  from V$IM_COLUMN_LEVEL
  3  where table_name = 'T';

TABLE_NAME                     COLUMN_NAME                    INMEMORY_COMPRESSION
------------------------------ ------------------------------ --------------------
T                              OBJECT_DETAILS                 UNSPECIFIED
T                              OWNER                          DEFAULT
T                              OBJECT_TYPE                    DEFAULT
T                              OBJECT_ID                      DEFAULT

But as we saw above, just changing the metadata for the OBJECT_DETAILS is not sufficient to force it into the in-memory store. I’ll nominate OBJECT_DETAILS and you can see that V$IM_COLUMN_LEVEL immediately reflects that change, but that does not force a repopulation of the in-memory store.


SQL> alter table t inmemory(object_details);

Table altered.


SQL> select table_name, column_name, inmemory_compression
  2  from v$im_column_level
  3  where table_name = 'T';

TABLE_NAME                     COLUMN_NAME                    INMEMORY_COMPRESSION
------------------------------ ------------------------------ --------------------
T                              OBJECT_DETAILS                 DEFAULT
T                              OWNER                          DEFAULT
T                              OBJECT_TYPE                    DEFAULT
T                              OBJECT_ID                      DEFAULT

Since V$IM_COLUMN_LEVEL does not tell us if the in-memory store for OBJECT_DETAILS has been populated, we need to look elsewhere. A virtual column is an expression, so we need to look in the part of the in-memory store dedicated to holding expressions.


SQL> select count(*) from v$imeu_header;

  COUNT(*)
----------
         0

1 row selected.

You can see that this area is currently empty, which is the confirmation that OBJECT_DETAILS has not been populated into the in-memory store. I’ll perform the steps listed earlier in the post to make sure that OBJECT_DETAILS is populated, and then we can see that V$IMEU_HEADER now contains some content.


SQL> alter table t no inmemory;

Table altered.

SQL> alter table t inmemory;

Table altered.

SQL> alter table t inmemory(object_details);

Table altered.

SQL> select count(*) from t;

  COUNT(*)
----------
   1656460

1 row selected.

SQL> select owner,segment_name,bytes,bytes_not_populated,populate_status from v$im_segments;

OWNER      SEGMENT_NA      BYTES BYTES_NOT_POPULATED POPULATE_STAT
---------- ---------- ---------- ------------------- -------------
MCDONAC    T            46047232                   0 COMPLETED

SQL> select count(*) from V$IMEU_HEADER;

  COUNT(*)
----------
         3
         

Since virtual columns are almost certainly going to be some sort of expression, evaluating them is always going to consume significant CPU if you are processing millions (or billions) of rows. The in-memory store might be very useful means of reducing the cost of evaluating those expressions, but make sure you run your DDL commands in the correct sequence to guarantee that the virtual columns are being stored in the in-memory area.

You can watch the video version of this demo here

Oracle and the Future

We’ll start with a disclaimer here- this is my experience and my opinion. NO ONE ELSE’S, just as any link to any blog in this post is. Take it or leave it, I really don’t care, never have, never will- never been one to follow any drummer but my own anyway. </p />
</em></strong></p></div>

    	  	<div class=

Sparse Indexing

This is the first of two blog posts that discuss sparse and partial indexing.

Problem Statement

It is not an uncommon requirement to find rows that match a rare value in a column with a small number of distinct values.  So, the distribution of values is skewed.  A typical example is a status column where an application processes newer rows that are a relatively small proportion of the table because over time the majority of rows have been processed and are at the final status.
An index is effective at finding the rare values, but it is usually more efficient to scan the table for the common values.  A histogram is would almost certainly be required on such a column.  However, if you build an index on the column you have to index all the rows.  The index is, therefore, larger and requires more maintenance.  Could we not just index the rare values for which we want to use the index to find?

  • Oracle does not index null values. If we could engineer that the common value was null, and then the index would only contain the rare values.  This is sometimes called sparse indexing and is discussed in this blog.
  • Or we could separate the rare and common values into different index partitions, and build only the index partition(s) for the rare values.  This is called partial indexing and is discussed in the next blog.

As usual, this is not a new subject and other people have written extensively on these subjects, and I will provide links.  However, I want to draw some of the issues together.

Sparse Indexing

The ideas discussed in this section are based on the principle that Oracle indexes do not include rows where the key values are null.  

Store Null Values in the Database?

One option is to engineer the application to use null as the common status value.  However, this means that the column in question has to be nullable, and you may require different logic because the comparison to null is always false.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1)
,other VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
);

INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-42 THEN NULL /*common status*/
WHEN rownum<=1e6-10 THEN 'A'
ELSE 'R' END CASE
FROM dual
CONNECT BY level <= 1e6;

CREATE INDEX t_status ON t (status);
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

I have created a test table with 1000000 rows. 10 rows have status R, and 32 rows have status A. The rest have status NULL. I have indexed the status column, and also created a histogram on it when I collected statistics.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT status, COUNT(*)
FROM t
GROUP BY status
/

S COUNT(*)
- ----------
999958
R 10
A 32

I can see from the statistics that I have 1000000 rows in the primary key index, but only 42 rows in the status index because it only contains the not null values. Therefore, it is much smaller, having only a single leaf block, whereas the primary key index has 1875 leaf blocks.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT index_name, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T';

INDEX_NAME NUM_ROWS LEAF_BLOCKS
---------- ---------- -----------
T_PK 1000000 1875
T_STATUS 42 1

There are some problems with this approach.

Not All Index Columns are Null 
If any of the index columns are not null, then there is an entry in the index for the row, and there is no saving of space. It is not uncommon to add additional columns to such an index, either for additional filtering, or to avoid accessing the table by satisfying the query from the index.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE INDEX t_status2 ON t (status,other);
SELECT index_name, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME NUM_ROWS LEAF_BLOCKS
---------- ---------- -----------
T_PK 1000000 1875
T_STATUS 42 1
T_STATUS2 1000000 9081

Null Logic
If, for example, I want to find the rows that do not have status A, then a simple inequality does not find the null statuses because comparison to null is always false.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT status, COUNT(*)
FROM t
WHERE status != 'A'
GROUP BY status
/

S COUNT(*)
- ----------
R 10

Instead, I would have to explicitly code for the null values.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT status, COUNT(*)
FROM t
WHERE status != 'A' OR status IS NULL
GROUP BY status
/

S COUNT(*)
- ----------
999958
R 10

This additional complexity is certainly one reason why developers shy away from this approach in custom applications. It is almost impossible to retrofit it into an existing or packaged application. 

Function-Based Indexes 

It is possible to build an index on a function, such that that function to evaluates to null for the common values. This time my test table still has 1,000,000 rows. The status column is now not nullable.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
)
/
INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-42 THEN 'C'
WHEN rownum<=1e6-10 THEN 'A'
ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

10 rows have status A, and 32 rows have status O. The rest have status C.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT status, COUNT(*)
FROM t
GROUP BY status
/
S COUNT(*)
- ----------
R 10
C 999958
A 32

I will build a simple index on status, and a second index on a function of status that decodes the common status C back to NULL;

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE INDEX t_status ON t (status);
CREATE INDEX t_status_fn ON t (DECODE(status,'C',NULL,status));

As before, with the null column, the function-based index has only a single leaf block, the other indexes are much larger because they contain all 1 million rows.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT index_name, index_type, num_rows, leaf_blocks 
from user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME INDEX_TYPE NUM_ROWS LEAF_BLOCKS
------------ --------------------------- ---------- -----------
T_PK NORMAL 1000000 1875
T_STATUS NORMAL 1000000 1812
T_STATUS_FN FUNCTION-BASED NORMAL 42 1

If I query the table for the common status, Oracle quite reasonably full scans the table.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
999958

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2446 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
|* 2 | TABLE ACCESS FULL| T | 999K| 52M| 2446 (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS"='C')

If I query for the rare status value, it will use the normal index to look that up.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 1997248105
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 550 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_STATUS | 10 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("STATUS"='R')

Now I will make that index invisible, and the optimizer can only choose to full scan the table. It cannot use the function-based index because the query does not match the function.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER INDEX t_status INVISIBLE;
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2445 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
|* 2 | TABLE ACCESS FULL| T | 10 | 550 | 2445 (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS"='R')

Instead, I must change the query to reference the function in the function-based index, and then the optimizer chooses the function-based index, even if I make the normal index visible again. Note that the function is shown in the access operation in the predicate section.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER INDEX t_status VISIBLE;
SELECT COUNT(other) FROM t WHERE DECODE(status,'C',null,status)='R';

COUNT(OTHER)
------------
10

Plan hash value: 2511618215
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 21 | 1155 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_STATUS_FN | 21 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(DECODE("STATUS",'C',NULL,"STATUS")='R')

Invisible Virtual Columns 

Function-based indexes are implemented using an invisible virtual column. You can even reference that virtual column in a query. However, the name of the column is system generated, so you may not want to include it in your application.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT * FROM user_stat_extensions WHERE table_name = 'T';

TABLE_NAME EXTENSION_NAME EXTENSION CREATO DRO
---------- --------------- ---------------------------------------- ------ ---
T SYS_NC00004$ (DECODE("STATUS",'C',NULL,"STATUS")) SYSTEM NO

SELECT SYS_NC00004$, COUNT(*) FROM t group by SYS_NC00004$;

S COUNT(*)
- ----------
999958
R 10
A 32

Instead, you could create a virtual column and then index it. The resulting index is still function-based because it references the function inside the virtual column. From Oracle 12c it is also possible to make a column invisible. I would recommend doing so in case you have any insert statements without explicit column lists, otherwise, you might get ORA-00947: not enough values.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER TABLE t ADD virtual_status VARCHAR2(1) INVISIBLE 
GENERATED ALWAYS AS (DECODE(status,'C',null,status));
CREATE INDEX t_status_virtual ON t (virtual_status);

SELECT index_name, index_type, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME INDEX_TYPE NUM_ROWS LEAF_BLOCKS
---------------- --------------------------- ---------- -----------
T_PK NORMAL 1000000 1875
T_STATUS NORMAL 1000000 1812
T_STATUS_VIRTUAL FUNCTION-BASED NORMAL 42 1

The only difference between this and previous function-based index example is that now you can control the name of the virtual column, and you can easily reference it in the application. 
If you have only ever referenced the virtual column in the application, and never the function, then it is also easy to change the function.  Although you would have to rebuild the index.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT COUNT(other) FROM t WHERE virtual_status='R';

COUNT(OTHER)
------------
10

Plan hash value: 3855131553
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 21 | 1155 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_STATUS_VIRTUAL | 21 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("VIRTUAL_STATUS"='R')

If you have already created function-based indexes and referenced the function in the application you can replace them with an index on a named virtual column and the index will still be used.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT COUNT(other) FROM t WHERE DECODE(status,'C',null,status)='R';

COUNT(OTHER)
------------
10

Plan hash value: 3855131553
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 21 | 1155 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_STATUS_VIRTUAL | 21 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T"."VIRTUAL_STATUS"='R')

Conclusion 

A function-based index, preferably on an explicitly named and created virtual column, will permit you to build an index on just the rare values in a column. Making the virtual column invisible will prevent errors during insert statements without explicit column lists. However, you will still need to alter the application SQL to reference either the virtual column or the function that generates it.

Partial Indexing

This is the second of two blog posts that discuss sparse and partial indexing.

Problem Statement

(This is the same problem statements as for sparse indexing)
It is not an uncommon requirement to find rows that match a rare value in a column with a small number of distinct values.  So, the distribution of values is skewed.  A typical example is a status column where an application processes newer rows that are a relatively small proportion of the table because over time the majority of rows have been processed and are at the final status.
An index is effective at finding the rare values, but it is usually more efficient to scan the table for the common values.  A histogram is would almost certainly be required on such a column.  However, if you build an index on the column you have to index all the rows.  The index is, therefore, larger and requires more maintenance.  Could we not just index the rare values for which we want to use the index to find?

  • Oracle does not index null values. If we could engineer that the common value was null, and then the index would only contain the rare values.  This is sometimes called sparse indexing and was discussed in the previous blog.
  • Or we could separate the rare and common values into different index partitions, and build only the index partition(s) for the rare values.  This is called partial indexing and is discussed in this blog.

As usual, this is not a new subject and other people have written extensively on these subjects, and I will provide links.  However, I want to draw some of the issues together.

Partition Table and Locally Partitioned Partial Index 

I could partition the table on the status column. Here, I have used list partitioning, because the common status is between the two rare status, so I only need two partitions not three. From Oracle 12.1, I can specify indexing on and off on the table and certain partitions so that later I can build partial local indexes only on some partitions. See also:

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
) INDEXING OFF
PARTITION BY LIST (status)
(PARTITION t_status_rare VALUES ('R','A') INDEXING ON
,PARTITION t_status_common VALUES (DEFAULT)
) ENABLE ROW MOVEMENT
/
INSERT /*+APPEND*/ INTO t --(key, status)
SELECT rownum
, CASE WHEN rownum<=1e6-1000 THEN 'C'
WHEN rownum<=1e6-10 THEN 'A'
ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

Here Oracle eliminated the common status partition and only scanned the rare status partition (partition 1). Note that I don't even have an index at this point.  So simply partitioning the table can be effective.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 4 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 58 | | | | |
| 2 | PARTITION LIST SINGLE| | 10 | 580 | 4 (0)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | T | 10 | 580 | 4 (0)| 00:00:01 | 1 | 1 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("STATUS"='R')

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("STATUS"='R')

However, now when the application updates the status from R (rare) to C (common) the row must be moved between partitions. It is necessary to enable row movement on the table, otherwise, an error will be generated. However, there is additional overhead in moving the row. It is effectively deleted from one partition and inserted into the other.
In this test, I have increased the frequency of one of the rare statuses. Otherwise, the optimizer determines that it is cheaper just to scan the table partition than use the index!

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT status, COUNT(*)
FROM t
GROUP BY status
/
S COUNT(*)
- ----------
R 10
A 990
C 999000

Note that I have already specified INDEXING OFF on the table and INDEXING ON on the rare statuses partition. Now I can just build a locally partitioned partial index.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">CREATE INDEX t_status ON t(status) LOCAL INDEXING PARTIAL;

Note that only partition T_STATUS_RARE is physically built, and it only contains a single extent. Partition T_STATUS_COMMON exists, is unusable and the segment has not been physically built. It contains no rows and no leaf blocks.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT partition_name, status, num_rows, leaf_blocks
from user_ind_partitions where index_name = 'T_STATUS';

PARTITION_NAME STATUS NUM_ROWS LEAF_BLOCKS
-------------------- -------- ---------- -----------
T_STATUS_COMMON UNUSABLE 0 0
T_STATUS_RARE USABLE 1000 2

SELECT segment_name, partition_name, blocks
FROM user_segments WHERE segment_name = 'T_STATUS';

SEGMENT_NAME PARTITION_NAME BLOCKS
------------ -------------------- ----------
T_STATUS T_STATUS_RARE 8

SELECT segment_name, partition_name, segment_type, extent_id, blocks
FROM user_extents WHERE segment_name = 'T_STATUS';

SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE EXTENT_ID BLOCKS
------------ -------------------- ------------------ ---------- ----------
T_STATUS T_STATUS_RARE INDEX PARTITION 0 8

Scans for the common status value can only full scan the table partition because there is no index to use.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 90%;">SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
999000

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2444 (1)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 55 | | | | |
| 2 | PARTITION LIST SINGLE| | 998K| 52M| 2444 (1)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | T | 998K| 52M| 2444 (1)| 00:00:01 | 2 | 2 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("STATUS"='C')

To query the rare value Oracle does use the index on the rare values partition.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 3051124889
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 2 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 58 | | | | |
| 2 | PARTITION LIST SINGLE | | 10 | 580 | 2 (0)| 00:00:01 | KEY | KEY |
| 3 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T | 10 | 580 | 2 (0)| 00:00:01 | 1 | 1 |
|* 4 | INDEX RANGE SCAN | T_STATUS | 10 | | 1 (0)| 00:00:01 | 1 | 1 |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("STATUS"='R')

However, it is not worth using the index for the slightly more common status A.  Here, Oracle full scans the table partition.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT COUNT(other) FROM t WHERE status='A';

COUNT(OTHER)
------------
990

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 4 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 58 | | | | |
| 2 | PARTITION LIST SINGLE| | 990 | 57420 | 4 (0)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | T | 990 | 57420 | 4 (0)| 00:00:01 | 1 | 1 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("STATUS"='A')

Note that to use partial indexing I also had to partition the table.

Globally Partitioned Index with Zero-Sized Unusable Partitions 

Since Oracle 11.2.0.4, it has been possible to achieve the same effect without partitioning the table, thus avoiding the overhead of row movement. See also:

This feature also worked in earlier versions, but Oracle built a single extent for each unusable partition.

Here, I will recreate a non-partitioned table.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
)
/
INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-1000 THEN 'C'
WHEN rownum<=1e6-10 THEN 'A'
ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;

exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

SELECT status, COUNT(*)
FROM t
GROUP BY status
/

S COUNT(*)
- ----------
R 10
C 999000
A 990

It is not possible to create a globally list-partitioned index. Oracle simply does not support it.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">CREATE INDEX t_status ON t(status)
GLOBAL PARTITION BY LIST (status, id2)
(PARTITION t_status_common VALUES ('R','A')
,PARTITION t_status_rare VALUES (DEFAULT)
);

GLOBAL PARTITION BY LIST (status)
*
ERROR at line 2:
ORA-14151: invalid table partitioning method

You can create a global range or hash partitioned index. It is unlikely that the hash values of the column will break down conveniently into particular hash values. In this example, I would still have needed to create 4 hash partitions and still build 2 of them.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">WITH x as (
SELECT status, COUNT(*) freq
FROM t
GROUP BY status
) SELECT x.*
, dbms_utility.get_hash_value(status,0,2)
, dbms_utility.get_hash_value(status,0,4)
FROM x
/

S FREQ DBMS_UTILITY.GET_HASH_VALUE(STATUS,0,2) DBMS_UTILITY.GET_HASH_VALUE(STATUS,0,4)
- ---------- --------------------------------------- ---------------------------------------
R 990 1 1
C 1009000 0 0
A 10 0 2

It is easier to create a globally range partitioned index. Although in my example, the common status lies between the two rare statuses, so I need to create three partitions. I will create the index unusable and build the two rare status partitions.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">CREATE INDEX t_status ON t(status)
GLOBAL PARTITION BY RANGE (status)
(PARTITION t_status_rare1 VALUES LESS THAN ('C')
,PARTITION t_status_common VALUES LESS THAN ('D')
,PARTITION t_status_rare2 VALUES LESS THAN (MAXVALUE)
) UNUSABLE;
ALTER INDEX t_status REBUILD PARTITION t_status_rare1;
ALTER INDEX t_status REBUILD PARTITION t_status_rare2;

The index partition for the common status is unusable so Oracle can only full scan the table.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
999000

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2445 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
|* 2 | TABLE ACCESS FULL| T | 999K| 52M| 2445 (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS"='C')

However, for the rare statuses, Oracle scans the index and looks up each of the table rows.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 2558590380
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 55 | | | | |
| 2 | PARTITION RANGE SINGLE | | 10 | 550 | 2 (0)| 00:00:01 | 3 | 3 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 550 | 2 (0)| 00:00:01 | | |
|* 4 | INDEX RANGE SCAN | T_STATUS | 10 | | 1 (0)| 00:00:01 | 3 | 3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("STATUS"='R')

Conclusion 

The advantage of this global partitioning approach is that it does not require any change to application code, and it does not involve partitioning the table. However, you will have to remember not to rebuild the unusable partitions, otherwise, they will have to be maintained as the table changes until you make them unusable again and, they will consume space that you will only get back by recreating the entire index.
NB: Partitioning is a licenced option that is only available on the Enterprise Edition of the database.