Search

OakieTags

Who's online

There are currently 0 users and 37 guests online.

Recent comments

DBMS_STATS

Queue-based Concurrent Stats Prototype Implementation

This is just a prototype of a queue-based concurrent statistics implementation - using the same basic implementation I've used a a couple of years ago to create indexes concurrently.There are reasons why such an implementation might be useful - in 11.2.0.x the built-in Concurrent Stats feature might turn out to be not really that efficient by creating lots of jobs that potentially attempt to gather statistics for different sub-objects of the same table at the same time - which can lead to massive contention on Library Cache level due to the exclusive Library Cache locks required by DDL / DBMS_STATS calls.In 12.1 the Concurrent Stats feature obviously got a major re-write by using some more intelligent processing what and how should be processed concurrently - some of the details are exposed via the new view DBA_OPTSTAT_OPERATION_TASKS, but again I've seen it running lots of very

DBMS_STATS - Gather statistics on tables with many columns - 12c update

This is just a short 12c update on my post about gathering statistics on tables with many columns from some time ago.I'm currently investigating the "Incremental Statistics" feature in 12.1.0.2 for a client, which probably will be worth one or more other posts, but since we're still in the process of evaluating and installing various patches it's too early to write about that.As part of the investigation I've noticed a significant change in behaviour in 12.1.0.2 compared to previous versions when it comes to gathering statistics on tables with many columns, hence this post here.The key message of the original post was that DBMS_STATS needs potentially several passes when gathering statistics on tables with many columns, which meant a significant increase in overall work and resource consumption, exaggerated by the fact that tables with that many colu

Gathering statistics in 10.2.0.5+ and 11.2.0.2+ encounters strange waits

It appears that, somewhere in the 10.2.0.5 and 11.2.0.2 patchsets, Oracle introduced some additional unwanted functionality to the “GATHER_*_STATS” procedures in the DBMS_STATS package.

I have been working on a customer’s 11.2.0.3.0 database supporting a data-mart application where the data loading programs call DBMS_STATS.GATHER_TABLE_STATS as a concluding part of load processing, which means that the procedure gets called a *lot*.

We noticed that some calls to the same procedure were waiting excessively on the event “enq: TX – allocate ITL entry” and being blocked by sessions calling similar DBMS_STATS procedures, themselves in turn waiting excessively on the event “row cache lock” on data dictionary tables like SYS.CON$ (i.e. underlying the DBA_CONSTRAINTS view) and SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$.

Maintaining Optimizer Statistics on PeopleSoft on Oracle 11g

I have been considering how to collect optimizer statistics for a PeopleSoft system running on an Oracle 11g database.  Despite 11g being several years old, most of my current customers are still using 10g, though some are looking at the upgrade to 11g.  I believe a slightly different approach is required.

In 2009, I wrote a series of blog postings on the subject of collecting statistics.  However these were all based on Oracle 10g.  I proposed a PL/SQL package that would use meta-data in a database table to determine how to collect statistics on a table, or deliberate supress collection of statistics.

Maintaining Optimizer Statistics on PeopleSoft on Oracle 11g

I have been considering how to collect optimizer statistics for a PeopleSoft system running on an Oracle 11g database.  Despite 11g being several years old, most of my current customers are still using 10g, though some are looking at the upgrade to 11g.  I believe a slightly different approach is required.

In 2009, I wrote a series of blog postings on the subject of collecting statistics.  However these were all based on Oracle 10g.  I proposed a PL/SQL package that would use meta-data in a database table to determine how to collect statistics on a table, or deliberate supress collection of statistics.

Incremental Partition Statistics Review

Introduction

Here is a summary of the findings while evaluating Incremental Partition Statistics that have been introduced in Oracle 11g.

The most important point to understand is that Incremental Partition Statistics are not "cost-free", so anyone who is telling you that you can gather statistics on the lowest level (partition or sub-partition in case of composite partitioning) without any noticeable overhead in comparison to non-incremental statistics (on the lowest level) is not telling you the truth.

Although this might be obvious I've already personally heard someone making such claims so it's probably worth to mention.

In principle you need to test on your individual system whether the overhead that is added to each statistics update on the lowest level outweighs the overhead of actually gathering statistics on higher levels, of course in particular on global level.

Distinctly Odd (Update)

Greg Rahn points out that the improved NDV estimation arrives in 11.1 not 11.2 (which my article distinctly odd implies) see http://structureddata.org/2007/09/17/oracle-11g-enhancements-to-dbms_stats/ for more on this.

DBMS_STATS - Gather table statistics with many columns

Here is another good reason why you probably don't want to use tables with too many columns.

The first good reason is that Oracle stores rows of conventional heap tables with more than 255 columns (where at least one column value after the 255th is non-null) in multiple row pieces even when the entire row may fit into a single block (and up to 11.2 doesn't support basic and OLTP heap table compression on tables with more than 255 columns). This leads to something that is sometimes called "intra-row chaining" which means that Oracle needs to follow the row piece pointer to access columns after the 255th one leading to multiple logical I/Os per row, up to four for a row approaching the hard limit of 1,000 columns.

Of course such a multiple-piece row easily can result in actual row chaining where the row pieces are scattered across different blocks now potentially leading to additional random single block I/O, but I digress...

This simple example allows to see this (and the other, yet to describe) effect in action - it creates by default a table with 1,000 columns with 10,000 rows where each row resides in a separate block (and therefore only intra-row chaining should occur). I used a 8K block size tablespace with Manual Segment Space Management (MSSM) in order to avoid the ASSM overhead in this case:

----------------------------------------------------------------------------------------------------------
-- MANY_X_COLS.SQL
-- Usage: @MANY_X_COLS [NUM_COLS] [PCTFREE] [PCTUSED] [TABLESPACE] [COL1] [COL2] [COL3] [COL4] [NUM_ROWS]
-- Defaults: [1000 ] [99 ] [1 ] [TEST_8K ] [001 ] [256 ] [512 ] [768 ] [10000 ]
--
-- Create a table MANY_X_COLS with a configurable number of columns
-- and populate four columns of it using skewed data (normal distribution)
----------------------------------------------------------------------------------------------------------

set termout off verify off
REM Save current settings
store set .settings replace

REM Command line handling

REM Preparation for default value handling
column 1 new_value 1
column 2 new_value 2
column 3 new_value 3
column 4 new_value 4
column 5 new_value 5
column 6 new_value 6
column 7 new_value 7
column 8 new_value 8
column 9 new_value 9

select
'' as "1"
, '' as "2"
, '' as "3"
, '' as "4"
, '' as "5"
, '' as "6"
, '' as "7"
, '' as "8"
, '' as "9"
from
dual
where
rownum = 0;

REM Default values
select
nvl('&&1', '1000') as "1"
, nvl('&&2', '99') as "2"
, nvl('&&3', '1') as "3"
, nvl('&&4', 'TEST_8K') as "4"
, nvl('&&5', '001') as "5"
, nvl('&&6', '256') as "6"
, nvl('&&7', '512') as "7"
, nvl('&&8', '768') as "8"
, nvl('&&9', '10000') as "9"
from
dual;

define n_iter = &1
define n_pctfree = &2
define n_pctused = &3
define s_tblspace = &4
define s_col1 = &5
define s_col2 = &6
define s_col3 = &7
define s_col4 = &8
define n_num_rows = &9

set termout on echo on verify on

declare
s_sql1 varchar2(32767);
s_sql varchar2(32767);

e_table_or_view_not_exists exception;
pragma exception_init(e_table_or_view_not_exists, -942);
begin
for i in 1..&n_iter loop
s_sql1 := s_sql1 || 'col' || to_char(i, 'FM9000') || ' varchar2(10),';
end loop;

s_sql1 := rtrim(s_sql1, ',');

s_sql := 'drop table many_x_cols purge';

begin
execute immediate s_sql;
exception
when e_table_or_view_not_exists then
null;
end;

s_sql := 'create table many_x_cols (' || s_sql1 || ') pctfree &n_pctfree pctused &n_pctused tablespace &s_tblspace';

execute immediate s_sql;

dbms_random.seed(0);

s_sql := 'insert /*+ append */ into many_x_cols (
col&s_col1
, col&s_col2
, col&s_col3
, col&s_col4
)
select
trunc(sys.dbms_random.normal * &n_num_rows.) as col&s_col1
, trunc(sys.dbms_random.normal * &n_num_rows.) as col&s_col2
, trunc(sys.dbms_random.normal * &n_num_rows.) as col&s_col3
, trunc(sys.dbms_random.normal * &n_num_rows.) as col&s_col4
from
dual
connect by
level <= &n_num_rows
';

execute immediate s_sql;

commit;
end;
/

set termout off
REM Restore and cleanup
undefine 1 2 3 4 5 6 7 8 9
column 1 clear
column 2 clear
column 3 clear
column 4 clear
column 5 clear
column 6 clear
column 7 clear
column 8 clear
column 9 clear
@.settings
set termout on

If you have created the table with the defaults and now run some simple queries on it, for example:

SET AUTOTRACE TRACEONLY STATISTICS

SELECT COUNT(*) FROM MANY_X_COLS;

you should see approximately 10,000 consistent gets after some initial runs to avoid the overhead of hard parsing. Your output should look similar to the following:

SQL> SELECT COUNT(*) FROM MANY_X_COLS;

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
10004 consistent gets
10000 physical reads
0 redo size
422 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

However if you start to access columns after the 255th one (and you've used the defaults or suitable values to populate at least one these columns), then you'll notice that the number of consistent gets increases whereas the number of physical reads stays more or less the same:

SET AUTOTRACE TRACEONLY STATISTICS

SELECT COUNT(COL256) FROM MANY_X_COLS;

SQL> SELECT COUNT(COL256) FROM MANY_X_COLS;

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
30004 consistent gets
10000 physical reads
0 redo size
427 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

It is interesting to note that in 11.1.0.7 and 11.2.0.1 it happened sometimes that the consistent gets jumped directly from 10,000 to 30,000 and increased then to 40,000 and 50,000 respectively when accessing later columns whereas in 10.2.0.4 the expected result of 20,000, 30,000 and 40,000 was seen.

It is also interesting to note that the statistic "table fetch continued row" is only increasing once per row which could be another interesting variation of the theme "How does Oracle record this statistic".

OK, back to the main topic for today: DBMS_STATS and such tables with many columns.

The other good reason is that DBMS_STATS might switch to a multi-pass mode if a table exceeds a certain number of columns. This means even if you request only basic column statistics (METHOD_OPT=>"FOR ... COLUMNS SIZE 1...") without any additional histograms that require additional passes anyway DBMS_STATS.GATHER_TABLE_STATS will perform multiple passes (full segment scans usually) on the same segment. This is even true if basic column statistics are only requested for a limited set of columns (for example "FOR COLUMNS SIZE 1 COL1, COL2, COL3") since Oracle gathers some column statistics always no matter what options are passed to DBMS_STATS (very likely in order to calculate for example the average row length for the table statistics).

The threshold when DBMS_STATS switches to multiple passes depends the options passed to DBMS_STATS and differs between Oracle versions 11g (11.1 and 11.2) and versions prior to 11g - and I'm not talking about the multiple passes that are used in versions prior to 11g to determine the optimal auto sample size (DBMS_STATS.AUTO_SAMPLE_SIZE, see below for more information).

In Oracle 10.2.0.4 DBMS_STATS is capable of handling a minimum of approximately 160 columns per pass if basic columns statistics are requested and a maximum of approximately 440 columns if no column statistics are requested, which means that it requires between three and seven passes to gather statistics for a 1,000 columns table. Below is what such a typical query looks like - and you'll find up to seven of them in a SQL trace file of a DBMS_STATS call with a different column list for a 1,000 columns table.

Notice that in 10.2.0.4 DBMS_STATS gathers five different values per column for basic column statistics:

- Number of non-null values
- Number of distinct values
- Column size
- Minimum value
- Maximum value

If no column statistics are requested then still the "Number of non-null values" and "Column size" per column are gathered which explains why DBMS_STATS is capable of handling more columns per pass since simply less information per column is requested.

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*),count("COL257"),count(distinct "COL257"),
sum(sys_op_opnsize("COL257")),substrb(dump(min("COL257"),16,0,32),1,120),
substrb(dump(max("COL257"),16,0,32),1,120),count("COL258"),count(distinct
"COL258"),sum(sys_op_opnsize("COL258")),substrb(dump(min("COL258"),16,0,32),
1,120),substrb(dump(max("COL258"),16,0,32),1,120),count("COL259"),
count(distinct "COL259"),sum(sys_op_opnsize("COL259")),
substrb(dump(min("COL259"),16,0,32),1,120),substrb(dump(max("COL259"),16,0,
32),1,120),...
.
.
.
from
"CBO_TEST"."MANY_X_COLS" t

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.20 0.19 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.21 3.07 9713 40010 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.41 3.26 9713 40010 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT GROUP BY (cr=40010 pr=9713 pw=0 time=3070537 us)
10000 TABLE ACCESS FULL MANY_X_COLS (cr=40010 pr=9713 pw=0 time=1820245 us)

Tracing the DBMS_STATS.GATHER_TABLE_STATS call furthermore reveals that Oracle performs at least one additional query when using DBMS_STATS.AUTO_SAMPLE_SIZE (the default) which is probably used for determining the optimal sample size because Oracle uses an iterative approach prior to 11g to determine the auto sample size. The sample size is increased in steps until the number of distinct values per column passes some statistical checks. The interesting thing however seems to be that these additional queries are repeated per pass:

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*)
from
"CBO_TEST"."MANY_X_COLS" sample block ( 1.0000000000,1) t

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 7 0.01 0.00 0 0 0 0
Execute 7 0.00 0.00 0 0 0 0
Fetch 7 0.12 3.67 1971 764 0 7
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 21 0.13 3.67 1971 764 0 7

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=119 pr=332 pw=0 time=530678 us)
108 TABLE ACCESS SAMPLE MANY_X_COLS (cr=119 pr=332 pw=0 time=646431 us)

********************************************************************************

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*)
from
"CBO_TEST"."MANY_X_COLS" t

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 7 0.00 0.00 0 0 0 0
Execute 7 0.00 0.00 0 0 0 0
Fetch 7 3.41 15.85 65858 70070 0 7
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 21 3.41 15.86 65858 70070 0 7

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=10010 pr=9445 pw=0 time=2228265 us)
10000 TABLE ACCESS FULL MANY_X_COLS (cr=10010 pr=9445 pw=0 time=816529 us)

This means that in total between six (no column statistics) and at least 14 queries (basic column statistics for all columns) are used to gather the statistics for a 1,000 columns table in 10.2.0.4 with DBMS_STATS.AUTO_SAMPLE_SIZE.

Using a fixed sample size (or compute) it takes between three and seven queries to gather the statistics for a 1,000 columns table in 10.2.0.4. When computing the basic statistics for all columns this means a minimum of 220,000 consistent gets to gather the statistics on a 10,000 blocks table with 1,000 columns in comparison to approximately 20,000 consistent gets to perform the same on a 255 columns table - not taking into account the CPU overhead of performing the aggregates (count distinct, min, max, etc.) on all those columns.
Note that these consistent gets exclude the work required to maintain the dictionary and statistics history information.

The additional queries executed when using DBMS_STATS.AUTO_SAMPLE_SIZE in 10.2.0.4 increase the number of consistent gets to 300,000 for the 1,000 columns / 10,000 blocks table (in this case the automatism ends up in a compute) and 40,000 consistent gets for the 255 columns / 10,000 blocks table.

Of course the intention of the AUTO_SAMPLE_SIZE is to find an optimal sample size and not to actually increase the amount of work compared to a compute gather statistics, but in this particular case this was true.

This significant difference in consistent gets between the 1,000 columns and 255 columns table both having 10,000 blocks has three reasons:

1. It requires more consistent gets per pass due to the multiple row pieces (up to four times more consistent gets)
2. DBMS_STATS performs many more passes on the 1,000 columns table (when gathering basic column statistics for all columns at least seven on the 1,000 columns table compared to at least two on the 255 columns table)
3. Obviously DBMS_STATS doesn't make a systematic attempt to minimize the number of consistent gets per pass by trying to access only leading columns per pass (so to require only 10,000 / 20,000 / 30,000 / 40,000 gets per pass). It looks more like the columns accessed per pass are randomly chosen often causing the maximum number of consistent gets per pass

The same applies to version 11.1 and 11.2 as long as fixed sample sizes are used. However when using DBMS_STATS.AUTO_SAMPLE_SIZE the whole gather statistics logic has been revised, in particular due to the introduction of the improvements regarding number of distinct column values approximation and incremental partition statistics, which is described in great detail by Amit Poddar here.

Due to these improvements to DBMS_STATS when using DBMS_STATS.AUTO_SAMPLE_SIZE Oracle 11.1.0.7 (and 11.2.0.1) are capable of handling up to 1,000 columns in a single pass if no columns statistics are requested and a minimum of approximately 260 columns per pass if basic column statistics are requested. This means it takes four passes to gather basic column statistics on all columns for a 1,000 columns table. Furthermore due to the revised logic the additional queries used in 10.2.0.4 for the DBMS_STATS.AUTO_SAMPLE_SIZE are no longer used.

Tracing the DBMS_STATS.GATHER_TABLE_STATS call in 11.1.0.7 and 11.2.0.1 also reveals that there is a special operation activated that is not represented in SQL. Accompanied by a comment outlining what the special NDV operation is supposed to do DBMS_STATS seems to activate a "APPROXIMATE NDV AGGREGATE" operation that is very likely used for the aforementioned improvement regarding the NDV algorithm.

Such a query looks then like the following:

select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
no_substrb_pad */to_char(count("COL174")),to_char(substrb(dump(min("COL174")
,16,0,32),1,120)),to_char(substrb(dump(max("COL174"),16,0,32),1,120)),
to_char(count("COL175")),to_char(substrb(dump(min("COL175"),16,0,32),1,120))
,to_char(substrb(dump(max("COL175"),16,0,32),1,120)),...
.
.
.
from
"CBO_TEST"."MANY_X_COLS" t /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,
NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,
NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,
NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,
.
.
.
NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.11 0.10 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.01 2.24 9845 30010 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.12 2.35 9845 30010 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=30010 pr=9845 pw=0 time=0 us)
8755 APPROXIMATE NDV AGGREGATE (cr=30010 pr=9845 pw=0 time=2336556 us cost=2 size=121114 card=82)
10000 TABLE ACCESS FULL MANY_X_COLS (cr=30010 pr=9845 pw=0 time=300223 us cost=2 size=121114 card=82)

The following differences to 10g become obvious:

1. The SQL part has been reduced to

- Number of non-null values
- Minimum value
- Maximum value

for basic column statistics. The number of distinct values is obviously performed as part of the special operation "APPROXIMATE NDV AGGREGATE" as it can be seen in the "Row Source Operation" (for further details refer to Amit Poddar's document). Note that in case of no column statistics this is reduced to a single expression per column which is the "Number of non-null values".

2. The special operation "APPROXIMATE NDV AGGREGATE" appears in the execution plan

3. The trailing comment section of the statement describes what the "APPROXIMATE NDV AGGREGATE" is supposed to perform, for more details refer to the aforementioned document by Amit Poddar

4. All expressions are converted to strings using TO_CHAR

5. A new hint NO_SUBSTRB_PAD has been introduced (example taken from 11.2.0.1)

Probably the fact that the number of expressions in the SQL part has been reduced allows Oracle 11g to process more columns per pass.

This improved variant requires approximately 150,000 consistent gets (four passes) for basic column statistics for all columns of a 10,000 blocks / 1,000 columns table with DBMS_STATS.AUTO_SAMPLE_SIZE in comparison to 10,000 consistent gets for table with 255 columns (single pass).

So even with the improvements of 11g approximate NDV algorithm it will still require multiple passes when requesting basic column statistics (which generally speaking is recommended) for tables with more than 260 columns. If you think about the consequence of that in case of a table that not only has many columns but also many rows (that are then at least intra-row chained) the potential overhead becomes quite obvious.

So in summary the best way to deal with such tables is to avoid them in first place - in Oracle 10g attempt to keep the number of columns below 160 respectively 260 in 11g when using DBMS_STATS.AUTO_SAMPLE_SIZE to ensure a single pass operation covering all columns.

If you really need to use that many columns it might be worth the effort to write a custom gather statistics procedure that explicitly states the columns to gather statistics for in the METHOD_OPT parameter in order to minimize the number of passes required - of course keeping in mind the potential side effects on the cost based optimizer when column statistics are missing and these become relevant for optimization. In 11g with DBMS_STATS.AUTO_SAMPLE_SIZE this potentially allows to reduce the number of passes down to a single one.

And as a side note - using the deprecated ANALYZE ... STATISTICS command requires exactly a single pass to gather basic column statistics for all 1,000 columns of the 10,000 rows table - it even records only 10,000 consistent gets (and no "table fetch continued row") which is quite interesting since it is unclear how the different row pieces are accessed then. It might however simply be an instrumentation issue. Of course in most cases it doesn't make sense to revert to ANALYZE ... STATISTICS due to its shortcomings (just to name a few: No parallelism, no statistics history, no global statistics on partitioned tables and no approximate NDV improvements in 11g) and differences to DBMS_STATS that might lead to plan changes - see also the other notes linked to in that note by Jonathan Lewis.

Statistics on Partitioned Tables - Part 6b - COPY_TABLE_STATS

Sigh ... these posts have become a bit of a mess.

There are so many different bits and pieces I want to illustrate and I've been trying to squeeze them in around normal work. Worse still, because I keep leaving them then coming back to them and re-running tests it's easy to lose track of where I was, despite using more or less the same test scripts each time (any new scripts tend to be sections of the main test script). I suspect my decision to only pull out the more interesting parts of the output has contributed to the difficulties too, but with around 18.5 thousand lines of output, I decided that was more or less essential.

It has got so bad that I noticed the other day that there were a couple of significant errors in the last post which are easy to miss when you're looking at detailed output and must be even less obvious if you're looking at it for the first time.

The fact no-one said much about these errors reinforces my argument with several bloggers that less people read and truly absorb the more technical stuff than they think. They just pick up the messages they need and take more on trust than you might imagine!

So what were the errors? Possibly more important, why did they appear? The mistakes are often as instructive as the successes.

Error 1

This is the tail-end of the subpartition stats at the end of part 5

Statistics Management for Partitioned Objects in PeopleSoft

I have implemented partitioned objects in a number of PeopleSoft systems on Oracle. Recently, I was working on a system where a table was partitioned into weekly range partitions, and I encountered a performance problem when Oracle's automatic maintenance window job to collect statistics did not run between populating the new partition for the first time, and running a batch process that referenced that partition. Oracle, understandably produced a execution plan for a statement that assumed the partition was empty, but as the partition actually had quite a lot of data, the statement ran for a long time.

The solution was to tell Oracle the truth by gathering statistics for that partition. However, I didn't want to refresh the statistics for the whole table. There were many partitions with historical data that has not changed, so I don't need to refresh those partitions. I only need to refresh just the stale partitions, and here is the problem. Unfortunately, dbms_stats package will let you gather stale and missing statistics for all tables in a given schema, or the whole database, but not for a named table. It is not completely unreasonable, if you are targeting a single table then you ought to know what needs to be refreshed.