Search

OakieTags

Who's online

There are currently 0 users and 22 guests online.

Recent comments

Affiliations

Index

warning: Invalid argument supplied for foreach() in /www/oaktable/sites/all/modules/cck/content.module on line 1284.

Largest Tables Including Indexes and LOBs

Just a quick code snippit. I do a lot of data pumps to move schemas between different databases; for example taking a copy of a schema to an internal database to try to reproduce a problem. Some of these schemas have some very large tables. The large tables aren’t always needed to research a particular problem.

Here’s a quick bit of SQL to list the 20 largest tables by total size – including space used by indexes and LOBs. A quick search on google didn’t reveal anything similar so I just wrote something up myself. I’m pretty sure this is somewhat efficient; if there’s a better way to do it then let me know! I’m posting here so I can reference it in the future. :)

Largest Tables Including Indexes and LOBs

Just a quick code snippit. I do a lot of data pumps to move schemas between different databases; for example taking a copy of a schema to an internal database to try to reproduce a problem. Some of these schemas have some very large tables. The large tables aren’t always needed to research a particular problem.

Here’s a quick bit of SQL to list the 20 largest tables by total size – including space used by indexes and LOBs. A quick search on google didn’t reveal anything similar so I just wrote something up myself. I’m pretty sure this is somewhat efficient; if there’s a better way to do it then let me know! I’m posting here so I can reference it in the future. :)

Dynamic Sampling And Indexes

There is a lot more to say about Dynamic Sampling and indexes, and I'll try to cover these basics in my Dynamic Sampling series on AllThingsOracle.com, but two recent discussions on the OTN forums and on Charles Hooper's blog prompted me to publish this blog post.

These discussions revolved around the following issues with Dynamic Sampling and indexes:

1. CREATE INDEX On Empty Table

n/a

Index Rebuilds – A new Myth arrives

Just over 9 years ago now, before usenet died , I posted a list of common myths regarding the Oracle RDBMS. You can find that list and the ensuing discussion here. Number 2 in my list was

Free Space in an index is never reused.

At the time it was commonly thought that indexes needed regular rebuilding to avoid fragmentation. The best work I am aware of that describes why this is not the case is Richard Foote’s Index Internals: Rebuilding the Truth paper. Over time more and more people have come to quote this work and it is often cited in discussions on the forums at OTN and elsewhere.  Recently I have noticed a tendency for people to produce answers like the one in this answer. Namely

In Enterprise Manager, the section named “Segment Advisor” can to help you

This it seems to me is a fundamental misunderstanding of what the segment advisor is showing you, and sadly is my second myth rising anew from the dead. The official documentation on the segment advisor describes it as

Over time, updates and deletes on objects within a tablespace can create pockets of empty space that individually are not large enough to be reused for new data. This type of empty space is referred to as fragmented free space.

Objects with fragmented free space can result in much wasted space, and can impact database performance. The preferred way to defragment and reclaim this space is to perform an online segment shrink. This process consolidates fragmented free space below the high water mark and compacts the segment. After compaction, the high water mark is moved, resulting in new free space above the high water mark. That space above the high water mark is then deallocated. The segment remains available for queries and DML during most of the operation, and no extra disk space need be allocated.

You use the Segment Advisor to identify segments that would benefit from online segment shrink. Only segments in locally managed tablespaces with automatic segment space management (ASSM) are eligible. Other restrictions on segment type exist. For more information, see “Shrinking Database Segments Online”.

If a table with reclaimable space is not eligible for online segment shrink, or if you want to make changes to logical or physical attributes of the table while reclaiming space, then you can use online table redefinition as an alternative to segment shrink. Online redefinition is also referred to as reorganization. Unlike online segment shrink, it requires extra disk space to be allocated. See “Redefining Tables Online” for more information.

In other words the segment advisor is all about reclaiming space. But a B*-Tree index will always tend to have free space within it – and will nearly always be able to reuse that space. Fragmentation as described above, that is unusable free space, does not apply to B*-Tree indexes in general in Oracle (the point of my previous post was to show one situation where it can be an issue) . To run the segment advisor on an index then is fundamentally misguided. As the next section describes the advisor is really intended for objects that actually store data (most commonly heap tables).

The Segment Advisor generates the following types of advice:

  • If the Segment Advisor determines that an object has a significant amount of free space, it recommends online segment shrink. If the object is a table that is not eligible for shrinking, as in the case of a table in a tablespace without automatic segment space management, the Segment Advisor recommends online table redefinition.
  • If the Segment Advisor determines that a table could benefit from compression with the OLTP compression method, it makes a recommendation to that effect. (Automatic Segment Advisor only. See “Automatic Segment Advisor”.)
  • If the Segment Advisor encounters a table with row chaining above a certain threshold, it records that fact that the table has an excess of chained rows.

Please don’t lets reinvent the old myths – there’s plenty of room for new ones out there.

Monotonically Increasing Indexes

Richard Foote has put up another excellent rebuttal of the myth that you should rebuild indexes based on the value of del_lf_rows/lf_rows from an appropriately valideated index structure. Greg asked a question similar to my comment on the OTN forums in this thread. Namely

Is Oracle able to reuse index space when indexed column is like current date (monolitically increasing). So if we delete some old rows (past dates) and inserted new rows are with higer (more recent) dates . Does Oracle reuse space from deleted rows ?

The short answer to Greg is that the Radiohead example Richard uses shows that index space is reused for montonically increasing indexes where the old data is completely deleted. What happens is that the block gets put back on the freelist and is eligible for reuse as a new leaf block. This pattern of insert, process, delete old is pretty common. Sometimes however not *all* the old data gets deleted. In this case Oracle won’t re-use the free space. I have modified Richard’s example so that I delete all the non-prime values below a threshold. Due to the pattern of distribution of the primes this should leave a relatively large number of almost empty index blocks. In this case we see that the old space is not reused until the laggard’s have been finally deleted. NB if you are tempted to use my prime number test, feel free but be aware it degrades as the value of n grows. In other words I only warrant accuracy for positive numbers, not performance. In my case a significant percentage of the available space is left held by these ‘old’ entries. It’s worth remembering however that

  1. the ratio can’t tell you if you have a pattern that matches this usage and
  2. in most cases you’d want to investigate why these values weren’t being deleted anyway.
NIALL @ NIALL1 >@index_deletes
NIALL @ NIALL1 >drop table rush purge;

Table dropped.

Elapsed: 00:00:03.83
NIALL @ NIALL1 >
NIALL @ NIALL1 >create table rush(
  2  	     id number
  3  ,	     code number
  4  ,	     name  varchar2(30)
  5  );

Table created.

Elapsed: 00:00:00.02
NIALL @ NIALL1 >
NIALL @ NIALL1 >insert into rush
  2  select rownum,mod(rownum,100),'MOVING PICTURES'
  3  FROM dual
  4  connect by level <= 1000000; 

1000000 rows created.
Elapsed: 00:00:03.18 

NIALL @ NIALL1 >commit;

Commit complete.

Elapsed: 00:00:00.03
NIALL @ NIALL1 >
NIALL @ NIALL1 >create unique index pk_rush on rush(id);

Index created.

Elapsed: 00:00:05.64
NIALL @ NIALL1 >
NIALL @ NIALL1 >alter table rush
  2  add constraint pk_rush
  3  primary key (id) using index;

Table altered.

Elapsed: 00:00:00.14
NIALL @ NIALL1 >
NIALL @ NIALL1 >
NIALL @ NIALL1 >create or replace function is_prime(p in number) return number
  2  is
  3  	     l_limit number;
  4  	     i	     number := 3;
  5  	     retval  number := 1;  -- returns 0 if false, 1 if true
  6  begin
  7  	     l_limit := sqrt(p);
  8  	     if p = 1 then
  9  		     retval:=1;
 10  	     end if;
 11  	     if p = 2 then
 12  		     retval :=1;
 13  	     end if;
 14
 15  	     if mod(p,2) = 0 then
 16  		     retval := 0;
 17  	     end if;
 18
 19  	     while (i <= l_limit) loop
 20  		if mod(p,i) = 0 then
 21  			     retval := 0;
 22  		end if;
 23  		i := i + 2; -- (no need to test even numbers)
 24  	     end loop;
 25  	     return retval;
 26  end;
 27  /
Function created.
Elapsed: 00:00:00.00
NIALL @ NIALL1 >
NIALL @ NIALL1 >show errors
No errors.
NIALL @ NIALL1 >
NIALL @ NIALL1 >delete from rush
  2  where id < 400000
  3  and is_prime(id) = 0;

 366139 rows deleted.
Elapsed: 00:01:15.54
NIALL @ NIALL1 >
NIALL @ NIALL1 >commit;

Commit complete.

Elapsed: 00:00:00.01
NIALL @ NIALL1 >
NIALL @ NIALL1 >analyze index pk_rush validate structure;

Index analyzed.

Elapsed: 00:00:00.97
NIALL @ NIALL1 >
NIALL @ NIALL1 >select lf_rows,del_lf_rows,round(del_lf_rows/lf_rows,2) del_pct
  2  from index_stats;

   LF_ROWS DEL_LF_ROWS    DEL_PCT
---------- ----------- ----------
   1000000      366139        .37

1 row selected.

Elapsed: 00:00:00.03
NIALL @ NIALL1 >
NIALL @ NIALL1 >begin
  2  for i in 1000001..1400000 loop
  3  insert into rush
  4  values(i,mod(i,100),'GRACE UNDER PRESSURE');
  5  end loop;
  6  commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:47.54
NIALL @ NIALL1 >
NIALL @ NIALL1 >analyze index pk_rush validate structure;

Index analyzed.

Elapsed: 00:00:00.49
NIALL @ NIALL1 >
NIALL @ NIALL1 >select lf_rows,del_lf_rows,round(del_lf_rows/lf_rows,2) del_pct
  2  from index_stats;

   LF_ROWS DEL_LF_ROWS    DEL_PCT
---------- ----------- ----------
   1400000      366139        .26

1 row selected.

Elapsed: 00:00:00.02
NIALL @ NIALL1 >
NIALL @ NIALL1 >
NIALL @ NIALL1 >
NIALL @ NIALL1 >
NIALL @ NIALL1 >begin
  2  for i in 1400001..2000000 loop
  3  insert into rush
  4  values(i,mod(i,100),'2112');
  5  end loop;
  6  commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:01:08.89
NIALL @ NIALL1 >
NIALL @ NIALL1 >analyze index pk_rush validate structure;

Index analyzed.

Elapsed: 00:00:00.63
NIALL @ NIALL1 >
NIALL @ NIALL1 >select lf_rows,del_lf_rows,round(del_lf_rows/lf_rows,2) del_pct
  2  from index_stats;

   LF_ROWS DEL_LF_ROWS    DEL_PCT
---------- ----------- ----------
   2000000      366139        .18

1 row selected.

Elapsed: 00:00:00.02
NIALL @ NIALL1 >spoo off

Cleanup of partitioned objects residing in offlined tablespaces

If you have the scenario that a large database that contains historical data - typically a data warehouse with partitioned tables where the partitions reside in different tablespaces depending on their age - is supposed to be duplicated but the target environment, typically a lower environment, like duplicating Production to a UAT environment, doesn't have sufficient space to hold the complete database and in particular the whole historic partitions, then you somehow need to deal with that partial clone.

Of course, in an ideal world the lower environment is supposed to have sufficient space to hold the complete database, but we don't live in an ideal world. Sometimes, depending on the storage technology, you might be able to do some kind of temporary "split mirror" operation that allows you to start up a complete clone of the database and drop all the affected objects / tablespaces before putting the cleaned up / partial database on the final target system, but this is not always applicable, for instance when using ASM-based storage.

The issue in that particular case is that you can do for instance a partial RMAN clone that omits the historical tablespaces, but the resulting database then obviously has some tablespaces that are unavailable and can't be onlined since the datafiles are simply missing.

In case of objects that are completely residing in the offlined tablespaces the solution is simple and straightforward: The objects can be dropped or the affected tablespaces can simply be dropped with the "including contents" clause.

Things get more complicated however in case of partitioned objects if some of the partitions reside in the offlined tablespaces.

In that case you can't simply drop the offlines tablespaces - you end up with Oracle error message "ORA-14404: partitioned table contains partitions in a different tablespace" for example that tells you that objects exist that have some of their partitions not residing in the offlined tablespace and Oracle obviously won't do the cleanup job for you in that case.

Oracle allows to drop partitions of a table that reside in offlined tablespaces, but there is an important limitation: You can only do so if no index has been defined on the table. Otherwise you get the error message "ORA-14117/ORA-14630: [sub]partition resides in offlined tablespace" when attempting to drop the partition. It's interesting to note that the mere existence of indexes is sufficient - it doesn't matter if the index is residing in an offlined or onlined tablespace, if it is unusable or not - you just need to have at least one index defined on the table, and the drop partition operation errors out. It looks like a hard coded "if index exists then error" code path. I'm not sure why exactly this restriction is in place, but it has serious consequences if you need to do this sort of cleanup for a large database with hundreds of thousands of partitions.

By the way it is interesting to know that Oracle allows to drop segments from offlined tablespaces that are locally managed - think about the consequence: The local bitmaps that represent the allocated and free space in the tablespace can not be updated at the time of the drop operation. This is one of the few disadvantages compared to the old-fashioned dictionary managed tablespaces where an operation like that potentially could have been limited to a pure dictionary operation.

The approach Oracle chooses is straightforward: The segments dropped are converted into "temporary" objects (kind of a special "recyclebin" if you want to say so) and if the offlined tablespace should get onlined again the local bitmaps will be updated according to these temporary objects and finally the temporary objects will be removed from the dictionary. Otherwise if the offlined tablespace gets dropped the corresponding temporary objects consequently can also be removed.

Tanel Poder recently also blogged about this in the context of read-only tablespaces.

So you could now drop all indexes from the affected tables and recreate them afterwards, but this is not a good idea for several reasons:
* If the remaining partitions still represent multiple terabytes it will take quite long to rebuild all indexes
* Things might go wrong and you end up with missing indexes

Since these are partitioned objects, another Oracle best practice might come to rescue: Exchange the affected partitions with an exchange table excluding indexes - this allows to drop the exchanged partition residing in an offlined tablespace without any error message. Note that this means that after you've exchanged the partition you need to drop the unpartitioned object (that doesn't have indexes) in order to get rid of the affected segment.

Now if you only have a few objects that are affected or only a single partition to clean up per table then you're basically done, but if you need to clean up multiple partitions and the number of partitions is high then this approach is not feasible, because it sounds like quite an overhead to drop / create a table for each partition that needs to be cleaned up - in particular when talking about hundred thousands of partitions.

The approach of creating a table to exchange partitions with adds another complexity to the problem: Many large databases make usage of the option in Oracle to set columns unused instead of actually dropping them, since the former is a simple meta data operation in the dictionary (renames the column and marks the column as hidden) and independent from the data volume and therefore almost instant but the latter requires Oracle to process the whole segment which can take quite some time in case of huge segments (and is not a very efficient operation by the way, but that is something for a different post).

Now creating a table that can be exchanged with such an evolved table that contains unused columns can not be accomplished by simply doing a handy CTAS operation - the unused columns will be missing from the copy and therefore the exchange operation will fail with "column mismatch" errors.

So you need to deal with these unused columns somehow in that moment you need to create a table dynamically, and you need to do that if you need to drop more than a single partition per table.

The only other viable option that I'm aware of is to maintain this exchange table as a partitioned table itself permanently - which means create two exact copies (in terms of dictionary meta data) of every potentially affected table and make sure to apply the same DDL in the same order which will ensure that the exchange operation with those tables will succeed.

It will need two copies, one unpartitioned and one partitioned, if you want to avoid the drop and dynamic create table operation, since Oracle allows to exchange partitions only with unpartitioned tables. So it would take two exchange operations per affected partition - the first with the unpartitioned table, and the second with the partitioned table. Both exchange operations would have to be done excluding indexes and at least the final partitioned table has to be created excluding indexes - note that Oracle allows the exchange operation even if one table is missing a potentially created primary key constraint if you use the "EXCLUDING INDEXES" clause.

Then you can drop the partition that has been exchanged into the second table since the table doesn't have any indexes defined on it.

This allows you to accomplish two objectives:

1. Avoid a costly (and potentially complex in case of unused columns) drop / create table operation per partition
2. Manage the drop / create cycle by simply dropping and adding partitions in the second table which is far less overhead compared to a drop / create table operation which is important when performance matters

However maintaining two copies for every table that potentially needs to be cleaned up sounds like a huge overhead, and even in Oracle 11.2 with the deferred segment creation at least the partitioned table copy will allocate space since the deferred segment creation option is not (yet) applicable to partitioned objects.

So here a few ideas how to accomplish such a cleanup operation:

1. For each table to process create the required table copies on the fly dynamically once. If you don't have to deal with unused columns then a simple CTAS operation can be used to create those tables. If you need to handle unused columns, a more complex approach is required. You will have to use custom queries against the data dictionary to extract the required information since the official DBMS_METADATA interface as far as I know doesn't expose information about unused columns.

The following query could be used as a starting point to extract the column definition of a table:

-- Get the column list including hidden columns, but ignoring virtual columns added by function-based indexes
-- TODO: What about virtual columns (11g or later)?
select
decode(t.hidden_column,
'YES', 'SYS$' || internal_column_id || '$UNUSED',
t.column_name
) ||
' ' ||
lower(data_type) ||
decode(substr(data_type, 1, 9),
'TIMESTAMP', null,
'INTERVAL ', null,
decode(t.data_type,
'NUMBER', decode(t.data_precision || t.data_scale,
null, null,
'(' ||
decode(t.data_precision,
null, '*',
t.data_precision
) ||
decode(t.data_scale,
null, null,
',' || t.data_scale
) ||
')'
),
'FLOAT', '(' || t.data_precision || ')',
'LONG', null,
'LONG RAW', null,
'BLOB', null,
'CLOB', null,
'NCLOB', null,
'BFILE', null,
'CFILE', null,
'BINARY_FLOAT', null,
'BINARY_DOUBLE', null,
'MLSLABEL', null,
'ROWID', null,
'UROWID', null,
'DATE', null,
'(' ||
nvl(
nvl(
decode(t.char_length,
0, to_number(null),
t.char_length
),
t.char_col_decl_length
),
t.data_length
) ||
decode(t.character_set_name,
'CHAR_CS', decode(t.char_used,
'C', ' char',
'B', ' byte'
)
) ||
')'
)
) ||
' ' ||
decode(nullable,
'N', 'not null',
null
) ||
',' as column_def
from
all_tab_cols t
where
t.owner =
and t.table_name =
and t.segment_column_id is not null
order by
t.internal_column_id;

This will generate SYS$$UNUSED columns for all columns set to unused in the source table which can then subsequently be set to unused in the newly created table to have the same layout as the source table.

Note that it hasn't been tested with virtual columns yet, but should deal with most of the available data types and also handle columns using char semantics correctly (important for NCHAR / NVARCHAR based columns and databases running with multi-byte characters, in particular AL32UTF8). It also can handle virtual columns added via function-based indexes.

Of course the extraction process potentially needs to handle much more than shown here, for instance check the table properties like index organized tables, partition keys etc.

2. Process each affected table partition using the following approach:

* Add a new "clean" partition to the partitioned copy table in a tablespace that is online. The partition key is a "dummy" key that can be the same for each iteration of this process. This is the segment that is going to end up in the affected partitioned table and will be finally dropped since it then resides in a available tablespace. Remember, we want to get rid of those partitions residing in unavailable tablespaces, since we have not sufficient space in the clone of the database to make them available)

* Exchange the affected partition with the unpartitioned copy table EXCLUDING INDEXES (The unpartitioned table copy has also been initially created in an available tablespace)

* Exchange the unpartitioned copy table (that now holds the defect partition) with the newly created "clean" partition of the partitioned copy table EXCLUDING INDEXES

* Now the defect partition resides in the partitioned copy table and can simply be dropped since we don't have any indexes defined on this table

* Finally we can drop now the partition from the affected table (that is now the partition that has been added to the partitioned copy table in the previous iteration of the process) since the partition resides now in an available tablespace

The same process can be applied to a subpartitioned table. There are of course some more details to consider, for example, you can't drop the last subpartition or partition from a partition / table, but since we assume that our affected table in general still has some partitions left over that reside in current and available tablespaces it should be possible to handle these scenarios.

Here is a code snippet / template that applies this process:

/**
* Drop a (sub-)partition residing in an offline tablespace
* @param p_object The original object to clean up
* @param p_temp_exc_table_name The name of the working table
* @param p_clean_tablespace_name The tablespace where to create the cleanup partition
* @param p_partition_name The name of the (sub-)partition to drop
* @param p_segment_type The type of the segment (TABLE[ ][SUB][PARTITION])
* @param p_last_partition_indicator For subpartitions indicate that this is
* the last subpartition - this will drop the whole partition
**/
procedure drop_partition(
p_object in r_oracle_object
, p_temp_exc_table_name in out oracle_object
, p_clean_tablespace_name in oracle_object
, p_partition_name in oracle_object
, p_segment_type in oracle_object
, p_last_partition_indicator in varchar2
)
as
s_sql large_sql;
s_partition_type varchar2(255);
s_exchange_table oracle_object;
e_partition_in_offline_ts exception;
pragma exception_init(e_partition_in_offline_ts, -14117);
-- We do this in an autonomous transaction to keep it separate
-- from any outside transactions like the "queue" transaction
-- used by the parallel clean up threads
pragma autonomous_transaction;
begin
-- Determine if this is a partition or subpartition
s_partition_type := substr(p_segment_type, 7, 255);

-- Step 1: Add a clean partition to the temporary cleanup table
s_sql := 'alter table ' || p_temp_exc_table_name || ' add partition p_cleanup values (42) tablespace ' || p_clean_tablespace_name;

execute(p_object.owner, s_sql);

-- Get the name of the unpartitioned table copy
s_exchange_table := get_exchange_table(p_object);

-- Step 2: Exchange the defect (sub-)partition with the unpartitioned copy table EXCLUDING indexes
s_sql := '
alter table ' || p_object.segment_name || '
exchange ' || s_partition_type || ' ' || p_partition_name || '
with table ' || s_exchange_table || '
excluding indexes without validation';

execute(p_object.owner, s_sql);

-- Step 3: Exchange the unpartitioned copy table with the partitioned copy table partition again EXCLUDING indexes
s_sql := '
alter table ' || p_temp_exc_table_name || '
exchange partition p_cleanup
with table ' || s_exchange_table || '
excluding indexes without validation';

execute(p_object.owner, s_sql);

-- Step 4: Drop the defect partition now residing in partitioned "waste-bin"table partition
-- Since we don't have any indexes this is possible now
s_sql := 'alter table ' || p_temp_exc_table_name || ' drop partition p_cleanup';

begin
execute(p_object.owner, s_sql);
exception
-- Any tables with LOBs will cause an exception since they will implicitly have a LOB INDEX added
-- Therefore the DROP PARTITION will fail with ORA-14117
-- The simplest solution is to drop and recreate the entire working table - note the serious side effects
-- of the underlying ORA-01110 error in 11.1.0.7. See below comments about the potential impact
-- of the health check monitoring
-- A smoother way of handling this to avoid these potential side-effects is to check if the
-- table has any indexes defined on it and then do not attempt to drop the partition but
-- immediately drop the entire table
when e_partition_in_offline_ts then
drop_temporary_exc_table(p_object, p_temp_exc_table_name);
p_temp_exc_table_name := create_temporary_exc_table(p_object, p_clean_tablespace_name, p_segment_type);
end;

-- Step 5: Drop the partition / subpartition which completes the cleanup
-- TODO: Are really all subpartitions of a partition affected?
-- If not, this logic needs to be revised
if (s_partition_type = 'SUBPARTITION' and p_last_partition_indicator = 'Y') then
s_sql := 'alter table ' || p_object.segment_name || ' drop partition ' || get_parent_partition(p_object, p_partition_name);

execute(p_object.owner, s_sql);
elsif s_partition_type = 'PARTITION' then
s_sql := 'alter table ' || p_object.segment_name || ' drop partition ' || p_partition_name;

execute(p_object.owner, s_sql);
end if;
end drop_partition;

Note that tables with LOBs need special treatment since the logic used for tables without LOBs fails with an error that the partition resides in an offline tablespace when the "defect" partition gets dropped from the temporary cleanup table.

This is the error that is typically raised when attempting to drop a partition from a table having indexes defined on it. Very likely this error is caused by the fact that a LOB always has a corresponding LOB INDEX (that will be always exchanged when doing an EXCHANGE PARTITION even if EXCLUDING INDEXES is specified).

Since this index cannot be dropped from the temporary cleanup table we actually do not gain anything by performing the exchange exercise.
The simplest solution to the problem is to drop and recreate the cleanup table which is possible without the ORA-14117 error message being raised.

Note that attempting to drop the partition raising the ORA-14117 error has a serious side effect in 11.1.0.7 and 11.2.0.1: If the "Health Monitoring" detects a "flooding" ORA-1110 error (something is wrong with a datafile) any operation on these datafiles from then on seem to have tracing turned on - this leads to the odd situation that onlining such a tablespace again (which we can't do in our actual case since the datafiles are not there, but I've done for testing purposes) takes minutes and generates trace files in size between 20 and 500 MB (!).

I haven't investigated much into this, but in my tests this couldn't be avoided by setting the undocumented parameter "_disable_health_checks=TRUE".

The MMON_SLAVE process will dump something similar to this:

========= Dump for error ORA 1110 (no incident) ========
----- DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----

Also for the sessions that cause the error something like this will be written to trace files:

DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
DDE Action 'DB_STRUCTURE_INTEGRITY_CHECK' was flood controlled
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (FLOOD CONTROLLED, 17 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----

A smoother way of handling this to avoid these potential side-effects is to check if the table has any indexes defined on it and then do not attempt to drop the partition but immediately drop the entire table.

Since each iteration of this process takes some time (in a database with hundreds of thousands of partitions this can take a few seconds per partition due to the underlying recursive data dictionary operations) and is according to some traces mostly CPU bound you can speed up the processing by using multiple processes it if you have multiple CPUs available.

Using an approach with Advanced Queueing having multiple consumers (for instance as many consumers as CPUs) we were able to cleanup more than 200,000 partitions (table plus index (sub)partitions) in less than one hour which was within the available timeframe for that operation.

CLUSTERING_FACTOR What-If Analysis

In Oracle the clustering factor of an index is a single number that is supposed to represent the correlation between the order of the index and the order of the corresponding table.

If an execution plan contains an index range scan including an access to a table by ROWID, the clustering factor tells the cost-based optimizer how "clustered" or "scattered" the data in the table is with respect to the index - that is on average how likely contiguous rows from an index range scan will point to the same table block. Widely scattered data will require to read a different block for every row returned from the index range scan, in contrast in the case of clustered data many of the contiguous rows from the index will point to the same block, making a significant difference to the cost calculated by the cost-based optimizer (and of course making also a significant difference at actual execution time, in particular when many table blocks have to be read from disk).

As described e.g. in detail by Jonathan Lewis in his Cost-Based Fundamentals book, Chapter 5 (Clustering Factor), there are various scenarios that have significant influence on the clustering factor, and there are scenarios where Oracle actually doesn't get the clustering factor right - in particular when dealing with data that is inserted simultaneously in conjunction with some segment space management method that attempts to spread the inserted data across different blocks to avoid contention. This can happen for instance when using multiple freelists/freelist groups with manual segment space management (MSSM) or automatic segment space management (ASSM) for that matter.

Usually the clustering factor in case of an index range scan with table access involved represents the largest fraction of the cost associated with the operation, therefore indexes with high clustering factors (meaning that the table data is scattered in relation to the index order) tend to be ignored by the cost-based optimizer and different access paths might be favored instead, like full table scans or the usage of different available indexes.

Since this clustering factor is therefore often such a crucial information I have derived a simple query from what DBMS_STATS.GATHER_INDEX_STATS uses (and Jonathan mentions and explains in his book) and that allows to perform a what-if analysis regarding the clustering factor.

It can be used to:

- Validate/correct the clustering factor of an existing index determined by DBMS_STATS
- In particular check if concurrent inserts together with some segment space management lead to an non-representative clustering factor (this is something DBMS_STATS does not support at present and can only be corrected manually using DBMS_STATS.SET_INDEX_STATS)
- Perform various what-if scenarios without the need to actually create/re-create the index, e.g.
- The clustering factor of a new index to be added
- The effect of adding a column to an existing index on the clustering factor
- The effect of changing the column order of an existing index
- The effect of changing an index to an reverse index

Similar information could also be obtained by simply creating the corresponding index, but there are some points to consider here:

Creating an index up to release 10g might lead to unwanted changes in the execution plans if performed on a live system. In 11g the option to create an index as invisible can be used to avoid this, however you still incur the overhead of writing the index structure and allocating physical space, which could be significant depending on the amount of data.

Furthermore the query allows for certain kinds of analysis that is simply not possible by creating an index - more on this later.

The query has this general form:

select
sys_op_countchg(substrb(row_id,1,15), )
[* (100 / %perc)] as clustering_factor
, count(*)
[* (100 / %perc)] as cnt
, count(distinct substrb(row_id,1,15))
[* (100 / %perc)] as blocks
from
(
select /*+ no_merge no_eliminate_oby */
/* optionally use parallelism e.g. parallel(t, 2) */
rowid as row_id
[, t.*]
[, dbms_rowid.rowid_relative_fno(rowid) as file_number]
[, dbms_rowid.rowid_block_number(rowid) as block_number]
[, dbms_rowid.rowid_row_number(rowid) as row_number]
from
[sample [block] (%perc)] t
where
is not null
[or is not null]
order by
[asc|desc]
[, [asc|desc]]
, rowid
);

and a first version of a ready-to-use script could look like this:

-- Simulate / calculate clustering factor
set verify off

set termout off

column ora10 new_value if_v10 noprint
column oralower10 new_value if_lower_v10 noprint

-- Determine version for regular expression support
select
decode(substr(banner, instr(banner, 'Release ') + 8, 1), '1', '', '--') as ora10
from
v$version
where
rownum = 1;

select
decode('&if_v10', '--', '', '--') as oralower10
from
dual;

set termout on

accept table_name prompt 'Enter table name: '

define sample_pct = 100

accept sample_pct number default &sample_pct prompt 'Enter sample percent (default &sample_pct): '

define p_degree = DEFAULT

accept p_degree default &p_degree prompt 'Enter parallel degree (default &p_degree): '

define history = 1

accept history number default &history prompt 'Enter number of blocks to remember (default &history): '

-- ideally comma separated without spaces surrounding the comma
accept col_list prompt 'Enter comma separated index column list: '

set termout off

column define_sample_block new_value sample_block noprint

select
case
when &sample_pct < 100
then 'sample block (&sample_pct)'
else ''
end as define_sample_block
from
dual;

column col_list_where_expr new_value col_list_where noprint

-- Try to get clever with the WHERE clause derived from the index expression
select
replace(
&if_v10 regexp_replace('&col_list', '( asc| desc)?( nulls (last|first))?', '')
&if_lower_v10 replace(replace(replace(replace(replace('&col_list', ' asc', ''), ' desc', ''), ' nulls', ''), ' last', ''), ' first', '')
, ','
, ' is not null or '
) || ' is not null' as col_list_where_expr
from
dual;

column p_degree_hint_expr new_value p_degree_hint noprint

select
case
when '&p_degree' != 'DEFAULT'
then 'parallel (t &p_degree)'
else ''
end as p_degree_hint_expr
from
dual;

-- set echo on verify on
set termout on

select
sys_op_countchg(substrb(row_id,1,15), &history)
* (100 / &sample_pct) as clustering_factor
, count(*)
* (100 / &sample_pct) as cnt
, count(distinct substrb(row_id,1,15))
* (100 / &sample_pct) as blocks
from
(
select /*+ no_merge no_eliminate_oby &p_degree_hint */
rowid as row_id
--, t.*
from
&table_name &sample_block t
where
&col_list_where
order by
&col_list
, rowid
);

It is based on the undocumented aggregate function SYS_OP_COUNTCHG that is apparently used by DBMS_STATS to calculate the clustering factor.

The second parameter to this function (I called it "history") is very interesting, since it represents the number of blocks the function "remembers" to determine if the block has "changed" or not. DBMS_STATS uses 1 as value and therefore if we have data that is still clustered but unluckily scattered across a few blocks it will lead to a likely non-representative clustering factor since walking the index may jump forth and back between these few blocks but the SYS_OP_COUNTCHG function will increase the clustering factor with each different block, although it stays within the same few blocks and therefore these blocks very likely will be held in the cache.

For example in case of concurrent inserts and ASSM or freelist / freelist groups with MSSM choosing an appropriate number of blocks to retain could be the number of concurrent processes that insert the data - more on this in the demonstration part later.

The remaining placeholders are straightforward - the table name obviously, and if it is a large table you can use the sample clause to avoid reading the whole table, but then the values returned need to be adjusted accordingly - you could also try to run a potentially required full table scan in parallel (something that the original DBMS_STATS query doesn't) - note however that the aggregate function will/needs to be performed by the query coordinator (due to the dependency of the clustering factor evaluation on the data order) which might represent the bottleneck in case of parallel execution.

If you want to get a feeling on how the data will be sorted according to the index definition you can use the [t.*] and [DBMS_ROWID...] clauses and execute only the inner query without the aggregate function - in this case a potentially required sort operation is going to be more costly due to the increased data volume to sort.

The expressions in the WHERE and ORDER BY clause are supposed to represent the columns and/or expressions (in case of function-based indexes) used in the index definition.

The WHERE clause will ensure that only data will be considered that leads to non-null expressions in the index (a b*tree index only covers non-null data), and the ORDER BY clause will order the data the way the index will be ordered.

Note that non-unique indexes will get the ROWID added to make the index expression unique - for unique indexes this is not required, but doesn't harm, since the expression by itself is already "unique". You can omit the ROWID in this case, but it won't change the outcome.

This query also allows some interesting considerations. For example if you know that you'll mostly access only a particular "hot" part of the table which is well clustered, but the remaining "cold(er)" part of the table is rather scattered (for example in case of batch inserts of newly arrived data into a partially deleted and shrunk table via the new SHRINK option introduced in 10g), the overall average clustering factor determined might be bad but probably not representative for a typical query accessing only the "hot"/"latest" data. You could then add the corresponding selection criteria to the query to restrict the data analysed accordingly and use the obtained clustering factor to correct the index statistics using DBMS_STATS.SET_INDEX_STATS.

Here is an demonstration of some of the common scenarios regarding the clustering factor. It allows to reproduce issues with concurrent inserts, extra columns and changed column order. It is a modification of some code I've recently used to reproduce similar issues that one of my clients had.

prompt Tablespace
accept tblspace
rem define tblspace = test_8k

drop table t1 purge;

drop sequence seq_t1_run_id;

drop sequence seq_t1_seq_id;

create table t1 (
run_id integer not null, /* identify the process inserting the data */
batch_id integer not null, /* represents clustered data, could also be a (arriving) date */
a_value number null, /* represents sequence based data */
a_random number null, /* represents randomly scattered data */
a_date timestamp default systimestamp not null, /* represents the insert timestamp */
filler char(1) default 'x' not null /* can be used to size the row as required */
)
tablespace &tblspace;

/* a sample index */
create index t1_idx1 on
t1 (
batch_id,
a_value
)
tablespace &tblspace;

create sequence seq_t1_run_id;

create sequence seq_t1_seq_id;

create or replace procedure populate_t1(i_run_id in integer, i_iter in integer) as
begin
dbms_output.put_line(
dbms_lock.request(
1
, dbms_lock.s_mode
, release_on_commit => true
)
);
commit;
for i in 1..i_iter loop
for j in 1..100 loop
insert into t1 (
run_id
, batch_id
, a_value
, a_random
)
values (
i_run_id
, i
, seq_t1_seq_id.nextval
, trunc(dbms_random.value(1, 1000))
);
commit;
dbms_lock.sleep(0.01);
end loop;
end loop;
end;
/

If you want to test the effect of concurrent inserts with ASSM for instance, choose an appropriate tablespace (or modify the script to use freelists / freelist groups with MSSM) and run the following code.

In a main session run this:

-- run this as main session
-- afterwards start as many of the below code snippets
-- and press return in the main session
-- to let them all start at the same time

truncate table t1;

begin
dbms_output.put_line(
dbms_lock.request(
1
, dbms_lock.x_mode
, release_on_commit=>true
)
);
end;
/

prompt Press return to continue when sessions have been started

accept x

commit;

prompt Press return to continue when sessions have completed

accept x

exec dbms_stats.gather_table_stats(null, 't1', estimate_percent=>null, cascade=>true)

select
index_name
, clustering_factor
from
user_indexes
where
table_name = 'T1';

Then start this in as many sessions as you want to run concurrently:

set timing on echo on

variable n_run_id number

exec select seq_t1_run_id.nextval into :n_run_id from dual;

exec populate_t1(:n_run_id, 100)

commit;

After the sessions have started they're going to wait on the lock of the main session. Press ENTER to get the sessions started and press ENTER again after the session have completed to gather statistics and get initial information about the CLUSTERING_FACTOR of the sample index determined by DBMS_STATS.

The number of iterations (set to 100 in the code snippet above) determines how long this code will run - every insert is delayed by 1/100th of a second - the minimum delay supported by DBMS_LOCK.SLEEP - so the block above will insert 10,000 rows lasting approx. 100 seconds.

The code uses the simple synchronisation method also used by Jonathan in his sample scripts (based on DBMS_LOCK.REQUEST) - the main session allocates a user lock in exclusive mode, all other sessions attempt to request this in shared mode. Therefore all sessions will wait until the main session commits to release the lock. Note that this uses a hard coded lock handle - in a non-test system it is advisable to use DBMS_LOCK.ALLOCATE_UNIQUE to generate a unique lock handle.

Depending on what you've chosen as concurrency and segment space managment, the clustering factor of the index on (batch_id, a_value) might be close to the number of blocks or rows in the table as determined by the final DBMS_STATS call.

You can use now the query to perform some analysis regarding the clustering factor. You could run e.g. the following query:

select
sys_op_countchg(substrb(row_id,1,15), 1) as clustering_factor
, count(*) as cnt
, count(distinct substrb(row_id,1,15)) as blocks
from
(
select /*+ no_merge no_eliminate_oby */
rowid as row_id
, t1.*
from
t1
where
batch_id is not null
or a_value is not null
order by
batch_id
, a_value
, rowid
);

This should give you exactly the clustering factor that has been determined by the DBMS_STATS call used above.

In my case when using MSSM I got a clustering factor of 190 with the table having 186 blocks for 40,000 rows (four concurrent processes each inserting 10,000 rows).

When using ASSM for the same setup (four processes) I got a clustering factor of 28,483 (!) for the same index. Note that the results might vary significantly, depending on how the processes were assigned to the different freelists (MSSM) or block groups (ASSM).

If you've used ASSM or freelist / freelist groups with MSSM then replace the "history" parameter with your number of concurrent processes (or number of freelists, if you had more processes than freelists), e.g. in case of four concurrent processes:

select
sys_op_countchg(substrb(row_id,1,15), 4) as clustering_factor
...

and you should notice a significant drop in the clustering factor, caused by the fact that multiple concurrent inserts used different blocks and therefore the data is not in a single block, but clustered in a few blocks and Oracle has in this constructed case to "jump" forth and back between these few blocks to obtain the data (actually caused by the A_VALUE column which is an increasing value, but written concurrently by the different processes, so with ASSM/freelist (groups) each increasing value is potentially stored in a different block).

In my particular case the simulated clustering factor for ASSM dropped from 28,483 to 188.

Some variations of the query allow to reproduce some other scenarios, e.g. use the following columns to see the impact of adding a badly scattered column to an index:

batch_id

(in my case clustering factor 188 for MSSM with default freelists)

vs.

batch_id
, a_random

(in my case clustering factor 22,191 for MSSM with default freelists, selecting a history size of 4 showed a clustering factor of 190)

or this one to see the impact of changing the column order:

batch_id
, a_value
, a_random
, rowid

(in my case clustering factor 190 for MSSM with default freelists)

vs.

a_random
, batch_id
, a_value
, rowid

(in my case clustering factor 35,904 for MSSM with default freelists, and here increasing the history size to 4 doesn't make a significant difference)

You could also use variations of the following query to get a feeling how the data arrived in the table:

select
rowid
, run_id
, batch_id
, a_value
, to_char(a_date, 'DD.MM.YYYY HH24:MI:SS.FF') as insert_timestamp
, dbms_rowid.rowid_relative_fno(rowid) as file_number
, dbms_rowid.rowid_block_number(rowid) as block_number
, dbms_rowid.rowid_row_number(rowid) as row_number
from
t1
order by
a_date;

By using different ORDER BYs (or no ORDER BY) for the above query you can get some other interesting insights how the data is stored in the table - in particular the difference when using multiple freelists or ASSM with concurrent inserts.

Happy simulating!