Search

Top 60 Oracle Blogs

Recent comments

October 2009

Checkpoint presentation presented at the RAC SIG

Yesterday I presented my What’s the Point of Oracle Checkpoints presentation at the Oracle RAC SIG and although I was a bit nervous it went quite well. However somehow my assigned time slot was shortened without notice and instead of having 90 minutes for my presentation I had to cut short after just 60 minutes. [...]

Book Adverts

I'm very pleased to announce two Oracle related books I have co-authored are to be published shortly by Apress.

Database Links and PeopleSoft

I have seen Oracle database links used in conjuction with PeopleSoft at a number of customer sites over the years. I think some scenarios are examples where it is reasonable to use a database link, some are not.
In Oracle RDBMS, a database link (see also the Oracle Concepts Manual) specifies how a database server can create a session on another database in order to perform a remote operation. The remote database may be another Oracle database, or another type of database. The remote operation may be a query, or it may modify remote data (DML operation), in which case a two-phase commit mechanism ensures transaction integrity across the databases.

Database Links and PeopleSoft

I have seen Oracle database links used in conjuction with PeopleSoft at a number of customer sites over the years. I think some scenarios are examples where it is reasonable to use a database link, some are not.
In Oracle RDBMS, a database link (see also the Oracle Concepts Manual) specifies how a database server can create a session on another database in order to perform a remote operation. The remote database may be another Oracle database, or another type of database. The remote operation may be a query, or it may modify remote data (DML operation), in which case a two-phase commit mechanism ensures transaction integrity across the databases.

Multi-column joins

Consider the following scenario with four tables. Two of them represent master data, the third one uses a concatenated primary key consisting of foreign keys to the first two, and the fourth one has a foreign key to the third one.

drop table t1 cascade constraints purge;

drop table t2 cascade constraints purge;

drop table t3 cascade constraints purge;

drop table t4 cascade constraints purge;

create table t1 (
t1_id integer not null constraint pk_t1 primary key,
filler1 varchar2(40),
filler2 varchar2(40)
);

create table t2 (
t2_id integer not null constraint pk_t2 primary key,
filler1 varchar2(40),
filler2 varchar2(40)
);

create table t3 (
t1_id integer not null,
t2_id integer not null,
filler1 varchar2(40),
filler2 varchar2(40),
constraint pk_t3 primary key (t1_id, t2_id) using index (
create index pk_t3 on t3 (t1_id, t2_id)
),
constraint fk_t3_1 foreign key (t1_id) references t1 (t1_id),
constraint fk_t3_2 foreign key (t2_id) references t2 (t2_id));

create table t4 (
t4_id integer not null constraint pk_t4 primary key,
t1_id integer,
t2_id integer,
filler1 varchar2(40),
filler2 varchar2(40),
constraint t4_fk_1 foreign key (t1_id, t2_id) references t3 (t1_id, t2_id)
);

Notice that the primary key of "t3" is using a non-unique index, which is supported and can be used e.g. for deferrable constraints or when loading data into tables that might be non-unique so that the constraint can be disabled without dropping the (unique) index. This allows to simply re-enable the constraint after cleaning up the non-unique rows instead of re-creating an unique index (and the risk of losing the index if anything goes wrong).

Now when using an uncorrelated data set for the concatenated keys, Oracle's default (join) selectivity formulas apply and the estimated cardinalities are correct. Table "t1" has 10,000 rows, "t2" 3 rows, table "t3" holds 30,000 rows combining "t1" and "t2" data. "t4" has 300,000 rows.

-- non-correlated column values
exec dbms_stats.set_table_stats(null, 't1', numrows=>10000, numblks=>100, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't2', numrows=>3, numblks=>1, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't3', numrows=>30000, numblks=>300, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't4', numrows=>300000, numblks=>3000, avgrlen=>100)

exec dbms_stats.set_column_stats(null, 't1', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't2', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't4_id', distcnt=>300000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_index_stats(null, 'pk_t1', numdist=>10000, clstfct=>10000, indlevel=>2, numlblks=>10, numrows=>10000)

exec dbms_stats.set_index_stats(null, 'pk_t2', numdist=>3, clstfct=>3, indlevel=>1, numlblks=>1, numrows=>3)

exec dbms_stats.set_index_stats(null, 'pk_t3', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>30000)

exec dbms_stats.set_index_stats(null, 'pk_t4', numdist=>300000, clstfct=>300000, indlevel=>2, numlblks=>300, numrows=>30000)

Joining t4 to t3 results in a correct estimate of 300K rows:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1456 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 300K| 14M| 1120K| 1456 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

Things look however different if we have the awkward situation of correlated column values for the concatenated keys:

-- correlated column values
exec dbms_stats.set_table_stats(null, 't1', numrows=>10000, numblks=>100, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't2', numrows=>20000, numblks=>200, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't3', numrows=>30000, numblks=>300, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't4', numrows=>300000, numblks=>3000, avgrlen=>100)

exec dbms_stats.set_column_stats(null, 't1', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't2', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't4_id', distcnt=>300000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_index_stats(null, 'pk_t1', numdist=>10000, clstfct=>10000, indlevel=>2, numlblks=>10, numrows=>10000)

exec dbms_stats.set_index_stats(null, 'pk_t2', numdist=>20000, clstfct=>20000, indlevel=>2, numlblks=>20, numrows=>20000)

exec dbms_stats.set_index_stats(null, 'pk_t3', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>30000)

exec dbms_stats.set_index_stats(null, 'pk_t4', numdist=>300000, clstfct=>300000, indlevel=>2, numlblks=>300, numrows=>30000)

Here we simulate 20,000 distinct values in one column, 10,000 distinct values in the second one, but only 30,000 distinct values for the combination of both columns. In this case Oracle's default selectivity formula underestimates the cardinality since it is assuming uncorrelated values:

select /*+ opt_param('_optimizer_join_sel_sanity_check', 'false') */
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 45 | 2340 | 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

Oracle simply multiplies the selectivity of the two columns and arrives at a join cardinality of 45 rows (1/20,000*1/10,000*300,000*30,000).

You'll notice that I had to use a undocumented optimizer parameter to arrive at that default selectivity. If you run an EXPLAIN PLAN for the same statement without the hint, you'll get the following estimate:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 30000 | 1523K| 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

It can be seen from a 10053 optimizer trace file that Oracle uses a "Multi-column cardinality sanity check" by default in cases where the calculated multi-column selectivity falls below a certain limit, obviously using the smaller selectivity available from the different 1/num_rows of the tables/row sources involved in the join, arriving at an estimate 30,000 rows in this particular case.

Changing the non-unique index used for the primary key on "t3" to a unique index will bring another sanity check into the picture: The "concatenated index" sanity check that uses the number of distinct values of an unique index that corresponds exactly to the join columns used.

create table t3 (
t1_id integer not null,
t2_id integer not null,
filler1 varchar2(40),
filler2 varchar2(40),
constraint pk_t3 primary key (t1_id, t2_id) using index (
create unique index pk_t3 on t3 (t1_id, t2_id)
),
constraint fk_t3_1 foreign key (t1_id) references t1 (t1_id),
constraint fk_t3_2 foreign key (t2_id) references t2 (t2_id));

With this unique index in place Oracle uses the number of distinct keys from this index to calculate the selectivity of the join and therefore arrives at the correct cardinality again:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 300K| 14M| 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

So this is another case where the uniqueness of an index makes a significant difference for optimizer calculations.

Note that from 11g on there more options to help the optimizer to come up with a better estimate even with the non-unique index on (t3.t1_id, t3.t2_id). Obviously 11g introduced extended statistics on column groups, so we can do the following:

variable ext_name varchar2(30)

exec :ext_name := dbms_stats.create_extended_stats(null, 't3', '(t1_id, t2_id)')

exec dbms_stats.set_column_stats(null, 't3', :ext_name, distcnt=>30000, nullcnt=>0)

This allows to derive the correct selectivity for these correlated column values using the extended statistics set.

Another option in 11g is adding an index on (t4.t1_id, t4.t2_id), like that:

create index ix_t4 on t4 (t1_id, t2_id);

exec dbms_stats.set_index_stats(null, 'ix_t4', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>300000)

Having now two non-unique indexes Oracle 11g comes up again with the correct join cardinality of 300K. Notice that this doesn't work in pre-11g. Pre-11g versions require the index on t3 to be unique to take advantage of the "concatenated index" sanity check.

Having demonstrated all these sanity checks available for multi-column joins (the general multi-column and the concatenated index sanity check), let's see what happens when joining three tables:

select
count(*)
from
t1
, t3
, t4
where
t4.t1_id = t1.t1_id
and t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | | 1468 (4)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 65 | | | |
|* 2 | HASH JOIN | | 300K| 18M| | 1468 (4)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN | PK_T1 | 10000 | 126K| | 4 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 300K| 14M| 1120K| 1455 (3)| 00:00:18 |
| 5 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
-----------------------------------------------------------------------------------------

All I've done is to add "t1", in this case joining to "t4" on "t1_id". Thanks to the concatenated index sanity check (or the extended column group statistics in 11g) the calculated join cardinality is still 300K.

Now what happens if one decides to join "t3" to "t1" on "t1_id" instead of "t4.t1_id"? From a logical point of view this should lead to exactly the same result, since we can deduce that if "t4.t1_id" = "t1.t1_id" and "t3.t1_id = t1.t1_id" then "t3.t1_id = t4.t1_id".

select
count(*)
from
t1
, t3
, t4
where
t4.t1_id = t1.t1_id
and t3.t1_id = t1.t1_id
and t3.t2_id = t4.t2_id;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | | 1475 (4)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 65 | | | |
|* 2 | HASH JOIN | | 45 | 2925 | | 1475 (4)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN | PK_T1 | 10000 | 126K| | 4 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 450K| 22M| 1120K| 1459 (3)| 00:00:18 |
| 5 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
-----------------------------------------------------------------------------------------

The result is astonishing. By making this simple change we have effectively disabled all available sanity checks and arrive at the result based on the the default, uncorrelated selectivity.

So whenever you perform multi-column joins and the column data is correlated, be very careful how you join the tables - it might make a significant difference to the calculations of the optimizer.

UKOUG 2009

I'll be giving two one hour presentations at the upcoming UKOUG conference 2009 at Birmingham (30th November - 2nd December):

1. Monday, 30th November, 16:00-17:00: "CBO fundamentals: Understanding the different modes of System Statistics"

2. Wednesday, 2nd December, 11:05-12:05: "Everything you always wanted to know about FIRST_ROWS_N but were afraid to ask"

Both presentations cover fundamental functionality of the Cost-Based Optimizer which will help you to better understand why and how the optimizer comes up with certain execution plans, and in particular how the underlying cost calculation works. So eventually, if you haven't got an explanation yet I'll show you what the "cost" calculated effectively means and how to appropriately use the different optimization modes (ALL_ROWS, FIRST_ROWS, FIRST_ROWS_n) available.

So if you happen to attend the conference I'm looking forward to meeting you there.

By the way, I recommend visiting this link and save your personalised agenda to help with the room planning.

CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV)

As we have seen in the previous posts of this series, in 11g a new figure named "NewDensity" has been introduced as a replacement for the "density" column statistic for columns whose histogram has been collected; this change has been backported in 10.2.0.4 also.
In the previous post we discussed how NewDensity influences the CBO [...]

Use NULL for unknown data...

Hah, it goes back much further than I thought... Snopes.com pointed out this morning that the use of a 'bad default value' dates back to at least 1979...

See http://tkyte.blogspot.com/2009/10/back-from-oracle-openworld.html for the original reference.

Back from Oracle OpenWorld

I am back and have been taking a set of new questions on asktom. Last week was a busy one out in California and I'm finally getting caught up on emails and questions (100% on the former, still working on the latter)

Anyway, I saw an article and it made me laugh - and sort of cry at the same time. It has to do with the use of default values. A lot of developers/DBAs have a very certain fear (that is the best word I can think of to describe their attitude) of NULL - the 'unknown' value. So, instead of using NULL for an effective_end_date field (for records that we don't know the end date for, they don't have one) or using NULL for values they do not know the value of - they use some 'fake' value. This fake value is assumed to be a value that could never possibly be used.

But....

Things change over time.

I'm sure when the developers implemented this system - using XXXXXXX for a license plate value that was unknown seemed 'reasonable'. I mean - who would ever ask for a vanity plate with seven X's on them?

Talk to the guy with almost $20k in fines that aren't his to see if he might know someone that might want a vanity plate with XXXXXXX on it :)

Do not fear NULL.

Understand it, but don't fear it.

http://tkyte.blogspot.com/2006/01/something-about-nothing.html
http://tkyte.blogspot.com/2006/01/mull-about-null.html

Things that make you go "hmmmmm"

I was reading a blog post entitled How Good is Good?. It was written 8 years ago by a graphic designer by the name of Stefan Sagmeister. The post was speaking about creating designs that were meaningful and make an impact on the world (or the part of the world each design is directed toward). While the post was interesting, it was one of the comments that caught my eye. Most of the commenters were inspired by the post but one commenter, in reference to the idea of making a match between good causes and good design a priority vs the more mundane every day marketing stuff, said:

Simple, don't go into graphic design, Choose something else. Cause if you don't do "that" work, other designers will. Do people really think that everybody enjoys their job?