Oakies Blog Aggregator

Book Adverts

I'm very pleased to announce two Oracle related books I have co-authored are to be published shortly by Apress.


The first book is entitled Expert Oracle Practices: Oracle Database Administration from the Oak Table and it will be officially published in December. There are 9 chapters currently available for eBook download via the Apress Alpha program. The Alpha program allows you to purchase (at a substantial discount from the full publish price) an unedited, unfinished pre-release format. The full book isn't available yet, but when it becomes available, you will be able to download the full eBook at no additional cost. It's a good option if you want to get a jump on what's to come! I wrote the chapter on Managing SQL Performance. I'm really excited about the book and am very proud to be among the group of Oak Table folks who are co-authoring it together.

Not far behind the Oak Table book will be Beginning Oracle SQL. The book is a revision of the classic Mastering Oracle SQL and SQL*Plus by Lex de Haan originally published in 2004. Again, I was fortunate to be able to work with a fantastic group of co-authors to revise Lex's work to include many of Oracle's latest developments to the SQL query language. I knew Lex for only a very short time before his passing in 2006 but I knew him to be a brilliant and wonderful man and I'm very honored to be able to contribute to revising his original work.

These books represent my first time as a published author. I've written many whitepapers and articles for various Oracle user group periodicals and such in the past but never attempted the process of writing a book. After having this experience, I'm grateful that my first time was as a co-author. It's comforting to know there are other folks out there sweating the process with you!

The folks at Apress have been awesome and I hope that my first experience as an author won't be my last!

Database Links and PeopleSoft

I have seen Oracle database links used in conjuction with PeopleSoft at a number of customer sites over the years. I think some scenarios are examples where it is reasonable to use a database link, some are not.
In Oracle RDBMS, a database link (see also the Oracle Concepts Manual) specifies how a database server can create a session on another database in order to perform a remote operation. The remote database may be another Oracle database, or another type of database. The remote operation may be a query, or it may modify remote data (DML operation), in which case a two-phase commit mechanism ensures transaction integrity across the databases.

PeopleSoft does not use database links between databases because they are a database platform specific technology. Instead, the PeopleSoft methodology is to replicate data between databases with an application message. This works well with small data sets, and with larger data sets that change slowly. However, there are scenarios with very large data volumes where Application Messaging will struggle to keep up, and Oracle RDMBS specific replication technologies are an appropriate alternative.

Data can be replicated between databases with Materialised Views (formerly known as Snapshots). This is a SQL based technology. Materialised Views Logs on the source database track changes in the source table. An incremental or full refresh process is run on the target, usually as a scheduled job.

Oracle introduced an alternative replication technology in Oracle 9i called ‘Streams’. This uses supplementary redo logging on the source database which can then be applied to the target. Additional database server processes transmit, receive and apply this redo stream. This technology is aimed a large volume replication for data warehouses. I know of one site where several hundred database tables are replicated from PeopleSoft HR to a data warehouse.

I think the clearest way is to explain the good and bad use of links is by example.

At two different sites with HR and Finance, I have seen similar examples of both good and bad use of database links (one was on PeopleTools 7.0, before Application Messaging was available).

  • GL data needs to be sent from Payroll in the HR database to GL in the Financials database. The PeopleSoft delivered mechanism was to use generate and reload an interface file. This customer changed replaced the interface table in HR with a view that referenced a table with the same name and structure on the Financial system via a link. The payroll GL extract process now inserted directly into the table on Financials. I think this is a perfectly reasonable use of a database link. It was simple. It performed better than the interface file, and would certainly have been faster than application messaging. 
  • However, the same customer also used the employee expenses module in Financials. This required employee personal data. So they made the PS_PERSONAL_DATA table in Financials a view of the corresponding table in HR. The result was that every time somebody opened the expenses component in Financials, the application server process referenced the database link to HR. There is a limit on the maximum number of database links that a session can concurrently open (OPEN_LINKS) and that the whole database instance can concurrently open (OPEN_LINKS_PER_INSTANCE). They both default is 4, which gives an indication of how Oracle expect you to use this feature. This system ran out of database links. No Oracle errors were generated, instead sessions wait to be allowed to make the connection to the remote database. There are specific database link wait events that report the amount of time lost.  Eventually you reach the point where Tuxedo services timeout and then users receive errors. I think this is an example of the wrong way to use a database link. Instead I think that the data should have been replicated from HR to Financials. In a modern PeopleSoft system this should be done with an application message (on the 7.0 system a Materialised View could have been used). 
  • Not only were database links used to provide data to components, but the same views, that referenced remote objects, were used in queries and reports resulting in complex distributed queries. When multiple remote objects are referenced in a single SQL, I have seen Oracle decide to copy the whole of one or more of these objects to the local temporary tablespace. This does not result in good performance. Remember also, that even query only operations via a database link create a transaction on the remains until a commit or rollback is issued.
  • I worked on another site where Materialised Views were used to incrementally replicate data from CRM and HR to EPM databases. Processes in the EPM database then referenced the Materialised Views. In this system, there was never any question of EPM processes referencing objects via the links. The data volumes were such that Application Messaging would never have coped. In this case only the Materialised View refresh process references the database links, and so the DBA can manage the usage of links. This is a system where (after upgrade to Oracle 10g) Streams could also have been used to replicate the data.

Conclusion

My view is that database links should not be used to directly support functionality in either on-line components or reporting. When this is done in reports and queries it presents the problem of tuning distributed queries, and having to decide whether local or remote database should drive the query. Instead data should be replicated to the local database, preferably by PeopleSoft messaging, but if necessary by Oracle replication. However, I think that it can be reasonable to use a database link in an occasional batch process that moves data between systems.

Database Links and PeopleSoft

I have seen Oracle database links used in conjuction with PeopleSoft at a number of customer sites over the years. I think some scenarios are examples where it is reasonable to use a database link, some are not.
In Oracle RDBMS, a database link (see also the Oracle Concepts Manual) specifies how a database server can create a session on another database in order to perform a remote operation. The remote database may be another Oracle database, or another type of database. The remote operation may be a query, or it may modify remote data (DML operation), in which case a two-phase commit mechanism ensures transaction integrity across the databases.

PeopleSoft does not use database links between databases because they are a database platform specific technology. Instead, the PeopleSoft methodology is to replicate data between databases with an application message. This works well with small data sets, and with larger data sets that change slowly. However, there are scenarios with very large data volumes where Application Messaging will struggle to keep up, and Oracle RDMBS specific replication technologies are an appropriate alternative.

Data can be replicated between databases with Materialised Views (formerly known as Snapshots). This is a SQL based technology. Materialised Views Logs on the source database track changes in the source table. An incremental or full refresh process is run on the target, usually as a scheduled job.

Oracle introduced an alternative replication technology in Oracle 9i called ‘Streams’. This uses supplementary redo logging on the source database which can then be applied to the target. Additional database server processes transmit, receive and apply this redo stream. This technology is aimed a large volume replication for data warehouses. I know of one site where several hundred database tables are replicated from PeopleSoft HR to a data warehouse.

I think the clearest way is to explain the good and bad use of links is by example.

At two different sites with HR and Finance, I have seen similar examples of both good and bad use of database links (one was on PeopleTools 7.0, before Application Messaging was available).

  • GL data needs to be sent from Payroll in the HR database to GL in the Financials database. The PeopleSoft delivered mechanism was to use generate and reload an interface file. This customer changed replaced the interface table in HR with a view that referenced a table with the same name and structure on the Financial system via a link. The payroll GL extract process now inserted directly into the table on Financials. I think this is a perfectly reasonable use of a database link. It was simple. It performed better than the interface file, and would certainly have been faster than application messaging. 
  • However, the same customer also used the employee expenses module in Financials. This required employee personal data. So they made the PS_PERSONAL_DATA table in Financials a view of the corresponding table in HR. The result was that every time somebody opened the expenses component in Financials, the application server process referenced the database link to HR. There is a limit on the maximum number of database links that a session can concurrently open (OPEN_LINKS) and that the whole database instance can concurrently open (OPEN_LINKS_PER_INSTANCE). They both default is 4, which gives an indication of how Oracle expect you to use this feature. This system ran out of database links. No Oracle errors were generated, instead sessions wait to be allowed to make the connection to the remote database. There are specific database link wait events that report the amount of time lost.  Eventually you reach the point where Tuxedo services timeout and then users receive errors. I think this is an example of the wrong way to use a database link. Instead I think that the data should have been replicated from HR to Financials. In a modern PeopleSoft system this should be done with an application message (on the 7.0 system a Materialised View could have been used). 
  • Not only were database links used to provide data to components, but the same views, that referenced remote objects, were used in queries and reports resulting in complex distributed queries. When multiple remote objects are referenced in a single SQL, I have seen Oracle decide to copy the whole of one or more of these objects to the local temporary tablespace. This does not result in good performance. Remember also, that even query only operations via a database link create a transaction on the remains until a commit or rollback is issued.
  • I worked on another site where Materialised Views were used to incrementally replicate data from CRM and HR to EPM databases. Processes in the EPM database then referenced the Materialised Views. In this system, there was never any question of EPM processes referencing objects via the links. The data volumes were such that Application Messaging would never have coped. In this case only the Materialised View refresh process references the database links, and so the DBA can manage the usage of links. This is a system where (after upgrade to Oracle 10g) Streams could also have been used to replicate the data.

Conclusion

My view is that database links should not be used to directly support functionality in either on-line components or reporting. When this is done in reports and queries it presents the problem of tuning distributed queries, and having to decide whether local or remote database should drive the query. Instead data should be replicated to the local database, preferably by PeopleSoft messaging, but if necessary by Oracle replication. However, I think that it can be reasonable to use a database link in an occasional batch process that moves data between systems.

Multi-column joins

Consider the following scenario with four tables. Two of them represent master data, the third one uses a concatenated primary key consisting of foreign keys to the first two, and the fourth one has a foreign key to the third one.

drop table t1 cascade constraints purge;

drop table t2 cascade constraints purge;

drop table t3 cascade constraints purge;

drop table t4 cascade constraints purge;

create table t1 (
t1_id integer not null constraint pk_t1 primary key,
filler1 varchar2(40),
filler2 varchar2(40)
);

create table t2 (
t2_id integer not null constraint pk_t2 primary key,
filler1 varchar2(40),
filler2 varchar2(40)
);

create table t3 (
t1_id integer not null,
t2_id integer not null,
filler1 varchar2(40),
filler2 varchar2(40),
constraint pk_t3 primary key (t1_id, t2_id) using index (
create index pk_t3 on t3 (t1_id, t2_id)
),
constraint fk_t3_1 foreign key (t1_id) references t1 (t1_id),
constraint fk_t3_2 foreign key (t2_id) references t2 (t2_id));

create table t4 (
t4_id integer not null constraint pk_t4 primary key,
t1_id integer,
t2_id integer,
filler1 varchar2(40),
filler2 varchar2(40),
constraint t4_fk_1 foreign key (t1_id, t2_id) references t3 (t1_id, t2_id)
);

Notice that the primary key of "t3" is using a non-unique index, which is supported and can be used e.g. for deferrable constraints or when loading data into tables that might be non-unique so that the constraint can be disabled without dropping the (unique) index. This allows to simply re-enable the constraint after cleaning up the non-unique rows instead of re-creating an unique index (and the risk of losing the index if anything goes wrong).

Now when using an uncorrelated data set for the concatenated keys, Oracle's default (join) selectivity formulas apply and the estimated cardinalities are correct. Table "t1" has 10,000 rows, "t2" 3 rows, table "t3" holds 30,000 rows combining "t1" and "t2" data. "t4" has 300,000 rows.

-- non-correlated column values
exec dbms_stats.set_table_stats(null, 't1', numrows=>10000, numblks=>100, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't2', numrows=>3, numblks=>1, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't3', numrows=>30000, numblks=>300, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't4', numrows=>300000, numblks=>3000, avgrlen=>100)

exec dbms_stats.set_column_stats(null, 't1', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't2', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't4_id', distcnt=>300000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_index_stats(null, 'pk_t1', numdist=>10000, clstfct=>10000, indlevel=>2, numlblks=>10, numrows=>10000)

exec dbms_stats.set_index_stats(null, 'pk_t2', numdist=>3, clstfct=>3, indlevel=>1, numlblks=>1, numrows=>3)

exec dbms_stats.set_index_stats(null, 'pk_t3', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>30000)

exec dbms_stats.set_index_stats(null, 'pk_t4', numdist=>300000, clstfct=>300000, indlevel=>2, numlblks=>300, numrows=>30000)

Joining t4 to t3 results in a correct estimate of 300K rows:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1456 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 300K| 14M| 1120K| 1456 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

Things look however different if we have the awkward situation of correlated column values for the concatenated keys:

-- correlated column values
exec dbms_stats.set_table_stats(null, 't1', numrows=>10000, numblks=>100, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't2', numrows=>20000, numblks=>200, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't3', numrows=>30000, numblks=>300, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't4', numrows=>300000, numblks=>3000, avgrlen=>100)

exec dbms_stats.set_column_stats(null, 't1', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't2', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't4_id', distcnt=>300000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_index_stats(null, 'pk_t1', numdist=>10000, clstfct=>10000, indlevel=>2, numlblks=>10, numrows=>10000)

exec dbms_stats.set_index_stats(null, 'pk_t2', numdist=>20000, clstfct=>20000, indlevel=>2, numlblks=>20, numrows=>20000)

exec dbms_stats.set_index_stats(null, 'pk_t3', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>30000)

exec dbms_stats.set_index_stats(null, 'pk_t4', numdist=>300000, clstfct=>300000, indlevel=>2, numlblks=>300, numrows=>30000)

Here we simulate 20,000 distinct values in one column, 10,000 distinct values in the second one, but only 30,000 distinct values for the combination of both columns. In this case Oracle's default selectivity formula underestimates the cardinality since it is assuming uncorrelated values:

select /*+ opt_param('_optimizer_join_sel_sanity_check', 'false') */
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 45 | 2340 | 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

Oracle simply multiplies the selectivity of the two columns and arrives at a join cardinality of 45 rows (1/20,000*1/10,000*300,000*30,000).

You'll notice that I had to use a undocumented optimizer parameter to arrive at that default selectivity. If you run an EXPLAIN PLAN for the same statement without the hint, you'll get the following estimate:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 30000 | 1523K| 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

It can be seen from a 10053 optimizer trace file that Oracle uses a "Multi-column cardinality sanity check" by default in cases where the calculated multi-column selectivity falls below a certain limit, obviously using the smaller selectivity available from the different 1/num_rows of the tables/row sources involved in the join, arriving at an estimate 30,000 rows in this particular case.

Changing the non-unique index used for the primary key on "t3" to a unique index will bring another sanity check into the picture: The "concatenated index" sanity check that uses the number of distinct values of an unique index that corresponds exactly to the join columns used.

create table t3 (
t1_id integer not null,
t2_id integer not null,
filler1 varchar2(40),
filler2 varchar2(40),
constraint pk_t3 primary key (t1_id, t2_id) using index (
create unique index pk_t3 on t3 (t1_id, t2_id)
),
constraint fk_t3_1 foreign key (t1_id) references t1 (t1_id),
constraint fk_t3_2 foreign key (t2_id) references t2 (t2_id));

With this unique index in place Oracle uses the number of distinct keys from this index to calculate the selectivity of the join and therefore arrives at the correct cardinality again:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 300K| 14M| 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

So this is another case where the uniqueness of an index makes a significant difference for optimizer calculations.

Note that from 11g on there more options to help the optimizer to come up with a better estimate even with the non-unique index on (t3.t1_id, t3.t2_id). Obviously 11g introduced extended statistics on column groups, so we can do the following:

variable ext_name varchar2(30)

exec :ext_name := dbms_stats.create_extended_stats(null, 't3', '(t1_id, t2_id)')

exec dbms_stats.set_column_stats(null, 't3', :ext_name, distcnt=>30000, nullcnt=>0)

This allows to derive the correct selectivity for these correlated column values using the extended statistics set.

Another option in 11g is adding an index on (t4.t1_id, t4.t2_id), like that:

create index ix_t4 on t4 (t1_id, t2_id);

exec dbms_stats.set_index_stats(null, 'ix_t4', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>300000)

Having now two non-unique indexes Oracle 11g comes up again with the correct join cardinality of 300K. Notice that this doesn't work in pre-11g. Pre-11g versions require the index on t3 to be unique to take advantage of the "concatenated index" sanity check.

Having demonstrated all these sanity checks available for multi-column joins (the general multi-column and the concatenated index sanity check), let's see what happens when joining three tables:

select
count(*)
from
t1
, t3
, t4
where
t4.t1_id = t1.t1_id
and t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | | 1468 (4)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 65 | | | |
|* 2 | HASH JOIN | | 300K| 18M| | 1468 (4)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN | PK_T1 | 10000 | 126K| | 4 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 300K| 14M| 1120K| 1455 (3)| 00:00:18 |
| 5 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
-----------------------------------------------------------------------------------------

All I've done is to add "t1", in this case joining to "t4" on "t1_id". Thanks to the concatenated index sanity check (or the extended column group statistics in 11g) the calculated join cardinality is still 300K.

Now what happens if one decides to join "t3" to "t1" on "t1_id" instead of "t4.t1_id"? From a logical point of view this should lead to exactly the same result, since we can deduce that if "t4.t1_id" = "t1.t1_id" and "t3.t1_id = t1.t1_id" then "t3.t1_id = t4.t1_id".

select
count(*)
from
t1
, t3
, t4
where
t4.t1_id = t1.t1_id
and t3.t1_id = t1.t1_id
and t3.t2_id = t4.t2_id;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | | 1475 (4)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 65 | | | |
|* 2 | HASH JOIN | | 45 | 2925 | | 1475 (4)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN | PK_T1 | 10000 | 126K| | 4 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 450K| 22M| 1120K| 1459 (3)| 00:00:18 |
| 5 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
-----------------------------------------------------------------------------------------

The result is astonishing. By making this simple change we have effectively disabled all available sanity checks and arrive at the result based on the the default, uncorrelated selectivity.

So whenever you perform multi-column joins and the column data is correlated, be very careful how you join the tables - it might make a significant difference to the calculations of the optimizer.

UKOUG 2009

I'll be giving two one hour presentations at the upcoming UKOUG conference 2009 at Birmingham (30th November - 2nd December):

1. Monday, 30th November, 16:00-17:00: "CBO fundamentals: Understanding the different modes of System Statistics"

2. Wednesday, 2nd December, 11:05-12:05: "Everything you always wanted to know about FIRST_ROWS_N but were afraid to ask"

Both presentations cover fundamental functionality of the Cost-Based Optimizer which will help you to better understand why and how the optimizer comes up with certain execution plans, and in particular how the underlying cost calculation works. So eventually, if you haven't got an explanation yet I'll show you what the "cost" calculated effectively means and how to appropriately use the different optimization modes (ALL_ROWS, FIRST_ROWS, FIRST_ROWS_n) available.

So if you happen to attend the conference I'm looking forward to meeting you there.

By the way, I recommend visiting this link and save your personalised agenda to help with the room planning.

CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV)

As we have seen in the previous posts of this series, in 11g a new figure named "NewDensity" has been introduced as a replacement for the "density" column statistic for columns whose histogram has been collected; this change has been backported in 10.2.0.4 also.
In the previous post we discussed how NewDensity influences the CBO [...]

Use NULL for unknown data...

Hah, it goes back much further than I thought... Snopes.com pointed out this morning that the use of a 'bad default value' dates back to at least 1979...

See http://tkyte.blogspot.com/2009/10/back-from-oracle-openworld.html for the original reference.

Back from Oracle OpenWorld

I am back and have been taking a set of new questions on asktom. Last week was a busy one out in California and I'm finally getting caught up on emails and questions (100% on the former, still working on the latter)

Anyway, I saw an article and it made me laugh - and sort of cry at the same time. It has to do with the use of default values. A lot of developers/DBAs have a very certain fear (that is the best word I can think of to describe their attitude) of NULL - the 'unknown' value. So, instead of using NULL for an effective_end_date field (for records that we don't know the end date for, they don't have one) or using NULL for values they do not know the value of - they use some 'fake' value. This fake value is assumed to be a value that could never possibly be used.

But....

Things change over time.

I'm sure when the developers implemented this system - using XXXXXXX for a license plate value that was unknown seemed 'reasonable'. I mean - who would ever ask for a vanity plate with seven X's on them?

Talk to the guy with almost $20k in fines that aren't his to see if he might know someone that might want a vanity plate with XXXXXXX on it :)

Do not fear NULL.

Understand it, but don't fear it.

http://tkyte.blogspot.com/2006/01/something-about-nothing.html
http://tkyte.blogspot.com/2006/01/mull-about-null.html

Things that make you go "hmmmmm"

I was reading a blog post entitled How Good is Good?. It was written 8 years ago by a graphic designer by the name of Stefan Sagmeister. The post was speaking about creating designs that were meaningful and make an impact on the world (or the part of the world each design is directed toward). While the post was interesting, it was one of the comments that caught my eye. Most of the commenters were inspired by the post but one commenter, in reference to the idea of making a match between good causes and good design a priority vs the more mundane every day marketing stuff, said:

Simple, don't go into graphic design, Choose something else. Cause if you don't do "that" work, other designers will. Do people really think that everybody enjoys their job?

It was the commenter's question that made me thoughtful: Do people really think that everybody enjoys their job?

I've always felt very fortunate to be able to do work that I really love doing. There may be times when I get frustrated with some of the particulars, but overall I know I'm doing "the thing" I'm supposed to be doing.

My choice of career path came to me in college. I started out as an accounting major but after my first computer-related course, I switched majors and knew that working in the information technology field was for me. So, I got my degree and have worked in the field ever since.

I honestly can't imagine working in a job that I didn't like. So, it was just a bit of an "ah-ha" moment to consider the idea that not everyone enjoys their job. What would that be like? What would it be like to wake up every morning and go do something that you don't enjoy for 8 or more hours? That would seem like torture to me.

But, I suppose it may be more likely that more people don't enjoy their jobs than do. Have you ever seen the TV show "Dirty Jobs"? Some of those jobs are (in my opinion) just horrid. I can't imagine having to do some of those tasks every day. But, there are a lot of jobs out there that I can't imagine doing that many people love.

However, I think that it may be more about "doing what you gotta do" many times vs actually choosing a job you want/enjoy/love. I just find it sad, and somewhat humbling, to think about all the people who do things they don't enjoy just to make enough money to maintain a life for themselves and their families. To think of spending such a large portion of my life doing something I found no enjoyment in really makes my heart ache.

Are you doing what you really enjoy?

OpenWorld 2009 Recap

Having now had several days to recover from, and reflect on, my trip to beautiful San Francisco for Oracle OpenWorld 2009, I wanted to follow up with a blog post describing my experience. First, before anything else, I went to In-N-Out Burger where I had my usual Double-Double with fries animal style; oh, how I missed thee. Next, with the […]