Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Classic MetaLink vs. My Oracle Support

If you are in any way involved with supporting Oracle products, then you know that the death knell for Classic MetaLink has sounded. MetaLink will be unplugged at the end of July 2009.

The new support site, My Oracle Support, seems to be causing some pain for quite a few people in the Oracle user community.
Some of the complaints regard limited platform support due to the Adobe Flash 9 requirements, navigation and response times.
On the other hand there are some cool new features such as Power View, Configuration Manager and the new Advanced Search options.
How do you feel about it?
Here's a chance to let your opinion be know as a poll has been created where you can vote on it.

At this time 637 voters have voiced their opinion about MetaLink and My Oracle Support.

Current Poll results can be found in this Excel File: MetaLink vs My Oracle Support Results

Oracle Terminated Connection Timeout

I have recently come across situations on two different PeopleSoft sites where ad-hoc queries continue to run on the Oracle database server long after the Application Server process, which is the Oracle session client, has terminated. Often, queries perform poorly because they are poorly coded, but that is another story. To help guard against this situation Oracle has mechanism called Terminated Connection Timeout (also known as Dead Connection Detection (DCD) when it was introduced in Net8).

Oracle Support Note 615782.1 explains the mechanism. "DCD is initiated on the server when a connection is established. At this time SQL*Net reads the SQL*Net parameter files and sets a timer to generate an alarm. The timer interval is set by providing a non-zero value in minutes for the SQLNET.EXPIRE_TIME parameter in the sqlnet.ora file on the database server side. When the timer expires, SQL*Net on the server sends a 'probe' packet, essentially an empty SQL*Net packet, to the client. If the client end of the connection is still active, the probe is discarded, and the timer mechanism is reset. If the client has terminated abnormally, the server will receive an error from the send call issued for the probe, and SQL*Net on the server will signal the operating system to release the connection's resources."

Thus, if a PeopleSoft operator initiates an ad-hoc query that runs on the PSQRYSRV server for longer than the ICQuery service time-out (default 1200 seconds), then Tuxedo will terminate and restart the busy server process. However, the query will continue run on the database server until the current fetch operation returns. In the case of a query performing a large sort or hash operation, it might be a long time before the first row is returned. All the while, the query is continuing to consume resources on the database.

PeopleTools 8.44, also introduced the ability to kill a query via the Query Monitor that had reached a maximum run time. This is one of the functions of the PSMONITORSRV server process (see Oracle Support Note 624339.1. The maximum run time is specified in a permission list (see Security Administration PeopleBook Permission List Query Profile) and then the ability to kill queries that have timed out can be enabled or disabled system-wide (see PeopleSoft Query PeopleBook Query Administration. It will kill the application server process that submitted the query, but for the same reasons, the query may continue to run on the database.

Thus, setting Terminated Connection Timeout is not merely a good idea for a PeopleSoft system running on Oracle, it is effectively mandatory. Otherwise. some PeopleSoft functionality simply won't work as intended.

What is an appropriate value for SQLNET.EXPIRE_TIME?

The value for this parameter is the time between successive SQL*Net probes sent by the Oracle shadow server process to the client. Setting it is a balance between the maximum time that a query can be left to consume resources after a client process terminates, against the additional overhead of every client process sending a probe every few minutes.

The SQL*Net documents often talk about additional network traffic generated by DCD. This was a consideration in the past on client-server applications that ran across a wide area network. However, it is rarely a consideration in relatively modern systems such as PeopleTools 8, the database connections are made by Application Server and Process Scheduler, which are usually physically close to the database server.

The time-out can be set independently of any of the other time-outs for the Application Server and Web Server. Documents on Metalink often suggest 5 or 10 minutes, and I don't think that is unreasonable.

My thanks to Colin Kilpatrick who prompted me to look at this again.

Oracle Terminated Connection Timeout

I have recently come across situations on two different PeopleSoft sites where ad-hoc queries continue to run on the Oracle database server long after the Application Server process, which is the Oracle session client, has terminated. Often, queries perform poorly because they are poorly coded, but that is another story. To help guard against this situation Oracle has mechanism called Terminated Connection Timeout (also known as Dead Connection Detection (DCD) when it was introduced in Net8).

Oracle Support Note 615782.1 explains the mechanism. "DCD is initiated on the server when a connection is established. At this time SQL*Net reads the SQL*Net parameter files and sets a timer to generate an alarm. The timer interval is set by providing a non-zero value in minutes for the SQLNET.EXPIRE_TIME parameter in the sqlnet.ora file on the database server side. When the timer expires, SQL*Net on the server sends a 'probe' packet, essentially an empty SQL*Net packet, to the client. If the client end of the connection is still active, the probe is discarded, and the timer mechanism is reset. If the client has terminated abnormally, the server will receive an error from the send call issued for the probe, and SQL*Net on the server will signal the operating system to release the connection's resources."

Thus, if a PeopleSoft operator initiates an ad-hoc query that runs on the PSQRYSRV server for longer than the ICQuery service time-out (default 1200 seconds), then Tuxedo will terminate and restart the busy server process. However, the query will continue run on the database server until the current fetch operation returns. In the case of a query performing a large sort or hash operation, it might be a long time before the first row is returned. All the while, the query is continuing to consume resources on the database.

PeopleTools 8.44, also introduced the ability to kill a query via the Query Monitor that had reached a maximum run time. This is one of the functions of the PSMONITORSRV server process (see Oracle Support Note 624339.1. The maximum run time is specified in a permission list (see Security Administration PeopleBook Permission List Query Profile) and then the ability to kill queries that have timed out can be enabled or disabled system-wide (see PeopleSoft Query PeopleBook Query Administration. It will kill the application server process that submitted the query, but for the same reasons, the query may continue to run on the database.

Thus, setting Terminated Connection Timeout is not merely a good idea for a PeopleSoft system running on Oracle, it is effectively mandatory. Otherwise. some PeopleSoft functionality simply won't work as intended.

What is an appropriate value for SQLNET.EXPIRE_TIME?

The value for this parameter is the time between successive SQL*Net probes sent by the Oracle shadow server process to the client. Setting it is a balance between the maximum time that a query can be left to consume resources after a client process terminates, against the additional overhead of every client process sending a probe every few minutes.

The SQL*Net documents often talk about additional network traffic generated by DCD. This was a consideration in the past on client-server applications that ran across a wide area network. However, it is rarely a consideration in relatively modern systems such as PeopleTools 8, the database connections are made by Application Server and Process Scheduler, which are usually physically close to the database server.

The time-out can be set independently of any of the other time-outs for the Application Server and Web Server. Documents on Metalink often suggest 5 or 10 minutes, and I don't think that is unreasonable.

My thanks to Colin Kilpatrick who prompted me to look at this again.

Merge counter

A package to capture and report separate INSERT and UPDATE rowcounts from a MERGE statement. February 2004

my week in venn diagram form ...

(from thisisindexed.com)My take on this one has nothing to do with Bernie and everything to do with recent events on the big project. If I could add a small red 'x' in the middle of 'Doom' with an arrow and a 'You are Here' message, it would be perfect ...

SIOUG 2009 conference

I'll be presenting at the SIOUG 2009 conference that will take place in Portoroz, from 20th to 23rd September 2009.

The title of the presentation will be "CBO fundamentals: Understanding the different modes of System Statistics" and will be based on the series already available here on my blog.

So if you're interested in the details of the cost calculations performed by the recent releases of the Cost Based Optimizer and would like to here about this in a different style, this presentation might be something for you.

If you happen to be at the conference I'm looking forward to meeting you there.

Thanks to OakTable fellow Joze Senegacnik for inviting me to the conference.

Fundamentals of Software Performance Quick Reference Card

I just posted "Fundamentals of Software Performance Quick Reference Card" at the Method R company website:

This two-page quick reference card written by Cary Millsap sums up computer software performance the Method R way. The first page lists definitions of the terms you need to know: efficiency, knee, load, response time, and so on. The second page lists ten principles that are vital to your ability to think clearly about software performance. This document contains meaningful insight in a format that's compact enough to hang on your wall.

It's free, and there's no sign-up required. I hope you will enjoy it.

Oracle 10g Statistics History Retention in PeopleSoft

I have been working on a system where many Application Engine programs are running throughout the day, and are frequently collecting Optimizer statistics with the %UpdateStats macro on many working storage tables. Concurrent calls to dbms_stats are typical.

There are two new behaviours in Oracle 10g RDBMS that can in extreme cases, in combination with a system that calls dbms_stats very frequently, create a significant performance overhead.

From Oracle 10g, histograms may, by default, be collected automatically. That means that rows are concurrently deleted from and inserted into histgrm$ and hist_head$, leading to contention and consistent read.

  • Also from Oracle 10g, every time you collect statistics on a table the old statistics are retained in the SYS.WRI$_OPTSTAT%HISTORY tables. If histograms have previously been collected, these are also copied. DBMS_STATS has the additional overhead of writing this history. I found in excess of 10,000 versions of previous statistics for some tables, because the batch processes have updated statistics on working storage tables that many times.
  • dbms_stats also appears to be responsible for purging history older than the retention limit. The default retention period is 31 days. I have seen concurrent calls to dbms_stats blocked on row level locks on the statistics history tables. For me, this occurred 31 days after the system went live on a significantly increased volume.
  • #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    SELECT dbms_stats.get_stats_history_retention FROM dual;
    GET_STATS_HISTORY_RETENTION 
    --------------------------- 
                             31

Statistics history was designed to work in conjunction with schema wide statistics jobs that only refreshed stale statistics. There is an option on gather_schema_stats to collect only statistics on tables where the current statistics are stale. However, there is no such option on gather_table_stats. If you have decided to call this procedure for a particular table, then it is assumed you know you need to refresh the statistics. However, by calling dbms_stats from a batch program you can end up calling it much more frequently than is really necessary.

Recommendations

  • Disable statistics history by using dbms_stats.alter_stats_history_retention to set the retention period to zero. Unfortunately this can only be set at database level. The statistics history is there in case you want to revert to a previous version of the statistics should a new set of statistics produce a problem, but it is only used rarely, and I think this is a necessary sacrifice.
  • #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    EXECUTE dbms_stats.alter_stats_history_retention(retention=>0);
  • Use Oracle Optimizer Dynamic Sampling. However, I suggest increasing the level from the default of 2 to 4 to increase the situations in which it is used.
  • Introduce the new version of the PL/SQL wrapper package for dbms_stats so that you can specify the records for which statistics will be explicitly collected, and whether histograms are to be collect. Thus you can reduce the number of calls to dbms_stats.

If you have allowed the statistics history to grow before you disable history retention, then you might like to read John Hallas' posting on Purging statistics from the SYSAUX tablespace.

Oracle 10g Statistics History Retention in PeopleSoft

I have been working on a system where many Application Engine programs are running throughout the day, and are frequently collecting Optimizer statistics with the %UpdateStats macro on many working storage tables. Concurrent calls to dbms_stats are typical.

There are two new behaviours in Oracle 10g RDBMS that can in extreme cases, in combination with a system that calls dbms_stats very frequently, create a significant performance overhead.

From Oracle 10g, histograms may, by default, be collected automatically. That means that rows are concurrently deleted from and inserted into histgrm$ and hist_head$, leading to contention and consistent read.

  • Also from Oracle 10g, every time you collect statistics on a table the old statistics are retained in the SYS.WRI$_OPTSTAT%HISTORY tables. If histograms have previously been collected, these are also copied. DBMS_STATS has the additional overhead of writing this history. I found in excess of 10,000 versions of previous statistics for some tables, because the batch processes have updated statistics on working storage tables that many times.
  • dbms_stats also appears to be responsible for purging history older than the retention limit. The default retention period is 31 days. I have seen concurrent calls to dbms_stats blocked on row level locks on the statistics history tables. For me, this occurred 31 days after the system went live on a significantly increased volume.
  • #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    SELECT dbms_stats.get_stats_history_retention FROM dual;
    GET_STATS_HISTORY_RETENTION 
    --------------------------- 
                             31

Statistics history was designed to work in conjunction with schema wide statistics jobs that only refreshed stale statistics. There is an option on gather_schema_stats to collect only statistics on tables where the current statistics are stale. However, there is no such option on gather_table_stats. If you have decided to call this procedure for a particular table, then it is assumed you know you need to refresh the statistics. However, by calling dbms_stats from a batch program you can end up calling it much more frequently than is really necessary.

Recommendations

  • Disable statistics history by using dbms_stats.alter_stats_history_retention to set the retention period to zero. Unfortunately this can only be set at database level. The statistics history is there in case you want to revert to a previous version of the statistics should a new set of statistics produce a problem, but it is only used rarely, and I think this is a necessary sacrifice.
  • #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    EXECUTE dbms_stats.alter_stats_history_retention(retention=>0);
  • Use Oracle Optimizer Dynamic Sampling. However, I suggest increasing the level from the default of 2 to 4 to increase the situations in which it is used.
  • Introduce the new version of the PL/SQL wrapper package for dbms_stats so that you can specify the records for which statistics will be explicitly collected, and whether histograms are to be collect. Thus you can reduce the number of calls to dbms_stats.

If you have allowed the statistics history to grow before you disable history retention, then you might like to read John Hallas' posting on Purging statistics from the SYSAUX tablespace.

Dynamic sampling and set current_schema anomaly

Sometimes when I'm asked to check a particular SQL in a production like environment at first I only get rather limited access, e.g. a read-only user that has the required privileges to perform the SQL statement, but doesn't own the objects.

If I'm now supposed to analyse a rather complex SQL with dozens of table and/or view references I like to use the ALTER SESSION SET CURRENT_SCHEMA command to switch any unqualified object references to the schema set via CURRENT_SCHEMA rather than editing the SQL and add all the object references.

This basically works just fine, but I recently have come across an interesting anomaly in 10.2.0.4: If the optimizer uses dynamic sampling as part of the parsing then it doesn't do so anymore if the CURRENT_SCHEMA has been altered to a non-default schema.

The following simple test case demonstrates the issue in 10.2.0.4 Win32:

SQL>
SQL> drop user dynsamp_test cascade;
drop user dynsamp_test cascade
*
ERROR at line 1:
ORA-01918: user 'DYNSAMP_TEST' does not exist

SQL>
SQL> create user dynsamp_test identified by dynsamp_test;

User created.

SQL>
SQL> grant create session, alter session to dynsamp_test;

Grant succeeded.

SQL>
SQL> drop table dynsamp_test_table purge;
drop table dynsamp_test_table purge
*
ERROR at line 1:
ORA-00942: table or view does not exist

SQL>
SQL> create table dynsamp_test_table
2 as
3 select
4 *
5 from
6 all_objects
7 where
8 rownum <= 1000;

Table created.

SQL>
SQL> grant select on dynsamp_test_table to dynsamp_test;

Grant succeeded.

SQL>
SQL> -- this is the plan we get as schema owner
SQL> explain plan for
2 select /*+ dynamic_sampling(t 4) */
3 *
4 from
5 dynsamp_test_table
6 where
7 owner = 'NOT_A_VALID_OWNER_SINCE_TOO_LONG';

Explained.

SQL>
SQL> -- It uses obviously dynamic sampling as instructed
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1573551846

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 128 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| DYNSAMP_TEST_TABLE | 1 | 128 | 5 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='NOT_A_VALID_OWNER_SINCE_TOO_LONG')

Note
-----
- dynamic sampling used for this statement

17 rows selected.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> -- We get the same from different schema with qualified reference (or valid synonym)
SQL> connect dynsamp_test/dynsamp_test@orcl
Connect durchgeführt.
SQL>
SQL> alter session set nls_language = 'AMERICAN';

Session altered.

SQL>
SQL> explain plan for
2 select /*+ dynamic_sampling(t 4) */
3 *
4 from
5 cbo_test.dynsamp_test_table
6 where
7 owner = 'NOT_A_VALID_OWNER_SINCE_TOO_LONG';

Explained.

SQL>
SQL> -- Still uses dynamic sampling
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1573551846

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 128 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| DYNSAMP_TEST_TABLE | 1 | 128 | 5 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='NOT_A_VALID_OWNER_SINCE_TOO_LONG')

Note
-----
- dynamic sampling used for this statement

17 rows selected.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> -- Now switch the current_schema
SQL> alter session set current_schema = cbo_test;

Session altered.

SQL>
SQL> explain plan for
2 select /*+ dynamic_sampling(t 4) */
3 *
4 from
5 dynsamp_test_table
6 where
7 owner = 'NOT_A_VALID_OWNER_SINCE_TOO_LONG';

Explained.

SQL>
SQL> -- No more dynamic sampling!
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1573551846

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 1280 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| DYNSAMP_TEST_TABLE | 10 | 1280 | 5 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='NOT_A_VALID_OWNER_SINCE_TOO_LONG')

13 rows selected.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> -- Check the 10053 trace file
SQL> -- It will show that the dynamic samping query fails
SQL> -- and therefore no dynamic sampling will be performed
SQL> -- presumably because the current_schema is not obeyed
SQL> -- by the recursively executed query
SQL> alter session set tracefile_identifier = 'dynamic_sampling_current_schema_10053';

Session altered.

SQL>
SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL>
SQL> explain plan for
2 select /*+ dynamic_sampling(t 4) */
3 *
4 from
5 dynsamp_test_table
6 where
7 owner = 'NOT_A_VALID_OWNER_SINCE_TOO_LONG';

Explained.

SQL>
SQL> alter session set events '10053 trace name context off';

Session altered.

SQL> -- Check the 10046 trace file
SQL> -- which confirms a ORA-00942 while parsing the dynamic sampling query
SQL> alter session set tracefile_identifier = 'dynamic_sampling_current_schema_10046';

Session altered.

SQL>
SQL> alter session set sql_trace = true;

Session altered.

SQL>
SQL> explain plan for
2 select /*+ dynamic_sampling(t 4) */
3 *
4 from
5 dynsamp_test_table
6 where
7 owner = 'NOT_A_VALID_OWNER_SINCE_TOO_LONG';

Explained.

SQL>
SQL> alter session set sql_trace = false;

Session altered.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> spool off

So as you can see the simply query uses dynamic sampling as instructed (and in 10.2 would use dynamic sampling by default anyway since the table has no statistics), but when using the CURRENT_SCHEMA trick and then an unqualified reference, the dynamic sampling is not performed and the estimate is based on hardcoded defaults.

Checking the 10053 optimizer trace file confirms that the dynamic sampling is attempted, but the recursive query fails, and the 10046 SQL trace shows that an "ORA-00942 table or view does not exist" error is encountered while parsing the recursive dynamic sampling query.

So be careful when using this (rather unusual I have to admit) setup, the execution plans might be different if you're using a non-default CURRENT_SCHEMA and dynamic sampling is involved.

The issue seems to be fixed in 11.1.0.7 (dynamic sampling is performed as expected when using non-default CURRENT_SCHEMA), however I couldn't identify a corresponding bug on Metalink. If anyone comes across a bug number or document ID this would be helpful.