Search

Top 60 Oracle Blogs

Recent comments

11.2.0.2

Parallel DML - Conventional (non-direct-path) Inserts As Select

In a recent discussion I've mentioned that I thought to remember that the DML part of conventional load as select inserts will always be executed serially, even with parallel DML enabled and requesting parallel DML execution. It's important to understand in this context that this is not the same as the parallel query execution of the SELECT part, which is possible independently from the parallel DML part.

After that discussion I realized that it was quite some time ago that I tested this scenario, probably it was back then with some 10.2 version.

So I quickly put together a small test case that I ran on 11g versions and the results were quite surprising which motivated me to take a closer look.

Concurrent Index Creation

When I read the recent post by the optimizer group about the new concurrent gather stats feature added in 11.2.0.2 it reminded me of the fact that I intended to publish something based on the same idea already some time ago.

The Problem

It was motivated by a client's regular need during a transition phase from non-Exadata to Exadata to create literally thousands of indexes with potentially a multitude of (sub-)partitions as fast as possible - as part of a full datapump import job of a multi-terabyte database running 11.1.0.7 and 11.2.0.1 (Exadata V2).

There are actually two issues regarding the index creation part of a large database import:

1. The datapump import performs the index creation only by a single worker thread even when using the PARALLEL worker thread import feature. Although an index could be created in parallel if you have thousands of smaller index objects this single worker thread potentially does not make efficient use of the available hardware resources with high-end configurations, including and in particular Exadata.

2. There is a nasty bug 8604502 that has been introduced with 11.1.0.7 that affects also 11.2.0.1 (fixed in 11.2.0.2 and a generic one-off patch is available on My Oracle Support for 11.1.0.7 and 11.2.0.1): The IMPDP creates all indexes serially, even those supposed to be created in parallel, and only after the creation ALTERs them to the defined PARALLEL degree. Note that the fix actually only fixes the problem at actual execution time, even with the fix installed (and in 11.2.0.2) the SQLFILE option of IMPDP still generates CREATE INDEX DDLs that will always have the parallel degree set to PARALLEL 1 (see MOS document 1289032.1 and bug 10408313 - INDEXES ARE CREATED WITH PARALLEL DEGREE 1 DURING IMPORT which has been closed as not being a bug). This "not-being-a-bug" also affects all other versions that support the datapump utility - the SQLFILE option always generates CREATE INDEX scripts with the parallel degree set to 1 no matter what the actual degree of the index is supposed to be. It's only the ALTER INDEX DDL command following the CREATE INDEX command that sets the parallel degree correctly.

These two issues in combination meant to them that a full database import job took ages to complete the index creation step after loading quite quickly the vast amount of table data in parallel.

In case of partitioned indexes there is another complication independently from the mentioned issues: Oracle uses only one parallel slave per partition for creation - in case of large and/or few partitions this again doesn't make efficient use of the available resources.

Oracle therefore provides several means to speed up index creation and rebuild tasks, in particular the documented DBMS_PCLXUTIL package that is around since the Oracle 8 days to overcome the above mentioned limitation of partitioned index creation by spawning multiple jobs each rebuilding an index partition in parallel.

Another, undocumented feature is the DBMS_INDEX_UTL package that is obviously used internally as part of several maintenance operations, for example those DDLs that include the "UPDATE INDEXES" clause. According to the spec it allows to rebuild multiple indexes concurrently by spawning multiple jobs - however since it is undocumented it might not be safe to use in production-like configurations - furthermore it might be changed in future releases without further notice and therefore is potentially unreliable.

A Solution

Since the client wanted a quick solution that ideally addressed all of the above issues I came up with a simple implementation that uses Advanced Queueing and background jobs to create as many indexes as desired concurrently.

The solution is targeted towards the client's scenario, so the following is assumed:

- There is a SQL file that contains the CREATE INDEX statements. This can easily be generated via IMPDP based on the dump files using the SQLFILE option.

- To address the CREATE INDEX (not-being-a-)bug (the bugfix for the bug 8604502 still generates incorrect CREATE INDEX DDLs with the SQLFILE option of IMPDP as mentioned above) I've created a combination of "sed" and "awk" unix scripts that take the IMPDP SQLFILE potentially including all DDLs commands as input and create a output file that consists solely of the CREATE INDEX commands with correct PARALLEL clauses based on the ALTER INDEX command following the CREATE INDEX in the script

- To address the lengthy index creation process I've created a small PL/SQL package that sets up the required AQ infrastructure, takes the CREATE INDEX DDL file as input, populates a queue with the index creation commands and spawns as many worker threads as specified that will take care of the actual index creation (that in turn might be a parallel index creation)

As a side note it is interesting that Oracle actually allows to build several indexes concurrently on the same segment (which makes totally sense but does probably not happen too often in practice).

Note that in principle this code could be used as a general template to execute arbitrary DDLs concurrently (of course with corresponding modifications).

The following link allows to download an archive that contains the following subdirectories:

- correct_parallel_clause: This directory contains the Unix scripts mentioned above that allow to process a SQLFILE generated by IMPDP and output a DDL file that solely consists of the CREATE INDEX commands contained in the SQLFILE. The generated CREATE INDEX statements also use a correct PARALLEL clause - the degree is taken from the ALTER INDEX DDL command following the CREATE INDEX in the SQLFILE. For further details refer to the README.txt in that directory. Note that the script at present does not handle Domain Indexes, only conventional and bitmap.

- source: Contains the package source for the concurrent index creation, furthermore a package that is required by the provided automated unit testing (see below for more details) and a script that prompts for the required details to initiate a concurrent index creation. The README.txt in that directory provides a quick start guide how to use the concurrent index creation.

- test: Contains two flavours of test harnesses for automated unit testing of the package. One based on the unit testing feature implemented in SQLDeveloper 2.1.1, and another one based on "dbunit", an open-source unit testing framework based on jUnit. The README.txt in the respective subdirectories explain how to use these unit tests.

How to use it

The usage is split into two parts: The first part deals with preparing a suitable text file that consists of the CREATE INDEX commands, the second part is about processing this text file with as many worker threads as desired.

Preparing the file is straightforward: You can use the "transform_all_sql.sh" script to generate the required CREATE INDEX script from a DDL script created via IMPDP SQLFILE.

The script has been tested primarily with bash, sed and awk under Cygwin 1.7.1 and OEL5, different Unix flavors might have different versions of the shell, awk or sed and therefore might behave differently.

Simply put all four Unix scripts in the "correct_parallel_clause" directory into the same directory, mark them as executable and run the "transform_all_sql.sh" like that:

./transform_all_sql.sh < input_file > output_file

where "input_file" is the file generated via IMPDP SQLFILE option and "output_file" will be the result.

In order to perform the parallel index creation, you need an account that has suitable privileges granted. Since it is assumed that the indexes will have to be created in different schemas this account will have to have extended privileges granted. The package is implemented using invoker's rights so granting these privileges via roles is sufficient. A quick and dirty solution could be creating a temporary account and granting simply the DBA role to it (this is what I used to do to test it). Note that the account also requires EXECUTE privileges on the DBMS_AQ and DBMS_AQADM packages for the AQ stuff. It also needs a simple logging table where errors and progress will be written to as well as a type that is used as payload of the queue. Obviously the account also needs to be able to create jobs - in this version of the package this is done via DBMS_SCHEDULER. At execution time the package is going to create a queue plus queue table that also needs to be stored in a tablespace - so you should make sure that the account (or at least the database) that executes the index creation has an appropriate default tablespace defined.

You can simply run the "pk_create_index_concurrent.sql" script (located in the "source" directory) in such a suitable account which will deinstall/install all required objects.

The execution of the index creation is then straightforward (taken from the package specification):

/**
* The main entry point to create indexes via parallel threads / AQ
* @param p_directory_name The directory where the file resides that contains the CREATE INDEX DDLs
* @param p_file_name The file name in the directory above
* @param p_parallel_degree_set_1 The number threads to start for the worker thread 1 which usually
represents the SERIAL_INDEX threads - G_AUTO_PARALLEL_DEGREE means use the CPU_COUNT and
CLUSTER_DATABASE_INSTANCES parameter to determine number of threads automatically
* @param p_parallel_degree_set_2 The number threads to start for the worker thread 2 which usually
represents the PARALLEL_INDEX threads - G_AUTO_PARALLEL_DEGREE means get the CPU_COUNT and
CLUSTER_DATABASE_INSTANCES parameter to determine number of threads automatically,
however 1 is the default here since we assume that these indexes use parallel DDL
* @param p_job_submit_delay The number of seconds each job will be delayed to allow Oracle
proper load balancing in a cluster, default 30 seconds (commented out at present due to
odd locking issues on the queue table in RAC environments)
* @param p_sleep_seconds The number of seconds to wait for the threads to startup
before attempting to teardown the AQ infrastructure again
* @param p_optional_init Optionally a SQL can be passed usually used to initialize the session
for example forcing a particular parallel degree
* @param p_worker_set_id_1
The character identifier used to identify the indexes to process by the first worker thread set
Default value is "SERIAL_INDEX"
* @param p_worker_set_id_2
The character identifier used to identify the indexes to process by the second worker thread set
Default value is "PARALLEL_INDEX"
**/
procedure create_index_concurrent(
p_directory_name in varchar2
, p_file_name in varchar2
, p_parallel_degree_set_1 in integer default G_AUTO_PARALLEL_DEGREE
, p_parallel_degree_set_2 in integer default 1
, p_job_submit_delay in integer default 30
, p_sleep_seconds in integer default 10
, p_optional_init in varchar2 default null
, p_worker_set_id_1 in varchar2 default G_WORKER_SET_ID_1
, p_worker_set_id_2 in varchar2 default G_WORKER_SET_ID_2
);

Note that the "p_job_submit_delay" parameter is currently not used - there were some odd locking issues on the AQ table in case of a RAC environment when using that option so I have commented out its usage at present - I haven't had a chance yet to investigate further what the problem actually was.

So the only required input to the CREATE_INDEX_CONCURRENT procedure is the name of the directory object that points to the directory where the file to process resides and the name of the file itself.

You probably want to specify the number of worker threads for the two sets: The idea here is to distinguish between the creation of serial and parallel indexes. The first parameter specifies the number of worker threads used for serial indexes, the second one the number of concurrent threads for parallel indexes.

The default is CPU_COUNT * INSTANCES threads for serial indexes and a single thread for parallel indexes.

If you don't want/need this separation of serial and parallel indexes simple use the same "worker_set_id" for both parameters "p_worker_set_id_1" and "p_worker_set_id_2" and specify the desired total parallel degree in one of the degree parameters and set the other one to 0 (the 0 is required otherwise one of the DBMS_SCHEDULER.CREATE_JOB calls will fail with a "duplicate job name/job name already exists").

The "p_sleep_seconds" parameter is only used to allow the jobs spawned to put a lock on the queue table - the teardown is then going to wait until all locks have been removed and therefore all queue processing has ended. The default of 10 seconds was sufficient in all cases I've encountered.

Since the package requires as prerequisite a directory where the file to process resides, I've prepared the script "create_index_concurrent.sql" that guides through the required inputs and takes care of that step as well.

It takes the full O/S path to the file and the file name as input, creates a directory CREATE_INDEX_CONCURRENT_DIR pointing to that directory and prompts then for the two degrees as input and the names of the two worker thread sets before calling the CREATE_INDEX_CONCURRENT stored procedure.

Caveats

Please note that you should double-check not to pass a non-transformed SQLFILE generated via IMPDP to the procedure - the results may be dire since the generated SQLFILE always contains much more than the bare CREATE INDEX commands, no matter what options you use for IMPDP. Always use the provided Unix scripts to post-process the SQLFILE before initiating the index creation.

Furthermore you need to be aware of the current limitation of the package that it does not attempt to tokenize the file contents. It simply uses a semicolon as delimiter to separate the DDL commands. This should be sufficient for most cases, but in case you have a function-based index using a string expression containing a semicolon as part of the index definition this will not work as expected. Also if you plan to use this package for other DDL execution activities like CTAS statements you might again hit this limitation if the DDL text contains semicolons.

Note that creating indexes using this tool results potentially in different index statistics than creating the indexes using IMPDP since IMPDP by default also imports the index statistics whereas the indexes created using this tool will end up with the current index statistics automatically generated during index creation (from 10g onwards, and the code requires at least 10.2). If you want to have the index statistics imported you can run IMPDP after the index creation using the INCLUDE=INDEX_STATISTICS option. This should complete fairly quickly and will import the index statistics only.

If you have SERVEROUTPUT enabled by default then you will very likely see some errors that will be printed by the initial attempt to tear down the AQ infrastructure. These errors are expected if the previous run was completed successfully or in case of the initial run and can be ignored (and will be catched/ignored by the default implementation).

Note also that all provided scripts except for the Unix shell scripts use DOS file format - under OEL this isn't a problem but it might be on your platform.

Finally the inevitable disclaimer: Although this has been tested thoroughly it comes with absolutely no warranty. Use it at your own risk and test it in your environment before attempting any runs against anything important.

Monitoring the execution

The code logs errors and progress into the table CREATE_INDEX_CONCURRENT_LOG. At present the code logs every attempt to execute DDL into the table as well as any errors that are raised during that DDL execution.

So the table can be used for both, monitoring the progress as well as checking for errors. The code currently continues the execution in case of errors encountered using the dreaded WHEN OTHERS THEN NULL construct, but the code is already prepared for a more granular error handling if required - see the defined exceptions and commented out exception handler.

You can view the queue contents in the corresponding queue view created by the AQ setup (AQ$CREATE_INDEX_QUEUE) in order to see the data to process. Note that due to the fact that all worker threads do not commit the queue transaction you won't be able to see the progress in the queue table until all worker threads committed. If you don't like that you can remove the wait and "teardown_aq" call at the end of the main procedure "create_index_concurrent" and uncomment the dequeue option "visibility=immediate" in the "create_index_thread" procedure. You would need then to call "teardown_aq" in a separate step as desired. With this modification you can monitor the progress by monitoring the queue, but the provided automated unit testing won't work with that variant since it relies on the main call to wait for all worker threads to complete before validating the results.

However you can see the progress also in the log table using the following sample query:

select
to_char(log_timestamp, 'DD-MON-YYYY HH24:MI:SS.FF') as log_timestamp
, sql_statement
, message
from
create_index_concurrent_log
order by
log_timestamp desc;

If you want to perform more sophisticated queries on the that table you might need to use some casts similar to the following, because the text columns are defined as CLOBs in order to be able to hold the complete DDLs and error messages in case of errors. The casts allow you to perform for example GROUP BYs etc.

select
to_char(log_timestamp, 'DD-MON-YYYY HH24:MI:SS.FF') as log_timestamp
, cast(substr(sql_statement, 1, 30) as varchar2(30)) as index_name
, cast(substr(message, 1, 128) as varchar2(128)) as worker_set_id
from
create_index_concurrent_log
order by
log_timestamp desc;

The Unit Testing

Here we come to a completely different issue that is off-topic for this post, however in my experience so far it seems to be a very important one and I hopefully will have the time to cover it in the future with separate posts.

Generally speaking I've seen to many shops that don't follow best-practice when it comes to database deployment and development, therefore here is what you should know/do about it ideally - in a nutshell:

- Treat your database like source code, which means put everything related to the database under version control. This includes not only the obvious database source code but also DDL and DML scripts for schema evolution
- Use unit testing to test database code. Automate this unit testing
- Automate the deployment of your database related changes
- Install a continuous integration environment that runs the automated deployment and unit tests regularly, for example every night
- Automate deployment everywhere - starting from the development databases up to the production environment
- Follow your guidelines strictly - for example any hotfix-like adhoc change should still go through the established processes - code changes, testing, deployment etc.

I've helped several clients in the past to setup corresponding tools and processes for implementing above - if you are interested, get in touch with me.

So as a bonus, if you haven't spent too much time yet with above mentioned topics, in order to get you started at least with automated unit testing, I've included two different examples for this small source provided, one using the built-in unit test feature of SQLDeveloper and the other one using "dbunit". You can find both in the corresponding subdirectories of the "test" folder in the archive.

The unit testing is based on the "pk_create_index_concur_test.sql" package that is used to setup and teardown the environment for running the unit test. It assumes at present the existence of a directory "C:\app\oracle\admin\orcl112\dpdump" on O/S level. It will create a directory object for the path and attempt to create/write a file used for the unit test runs. You can pass any valid O/S directory path to the "pk_create_index_concur_test.setup" procedure if you want/need to use a different one.

All provided automated tests assume that both scripts, "pk_create_index_concurrent.sql" and "pk_create_index_concur_test.sql" have been run in the schema that should be used for test runs.

You can use the SQLDeveloper Unit Test feature to run the provided Unit Test. You can either use the GUI to import and run the test, or you can use a command line version that is actually using ANT to run the UTUTIL command line tool that comes with SQLDeveloper. You can read and follow the instructions in the "README.txt" in the test/SQLDeveloper directory how to do so. You'll need to setup a unit test repository initially if you want to use SQLDeveloper's unit testing feature either way (GUI or UTUTIL command line). See the SQLDeveloper's user's guide or online help how to do that (Hint: Menu item "Extras->Unit Testing" gets you started).

If you don't like the SQLDeveloper unit test approach or you are simply to lazy to install the tool, the unit test repository etc., you can alternatively try the automated unit testing using "dbunit". Follow the instructions in the "README.txt" in the test/dbunit directory how to run the unit tests using "dbunit".

This version of the package has successfully been tested using these unit tests on 10.2.0.4, 10.2.0.5, 11.1.0.7, 11.2.0.1 and 11.2.0.2 (after all it's dead easy with automated unit testing :-).

Summary

The provided tool set should represent a solid foundation for the given task of concurrent index creation. In particular it has been designed with the following in mind:

- Efficient use of privileges granted via roles: The package uses invoker's rights and most operations use dynamic SQL to avoid compilation issues, therefore granting the required privileges to the account used via roles should be sufficient

- The Unix scripts should be able to deal with table-, schema- and database-level datapump formats from Oracle 10g and 11g (all these variants use slightly different texts to identify the relevant sections of the generated SQLFILE by IMPDP)

- Optional use of two separate worker thread sets: This allows the concurrent creation of a multitude of indexes, be it serial or parallel, with clear distinction between the handling of serial (possibly many worker threads) and parallel indexes (usually only a few worker threads)

- Support for arbitrarily sized SQL: The DDL commands for (sub-)partitioned indexes can become quite large due to the way the Oracle meta data API generates the SQL. Therefore these generated SQLs can easily exceed the usual 32KB limit for PL/SQL character strings. The implementation uses CLOBs for the processed SQLs (and DBMS_SQL in versions lower than 11 to handle these piecewise) to support these potentially very large SQLs

- RAC/Grid/Cluster support via DBMS_SCHEDULER: The usage of DBMS_SCHEDULER allows a fine grained control of the resource consumption by the optional use of job classes (not implemented yet but can easily be added - it is a simple additional parameter to the CREATE_JOB procedure) that allow to specify a resource consumer group and a specific service name for the spawned worker threads

- Automated Unit Testing support: The provided unit test harness allows for easy testing of modifications to the code

Direct NFS (DNFS) Clonedb…

A bit before Christmas I got an email from Kevin Closson asking me to take a look at a new undocumented Direct NFS (DNFS) feature in the 11.2.0.2 patch set. I think he wanted to see what a regular DBA would think of it. What with Christmas and some family issues, I didn’t get too much done. As soon as I hit the first hurdle I kinda caved in and left if for the new year.

Well, January came and went, then I finally got round to looking at it again. I like to think my constant questions and dumb mistakes has helped to prepare Oracle for the sort of thing that will happen when other idiots like me are let loose on it. Anyway, the result of that little journey is documented here.

As I’ve said in the article, things are still in a state of flux and I will no doubt have to do some alterations once the My Oracle Support (MOS) Note 1210656.1 is released that will properly document it.

So as a regular DBA what do I think? I think it is awesome!

Old-style cloning of databases isn’t hard, but it’s boring and can take ages depending on the size of the database and storage being used. It’s one of those tasks that always makes me sigh, before I get off my ass and start it. Clonedb turns that on its head because it is really quick and simple. There is a bit of setup, but that is really going to be a one-time thing on most servers. You are doing your backups anyway, so there is no big deal there. Now you can just run a script and bang, you have a running clone.

I think this is going to please a lot of DBAs out there!

Cheers

Tim…

Pending Statistics

This is just a quick heads-up to those that plan to use the Pending Statistics feature that has been introduced in Oracle 11.1.

It looks like that in all currently available versions that support this feature Pending Statistics have not been implemented consequently for all possible DBMS_STATS calls, so you have to be very careful which calls you use. Having enabled the pending statistics for a particular table you might start to manipulate the statistics under the impression that the modifications performed are not reflected in the actual dictionary statistics (by "dictionary statistics" in this case I don't mean the statistics of the data dictionary objects themselves but the actual statistics of database objects stored in the data dictionary) but only in the pending statistics area allowing you to test statistics modifications in an isolated environment using the OPTIMIZER_USE_PENDING_STATISTICS parameter on session level.

You therefore might be in for a surprise to find out that this holds true only for a limited set of DBMS_STATS calls, but not for all.

This effectively means that particular changes to the statistics will be effective immediately although pending statistics have been enabled.

In particular manipulations of the statistics using the SET_*_STATS procedures of DBMS_STATS seem to ignore the pending statistics settings and still update the dictionary statistics immediately without further notice.

This is rather unfortunate since this means that Pending Statistics can not be used in a straightforward way to test user-defined statistics which can be very helpful under certain circumstances but require extensive testing before using them on a live system.

But also other calls, like gathering statistics only for a particular set of columns show an unexpected behaviour: It looks like that both statistics get modified, the pending statistics area, but also the dictionary statistics.

Note that setting the GLOBAL preferences (DBMS_STATS.SET_GLOBAL_PREFS) for PUBLISH to FALSE seems to fix this particular issue - in that case only the pending statistics get updated, but the dictionary statistics are left unchanged. This fix does not apply to the SET_*_STATS procedures unfortunately, those seem to always update the dictionary statistics.

The worst thing however is that the statistics history that is automatically maintained since Oracle 10g does not reflect these (unintended) changes properly, so you can not easily recover from the potentially unwanted modifications by calling DBMS_STATS.RESTORE_TABLE_STATS.

Finally I was obviously able to activate the pending statistics using DBMS_STATS.RESTORE_TABLE_STATS - you can rather clearly see this behaviour when using the SET_GLOBAL_PREFS('PUBLISH', 'FALSE') variant of the following script.

The following is a small demonstration of the issues encountered - please note that it modifies the GLOBAL preferences for the PUBLISH setting if you intend to run this test by yourself.

set echo on timing on linesize 130 tab off trimspool on

alter session set nls_date_format = 'DD.MM.YYYY HH24:MI:SS';

drop table t purge;

create table t
as
select * from all_objects
where rownum <= 1000;

exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

-- Our baseline, no histograms, basic column statistics for all columns
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 1')

-- Verify the result
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Enable pending statistics for table T
-- You can try these different calls
--
-- The GATHER_*_STATS procedures seem to behave correctly
-- only when setting the GLOBAL PREFS to FALSE
--
-- "Correctly" means that the results are reflected in the
-- pending area only but not in the dictionary statistics
--
-- Note that the SET_*_STATS procedures seem to ignore the setting
-- always and publish directly to the dictionary
-- no matter what the PUBLISH setting is on any level (TABLE, GLOBAL)
--
-- exec dbms_stats.set_global_prefs('PUBLISH', 'FALSE')
exec dbms_stats.set_table_prefs(null, 'T', 'PUBLISH', 'FALSE')
-- exec dbms_stats.set_schema_prefs(user, 'PUBLISH', 'FALSE')

-- Verify the current setting, statistics will not be published
select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- This is supposed to go to the pending statistics area
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 254')

-- Yes, it worked, the dictionary statistics are not modified
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- The pending statistics area contains now the new statistics including histograms
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's gather statistics only for the OBJECT_NAME column
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 1')

-- Oops, why do my dictionary statistics reflect that change (Note the LAST_ANALYZED column)
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- I do have now the statistics updated in both, pending statistics and dictionary statistics
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's recreate the histogram only on the OBJECT_NAME column
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 254')

-- Oops, I did it again...
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's define a manually crafted NDV and DENSITY value
-- Again I expect this to go to the pending statistics area
declare
srec dbms_stats.statrec;
novals dbms_stats.numarray;
distcnt number;
avgclen number;
nullcnt number;
density number;
begin
dbms_stats.get_column_stats(null, 't', 'object_name', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
dbms_stats.set_column_stats(
ownname=>null,
tabname=>'t',
colname=>'object_name',
distcnt=>distcnt*100,
nullcnt=>nullcnt,
srec=>srec,
avgclen=>avgclen,
density=>density/100
);
end;
/

-- Nope, no change here
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- But I just changed it in the dictionary statistics
-- Even in case of setting the GLOBAL preference to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- And what is even worse: The statistics history does not reflect all these changes to the dictionary statistics
select table_name, stats_update_time from USER_TAB_STATS_HISTORY where table_name = 'T';

exec dbms_stats.restore_table_stats(null, 'T', systimestamp)

-- But which statistics have been restored now?
-- It looks like this actually restored the PENDING statistics
-- according to the LAST_ANALYZED information??
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

And this is what I get from running this on 11.1.0.7, 11.2.0.1 or 11.2.0.2:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> alter session set nls_date_format = 'DD.MM.YYYY HH24:MI:SS';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t purge;

Table dropped.

Elapsed: 00:00:00.20
SQL>
SQL> create table t
2 as
3 select * from all_objects
4 where rownum <= 1000;

Table created.

Elapsed: 00:00:00.35
SQL>
SQL> exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

DBMS_STATS.GET_PREFS('PUBLISH',NULL,'T')
----------------------------------------------------------------------------------------------------------------------------------
TRUE

Elapsed: 00:00:00.00
SQL>
SQL> -- Our baseline, no histograms, basic column statistics for all columns
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> -- Verify the result
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:24 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- Enable pending statistics for table T
SQL> -- You can try these different calls
SQL> --
SQL> -- The GATHER_*_STATS procedures seem to behave correctly
SQL> -- only when setting the GLOBAL PREFS to FALSE
SQL> --
SQL> -- "Correctly" means that the results are reflected in the
SQL> -- pending area only but not in the dictionary statistics
SQL> --
SQL> -- Note that the SET_*_STATS procedures seem to ignore the setting
SQL> -- always and publish directly to the dictionary
SQL> -- no matter what the PUBLISH setting is on any level (TABLE, GLOBAL)
SQL> --
SQL> -- exec dbms_stats.set_global_prefs('PUBLISH', 'FALSE')
SQL> exec dbms_stats.set_table_prefs(null, 'T', 'PUBLISH', 'FALSE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL> -- exec dbms_stats.set_schema_prefs(user, 'PUBLISH', 'FALSE')
SQL>
SQL> -- Verify the current setting, statistics will not be published
SQL> select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

DBMS_STATS.GET_PREFS('PUBLISH',NULL,'T')
----------------------------------------------------------------------------------------------------------------------------------
FALSE

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- This is supposed to go to the pending statistics area
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 254')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.99
SQL>
SQL> -- Yes, it worked, the dictionary statistics are not modified
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:24 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- The pending statistics area contains now the new statistics including histograms
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:26

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
2

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's gather statistics only for the OBJECT_NAME column
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.14
SQL>
SQL> -- Oops, why do my dictionary statistics reflect that change (Note the LAST_ANALYZED column)
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:29 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- I do have now the statistics updated in both, pending statistics and dictionary statistics
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .001005025 18.01.2011 18:58:29

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
0

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
2

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's recreate the histogram only on the OBJECT_NAME column
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 254')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.34
SQL>
SQL> -- Oops, I did it again...
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .00101 18.01.2011 18:58:32 254 NO

Elapsed: 00:00:00.01
SQL>
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:32

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's define a manually crafted NDV and DENSITY value
SQL> -- Again I expect this to go to the pending statistics area
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 begin
9 dbms_stats.get_column_stats(null, 't', 'object_name', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
10 dbms_stats.set_column_stats(
11 ownname=>null,
12 tabname=>'t',
13 colname=>'object_name',
14 distcnt=>distcnt*100,
15 nullcnt=>nullcnt,
16 srec=>srec,
17 avgclen=>avgclen,
18 density=>density/100
19 );
20 end;
21 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.17
SQL>
SQL> -- Nope, no change here
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:32

Elapsed: 00:00:00.01
SQL>
SQL> -- But I just changed it in the dictionary statistics
SQL> -- Even in case of setting the GLOBAL preference to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
99500 .0000101 18.01.2011 18:58:34 254 YES

Elapsed: 00:00:00.01
SQL>
SQL> -- And what is even worse: The statistics history does not reflect all these changes to the dictionary statistics
SQL> select table_name, stats_update_time from USER_TAB_STATS_HISTORY where table_name = 'T';

TABLE_NAME STATS_UPDATE_TIME
------------------------------ ---------------------------------------------------------------------------
T 18-JAN-11 06.58.24.391000 PM +01:00

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.restore_table_stats(null, 'T', systimestamp)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29
SQL>
SQL> -- But which statistics have been restored now?
SQL> -- It looks like this actually restored the PENDING statistics
SQL> -- according to the LAST_ANALYZED information??
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .00101 18.01.2011 18:58:32 254 NO

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>

It is also interesting to note that, although Oracle has added an array of new dictionary views that allow to access the pending statistics area, these views are inconsistent with the existing statistics-related views in terms of naming conventions, columns and behaviour.

For example the *_col_pending_stats view does not have a NUM_BUCKETS column, and the corresponding *_tab_histgrm_pending_stats view for histogram details shows 0 rows if basic column statistics have been defined but no histogram whereas the original *_tab_histograms returns two rows if basic column statistics have been collected representing the low and high value of the column.

Summary

The Pending Statistics feature clearly shows some unexpected behaviour. In particular you better don't rely on it preventing the dictionary statistics from being updated by DBMS_STATS calls with the PUBLISH attribute set to FALSE.

Some of this clearly looks like a bug but I couldn't find a corresponding bug entry yet in My Oracle Support.

Note that this post does not cover the actual usage of Pending Statistics by the optimizer - I haven't done any extensive testing in this regard, but some quick checks showed that it seems to work as expected, which means that the optimizer picks statistics for those tables that have Pending Statistics defined when setting OPTIMIZER_USE_PENDING_STATISTICS = TRUE, but still uses the statistics from the dictionary for those that don't have any pending statistics defined.

Hash Aggregation

Oracle introduced in Oracle 10g the hash aggregation as a new feature. It can be used for both GROUP BY and UNIQUE operations (HASH GROUP BY and HASH UNIQUE respectively) and is by default the preferred aggregation method if there is no particular reason that lets the cost based optimizer prefer the sort based aggregation (SORT GROUP BY and SORT UNIQUE), for example if the GROUP BY is followed by an ORDER BY on the same expression (using the SORT GROUP BY in such cases is not always beneficial by the way, see Guy Harrison's blog for an example).

Ever since its introduction from time to time I've heard complaints about performance degradations of aggregation operations that are based on the new hash aggregation algorithm compared to the previously used sort based aggregations.

Now there may be many potential reasons for such performance degradations (and possibly many of them might not have anything to do with the hash aggregation) but here is a surprising revelation that might explain why some of them were indeed caused by the switch to the new algorithm: The hash aggregation operation does not work very well together with the automatic PGA management (WORKAREA_SIZE_POLICY = AUTO, default since 9i). The fundamental defect is that it is not able to dynamically resize to a larger workarea size when using automatic PGA management and therefore remains more or less at its initial expected size based on the estimates at optimization time.

This effectively means that the efficiency of the hash aggregation operation when using automatic PGA management is heavily dependant on the cardinality estimates at optimization time - in case of estimates in the right ballpark, the memory used at execution time will correspond to the actual requirements at runtime, but in case of bad estimates, the operation potentially uses far less memory than available and unnecessarily spills to disk.

Let's start with a simple script to demonstrate the issue:

set echo on timing on

show parameter pga
show parameter processes

-- alter session set workarea_size_policy = manual sort_area_size = 40000000;

drop table t1 purge;

drop table t2 purge;

create table t1
as
select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

create table t2
as
select distinct user_id from t1;

exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

column low_val new_value low_value
column high_val new_value high_value

select
max(user_id) + 1 as low_val
, max(user_id) + max(user_id) - min(user_id) + 1 as high_val
from
t2;

-- Invalidate any cursors using T1
comment on table t1 is '';

alter session set statistics_level = all;

-- alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

-- alter session set "_smm_trace" = 65535;

select /*+ full(@inner t1)
full(@inner t2)
full(@inner t3)
leading(@inner t2 t3 t1)
use_nl(@inner t3)
use_hash(@inner t1)
no_swap_join_inputs(@inner t1)
use_hash_aggregation(@inner)
*/
max(cnt)
from
(
select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
from t1, t2, t2 t3
where t1.user_id = t2.user_id
group by t1.object_id, t3.user_id
)
;

-- alter session set "_smm_trace" = 0;

set pagesize 0 linesize 200 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

set pagesize 14

-- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
declare
srec dbms_stats.statrec;
novals dbms_stats.numarray;
distcnt number;
avgclen number;
nullcnt number;
density number;
srec2 dbms_stats.statrec;
begin
dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
srec2.epc := 2;
novals := dbms_stats.numarray(
&low_value,
&high_value
);
srec2.bkvals := null;
dbms_stats.prepare_column_values(srec2,novals);
dbms_stats.set_column_stats(
ownname=>null,
tabname=>'t1',
colname=>'user_id',
distcnt=>distcnt,
nullcnt=>nullcnt,
srec=>srec2,
avgclen=>avgclen,
density=>density
);
end;
/

-- Invalidate any cursors using T1
comment on table t1 is '';

alter session set statistics_level = all;

-- alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

-- alter session set "_smm_trace" = 65535;

select /*+ full(@inner t1)
full(@inner t2)
full(@inner t3)
leading(@inner t2 t3 t1)
use_nl(@inner t3)
use_hash(@inner t1)
no_swap_join_inputs(@inner t1)
use_hash_aggregation(@inner)
*/
max(cnt)
from
(
select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
from t1, t2, t2 t3
where t1.user_id = t2.user_id
group by t1.object_id, t3.user_id
)
;

-- alter session set "_smm_trace" = 0;

set pagesize 0 linesize 200 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

I first start with showing the current PGA_AGGREGATE_TARGET (P_A_T) and PROCESSES setting.

I create then two tables as sample data that will be joined and already collect information about the minimum and maximum value of the join column.

Any cursors that are dependent on table T1 will then be invalidated to make sure that a reoptimization of the next query will take place (of course running the complete script invalidates such cursors anyway due to the drop and re-create of the tables).

I then run a simple join followed by a group by operation. I've added a cartesian join by the way, but its only purpose is to ensure that the generated row source is sufficiently large to give the group by something to do.

I have used hints to ensure that I always get the same execution plan even if I later on manipulate the statistics to see the effect of incorrect cardinality estimates.

After getting the actual runtime execution plan along with execution statistics I modify the column statistics of one of the tables' join column in such a way that the optimizer thinks that there is no overlap between the join column values and therefore computes a very low join cardinality.

After making sure again that a reoptimization will take place I run the same statement again with the same data volume and the same hints in place.

And this is what I get for all major versions that are currently out there (10.2.0.4, 10.2.0.5, 11.1.0.7, 11.2.0.1 and 11.2.0.2):

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 200M
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL>
SQL> -- alter session set workarea_size_policy = manual sort_area_size = 40000000;
SQL>
SQL> drop table t1 purge;
drop table t1 purge
*
ERROR at line 1:
ORA-00942: table or view does not exist

Elapsed: 00:00:00.01
SQL>
SQL> drop table t2 purge;
drop table t2 purge
*
ERROR at line 1:
ORA-00942: table or view does not exist

Elapsed: 00:00:00.00
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.07
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.06
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
27 53

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.04
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> -- alter session set tracefile_identifier = 'smm_trace_correct_cardinality';
SQL>
SQL> -- alter session set "_smm_trace" = 65535;
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.71
SQL>
SQL> -- alter session set "_smm_trace" = 0;
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner
t2 t3 t1) use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner
t1) use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */ count(*)
as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by t1.object_id,
t3.user_id )

Plan hash value: 3134842094

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.72 | 139 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.72 | 139 | | | |
| 2 | VIEW | | 1 | 570K| 810K|00:00:00.43 | 139 | | | |
| 3 | HASH GROUP BY | | 1 | 570K| 810K|00:00:00.43 | 139 | 35M| 5521K| 38M (0)|
|* 4 | HASH JOIN | | 1 | 807K| 810K|00:00:00.02 | 139 | 1348K| 1348K| 1093K (0)|
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29907 | 30000 |00:00:00.01 | 55 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

29 rows selected.

Elapsed: 00:00:00.12
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 27,
old 14: &high_value
new 14: 53

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.03
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> -- alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';
SQL>
SQL> -- alter session set "_smm_trace" = 65535;
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:01.37
SQL>
SQL> -- alter session set "_smm_trace" = 0;
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner t1) use_hash_aggregation(@inner)
*/ max(cnt) from ( select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id =
t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 3134842094

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.37 | 139 | 2715 | 2715 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.37 | 139 | 2715 | 2715 | | | | |
| 2 | VIEW | | 1 | 1 | 810K|00:00:00.62 | 139 | 2715 | 2715 | | | | |
| 3 | HASH GROUP BY | | 1 | 1 | 810K|00:00:00.62 | 139 | 2715 | 2715 | 35M| 5521K| 7010K (1)| 23552 |
|* 4 | HASH JOIN | | 1 | 1 | 810K|00:00:00.01 | 139 | 0 | 0 | 1348K| 1348K| 1167K (0)| |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29907 | 30000 |00:00:00.01 | 55 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

28 rows selected.

Elapsed: 00:00:00.03
SQL>

As it can be seen with an estimate in the right ballpark the HASH GROUP BY operation completes in memory (very close to the 20% PGA_AGGREGATE_TARGET maximum size of a single workarea from 10.2 on with automatic PGA management for PGA_AGGREGATE_TARGET < 500M, for more information see Joze Senegacnik's paper on the internals of automatic PGA management).

However repeating exactly the same operation with the fudged statistics it spills to disk and uses only 7M, although it could have used up to 40M (given that there is no concurrent workload).

The same does not happen when repeating this experiment with other operations that use a workarea - the most obvious one being a SORT GROUP BY, as can be seen from this output:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 200M
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.04
SQL>
SQL> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.04
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 15) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.06
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.03
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
15 29

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> -- alter session set tracefile_identifier = 'smm_trace_correct_cardinality';
SQL>
SQL> -- alter session set "_smm_trace" = 65535;
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 no_use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.54
SQL>
SQL> -- alter session set "_smm_trace" = 0;
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 94tqnspjuzk8x, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner
t2 t3 t1) use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner
t1) no_use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */
count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by
t1.object_id, t3.user_id )

Plan hash value: 2068941295

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.53 | 103 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.53 | 103 | | | |
| 2 | VIEW | | 1 | 315K| 450K|00:00:00.37 | 103 | | | |
| 3 | SORT GROUP BY | | 1 | 315K| 450K|00:00:00.37 | 103 | 28M| 1913K| 25M (0)|
|* 4 | HASH JOIN | | 1 | 446K| 450K|00:00:00.01 | 103 | 1348K| 1348K| 1097K (0)|
| 5 | NESTED LOOPS | | 1 | 225 | 225 |00:00:00.01 | 48 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 15 | 15 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 15 | 15 | 225 |00:00:00.01 | 45 | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29787 | 30000 |00:00:00.01 | 55 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

29 rows selected.

Elapsed: 00:00:00.03
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 15,
old 14: &high_value
new 14: 29

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> -- alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';
SQL>
SQL> -- alter session set "_smm_trace" = 65535;
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 no_use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.56
SQL>
SQL> -- alter session set "_smm_trace" = 0;
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 94tqnspjuzk8x, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner
t2 t3 t1) use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner
t1) no_use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */
count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by
t1.object_id, t3.user_id )

Plan hash value: 2068941295

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.55 | 103 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.55 | 103 | | | |
| 2 | VIEW | | 1 | 1 | 450K|00:00:00.38 | 103 | | | |
| 3 | SORT GROUP BY | | 1 | 1 | 450K|00:00:00.38 | 103 | 28M| 1913K| 25M (0)|
|* 4 | HASH JOIN | | 1 | 1 | 450K|00:00:00.01 | 103 | 1348K| 1348K| 1053K (0)|
| 5 | NESTED LOOPS | | 1 | 225 | 225 |00:00:00.01 | 48 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 15 | 15 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 15 | 15 | 225 |00:00:00.01 | 45 | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29787 | 30000 |00:00:00.01 | 55 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

29 rows selected.

Elapsed: 00:00:00.03
SQL>

All I've done is to reduce the data set, because the SORT GROUP BY in this case required more memory for the same amount of data than the HASH GROUP BY, in order to prevent the operation to spill to disk with a 200M PGA_AGGREGATE_TARGET setting and change the USE_HASH_AGGREGATION hint to NO_USE_HASH_AGGREGATION.

As you can see, the operation completes both times in memory and uses the same amount of memory no matter what the estimates look like.

I've tested the serial execution of the most common workarea based operations like HASH JOIN, SORT ORDER BY, WINDOW SORT, SORT UNIQUE and all of them were able to dynamically resize the workarea in cases where the initial estimated size was too small.

If you carefully check then you'll notice that I haven't mentioned the HASH UNIQUE operation yet, and later on you'll see why.

A cunning feature of the automatic PGA management comes to help, however, which is a kind of "feedback loop" for the workareas based on statistics maintained by the automatic PGA memory management, and indeed when the HASH GROUP BY cursor based on the incorrect cardinality estimate gets shared (not invalidated) and re-executed, the next execution will look like this:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality_repeated_execution';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.76
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner t1)
use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id,
t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 3134842094

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.75 | 139 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.75 | 139 | | | | |
| 2 | VIEW | | 1 | 1 | 810K|00:00:00.44 | 139 | | | | |
| 3 | HASH GROUP BY | | 1 | 1 | 810K|00:00:00.44 | 139 | 35M| 5521K| 38M (0)| |
|* 4 | HASH JOIN | | 1 | 1 | 810K|00:00:00.01 | 139 | 1348K| 1348K| 1412K (0)| |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29836 | 30000 |00:00:00.01 | 55 | | | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

28 rows selected.

Elapsed: 00:00:00.03
SQL>

So Oracle this time based the workarea memory requirements on the feedback from the previous execution and therefore allocated sufficient memory to complete the operation without spilling to disk.

Notice however that the feedback loop unfortunately does not work as desired when the workarea execution fails due to insufficient TEMP space. Ideally the feedback loop should allow the subsequent executions to grab more memory specifically in such cases, but at present it doesn't - the failure obviously prevents an update of the statistics and therefore subsequent executions continue to fail since they still use the same amount of memory.

You can test this by simply assigned a very small TEMP tablespace to the user executing the query so that the second execution fails due to insufficient TEMP space. If you repeat the execution in this case, it will fail again and keeps doing so.

What happens if I invalidate the cursor and repeat the execution?

SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.01
SQL> set echo on timing on
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality_repeated_exec_invalid';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;
1

Elapsed: 00:00:01.29
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner t1) use_hash_aggregation(@inner)
*/ max(cnt) from ( select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id =
t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 3134842094

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.28 | 139 | 2715 | 2715 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.28 | 139 | 2715 | 2715 | | | | |
| 2 | VIEW | | 1 | 1 | 810K|00:00:00.61 | 139 | 2715 | 2715 | | | | |
| 3 | HASH GROUP BY | | 1 | 1 | 810K|00:00:00.61 | 139 | 2715 | 2715 | 35M| 5521K| 7067K (1)| 23552 |
|* 4 | HASH JOIN | | 1 | 1 | 810K|00:00:00.01 | 139 | 0 | 0 | 1348K| 1348K| 1129K (0)| |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29836 | 30000 |00:00:00.01 | 55 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

28 rows selected.

Elapsed: 00:00:00.03
SQL>

Back to square one - with the invalidation the statistics are gone, too, so the bad cardinality estimate again lead to the suboptimal execution of the HASH GROUP BY.

So you might think, why bother? Only the first execution of a cursor without workarea execution statistics will be affected by the problem, subsequent execution of the same cursor will benefit from the statistics from the previous executions.

The problem however is, that this is fine for applications that share cursors. Unfortunately applications that peform heavy duty aggregations like data warehouses typically do not share cursors, since they do not care about the optimization overhead and deliberately use literals to provide as much information to the optimizer as possible.

Also these heavy duty aggregations usually use Parallel Execution features, and as you'll see from the output of the same test case, if I run the HASH GROUP BY in parallel by simply setting table T1 to parallel degree 2, a similar problem occurs - so Parallel Execution is affected as well.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 200M
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL>
SQL> -- alter session set workarea_size_policy = manual sort_area_size = 40000000;
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.01
SQL>
SQL> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.04
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.06
SQL>
SQL> alter table t1 parallel 2;

Table altered.

Elapsed: 00:00:00.03
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.03
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
27 53

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.42
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner t1)
use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id,
t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 464898991

--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.43 | 87 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.43 | 87 | | | |
| 2 | PX COORDINATOR | | 1 | | 2 |00:00:00.43 | 87 | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
| 4 | SORT AGGREGATE | | 2 | 1 | 2 |00:00:00.81 | 0 | | | |
| 5 | VIEW | | 2 | 575K| 810K|00:00:00.48 | 0 | | | |
| 6 | HASH GROUP BY | | 2 | 575K| 810K|00:00:00.48 | 0 | 35M| 5521K| 2/0/0|
| 7 | PX RECEIVE | | 2 | 814K| 810K|00:00:00.01 | 0 | | | |
| 8 | PX SEND HASH | :TQ10001 | 0 | 814K| 0 |00:00:00.01 | 0 | | | |
|* 9 | HASH JOIN | | 2 | 814K| 810K|00:00:00.01 | 89 | 1348K| 1348K| 2/0/0|
| 10 | BUFFER SORT | | 2 | | 1458 |00:00:00.01 | 0 | 29696 | 29696 | 2/0/0|
| 11 | PX RECEIVE | | 2 | 729 | 1458 |00:00:00.01 | 0 | | | |
| 12 | PX SEND BROADCAST | :TQ10000 | 0 | 729 | 0 |00:00:00.01 | 0 | | | |
| 13 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | | | |
| 14 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | | | |
| 15 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | | | |
| 16 | PX BLOCK ITERATOR | | 2 | 30164 | 30000 |00:00:00.01 | 89 | | | |
|* 17 | TABLE ACCESS FULL | T1 | 18 | 30164 | 30000 |00:00:00.01 | 89 | | | |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

9 - access("T1"."USER_ID"="T2"."USER_ID")
17 - access(:Z>=:Z AND :Z<=:Z)

38 rows selected.

Elapsed: 00:00:00.03
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 27,
old 14: &high_value
new 14: 53

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:01.17
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1) use_nl(@inner t3)
use_hash(@inner t1) no_swap_join_inputs(@inner t1) use_hash_aggregation(@inner) */ max(cnt) from ( select /*+
qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 464898991

------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.16 | 87 | 0 | 0 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.16 | 87 | 0 | 0 | | | | |
| 2 | PX COORDINATOR | | 1 | | 2 |00:00:01.16 | 87 | 0 | 0 | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 4 | SORT AGGREGATE | | 2 | 1 | 2 |00:00:02.25 | 0 | 2460 | 2460 | | | | |
| 5 | VIEW | | 2 | 1 | 810K|00:00:01.53 | 0 | 2460 | 2460 | | | | |
| 6 | HASH GROUP BY | | 2 | 1 | 810K|00:00:01.53 | 0 | 2460 | 2460 | 35M| 5521K| | 11264 |
| 7 | PX RECEIVE | | 2 | 1 | 810K|00:00:00.01 | 0 | 0 | 0 | | | | |
| 8 | PX SEND HASH | :TQ10001 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
|* 9 | HASH JOIN | | 2 | 1 | 810K|00:00:00.01 | 89 | 0 | 0 | 1348K| 1348K| 2/0/0| |
| 10 | BUFFER SORT | | 2 | | 1458 |00:00:00.01 | 0 | 0 | 0 | 29696 | 29696 | 2/0/0| |
| 11 | PX RECEIVE | | 2 | 729 | 1458 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 12 | PX SEND BROADCAST | :TQ10000 | 0 | 729 | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 13 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 14 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 15 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
| 16 | PX BLOCK ITERATOR | | 2 | 30164 | 30000 |00:00:00.01 | 89 | 0 | 0 | | | | |
|* 17 | TABLE ACCESS FULL | T1 | 18 | 30164 | 30000 |00:00:00.01 | 89 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

9 - access("T1"."USER_ID"="T2"."USER_ID")
17 - access(:Z>=:Z AND :Z<=:Z)

37 rows selected.

Elapsed: 00:00:00.04
SQL>

So, if your database uses automatic PGA management and

- uses the hash aggregation HASH GROUP BY (or HASH UNIQUE, more on that in a moment)
- and does not share cursors

every execution that is based on bad cardinality estimates potentially has a problem with the hash aggregation because it might not make efficient use of the available memory.

The same applies to applications that share cursors, however in that case only the first execution after re-optimization / invalidation is affected.

So you might want to carefully check the runtime execution statistics of your critial hash aggregations.

Mind you, things could be worse, and that is where the HASH UNIQUE operation comes into the picture.

When I realised the issue with the HASH GROUP BY operation I was quite certain that the HASH UNIQUE operation will be affected in a similar way, since internally Oracle seems to use the same mechanism for both operations (In the SMM trace files both are called HASH GROUP BY).

To my surprise I noticed that in 10g versions below 10.2.0.5 and 11g versions below 11.2.0.1 (which means in this case 10.2.0.4 and 11.1.0.7 respectively, I didn't test other versions) the HASH UNIQUE operation suffers from an even more dramatic problem: The cardinality estimate is not considered and the initial workarea size is always based on minimum assumptions.

In passing, according to the SMM trace files that can be activated using the undocumented "_smm_trace" parameter it looks like that many of the sort-based workareas like SORT ORDER BY or WINDOW SORT seem to suffer from the same defect, since they however are able to dynamically resize and make use of the workarea statistics feedback they effectively work as expected, they just start with a too low workarea size estimate every time they are executed for the first time after re-optimization / invalidation.

The combination of the two issues - inability to dynamically resize and ignoring the optimizer estimates - leads to a dire result: Every first execution of a HASH UNIQUE operation in those versions will only use the minimum amount of memory. The following test case shows the problem:

set echo on timing on

show parameter pga
show parameter processes

-- alter session set workarea_size_policy = manual sort_area_size = 40000000;

drop table t1 purge;

drop table t2 purge;

create table t1
as
select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

create table t2
as
select distinct user_id from t1;

exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

column low_val new_value low_value
column high_val new_value high_value

select
max(user_id) + 1 as low_val
, max(user_id) + max(user_id) - min(user_id) + 1 as high_val
from
t2;

-- Invalidate any cursors using T1
comment on table t1 is '';

alter session set statistics_level = all;

alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

alter session set "_smm_trace" = 65535;

select /*+ full(@inner t1)
full(@inner t2)
full(@inner t3)
leading(@inner t2 t3 t1)
use_nl(@inner t3)
use_nl(@inner t1)
use_hash_aggregation(@inner)
*/
max(user_id)
from
(
select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
from t1, t2, t2 t3
where t1.user_id = t2.user_id
)
;

alter session set "_smm_trace" = 0;

set pagesize 0 linesize 200 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

set pagesize 14

-- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
declare
srec dbms_stats.statrec;
novals dbms_stats.numarray;
distcnt number;
avgclen number;
nullcnt number;
density number;
srec2 dbms_stats.statrec;
begin
dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
srec2.epc := 2;
novals := dbms_stats.numarray(
&low_value,
&high_value
);
srec2.bkvals := null;
dbms_stats.prepare_column_values(srec2,novals);
dbms_stats.set_column_stats(
ownname=>null,
tabname=>'t1',
colname=>'user_id',
distcnt=>distcnt,
nullcnt=>nullcnt,
srec=>srec2,
avgclen=>avgclen,
density=>density
);
end;
/

-- Invalidate any cursors using T1
comment on table t1 is '';

alter session set statistics_level = all;

alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

alter session set "_smm_trace" = 65535;

select /*+ full(@inner t1)
full(@inner t2)
full(@inner t3)
leading(@inner t2 t3 t1)
use_nl(@inner t3)
use_nl(@inner t1)
use_hash_aggregation(@inner)
*/
max(user_id)
from
(
select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
from t1, t2, t2 t3
where t1.user_id = t2.user_id
)
;

alter session set "_smm_trace" = 0;

set pagesize 0 linesize 200 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

Obviously it is very similar to the HASH GROUP BY test case, this time however I've used a DISTINCT clause and replaced the HASH JOIN with a NESTED LOOP which makes the generated trace files easier to read, since there is exactly one workarea involved in this execution.

And this is what I get from 11.1.0.7, this time using AMM, and therefore the PGA_AGGREGAT_TARGET has been set to 0 (You'll get the same result from 10.2.0.4 with a corresponding P_A_T setting):

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 0
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
global_txn_processes integer 1
job_queue_processes integer 1000
log_archive_max_processes integer 4
processes integer 150
SQL>
SQL> -- alter session set workarea_size_policy = manual sort_area_size = 40000000;
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.01
SQL>
SQL> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.00
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.04
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.04
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.06
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
27 53

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_nl(@inner t1)
7 use_hash_aggregation(@inner)
8 */
9 max(user_id)
10 from
11 (
12 select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
13 from t1, t2, t2 t3
14 where t1.user_id = t2.user_id
15 )
16 ;

MAX(USER_ID)
------------
26

Elapsed: 00:00:01.79
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 7cnqz02uwcs1a, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2)
full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_nl(@inner t1)
use_hash_aggregation(@inner) */ max(user_id) from ( select /*+
qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id from t1,
t2, t2 t3 where t1.user_id = t2.user_id )

Plan hash value: 1541846686

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.78 | 40179 | 1470 | 1470 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.78 | 40179 | 1470 | 1470 | | | | |
| 2 | VIEW | | 1 | 572K| 810K|00:00:00.97 | 40179 | 1470 | 1470 | | | | |
| 3 | HASH UNIQUE | | 1 | 572K| 810K|00:00:00.97 | 40179 | 1470 | 1470 | 25M| 3296K| 6029K (1)| 13312 |
| 4 | NESTED LOOPS | | 1 | 810K| 810K|00:00:00.01 | 40179 | 0 | 0 | | | | |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
|* 8 | TABLE ACCESS FULL | T1 | 729 | 1111 | 810K|00:00:00.03 | 40095 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("T1"."USER_ID"="T2"."USER_ID")

30 rows selected.

Elapsed: 00:00:00.03
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 27,
old 14: &high_value
new 14: 53

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_nl(@inner t1)
7 use_hash_aggregation(@inner)
8 */
9 max(user_id)
10 from
11 (
12 select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
13 from t1, t2, t2 t3
14 where t1.user_id = t2.user_id
15 )
16 ;

MAX(USER_ID)
------------
26

Elapsed: 00:00:01.26
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 7cnqz02uwcs1a, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2)
full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_nl(@inner t1)
use_hash_aggregation(@inner) */ max(user_id) from ( select /*+
qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id from t1,
t2, t2 t3 where t1.user_id = t2.user_id )

Plan hash value: 1541846686

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.26 | 40179 | 1470 | 1470 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.26 | 40179 | 1470 | 1470 | | | | |
| 2 | VIEW | | 1 | 1 | 810K|00:00:00.98 | 40179 | 1470 | 1470 | | | | |
| 3 | HASH UNIQUE | | 1 | 1 | 810K|00:00:00.98 | 40179 | 1470 | 1470 | 25M| 3296K| 6093K (1)| 13312 |
| 4 | NESTED LOOPS | | 1 | 1 | 810K|00:00:00.01 | 40179 | 0 | 0 | | | | |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
|* 8 | TABLE ACCESS FULL | T1 | 729 | 1 | 810K|00:00:00.01 | 40095 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("T1"."USER_ID"="T2"."USER_ID")

30 rows selected.

Elapsed: 00:00:00.03
SQL>

As you can see no matter what the estimates looks like, on this system the first execution of a HASH UNIQUE will not get more than 6M - subsequent executions of the same cursor benefit from the statistics from the previous run as seen before.

Again, switching to a SORT UNIQUE and reducing the data set accordingly, the problem can not be reproduced:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 0
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
global_txn_processes integer 1
job_queue_processes integer 1000
log_archive_max_processes integer 4
processes integer 150
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.15
SQL>
SQL> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.07
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 15) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.06
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.04
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
15 29

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_nl(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 no_use_hash_aggregation(@inner)
9 */
10 max(user_id)
11 from
12 (
13 select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 )
17 ;

MAX(USER_ID)
------------
14

Elapsed: 00:00:00.73
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID ckb2sbz3y2z14, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2)
full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_nl(@inner t1)
no_swap_join_inputs(@inner t1)
no_use_hash_aggregation(@inner) */ max(user_id) from ( select
/*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id from
t1, t2, t2 t3 where t1.user_id = t2.user_id )

Plan hash value: 3828303002

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.71 | 12423 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.71 | 12423 | | | |
| 2 | VIEW | | 1 | 318K| 450K|00:00:00.62 | 12423 | | | |
| 3 | SORT UNIQUE | | 1 | 318K| 450K|00:00:00.62 | 12423 | 22M| 1744K| 20M (0)|
| 4 | NESTED LOOPS | | 1 | 450K| 450K|00:00:00.01 | 12423 | | | |
| 5 | NESTED LOOPS | | 1 | 225 | 225 |00:00:00.01 | 48 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 15 | 15 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 15 | 15 | 225 |00:00:00.01 | 45 | | | |
|* 8 | TABLE ACCESS FULL | T1 | 225 | 2000 | 450K|00:00:00.01 | 12375 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("T1"."USER_ID"="T2"."USER_ID")

31 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 15,
old 14: &high_value
new 14: 29

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.06
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_nl(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 no_use_hash_aggregation(@inner)
9 */
10 max(user_id)
11 from
12 (
13 select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 )
17 ;

MAX(USER_ID)
------------
14

Elapsed: 00:00:00.74
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID ckb2sbz3y2z14, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2)
full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_nl(@inner t1)
no_swap_join_inputs(@inner t1)
no_use_hash_aggregation(@inner) */ max(user_id) from ( select
/*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id from
t1, t2, t2 t3 where t1.user_id = t2.user_id )

Plan hash value: 3828303002

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.73 | 12423 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.73 | 12423 | | | |
| 2 | VIEW | | 1 | 1 | 450K|00:00:00.61 | 12423 | | | |
| 3 | SORT UNIQUE | | 1 | 1 | 450K|00:00:00.61 | 12423 | 22M| 1744K| 20M (0)|
| 4 | NESTED LOOPS | | 1 | 1 | 450K|00:00:00.01 | 12423 | | | |
| 5 | NESTED LOOPS | | 1 | 225 | 225 |00:00:00.01 | 48 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 15 | 15 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 15 | 15 | 225 |00:00:00.01 | 45 | | | |
|* 8 | TABLE ACCESS FULL | T1 | 225 | 1 | 450K|00:00:00.01 | 12375 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("T1"."USER_ID"="T2"."USER_ID")

31 rows selected.

Elapsed: 00:00:00.01
SQL>
SQL> set doc off
SQL> doc
SQL>

In 10.2.0.5 and 11.2.0.1/11.2.0.2 this issue is apparently fixed and the cardinality estimates are used for an initial workarea size calculation - however this doesn't mean that the dynamic resize problem is fixed - it simply means that the HASH UNIQUE in those versions behaves exactly as the HASH GROUP BY and the memory usage for the first execution depends on the cardinality estimates.

Summary

If your database uses automatic PGA management then for the hash aggregation HASH GROUP BY / HASH UNIQUE operations every initial execution that is based on bad cardinality estimates potentially has a problem because it might not make efficient use of the available memory.

The same applies to applications that share cursors, however in that case only the initial execution after re-optimization / invalidation is affected, subsequent executions benefit from the workarea statistics feedback mechanism.

Furthermore in 10g versions below 10.2.0.5 and 11g versions below 11.2.0.1 the initial execution of a HASH UNIQUE operation ignores the cardinality estimates and will always be based on minimum assumptions.

So you might want to carefully check the runtime execution statistics of your critical hash aggregations.

Possible Workarounds

Different strategies are available as workarounds, depending on the situation:

- Upgrading to versions where the HASH UNIQUE operation at least considers the optimizer estimates might be beneficial

- Obviously good cardinality estimates are crucial, in that case and when using the correct Oracle versions you should be fine. Using the undocumented OPT_ESTIMATE hint (or the deprecated undocumented CARDINALITY hint) might help in cases where other options like manually crafted statistics are not able to help the optimizer to come up with reasonable cardinality estimates.

- Applications that are able to share cursors might not be too much affected due to the ability to use the workarea executions for subsequent executions of the same cursor

- The described problems disappear when switching to manual workarea size policy and allowing for sufficient memory of workarea. Interestingly the less obvious SORT_AREA_SIZE is used for the manual control of the HASH GROUP BY operation, and not the HASH_AREA_SIZE. Of course using manual PGA memory management system-wide is highly unrecommended, so this might only be a workaround in certain cases of large batch jobs where manual workarea sizes are used anyway. Also note, when switching to manual workareas be aware of a nasty bug that was introduced in the 10.2.0.3 patchset. For more information see MOS note "6053134.8: ALTER SESSION to set SORT_AREA_SIZE no honoured" and for example mine and Jonathen Lewis' post about the issue. According to the MOS note the bug has been fixed in 10.2.0.4.3, 10.2.0.5 and 11.1.0.7 / 11.2

- Obviously the hash aggregation can be avoided on statement level using the NO_USE_HASH_AGGREGATION hint or on session / system level using the _GBY_HASH_AGGREGATION_ENABLED parameter. Since the sort based aggregation can be less efficient (but note that it doesn't have to be, it depends on the individual grouping and data pattern) again this doesn't necessarily solve the problem since aggregate workareas might spill to disk which wouldn't be necessary when hash aggregation was used and worked as expected.

Final Note

Of course there are a lot of things that I haven't touched yet or only briefly, like Parallel Execution of the hash aggregations, workareas used for index sorts, and all the other details when the workarea spills to disk.

If you want to get an idea what kind of nasty things can happen in that case you might want to read Jonathan Lewis' post on analytic functions.

I also haven't tried yet to fiddle around with the undocumented _smm* related parameters if setting them to non-default values allows to work around the issue.

And of course then there are bugs like this one: Bug 6817844 - Multi pass sort with auto memory management even with plenty of PGA [ID 6817844.8], which is also mentioned in Jonathan's post.

As a final note, the odd TEMP tablespace I/O pattern issue that Jonathan describes in his post (Bug 9041800) is marked as fixed in the 11.2.0.2 patch set, but I haven't tested this yet.

Adaptive Cursor Sharing

This post was motivated by a recent discussion I had where I mentioned that Adaptive Cursor Sharing (ACS) does not work with embedded SQL in PL/SQL, but when looking for an existing post realized that I didn't find any good and freely available examples of that limitation of ACS.

Therefore I decided to put together this post that outlines what ACS is supposed to do, its inherent limitations and finally shed some light on the specific limitation that started its life with the description that "it does not work with embedded SQL in PL/SQL".

Adaptive Cursor Sharing (ACS) has been introduced in Oracle 11g to address the potential threat of cursor sharing and bind variables - yes, you have read correctly: The usage of bind variables can be a possible threat if different values of bind variables are executed using the same execution plan, but actually would require different execution plans to prevent poor performing executions.

So ACS allows Oracle to detect such problems and selectively "unshare" cursors, which means that Oracle creates one or more additional child cursors for different (ranges) of bind values.

However, there are a couple of limitations to this strategy. Let's have a look at a working example of ACS. Therefore I run the following script against database versions 11.1.0.7 and 11.2.0.2:

--------------------------------------------------------------------------------
--
-- File name: adaptive_cursor_sharing_1.sql
--
-- Purpose: Demonstrate a simple case of adaptive cursor sharing
--
-- Author: Randolf Geist
--
-- Last tested: December 2010
--
-- Versions: 11.1.0.7
-- 11.2.0.2
--------------------------------------------------------------------------------

set echo on define on tab off

drop table t;
purge table t;
create table t as select * from dba_objects, (select rownum from dual connect by level <= 20);
create index i on t(object_id);

exec dbms_stats.gather_table_stats(user, 'T', method_opt => 'for all columns size 1')

set echo on

select count(*), min(object_id)
, max(object_id) from t;

set echo on linesize 200 heading off newpage none tab off serveroutput off

alter session set statistics_level = all;

alter system flush buffer_cache;

alter system flush shared_pool;

var x number

exec :x := 100000

-- pause

set timing on

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

column prev_sql_id new_value sql_id noprint
column sid new_value session_id noprint

select
prev_sql_id
, sid
from
v$session
where
sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

exec :x := 1

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

exec :x := 1

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

set heading on

select
sql_id
, child_number
, executions
, parse_calls
, buffer_gets
, is_bind_sensitive
, is_bind_aware
from
v$sql
where
sql_id = '&sql_id';

and I get the following result from 11.2.0.2 (and 11.1.0.7):

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> drop table t;

Table dropped.

SQL> purge table t;

Table purged.

SQL> create table t as select * from dba_objects, (select rownum from dual connect by level <= 20);

Table created.

SQL> create index i on t(object_id);

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'T', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

SQL>
SQL> set echo on
SQL>
SQL> select count(*), min(object_id)
2 , max(object_id) from t;

COUNT(*) MIN(OBJECT_ID) MAX(OBJECT_ID)
---------- -------------- --------------
1457480 2 75795

SQL>
SQL> set echo on linesize 200 heading off newpage none tab off serveroutput off
SQL>
SQL> alter session set statistics_level = all;

Session altered.

SQL>
SQL> alter system flush buffer_cache;

System altered.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> var x number
SQL>
SQL> exec :x := 100000

PL/SQL procedure successfully completed.

SQL>
SQL> -- pause
SQL>
SQL> set timing on
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );

Elapsed: 00:00:00.09
SQL>
SQL> column prev_sql_id new_value sql_id noprint
SQL> column sid new_value session_id noprint
SQL>
SQL> select
2 prev_sql_id
3 , sid
4 from
5 v$session
6 where
7 sid = sys_context('userenv', 'sid');

Elapsed: 00:00:00.15
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | 3 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | 3 | | | |
| 2 | VIEW | | 1 | 13 | 0 |00:00:00.01 | 3 | 3 | | | |
| 3 | WINDOW SORT | | 1 | 13 | 0 |00:00:00.01 | 3 | 3 | 1024 | 1024 | |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 13 | 0 |00:00:00.01 | 3 | 3 | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 13 | 0 |00:00:00.01 | 3 | 3 | | | |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:01.74
SQL>
SQL> exec :x := 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );
2.7361E+11

Elapsed: 00:00:17.12
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:17.12 | 1460K| 35371 | 10532 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.12 | 1460K| 35371 | 10532 | | | | |
| 2 | VIEW | | 1 | 13 | 1457K|00:00:16.94 | 1460K| 35371 | 10532 | | | | |
| 3 | WINDOW SORT | | 1 | 13 | 1457K|00:00:16.66 | 1460K| 35371 | 10532 | 93M| 3313K| 37M (1)| 84992 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 13 | 1457K|00:00:12.01 | 1460K| 24839 | 0 | | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 13 | 1457K|00:00:01.53 | 3221 | 3220 | 0 | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:00.17
SQL>
SQL> exec :x := 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );
2.7361E+11

Elapsed: 00:00:05.10
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
SQL_ID fmbq5ytmh0hng, child number 1
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 2719131525

----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:05.10 | 21539 | 29966 | 8375 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:05.10 | 21539 | 29966 | 8375 | | | | |
| 2 | VIEW | | 1 | 1457K| 1457K|00:00:04.98 | 21539 | 29966 | 8375 | | | | |
| 3 | WINDOW SORT | | 1 | 1457K| 1457K|00:00:04.80 | 21539 | 29966 | 8375 | 74M| 2971K| 37M (1)| 67584 |
|* 4 | TABLE ACCESS FULL| T | 1 | 1457K| 1457K|00:00:01.38 | 21532 | 21528 | 0 | | | | |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("OBJECT_ID">:X)

24 rows selected.

Elapsed: 00:00:00.05
SQL>
SQL> set heading on
SQL>
SQL> select
2 sql_id
3 , child_number
4 , executions
5 , parse_calls
6 , buffer_gets
7 , is_bind_sensitive
8 , is_bind_aware
9 from
10 v$sql
11 where
12 sql_id = '&sql_id';
old 12: sql_id = '&sql_id'
new 12: sql_id = 'fmbq5ytmh0hng'
SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
fmbq5ytmh0hng 0 2 3 1460762 Y N
fmbq5ytmh0hng 1 1 0 21539 Y Y

Elapsed: 00:00:00.00
SQL>

So I intentionally use bind variables in a dangerous way here: The "object_id > :x" predicate is a potential threat - depending on the value of :x the usage of the index on OBJECT_ID can be a very good or a very bad idea.

From this example we therefore can tell a couple of things you should know about ACS:

1. ACS is not limited to the typical "skewed column data" / "histogram" case. We can see that I explicitly did not create any histograms (method_opt "for all columns size 1"), still ACS marked the cursor as "BIND_SENSITIVE" and eventually created a second child cursor with a more appropriate execution plan for the bind value, because it detected the "expression > :x" predicate.

2. However, and this one of the most important aspects, for ACS to detect the problem, it first has to go wrong at least once. As you can see from the example it was only the second execution using the "bad" bind value that created the child cursor. So depending on how "wrong" things can go - think of a more complex execution plan including multiple joins - it might already be "too late" for your application: If for example a query is not going to return in a reasonable amount of time your application might effectively be "broken" even if ACS might be able to "fix" the problem at the next execution...

3. Another important point to consider is that the ACS information is not stored persistently but only exists in the Shared Pool. This means, if the information is aged out of the Shared Pool for whatever reason, the same mistake has to be made again for ACS to detect the problem. So if your problematic cursors age out of the Shared Pool it will potentially go "wrong" again with the next execution before ACS can come for help.

From this you can see that - although ACS is a great feature - it is not a "Silver Bullet" for all possible cases. You still might need to help your database for those special cases where ACS is not able to support your application properly. How you can help your database? More on that later.

Now let's have a look at the point that ACS "does not work with embedded SQL in PL/SQL". Therefore I put together the following script that in principle does the same as the first script, but this time uses the most commonly used techniques in PL/SQL to execute SQL:

--------------------------------------------------------------------------------
--
-- File name: adaptive_cursor_sharing_2.sql
--
-- Purpose: Demonstrate adaptive cursor sharing is not working with PL/SQL when
-- optimizations are enabled, in particular holding cursors open
-- See bug 8357294
--
-- Author: Randolf Geist
--
-- Last tested: December 2010
--
-- Versions: 11.1.0.7
-- 11.2.0.2
--------------------------------------------------------------------------------

set echo on define on tab off

drop table t;
purge table t;
create table t as select * from dba_objects, (select rownum from dual connect by level <= 20);
create index i on t(object_id);

exec dbms_stats.gather_table_stats(user, 'T', method_opt => 'for all columns size 1')

set echo on

select count(*), min(object_id)
, max(object_id) from t;

set echo on linesize 200 heading on tab off serveroutput off

set timing on

-- This is a workaround that disables also the cursor optimizations in PL/SQL
-- In particular the PL/SQL hold cursor open optimization
-- Therefore the cursor gets reparsed and ACS kicks in
alter session set session_cached_cursors = 0;
--
-- Does this make any difference? No
-- alter session set plsql_optimize_level = 0;

variable sql_id varchar2(255)

-- Static SQL
alter system flush buffer_cache;

alter system flush shared_pool;

declare
x integer;
n_dummy number;
begin
for i in 1..10 loop
if i = 1 then
x := 100000;
else
x := 1;
end if;
select
sum(row_num)
into
n_dummy
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > x
);
if i = 1 then
select
prev_sql_id
into
:sql_id
from
v$session
where
sid = sys_context('userenv', 'sid');
end if;
end loop;
end;
/

select
sql_id
, child_number
, executions
, parse_calls
, buffer_gets
, is_bind_sensitive
, is_bind_aware
from
v$sql
where
sql_id = :sql_id;

set pagesize 0

select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));

set pagesize 9999

-- dynamic SQL with binds
alter system flush buffer_cache;

alter system flush shared_pool;

declare
x integer;
n_dummy number;
begin
for i in 1..10 loop
if i = 1 then
x := 100000;
else
x := 1;
end if;
execute immediate '
select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
)' into n_dummy using x;
if i = 1 then
select
prev_sql_id
into
:sql_id
from
v$session
where
sid = sys_context('userenv', 'sid');
end if;
end loop;
end;
/

select
sql_id
, child_number
, executions
, parse_calls
, buffer_gets
, is_bind_sensitive
, is_bind_aware
from
v$sql
where
sql_id = :sql_id;

set pagesize 0

select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));

set pagesize 9999

-- dynamic SQL with literals
alter system flush buffer_cache;

alter system flush shared_pool;

declare
x integer;
n_dummy number;
begin
for i in 1..10 loop
if i = 1 then
x := 100000;
else
x := 1;
end if;
execute immediate '
select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > ' || x || '
)' into n_dummy;
if i = 1 then
select
prev_sql_id
into
:sql_id
from
v$session
where
sid = sys_context('userenv', 'sid');
end if;
end loop;
end;
/

select
b.sql_id
, b.child_number
, b.executions
, b.parse_calls
, b.buffer_gets
, b.is_bind_sensitive
, b.is_bind_aware
from
v$sql a, v$sql b
where
a.force_matching_signature = b.force_matching_signature
and a.sql_id = :sql_id;

set pagesize 0

select
c.*
from
v$sql a
, v$sql b
, table(dbms_xplan.display_cursor(b.sql_id)) c
where
a.force_matching_signature = b.force_matching_signature
and a.sql_id = :sql_id;

set pagesize 9999

-- dynamic SQL with binds + OPEN
alter system flush buffer_cache;

alter system flush shared_pool;

declare
x integer;
n_dummy number;
c sys_refcursor;
begin
for i in 1..10 loop
if i = 1 then
x := 100000;
else
x := 1;
end if;
open c for '
select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
)' using x;
fetch c into n_dummy;
close c;
if i = 1 then
select
prev_sql_id
into
:sql_id
from
v$session
where
sid = sys_context('userenv', 'sid');
end if;
end loop;
end;
/

select
sql_id
, child_number
, executions
, parse_calls
, buffer_gets
, is_bind_sensitive
, is_bind_aware
from
v$sql
where
sql_id = :sql_id;

set pagesize 0

select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));

set pagesize 9999

So this script executes the same query as used in the first script ten times in a row, again deliberately using the same bind values in the same order: The first execution can benefit from the index, all following executions would better use a full table scan.

These are the different techniques used:

1. Static embedded SQL which are sometimes in PL/SQL are also called "implicit cursors". They should exhibit the same characteristics as explicitly declared cursors, which I haven't included in this script.

2. Dynamic SQL using EXECUTE IMMEDIATE and binds, so the cursor can still be shared

3. Dynamic SQL using EXECUTE IMMEDIATE and literals, which leads to different SQLs and therefore different parent cursors with potentially different executions plans since both will be parsed / optimized independently from each other

4. Dynamic SQL using OPEN / FETCH / CLOSE and binds, so the cursor again can be shared

And this is the result that I get from 11.2.0.2 (11.1.0.7 showed the same):

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> drop table t;

Table dropped.

SQL> purge table t;

Table purged.

SQL> create table t as select * from dba_objects, (select rownum from dual connect by level <= 20);

Table created.

SQL> create index i on t(object_id);

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'T', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

SQL>
SQL> set echo on
SQL>
SQL> select count(*), min(object_id)
2 , max(object_id) from t;

COUNT(*) MIN(OBJECT_ID) MAX(OBJECT_ID)
---------- -------------- --------------
1457480 2 75819

SQL>
SQL> set echo on linesize 200 heading on tab off serveroutput off
SQL>
SQL> set timing on
SQL>
SQL> -- This is a workaround that disables also the cursor optimizations in PL/SQL
SQL> -- In particular the PL/SQL hold cursor open optimization
SQL> -- Therefore the cursor gets reparsed and ACS kicks in
SQL> -- alter session set session_cached_cursors = 0;
SQL> --
SQL> -- Does this make any difference? No
SQL> -- alter session set plsql_optimize_level = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> variable sql_id varchar2(255)
SQL>
SQL> -- Static SQL
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.03
SQL>
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.63
SQL>
SQL> declare
2 x integer;
3 n_dummy number;
4 begin
5 for i in 1..10 loop
6 if i = 1 then
7 x := 100000;
8 else
9 x := 1;
10 end if;
11 select
12 sum(row_num)
13 into
14 n_dummy
15 from
16 (
17 select
18 row_number() over (partition by object_type order by object_name) as row_num
19 , t.*
20 from
21 t
22 where
23 object_id > x
24 );
25 if i = 1 then
26 select
27 prev_sql_id
28 into
29 :sql_id
30 from
31 v$session
32 where
33 sid = sys_context('userenv', 'sid');
34 end if;
35 end loop;
36 end;
37 /

PL/SQL procedure successfully completed.

Elapsed: 00:02:50.68
SQL>
SQL> select
2 sql_id
3 , child_number
4 , executions
5 , parse_calls
6 , buffer_gets
7 , is_bind_sensitive
8 , is_bind_aware
9 from
10 v$sql
11 where
12 sql_id = :sql_id;

SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
g89tq3q521rbg 0 10 1 13146414 Y N

Elapsed: 00:00:00.17
SQL>
SQL> set pagesize 0
SQL>
SQL> select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));
SQL_ID g89tq3q521rbg, child number 0
-------------------------------------
SELECT SUM(ROW_NUM) FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY
OBJECT_TYPE ORDER BY OBJECT_NAME) AS ROW_NUM , T.* FROM T WHERE
OBJECT_ID > :B1 )

Plan hash value: 1399240396

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 13 | 169 | 18 (6)| 00:00:01 |
| 3 | WINDOW SORT | | 13 | 507 | 18 (6)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 13 | 507 | 17 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I | 13 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :B1 (NUMBER): 100000

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:B1)

29 rows selected.

Elapsed: 00:00:01.86
SQL>
SQL> set pagesize 9999
SQL>
SQL> -- dynamic SQL with binds
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.09
SQL>
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.84
SQL>
SQL> declare
2 x integer;
3 n_dummy number;
4 begin
5 for i in 1..10 loop
6 if i = 1 then
7 x := 100000;
8 else
9 x := 1;
10 end if;
11 execute immediate '
12 select
13 sum(row_num)
14 from
15 (
16 select
17 row_number() over (partition by object_type order by object_name) as row_num
18 , t.*
19 from
20 t
21 where
22 object_id > :x
23 )' into n_dummy using x;
24 if i = 1 then
25 select
26 prev_sql_id
27 into
28 :sql_id
29 from
30 v$session
31 where
32 sid = sys_context('userenv', 'sid');
33 end if;
34 end loop;
35 end;
36 /

PL/SQL procedure successfully completed.

Elapsed: 00:03:17.26
SQL>
SQL> select
2 sql_id
3 , child_number
4 , executions
5 , parse_calls
6 , buffer_gets
7 , is_bind_sensitive
8 , is_bind_aware
9 from
10 v$sql
11 where
12 sql_id = :sql_id;

SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
3snga4x0swy99 0 10 1 13146434 Y N

Elapsed: 00:00:00.14
SQL>
SQL> set pagesize 0
SQL>
SQL> select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));
SQL_ID 3snga4x0swy99, child number 0
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > :x )

Plan hash value: 1399240396

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 13 | 169 | 18 (6)| 00:00:01 |
| 3 | WINDOW SORT | | 13 | 507 | 18 (6)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 13 | 507 | 17 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I | 13 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :X (NUMBER): 100000

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

31 rows selected.

Elapsed: 00:00:01.35
SQL>
SQL> set pagesize 9999
SQL>
SQL> -- dynamic SQL with literals
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.05
SQL>
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.66
SQL>
SQL> declare
2 x integer;
3 n_dummy number;
4 begin
5 for i in 1..10 loop
6 if i = 1 then
7 x := 100000;
8 else
9 x := 1;
10 end if;
11 execute immediate '
12 select
13 sum(row_num)
14 from
15 (
16 select
17 row_number() over (partition by object_type order by object_name) as row_num
18 , t.*
19 from
20 t
21 where
22 object_id > ' || x || '
23 )' into n_dummy;
24 if i = 1 then
25 select
26 prev_sql_id
27 into
28 :sql_id
29 from
30 v$session
31 where
32 sid = sys_context('userenv', 'sid');
33 end if;
34 end loop;
35 end;
36 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:52.29
SQL>
SQL> select
2 b.sql_id
3 , b.child_number
4 , b.executions
5 , b.parse_calls
6 , b.buffer_gets
7 , b.is_bind_sensitive
8 , b.is_bind_aware
9 from
10 v$sql a, v$sql b
11 where
12 a.force_matching_signature = b.force_matching_signature
13 and a.sql_id = :sql_id;

SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
2tgumwrfx40bu 0 1 1 53 N N
03zrk5tr7zsn4 0 9 1 193851 N N

Elapsed: 00:00:00.10
SQL>
SQL> set pagesize 0
SQL>
SQL> select
2 c.*
3 from
4 v$sql a
5 , v$sql b
6 , table(dbms_xplan.display_cursor(b.sql_id)) c
7 where
8 a.force_matching_signature = b.force_matching_signature
9 and a.sql_id = :sql_id;
SQL_ID 2tgumwrfx40bu, child number 0
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > 100000 )

Plan hash value: 1399240396

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | 15 | | |
| 2 | VIEW | | 13 | 195 | 18 (6)| 00:00:01 |
| 3 | WINDOW SORT | | 13 | 507 | 18 (6)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 13 | 507 | 17 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I | 13 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">100000)

SQL_ID 03zrk5tr7zsn4, child number 0
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > 1 )

Plan hash value: 2719131525

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 20588 (100)| |
| 1 | SORT AGGREGATE | | 1 | 15 | | | |
| 2 | VIEW | | 1457K| 20M| | 20588 (1)| 00:04:08 |
| 3 | WINDOW SORT | | 1457K| 54M| 72M| 20588 (1)| 00:04:08 |
|* 4 | TABLE ACCESS FULL| T | 1457K| 54M| | 5894 (2)| 00:01:11 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("OBJECT_ID">1)

51 rows selected.

Elapsed: 00:00:01.24
SQL>
SQL> set pagesize 9999
SQL>
SQL> -- dynamic SQL with binds + OPEN
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.04
SQL>
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.66
SQL>
SQL> declare
2 x integer;
3 n_dummy number;
4 c sys_refcursor;
5 begin
6 for i in 1..10 loop
7 if i = 1 then
8 x := 100000;
9 else
10 x := 1;
11 end if;
12 open c for '
13 select
14 sum(row_num)
15 from
16 (
17 select
18 row_number() over (partition by object_type order by object_name) as row_num
19 , t.*
20 from
21 t
22 where
23 object_id > :x
24 )' using x;
25 fetch c into n_dummy;
26 close c;
27 if i = 1 then
28 select
29 prev_sql_id
30 into
31 :sql_id
32 from
33 v$session
34 where
35 sid = sys_context('userenv', 'sid');
36 end if;
37 end loop;
38 end;
39 /

PL/SQL procedure successfully completed.

Elapsed: 00:01:12.12
SQL>
SQL> select
2 sql_id
3 , child_number
4 , executions
5 , parse_calls
6 , buffer_gets
7 , is_bind_sensitive
8 , is_bind_aware
9 from
10 v$sql
11 where
12 sql_id = :sql_id;

SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
3snga4x0swy99 0 2 3 1460762 Y N
3snga4x0swy99 1 8 7 172312 Y Y

Elapsed: 00:00:00.16
SQL>
SQL> set pagesize 0
SQL>
SQL> select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));
SQL_ID 3snga4x0swy99, child number 0
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > :x )

Plan hash value: 1399240396

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 13 | 169 | 18 (6)| 00:00:01 |
| 3 | WINDOW SORT | | 13 | 507 | 18 (6)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 13 | 507 | 17 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I | 13 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :X (NUMBER): 100000

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

SQL_ID 3snga4x0swy99, child number 1
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > :x )

Plan hash value: 2719131525

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 20588 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | | |
| 2 | VIEW | | 1457K| 18M| | 20588 (1)| 00:04:08 |
| 3 | WINDOW SORT | | 1457K| 54M| 72M| 20588 (1)| 00:04:08 |
|* 4 | TABLE ACCESS FULL| T | 1457K| 54M| | 5894 (2)| 00:01:11 |
-------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :X (NUMBER): 1

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("OBJECT_ID">:X)

61 rows selected.

Elapsed: 00:00:01.36
SQL>
SQL> set pagesize 9999
SQL>

So this script shows some interesting results:

1. It can be seen that ACS does not seem to work with several of the used variations: Both the static and the dynamic SQL using EXECUTE IMMEDIATE and binds do not create a second child cursor and re-use the same execution plan for all ten executions. Although the cursor has been marked as BIND_SENSITIVE, it has not been marked as BIND_AWARE.

Both execution variants also share the fact that there seems to be a PL/SQL optimization in place that allows to keep the cursors open - there is only a single PARSE call recorded in V$SQL for this SQL statement, although we loop through the same code ten times and therefore should in principle see as many parse as execution calls.

You would even see the same optimization in place when explicit cursors were used with explicit OPEN / FETCH / CLOSE calls: Albeit the explicit CLOSE you would still see only a single PARSE call for the statement - PL/SQL holds the cursor open for maximum performance and scalibility.

2. Another interesting side note is that the new feature "Cardinality Feedback" that has been introduced with Oracle 11.2 seems also to be not working in these examples, but this is definitely something for another post.

"Cardinality Feedback" allows Oracle to compare the cardinality estimates of the cost-based optimizer with the actual cardinalities at execution time and perform "instant tuning" by automatically adding OPT_ESTIMATE and other hints to the statement to correct the cardinality estimates and do a re-optimization of the "tuned" statement. Again this information only resides in the Shared Pool and will be potentially aged out like the ACS information.

3. The clear winner in terms of both elapsed and performed logical I/O is the third variant that creates two different cursors using literals. For both cursors the optimal execution plan has been found and nothing had to go "wrong" first to be fixed afterwards, therefore the total elapsed time and number of buffer gets is significantly lower than in the other cases. This gives us already a hint regarding the point "How can we help the database in such cases?"

Note also that although the number of buffer gets is significantly lower compared to the other cases, the elapsed time is not that much different, which is another example that buffer gets are not necessarily a good indicator for the overall work performed. In this particular case a lot of time is spent on writing and reading from the temporary tablespace for the WINDOW SORT operation, and since this operation has more or less to do the same work in all cases where the whole table needs to read in whatever efficient or inefficient way, the overall runtime does not differ as dramatically as the number of buffer gets. You could see this additional work in this example by checking the columns DISK_READS and DIRECT_WRITES from V$SQL and of course by checking the ASH / extended SQL trace profile.

4. The fourth variant using the dynamic SQL with OPEN obviously does not benefit from the PL/SQL optimization: V$SQL clearly shows that there have been as many PARSE calls as EXECUTE calls. Interestingly in this case ACS worked and created a second child cursor. This gives us already a hint what seems to prevent ACS from working.

The shown behaviour of ACS not working is officially documented in My Oracle Support, however not available in the Knowledge Base but only as bug number 8357294: ADAPTIVE CURSOR SHARING DOESN'T WORK FOR SQL FROM PL/SQL. It is interesting to note that the very detailed bug description does not mention a fix release or patch, but does mention that setting "session_cached_cursors" to 0 allows ACS again to work.

And indeed, when running the same script again with "session_cached_cursors" set to 0, I get this result:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> drop table t;

Table dropped.

SQL> purge table t;

Table purged.

SQL> create table t as select * from dba_objects, (select rownum from dual connect by level <= 20);

Table created.

SQL> create index i on t(object_id);

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'T', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

SQL>
SQL> set echo on
SQL>
SQL> select count(*), min(object_id)
2 , max(object_id) from t;

COUNT(*) MIN(OBJECT_ID) MAX(OBJECT_ID)
---------- -------------- --------------
1457480 2 75821

SQL>
SQL> set echo on linesize 200 heading on tab off serveroutput off
SQL>
SQL> set timing on
SQL>
SQL> -- This is a workaround that disables also the cursor optimizations in PL/SQL
SQL> -- In particular the PL/SQL hold cursor open optimization
SQL> -- Therefore the cursor gets reparsed and ACS kicks in
SQL> alter session set session_cached_cursors = 0;

Session altered.

Elapsed: 00:00:00.00
SQL> --
SQL> -- Does this make any difference? No
SQL> -- alter session set plsql_optimize_level = 0;
SQL>
SQL> variable sql_id varchar2(255)
SQL>
SQL> -- Static SQL
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.05
SQL>
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.71
SQL>
SQL> declare
2 x integer;
3 n_dummy number;
4 begin
5 for i in 1..10 loop
6 if i = 1 then
7 x := 100000;
8 else
9 x := 1;
10 end if;
11 select
12 sum(row_num)
13 into
14 n_dummy
15 from
16 (
17 select
18 row_number() over (partition by object_type order by object_name) as row_num
19 , t.*
20 from
21 t
22 where
23 object_id > x
24 );
25 if i = 1 then
26 select
27 prev_sql_id
28 into
29 :sql_id
30 from
31 v$session
32 where
33 sid = sys_context('userenv', 'sid');
34 end if;
35 end loop;
36 end;
37 /

PL/SQL procedure successfully completed.

Elapsed: 00:01:00.82
SQL>
SQL> select
2 sql_id
3 , child_number
4 , executions
5 , parse_calls
6 , buffer_gets
7 , is_bind_sensitive
8 , is_bind_aware
9 from
10 v$sql
11 where
12 sql_id = :sql_id;

SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
g89tq3q521rbg 0 2 3 1460742 Y N
g89tq3q521rbg 1 8 7 172312 Y Y

Elapsed: 00:00:00.16
SQL>
SQL> set pagesize 0
SQL>
SQL> select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));
SQL_ID g89tq3q521rbg, child number 0
-------------------------------------
SELECT SUM(ROW_NUM) FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY
OBJECT_TYPE ORDER BY OBJECT_NAME) AS ROW_NUM , T.* FROM T WHERE
OBJECT_ID > :B1 )

Plan hash value: 1399240396

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 13 | 169 | 18 (6)| 00:00:01 |
| 3 | WINDOW SORT | | 13 | 507 | 18 (6)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 13 | 507 | 17 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I | 13 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :B1 (NUMBER): 100000

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:B1)

SQL_ID g89tq3q521rbg, child number 1
-------------------------------------
SELECT SUM(ROW_NUM) FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY
OBJECT_TYPE ORDER BY OBJECT_NAME) AS ROW_NUM , T.* FROM T WHERE
OBJECT_ID > :B1 )

Plan hash value: 2719131525

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 20588 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | | |
| 2 | VIEW | | 1457K| 18M| | 20588 (1)| 00:04:08 |
| 3 | WINDOW SORT | | 1457K| 54M| 72M| 20588 (1)| 00:04:08 |
|* 4 | TABLE ACCESS FULL| T | 1457K| 54M| | 5894 (2)| 00:01:11 |
-------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :B1 (NUMBER): 1

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("OBJECT_ID">:B1)

57 rows selected.

Elapsed: 00:00:01.50
SQL>
SQL> set pagesize 9999
SQL>
SQL> -- dynamic SQL with binds
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.07
SQL>
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.70
SQL>
SQL> declare
2 x integer;
3 n_dummy number;
4 begin
5 for i in 1..10 loop
6 if i = 1 then
7 x := 100000;
8 else
9 x := 1;
10 end if;
11 execute immediate '
12 select
13 sum(row_num)
14 from
15 (
16 select
17 row_number() over (partition by object_type order by object_name) as row_num
18 , t.*
19 from
20 t
21 where
22 object_id > :x
23 )' into n_dummy using x;
24 if i = 1 then
25 select
26 prev_sql_id
27 into
28 :sql_id
29 from
30 v$session
31 where
32 sid = sys_context('userenv', 'sid');
33 end if;
34 end loop;
35 end;
36 /

PL/SQL procedure successfully completed.

Elapsed: 00:01:12.41
SQL>
SQL> select
2 sql_id
3 , child_number
4 , executions
5 , parse_calls
6 , buffer_gets
7 , is_bind_sensitive
8 , is_bind_aware
9 from
10 v$sql
11 where
12 sql_id = :sql_id;

SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
3snga4x0swy99 0 2 3 1460762 Y N
3snga4x0swy99 1 8 7 172312 Y Y

Elapsed: 00:00:00.13
SQL>
SQL> set pagesize 0
SQL>
SQL> select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));
SQL_ID 3snga4x0swy99, child number 0
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > :x )

Plan hash value: 1399240396

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 13 | 169 | 18 (6)| 00:00:01 |
| 3 | WINDOW SORT | | 13 | 507 | 18 (6)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 13 | 507 | 17 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I | 13 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :X (NUMBER): 100000

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

SQL_ID 3snga4x0swy99, child number 1
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > :x )

Plan hash value: 2719131525

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 20588 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | | |
| 2 | VIEW | | 1457K| 18M| | 20588 (1)| 00:04:08 |
| 3 | WINDOW SORT | | 1457K| 54M| 72M| 20588 (1)| 00:04:08 |
|* 4 | TABLE ACCESS FULL| T | 1457K| 54M| | 5894 (2)| 00:01:11 |
-------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :X (NUMBER): 1

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("OBJECT_ID">:X)

61 rows selected.

Elapsed: 00:00:01.10
SQL>
SQL> set pagesize 9999
SQL>
SQL> -- dynamic SQL with literals
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.03
SQL>
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.69
SQL>
SQL> declare
2 x integer;
3 n_dummy number;
4 begin
5 for i in 1..10 loop
6 if i = 1 then
7 x := 100000;
8 else
9 x := 1;
10 end if;
11 execute immediate '
12 select
13 sum(row_num)
14 from
15 (
16 select
17 row_number() over (partition by object_type order by object_name) as row_num
18 , t.*
19 from
20 t
21 where
22 object_id > ' || x || '
23 )' into n_dummy;
24 if i = 1 then
25 select
26 prev_sql_id
27 into
28 :sql_id
29 from
30 v$session
31 where
32 sid = sys_context('userenv', 'sid');
33 end if;
34 end loop;
35 end;
36 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:55.97
SQL>
SQL> select
2 b.sql_id
3 , b.child_number
4 , b.executions
5 , b.parse_calls
6 , b.buffer_gets
7 , b.is_bind_sensitive
8 , b.is_bind_aware
9 from
10 v$sql a, v$sql b
11 where
12 a.force_matching_signature = b.force_matching_signature
13 and a.sql_id = :sql_id;

SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
2tgumwrfx40bu 0 1 1 53 N N
03zrk5tr7zsn4 0 9 9 193851 N N

Elapsed: 00:00:00.11
SQL>
SQL> set pagesize 0
SQL>
SQL> select
2 c.*
3 from
4 v$sql a
5 , v$sql b
6 , table(dbms_xplan.display_cursor(b.sql_id)) c
7 where
8 a.force_matching_signature = b.force_matching_signature
9 and a.sql_id = :sql_id;
SQL_ID 2tgumwrfx40bu, child number 0
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > 100000 )

Plan hash value: 1399240396

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | 15 | | |
| 2 | VIEW | | 13 | 195 | 18 (6)| 00:00:01 |
| 3 | WINDOW SORT | | 13 | 507 | 18 (6)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 13 | 507 | 17 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I | 13 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">100000)

SQL_ID 03zrk5tr7zsn4, child number 0
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > 1 )

Plan hash value: 2719131525

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 20588 (100)| |
| 1 | SORT AGGREGATE | | 1 | 15 | | | |
| 2 | VIEW | | 1457K| 20M| | 20588 (1)| 00:04:08 |
| 3 | WINDOW SORT | | 1457K| 54M| 72M| 20588 (1)| 00:04:08 |
|* 4 | TABLE ACCESS FULL| T | 1457K| 54M| | 5894 (2)| 00:01:11 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("OBJECT_ID">1)

51 rows selected.

Elapsed: 00:00:01.17
SQL>
SQL> set pagesize 9999
SQL>
SQL> -- dynamic SQL with binds + OPEN
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.04
SQL>
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.73
SQL>
SQL> declare
2 x integer;
3 n_dummy number;
4 c sys_refcursor;
5 begin
6 for i in 1..10 loop
7 if i = 1 then
8 x := 100000;
9 else
10 x := 1;
11 end if;
12 open c for '
13 select
14 sum(row_num)
15 from
16 (
17 select
18 row_number() over (partition by object_type order by object_name) as row_num
19 , t.*
20 from
21 t
22 where
23 object_id > :x
24 )' using x;
25 fetch c into n_dummy;
26 close c;
27 if i = 1 then
28 select
29 prev_sql_id
30 into
31 :sql_id
32 from
33 v$session
34 where
35 sid = sys_context('userenv', 'sid');
36 end if;
37 end loop;
38 end;
39 /

PL/SQL procedure successfully completed.

Elapsed: 00:01:05.62
SQL>
SQL> select
2 sql_id
3 , child_number
4 , executions
5 , parse_calls
6 , buffer_gets
7 , is_bind_sensitive
8 , is_bind_aware
9 from
10 v$sql
11 where
12 sql_id = :sql_id;

SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
3snga4x0swy99 0 2 3 1460762 Y N
3snga4x0swy99 1 8 7 172312 Y Y

Elapsed: 00:00:00.16
SQL>
SQL> set pagesize 0
SQL>
SQL> select * from table(dbms_xplan.display_cursor(:sql_id,null,'+PEEKED_BINDS'));
SQL_ID 3snga4x0swy99, child number 0
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > :x )

Plan hash value: 1399240396

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 18 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 13 | 169 | 18 (6)| 00:00:01 |
| 3 | WINDOW SORT | | 13 | 507 | 18 (6)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 13 | 507 | 17 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I | 13 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :X (NUMBER): 100000

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

SQL_ID 3snga4x0swy99, child number 1
-------------------------------------
select sum(row_num) from (
select row_number() over (partition by object_type
order by object_name) as row_num , t.*
from t where
object_id > :x )

Plan hash value: 2719131525

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 20588 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | | |
| 2 | VIEW | | 1457K| 18M| | 20588 (1)| 00:04:08 |
| 3 | WINDOW SORT | | 1457K| 54M| 72M| 20588 (1)| 00:04:08 |
|* 4 | TABLE ACCESS FULL| T | 1457K| 54M| | 5894 (2)| 00:01:11 |
-------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :X (NUMBER): 1

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("OBJECT_ID">:X)

61 rows selected.

Elapsed: 00:00:01.57
SQL>
SQL> set pagesize 9999
SQL>

Now in all cases where the cursor is potentially shared we can see that ACS eventually kicks in and creates a second child cursor. Notice however that this comes at a price: All cases show now that there are as many PARSE calls as executions.

So this seems to mean that disabling the session cursor cache actually disabled also the PL/SQL optimization that prevents the repeated PARSE calls by keeping the cursors open.

Remember that the session cursor cache is merely a means to make a soft PARSE call even _softer_ by avoiding the otherwise required access to the Shared Pool library cache resources because the cursor information has been cached in a private memory area of the process, but the session cursor cache does not avoid the PARSE call itself - this can only be done on application resp. client side, not on database side.

In order to understand if the session cursor cache or the "keep cursors open" optimization affects ACS we can run the following simple script:

--------------------------------------------------------------------------------
--
-- File name: adaptive_cursor_sharing_3.sql
--
-- Purpose: Check if the session cursor cache affects adaptive cursor sharing
--
-- Author: Randolf Geist
--
-- Last tested: December 2010
--
-- Versions: 11.1.0.7
-- 11.2.0.2
--------------------------------------------------------------------------------

set echo on define on tab off

drop table t;
purge table t;
create table t as select * from dba_objects, (select rownum from dual connect by level <= 20);
create index i on t(object_id);

exec dbms_stats.gather_table_stats(user, 'T', method_opt => 'for all columns size 1')

set echo on

select count(*), min(object_id)
, max(object_id) from t;

set echo on linesize 200 heading off newpage none tab off serveroutput off

alter session set statistics_level = all;

-- alter session set session_cached_cursors = 0;

alter system flush buffer_cache;

alter system flush shared_pool;

var x number

exec :x := 100000

-- pause

set timing on

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

column prev_sql_id new_value sql_id noprint
column sid new_value session_id noprint

select
prev_sql_id
, sid
from
v$session
where
sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

exec :x := 100000

column value new_value value_1

select
name
, value
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select
name
, value - &value_1 as diff
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

exec :x := 100000

select
name
, value
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select
name
, value - &value_1 as diff
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

exec :x := 100000

select
name
, value
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select
name
, value - &value_1 as diff
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

exec :x := 100000

select
name
, value
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select
name
, value - &value_1 as diff
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

exec :x := 1

select
name
, value
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select
name
, value - &value_1 as diff
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

exec :x := 1

select
name
, value
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select
name
, value - &value_1 as diff
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

exec :x := 1

select
name
, value
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select
sum(row_num)
from
(
select
row_number() over (partition by object_type order by object_name) as row_num
, t.*
from
t
where
object_id > :x
);

select
name
, value - &value_1 as diff
from
v$statname a
, v$sesstat b
where
a.statistic# = b.statistic#
and a.name = 'session cursor cache hits'
and b.sid = sys_context('userenv', 'sid');

select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));

set heading on

select
sql_id
, child_number
, executions
, parse_calls
, buffer_gets
, is_bind_sensitive
, is_bind_aware
from
v$sql
where
sql_id = '&sql_id';

This script simply runs the same SQL that we already know sufficiently often to ensure that it gets cached in the session cursor cache by executing it at least three times. Then in one of the next executions the "bad" bind values is used to see if the caching in the session cursor cache somehow prevents ACS from kicking in. Here is the result, this time from 11.1.0.7 (but again I got the same from 11.2.0.2):

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> drop table t;

Table dropped.

SQL> purge table t;

Table purged.

SQL> create table t as select * from dba_objects, (select rownum from dual connect by level <= 20);

Table created.

SQL> create index i on t(object_id);

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'T', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

SQL>
SQL> set echo on
SQL>
SQL> select count(*), min(object_id)
2 , max(object_id) from t;

COUNT(*) MIN(OBJECT_ID) MAX(OBJECT_ID)
---------- -------------- --------------
1385560 2 71521

SQL>
SQL> set echo on linesize 200 heading off newpage none tab off serveroutput off
SQL>
SQL> alter session set statistics_level = all;

Session altered.

SQL>
SQL> -- alter session set session_cached_cursors = 0;
SQL>
SQL> alter system flush buffer_cache;

System altered.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> var x number
SQL>
SQL> exec :x := 100000

PL/SQL procedure successfully completed.

SQL>
SQL> -- pause
SQL>
SQL> set timing on
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );

Elapsed: 00:00:00.01
SQL>
SQL> column prev_sql_id new_value sql_id noprint
SQL> column sid new_value session_id noprint
SQL>
SQL> select
2 prev_sql_id
3 , sid
4 from
5 v$session
6 where
7 sid = sys_context('userenv', 'sid');

Elapsed: 00:00:00.06
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | 3 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | 3 | | | |
| 2 | VIEW | | 1 | 12 | 0 |00:00:00.01 | 3 | 3 | | | |
| 3 | WINDOW SORT | | 1 | 12 | 0 |00:00:00.01 | 3 | 3 | 1024 | 1024 | |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 0 |00:00:00.01 | 3 | 3 | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 12 | 0 |00:00:00.01 | 3 | 3 | | | |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:00.46
SQL>
SQL> exec :x := 100000

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> column value new_value value_1
SQL>
SQL> select
2 name
3 , value
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
session cursor cache hits 5058

Elapsed: 00:00:00.01
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );

Elapsed: 00:00:00.01
SQL>
SQL> select
2 name
3 , value - &value_1 as diff
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
old 3: , value - &value_1 as diff
new 3: , value - 5058 as diff
session cursor cache hits 2

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 2 | VIEW | | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
| 3 | WINDOW SORT | | 1 | 12 | 0 |00:00:00.01 | 3 | 1024 | 1024 | |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> exec :x := 100000

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> select
2 name
3 , value
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
session cursor cache hits 5063

Elapsed: 00:00:00.00
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );

Elapsed: 00:00:00.01
SQL>
SQL> select
2 name
3 , value - &value_1 as diff
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
old 3: , value - &value_1 as diff
new 3: , value - 5063 as diff
session cursor cache hits 2

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 2 | VIEW | | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
| 3 | WINDOW SORT | | 1 | 12 | 0 |00:00:00.01 | 3 | 1024 | 1024 | |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> exec :x := 100000

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> select
2 name
3 , value
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
session cursor cache hits 5069

Elapsed: 00:00:00.00
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );

Elapsed: 00:00:00.01
SQL>
SQL> select
2 name
3 , value - &value_1 as diff
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
old 3: , value - &value_1 as diff
new 3: , value - 5069 as diff
session cursor cache hits 3

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 2 | VIEW | | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
| 3 | WINDOW SORT | | 1 | 12 | 0 |00:00:00.01 | 3 | 1024 | 1024 | |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> exec :x := 100000

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> select
2 name
3 , value
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
session cursor cache hits 5080

Elapsed: 00:00:00.00
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );

Elapsed: 00:00:00.01
SQL>
SQL> select
2 name
3 , value - &value_1 as diff
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
old 3: , value - &value_1 as diff
new 3: , value - 5080 as diff
session cursor cache hits 3

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 2 | VIEW | | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
| 3 | WINDOW SORT | | 1 | 12 | 0 |00:00:00.01 | 3 | 1024 | 1024 | |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 12 | 0 |00:00:00.01 | 3 | | | |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> exec :x := 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> select
2 name
3 , value
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
session cursor cache hits 5090

Elapsed: 00:00:00.00
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );
2.5311E+11

Elapsed: 00:00:17.12
SQL>
SQL> select
2 name
3 , value - &value_1 as diff
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
old 3: , value - &value_1 as diff
new 3: , value - 5090 as diff
session cursor cache hits 3

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:17.10 | 1388K| 44110 | 10005 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.10 | 1388K| 44110 | 10005 | | | | |
| 2 | VIEW | | 1 | 12 | 1385K|00:00:16.03 | 1388K| 44110 | 10005 | | | | |
| 3 | WINDOW SORT | | 1 | 12 | 1385K|00:00:16.03 | 1388K| 44110 | 10005 | 88M| 3227K| 55M (1)| 80896 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 1385K|00:00:09.70 | 1388K| 24154 | 0 | | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 12 | 1385K|00:00:01.39 | 3060 | 3059 | 0 | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> exec :x := 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> select
2 name
3 , value
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
session cursor cache hits 5100

Elapsed: 00:00:00.00
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );
2.5311E+11

Elapsed: 00:00:12.94
SQL>
SQL> select
2 name
3 , value - &value_1 as diff
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
old 3: , value - &value_1 as diff
new 3: , value - 5100 as diff
session cursor cache hits 3

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:12.95 | 1388K| 44001 | 10005 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:12.95 | 1388K| 44001 | 10005 | | | | |
| 2 | VIEW | | 1 | 12 | 1385K|00:00:12.29 | 1388K| 44001 | 10005 | | | | |
| 3 | WINDOW SORT | | 1 | 12 | 1385K|00:00:12.29 | 1388K| 44001 | 10005 | 88M| 3232K| 55M (1)| 80896 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 1385K|00:00:06.93 | 1388K| 24059 | 0 | | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 12 | 1385K|00:00:00.01 | 3060 | 3059 | 0 | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

25 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> exec :x := 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> select
2 name
3 , value
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
session cursor cache hits 5110

Elapsed: 00:00:00.01
SQL>
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );
2.5311E+11

Elapsed: 00:00:06.02
SQL>
SQL> select
2 name
3 , value - &value_1 as diff
4 from
5 v$statname a
6 , v$sesstat b
7 where
8 a.statistic# = b.statistic#
9 and a.name = 'session cursor cache hits'
10 and b.sid = sys_context('userenv', 'sid');
old 3: , value - &value_1 as diff
new 3: , value - 5110 as diff
session cursor cache hits 3

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'));
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'ALLSTATS LAST'))
new 1: select * from table(dbms_xplan.display_cursor('fmbq5ytmh0hng',null,'ALLSTATS LAST'))
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:12.95 | 1388K| 44001 | 10005 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:12.95 | 1388K| 44001 | 10005 | | | | |
| 2 | VIEW | | 1 | 12 | 1385K|00:00:12.29 | 1388K| 44001 | 10005 | | | | |
| 3 | WINDOW SORT | | 1 | 12 | 1385K|00:00:12.29 | 1388K| 44001 | 10005 | 88M| 3232K| 55M (1)| 80896 |
| 4 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 1385K|00:00:06.93 | 1388K| 24059 | 0 | | | | |
|* 5 | INDEX RANGE SCAN | I | 1 | 12 | 1385K|00:00:00.01 | 3060 | 3059 | 0 | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)

SQL_ID fmbq5ytmh0hng, child number 1
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 2719131525

----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:05.99 | 21058 | 34648 | 7955 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:05.99 | 21058 | 34648 | 7955 | | | | |
| 2 | VIEW | | 1 | 1385K| 1385K|00:00:05.74 | 21058 | 34648 | 7955 | | | | |
| 3 | WINDOW SORT | | 1 | 1385K| 1385K|00:00:04.35 | 21058 | 34648 | 7955 | 69M| 2893K| 55M (1)| 64512 |
|* 4 | TABLE ACCESS FULL| T | 1 | 1385K| 1385K|00:00:00.01 | 21054 | 21049 | 0 | | | | |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("OBJECT_ID">:X)

49 rows selected.

Elapsed: 00:00:00.21
SQL>
SQL> set heading on
SQL>
SQL> select
2 sql_id
3 , child_number
4 , executions
5 , parse_calls
6 , buffer_gets
7 , is_bind_sensitive
8 , is_bind_aware
9 from
10 v$sql
11 where
12 sql_id = '&sql_id';
old 12: sql_id = '&sql_id'
new 12: sql_id = 'fmbq5ytmh0hng'
SQL_ID CHILD_NUMBER EXECUTIONS PARSE_CALLS BUFFER_GETS I I
------------- ------------ ---------- ----------- ----------- - -
fmbq5ytmh0hng 0 7 8 2777315 Y N
fmbq5ytmh0hng 1 1 0 21058 Y Y

Elapsed: 00:00:00.03
SQL>

So we can see that from the third execution on we seem to get a hit in the session cursor cache (increase from 2 to 3), but still ACS seems to work properly, although this time it had to go wrong twice before ACS created the second child cursor. We can also see that we still get as many PARSE calls as executions, so the session cursor cache does not avoid the parsing as outlined above.

All this seems to suggest that the ACS (and potentially the "Cardinality Feedback", too) code is only triggered by a PARSE call and not evaluated as part of an EXECUTE call.

But if this was the case, then by no means the problem would be limited to PL/SQL and its clever optimizations that try to hold cursors open to avoid repeated PARSE calls.

Actually one of the core principles that we are told to make an application as scalable as possible is to parse only once but execute many times. So let's try to simulate such a well behaving application with this simple Java code snippet that accesses the database via JDBC:

import java.sql.* ;

class JDBCQuery
{
public static void main( String args[] )
{
try
{
// Load the database driver
Class.forName( "oracle.jdbc.driver.OracleDriver" ) ;

// Get a connection to the database
Connection conn = DriverManager.getConnection("jdbc:oracle:thin:@host:port:sid",
"username", "pwd");

// Print all warnings
for( SQLWarning warn = conn.getWarnings(); warn != null; warn = warn.getNextWarning() )
{
System.out.println( "SQL Warning:" ) ;
System.out.println( "State : " + warn.getSQLState() ) ;
System.out.println( "Message: " + warn.getMessage() ) ;
System.out.println( "Error : " + warn.getErrorCode() ) ;
}

// Reset the buffer cache and shared pool
Statement ddl = conn.createStatement();
ddl.executeUpdate("alter system flush shared_pool");
ddl.executeUpdate("alter system flush buffer_cache");
ddl.close();

// Prepare the statement
String query =
"select\n" +
" sum(row_num)\n" +
"from\n" +
" (\n" +
" select\n" +
" row_number() over (partition by object_type order by object_name) as row_num\n" +
" , t.*\n" +
" from\n" +
" t\n" +
" where\n" +
" object_id > ?\n" +
" )";

PreparedStatement stmt = conn.prepareStatement(query);

System.out.println(query);

Integer i;
Integer id;

for (i = 1; i <= 10; i++)
{
if (i == 1)
{
id = 100000;
}
else
{
id = 1;
}

System.out.println( "Iteration: " + i.toString() ) ;

stmt.setInt(1, id);

long startTime = System.currentTimeMillis();

// Execute the query
ResultSet rs = stmt.executeQuery();

// Loop through the result set
while( rs.next() )
System.out.println( rs.getLong(1)) ;

// Close the result set
rs.close() ;

long endTime = System.currentTimeMillis();

float seconds = (endTime - startTime) / 1000F;

// Elapsed time
System.out.println(Float.toString(seconds) + " seconds.");
}

stmt.close() ;
conn.close() ;
}
catch( SQLException se )
{
System.out.println( "SQL Exception:" ) ;

// Loop through the SQL Exceptions
while( se != null )
{
System.out.println( "State : " + se.getSQLState() ) ;
System.out.println( "Message: " + se.getMessage() ) ;
System.out.println( "Error : " + se.getErrorCode() ) ;

se = se.getNextException() ;
}
}
catch( Exception e )
{
System.out.println( e ) ;
}
}
}

/*
Check the Shared Pool by this query:

select
sql_id
, child_number
, executions
, parse_calls
, buffer_gets
, disk_reads
, direct_writes
, is_bind_sensitive
, is_bind_aware
from
v$sql
where
sql_id = 'cq3a8ukkws0su';

*/

This (by no means elegant or clever) piece of Java code again tries to do the same as our previous examples: It executes the known query ten times but behaves well since it parses only once and executes ten times using the two different bind values as already done before.

I only tested this with the JDBC thin driver that comes with Oracle 11.2.0.2 but not with the thick driver. However, since I assume that 95%+ of all Java applications out there actually use the thin driver this probably covers the most relevant case.

The results correspond to those of the PL/SQL examples with the "keep cursors open" optimization enabled: Neither ACS nor "Cardinality Feedback" kick in, and the SQL is executed every time using the same execution plan. The SQL is only parsed once, so the problem is not really limited to PL/SQL, but to every application that tries to avoid parse calls.

Summary

ACS is a great feature, but you need to be aware of its limitations, the most important being:

- Things have to go wrong first before ACS will be able to provide a fix. Depending on how "wrong" things can go, this might mean that ACS is actually not able to provide an efficient fix for certain cases
- The ACS information can be aged out of the Shared Pool which might lead to repeated "things have to go wrong" cases
- ACS is only triggered by explicit PARSE calls, it is not activated if the application parses once and executes many times

If you have a case where you need to help (or "co-operate" as Jonathan Lewis calls it in his brilliant presentation of the same name) your database, the simplest strategy is to use literals instead of bind variables. If you can't do this due to the issues introduced by this strategy (excessive hard parses, increased CPU and Shared Pool usage, library cache contention etc.) a more complex strategy - if applicable - is to "know your data" and generate a minimum amount of cursors that try to cover the cases where a different execution plan is desired.

This means something like this in pseudo code:

- If "known value A that requires a special treatment" then
-- execute SQL with dummy hint "A" or additional non-modifying predicate "A" = "A" to make the SQL unique
- If "known value B that requires a special treatment" then
-- execute SQL with dummy hint "B" or additional non-modifying predicate "B" = "B" to make the SQL unique
- Else "in all other cases"
-- execute SQL as is

Notice that using comments to make the SQL unique might not lead to the expected result since there are cases/environments where comments are stripped prior to execution.

Update: From a comment by Sokrates below that for some reason didn't make to the comments section but also doesn't show up in the "Spam" comments of the blog control panel I see that above description "dummy hint" might need a bit more explanation.

By "dummy hint" I mean to say that you should use hint syntax rather than normal comment syntax to make the SQL unique, because the normal comment might get filtered out whereas the comment using hint syntax is supposed to be kept. By "dummy hint" I mean any invalid hint comment in order to prevent any influence on the optimizer, for example /*+ CASE_A */ rather than /* CASE_A */

Adding user equivalence for RAC the easy way

This is the first time I am setting up a new 11.2.0.2 cluster with the automatic SSH setup. Until now, I ensured user equivalence by copying ssh RSA and DSA manually to all cluster nodes. For two nodes that’s not too bad, but recently someone asked a question around a 28 (!) node cluster on a mailing list I am subscribing to. So that’s when I think the whole process  gets a bit too labour intensive.

So setting up user equivalence using a script may be the solution. You can also use OUI to do the same, but I like to run “cluvfy stage -post hwos” to check everything is ok before even thinking about executing ./runInstaller.

Here’s the output of a session, my 2 cluster nodes are acfsprodnode1 and acfsprodnode2 (yes, they are for 11.2 ACFS replication and encryption testing). I am using the grid user as the owner of Grid Infrastructure, and oracle to own the RDBMS binaries. Start by navigating to the location where you unzipped the Grid Infrastructure patch file. Then change into directoy “sshsetup” and run the command:


[grid@acfsprdnode1 sshsetup]$ ./sshUserSetup.sh
Please specify a valid and existing cluster configuration file.
Either user name or host information is missing
Usage ./sshUserSetup.sh -user  [ -hosts "" | -hostfile  ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile  ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]

Next execute the command, I opted for option noPromptPassphrase, as I don’t use them for the key.

[grid@acfsprdnode1 sshsetup]$ ./sshUserSetup.sh -user grid -hosts "acfsprdnode1 acfsprdnode2" -noPromptPassphrase
The output of this script is also logged into /tmp/sshUserSetup_2010-12-22-15-39-18.log
Hosts are acfsprdnode1 acfsprdnode2
user is grid
Platform:- Linux
Checking if the remote hosts are reachable
PING acfsprdnode1.localdomain (192.168.99.100) 56(84) bytes of data.
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=2 ttl=64 time=0.019 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=3 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=4 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=5 ttl=64 time=0.018 ms

--- acfsprdnode1.localdomain ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.017/0.017/0.019/0.004 ms
PING acfsprdnode2.localdomain (192.168.99.101) 56(84) bytes of data.
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=1 ttl=64 time=0.331 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=2 ttl=64 time=0.109 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=3 ttl=64 time=0.324 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=4 ttl=64 time=0.256 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=5 ttl=64 time=0.257 ms

--- acfsprdnode2.localdomain ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.109/0.255/0.331/0.081 ms
Remote host reachability check succeeded.
The following hosts are reachable: acfsprdnode1 acfsprdnode2.
The following hosts are not reachable: .
All hosts are reachable. Proceeding further...
firsthost acfsprdnode1
numhosts 2
#ff0000;">The script will setup SSH connectivity from the host acfsprdnode1 to all
#ff0000;">the remote hosts. After the script is executed, the user can use SSH to run
commands on the remote hosts or copy files between this host acfsprdnode1
and the remote hosts without being prompted for passwords or confirmations.

NOTE 1:
As part of the setup procedure, this script will use ssh and scp to copy
files between the local host and the remote hosts. Since the script does not
store passwords, you may be prompted for the passwords during the execution of
the script whenever ssh or scp is invoked.

NOTE 2:
AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY
AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEDGES TO THESE
directories.

Do you want to continue and let the script make the above mentioned changes (yes/no)?
#ff0000;">yes

The user chose yes
User chose to skip passphrase related questions.
Creating .ssh directory on local host, if not present already
Creating authorized_keys file on local host
Changing permissions on authorized_keys to 644 on local host
Creating known_hosts file on local host
Changing permissions on known_hosts to 644 on local host
Creating config file on local host
If a config file exists already at /home/grid/.ssh/config, it would be backed up to /home/grid/.ssh/config.backup.
Removing old private/public keys on local host
Running SSH keygen on local host with empty passphrase
Generating public/private rsa key pair.
Your identification has been saved in /home/grid/.ssh/id_rsa.
Your public key has been saved in /home/grid/.ssh/id_rsa.pub.
The key fingerprint is:
de:e3:66:fa:16:e8:6e:36:fd:c5:e3:77:75:07:9a:b0 grid@acfsprdnode1
Creating .ssh directory and setting permissions on remote host acfsprdnode1
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host acfsprdnode1. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host acfsprdnode1.
Warning: Permanently added 'acfsprdnode1,192.168.99.100' (RSA) to the list of known hosts.
grid@acfsprdnode1's password:
Done with creating .ssh directory and setting permissions on remote host acfsprdnode1.
Creating .ssh directory and setting permissions on remote host acfsprdnode2
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host acfsprdnode2. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host acfsprdnode2.
Warning: Permanently added 'acfsprdnode2,192.168.99.101' (RSA) to the list of known hosts.
grid@acfsprdnode2's password:
Done with creating .ssh directory and setting permissions on remote host acfsprdnode2.
Copying local host public key to the remote host acfsprdnode1
The user may be prompted for a password or passphrase here since the script would be using SCP for host acfsprdnode1.
grid@acfsprdnode1's password:
Done copying local host public key to the remote host acfsprdnode1
Copying local host public key to the remote host acfsprdnode2
The user may be prompted for a password or passphrase here since the script would be using SCP for host acfsprdnode2.
grid@acfsprdnode2's password:
Done copying local host public key to the remote host acfsprdnode2
cat: /home/grid/.ssh/known_hosts.tmp: No such file or directory
cat: /home/grid/.ssh/authorized_keys.tmp: No such file or directory
SSH setup is complete.

------------------------------------------------------------------------
Verifying SSH setup
===================
The script will now run the date command on the remote nodes using ssh
to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,
THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR
PASSWORDS. If you see any output other than date or are prompted for the
password, ssh is not setup correctly and you will need to resolve the
issue and set up ssh again.
The possible causes for failure could be:
1. The server settings in /etc/ssh/sshd_config file do not allow ssh
for user grid.
2. The server may have disabled public key based authentication.
3. The client public key on the server may be outdated.
4. ~grid or ~grid/.ssh on the remote host may not be owned by grid.
5. User may not have passed -shared option for shared remote users or
may be passing the -shared option for non-shared remote users.
6. If there is output in addition to the date, but no password is asked,
it may be a security alert shown as part of company policy. Append the
additional text to the /sysman/prov/resources/ignoreMessages.txt file.
------------------------------------------------------------------------
--acfsprdnode1:--
Running /usr/bin/ssh -x -l grid acfsprdnode1 date to verify SSH connectivity has been setup from local host to acfsprdnode1.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
Wed Dec 22 15:40:10 GMT 2010
------------------------------------------------------------------------
--acfsprdnode2:--
Running /usr/bin/ssh -x -l grid acfsprdnode2 date to verify SSH connectivity has been setup from local host to acfsprdnode2.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
Wed Dec 22 15:40:10 GMT 2010
------------------------------------------------------------------------
SSH verification complete.
[grid@acfsprdnode1 sshsetup]$ ssh acfsprdnode1 hostname
acfsprdnode1
[grid@acfsprdnode1 sshsetup]$ ssh acfsprdnode2 hostname
acfsprodnode2
[grid@acfsprdnode1 sshsetup]$

Nice! That’s a lot of work taken away from me, and I can start runing cluvfy now to fix problems before OUI warns me about shortcomings on my system.

You should note that per the above output, the script only distributes the local ssh keys to the remote hosts. When in OUI’s cluster node addition screen (6 of 16 in the advanced installation) you still need to click on the “SSH Connectivity” button and then on “Setup” after providing username and password to establish cluster wide user equivalence.

Oracle RAC One Node revisited – 11.2.0.2

Since we published the RAC book, Oracle has released patchset 11.2.0.2. Amongst other things, this improved the RAC One Node option, exactly the way we expected.

How it was

A quick recap on the product as it was in 11.2.0.1: RAC One Node is part of Oracle Enterprise Edition, any other software editions are explicitly not allowed. Another restriction exists for 3rd party Clusterware: it’s not allowed to use one. RAC One Node is a hybrid between full blown RAC and the active/passive cluster. The option uses Grid Infrastructure for cluster management and storage provisioning via ASM. The RAC One instance starts its life as a RAC database, limited to only one cluster node. It only ever runs on one node only, but that node can change. It is strongly recommended to create a service for that RAC database. Utilities such as raconeinit provide a text based command line interface to transform that database to a “RAC One Node”-instance. In the process, the administrator can elect which nodes should be allowed to run the instance. The “omotion” utilities allowed the DBA to move the RAC One Node instance from the current node to another one. Optionally a time threshold could be set after which all ongoing transactions were to move to the new node. This feature required TAF or FAN to be set up correctly. The raconestatus utility allowed you to view the status of your RAC One Node instances. Conversion to full RAC was made possible by the racone2rac utility.

If you were after a Data Guard setup you’d be disappointed: that wasn’t (and AFAIK still is not) supported.

So all in all, that seemed a little premature. A patch to be downloaded and applied, no Data Guard and a new set of utilities are not really user friendly. Plus, initially this patch was available for Linux only. But at least a MOS note (which I didn’t find until after having finished writing this!) exists, RAC One — Changes in 11.2.0.2 [ID 1232802.1]

Changes

Instead of having to apply patch 9004119 to your environment, RAC One Node is available “out of the box” with 11.2.0.2. Sadly, the Oracle RAC One Node manual has not been updated, and searches on Metalink reveal no new information. One interesting piece of information: the patch for RAC One Node is listed as “undocumented Oracle Server” section.

The creation of a RAC One Node instance has been greatly simplified-dbca has added support for it, both from the command line for silent installations as well as the interactive GUI. Consider these options for dbca.

$ dbca -help
dbca  [-silent | -progressOnly | -customCreate] {  }  |
 { [ [options] ] -responseFile   }
 [-continueOnNonFatalErrors ]
Please refer to the manual for details.
You can enter one of the following command:

Create a database by specifying the following parameters:
-createDatabase
 -templateName 
 [-cloneTemplate]
 -gdbName 
 [-RACOneNode
 -RACOneNodeServiceName  ]
 [-policyManaged | -adminManaged ]
 [-createServerPool ]
 [-force ]
 -serverPoolName 
 -[cardinality ]
 [-sid ]
...

With RAC One Node you will most likely end up with a policy managed database in the end, I can’t see how an admin managed database made sense.

The srvctl command line tool has been improved to deal with the RAC One node. The most important operations are to add, remove, config and status. The nice thing about dbca is that it actually registers the database in the OCR. Immediately after the installation, you see this status information:

$ srvctl status database -d rontest
Instance rontest_1 is running on node node2
Online relocation: INACTIVE

$ srvctl config database -d rontest
Database unique name: rontest
Database name:
Oracle home: /data/oracle/product/11.2.0.2
Oracle user: oracle
Spfile: + DATA/rontest/spfilerontest.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: rontest
Database instances:
Disk Groups: DATA
Mount point paths:
Services: rontestsrv
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: rontest
Candidate servers: node2,node3
Database is administrator managed

Note that the instance_name, although the instance is administrator managed, changed to $ORACLE_SID_1. Relocating works now with the srvctl relocate database command as in this example:

$ srvctl relocate database -d rontest -n node2

You’ll get feedback about this in the output of the “status” command:

$ srvctl status database -d rontest
Instance rontest_1 is running on node node2
Online relocation: ACTIVE
Source instance: rontest_1 on node2
Destination instance: rontest_2 on node3

After the command completed, check the status again:

srvctl status database -d rontest
Instance rontest_2 is running on node node2
Online relocation: INACTIVE

The important difference between an admin managed database and a policy managed database is that you are responsible for undo tablespaces. If you don’t create and configure undo tablespaces, the relocate command will fail:

$ srvctl relocate database –d rontest -n node3                       <
PRCD-1222 : Online relocation of database rontest failed but database was restored to its original state
PRCD-1129 : Failed to start instance rontest_2 for database rontest
PRCR-1064 : Failed to start resource ora.rontest.db on node node3
CRS-5017: The resource action "ora.rontest.db start" encountered the following error:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use
Process ID: 1587
Session ID: 35 Serial number: 1

CRS-2674: Start of 'ora.rontest.db' on 'node3' failed

In this case, the database runs on the same node. Check the ORACLE_SID (rontest_2 in my case) and modify the initialisation parameter.

SQL> select tablespace_name from dba_data_files where tablespace_name like ‘%UNDO%’;

TABLESPACE_NAME
------------------------------
UNDOTBS1
UNDOTBS2

So the tablespace was there, but the initialisation parameter was wrong! Let’s correct this:

SQL> alter system set undo_tablespace='UNDOTBS1' sid='rontest_1';

System altered.

SQL> alter system set undo_tablespace='UNDOTBS2' sid='rontest_2';

System altered.

Now the relocate will succeed.

To wrap this article up, the srvctl convert database command will convert between single instance, RAC One Node and RAC databases.

Build your own 11.2.0.2 stretched RAC

Finally time for a new series! With the arrival of the new 11.2.0.2 patchset I thought it was about time to try and set up a virtual 11.2.0.2 extended distance or stretched RAC. So, it’s virtual, fair enough. It doesn’t allow me to test things like the impact of latency on the inter-SAN communication, but it allowed me to test the general setup. Think of this series as a guide after all the tedious work has been done, and SANs happily talk to each other. The example requires some understanding of how XEN virtualisation works, and it’s tailored to openSuSE 11.2 as the dom0 or “host”. I have tried OracleVM in the past but back then a domU (or virtual machine) could not mount an iSCSI target without a kernel panic and reboot. Clearly not what I needed at the time. OpenSuSE has another advantage: it uses a new kernel-not the 3 year old 2.6.18 you find in Enterprise distributions. Also, xen is recent (openSuSE 11.3 even features xen 4.0!) and so is libvirt.

The Setup

The general idea follows the design you find in the field, but with less cluster nodes. I am thinking of 2 nodes for the cluster, and 2 iSCSI target providers. I wouldn’t use iSCSI in the real world, but my lab isn’t connected to an EVA or similar.A third site will provide quorum via an NFS provided voting disk.

Site A will consist of filer01 for the storage part, and edcnode1 as the RAC node. Site B will consist of filer02 and edcnode2. The iSCSI targets are going to be provided by openFiler’s domU installation, and the cluster nodes will make use of Oracle Enterprise Linux 5 update 5.To make it more realistic, site C will consist of another openfiler isntance, filer03 to provide the NFS export for the 3rd voting disk. Note that openFiler seems to support NFS v3 only at the time of this writing. All systems are 64bit.

The network connectivity will go through 3 virtual switches, all “host only” on my dom0.

  • Public network: 192.168.99/24
  • Private network: 192.168.100/24
  • Storage network: 192.168.101/24

As in the real world, private and storage network have to be separated to prevent iSCSI packets clashing with Cache Fusion traffic. Also, I increased the MTU for the private and storage networks to 9000 instead of the default 1500. If you like to use jumbo frames you should check if your switch supports it.

Grid Infrastructure will use ASM to store OCR and voting disks, and the inter-SAN replication will also be performed by ASM in normal redundancy. I am planning on using preferred mirror read and intelligent data placement to see if that makes a difference.

Known limitations

This setup has some limitations, such as the following ones:

  • You cannot test inter-site SAN connectivity problems
  • You cannot make use of udev for the ASM devices-a xen domU doesn’t report anything back from /sbin/scsi_id which makes the mapping to /dev/mapper impossible (maybe someone knows a workaround?)
  • Network interfaces are not bonded-you certainly would use bonded NICs in real life
  • No “real” fibre channel connectivity between the cluster nodes

So much for the introduction-I’ll post the setup step-by-step. The intended series will consist of these articles:

  1. Introduction to XEN on openSuSE 11.2 and dom0 setup
  2. Introduction to openFiler and their installation as a virtual machine
  3. Setting up the cluster nodes
  4. Installing Grid Infrastructure 11.2.0.2
  5. Adding third voting disk on NFS
  6. Installing RDBMS binaries
  7. Creating a database

That’s it for today, I hope I got you interested and following the series. It’s been real fun doing it; now it’s about writing it all up.

Oracle DB 11.2.0.2 patch set is here … Again…

The incredible disappearing patch set has reappeared on MOS. Download it quick before it gets pulled again. :)

Cheers

Tim…

PS. Remember, read the patch note before you download it. You don’t need all 4.8G. Also, check out the new recommended patching instructions.