Search

OakieTags

Who's online

There are currently 0 users and 40 guests online.

Recent comments

Affiliations

Troubleshooting

Free ASH

If you’re running a version of Oracle older than 10g (where v$active_session_history appeared), or if you’re not using Enterprise Edition, or if you just don’t want to pay for the Diagnostic Pack licence, here are some links relating a free Java program that emulates the “Top Sessions” output of the Enterprise Manager screen:

http://timurakhmadeev.wordpress.com/2010/02/18/ash-viewer/
http://sourceforge.net/projects/ashv/

Important Note:

If you want to query v$active_session_history (or any of the AWR objects) then you need to purchase the licence for the Diagnostic Pack). If you aren’t licensed you should only use the program in emulation mode.

First_Rows

Browsing through the archive for the Oracle-L listserver a couple of days ago I came across this item dated Feb 2011 where the author was puzzled by Oracle’s choice of index for a query.

He was using 10.2.0.3, and running with the optimizer_mode set to first_rows – which you shouldn’t really be doing with that version of Oracle since Oracle Corp. told us about 10 years ago that “first_rows is avaiable only for backwards compatibility”.

I’ve created a model of their problem to demonstrate the effect. As usual, to make it easier to get a reproducible result, I’ve used locally managed tablespaces with 1MB uniform extents, freelist management, and CPU costing disabled:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	to_char(
		to_date('01-Jan-2011','dd-mon-yyyy') +
			trunc((rownum-1)/317),
		'yyyymmdd'
	)			a,
	mod(rownum,317) + 1	b,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 100000
;

alter table t1 add constraint t1_pk primary key(a,b);
create index t1_ba on t1(b,a);

The SQL creates 317 rows for a list of dates which have been stored as eight character strings in the form YYYYMMDD. The 317 rows are numbered from 1 to 317, and the data is stored in order of date and number. I’ve created a primary key on (date, number), and I’ve also created an index on (number, date) – the PK has a very good clustering_factor and the other index has a very bad one because of the way I generated the data.

With this data in hand, and after collecing statistics (compute, no histograms), I run the following SQL (and like the OP I am using 10.2.0.3):

alter session set optimizer_mode = first_rows;

set autotrace traceonly explain

select
	small_vc
from
	t1
where
	a = '20110401'
and	b > 10
order by
	a, b
;

select
	/*+ index(t1(a,b)) */
	small_vc
from
	t1
where
	a = '20110401'
and	b > 10
order by
	a, b
;

I’m after 307 consecutive rows of one date – and I want the data sorted by the date and number. With first_rows optimization the default plan is a little surprising. Here are two execution plans for the query – first the plan that the optimizer chose by default, the second when I hinted the SQL to use the primary key – note that neither plan shows a sort operation:

Default execution plan
---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   307 |  7368 |   617 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   307 |  7368 |   617 |
|*  2 |   INDEX SKIP SCAN           | T1_BA |   307 |       |   309 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("B">10 AND "A"='20110401')
       filter("A"='20110401' AND "B">10)

Hinted execution plan
---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   307 |  7368 |    10 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   307 |  7368 |    10 |
|*  2 |   INDEX RANGE SCAN          | T1_PK |   307 |       |     2 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"='20110401' AND "B">10)

How strange – it is clearly better to use the primary key index for this query, yet the optimizer doesn’t do it under first_rows optimisation. (It does if you use the slightly more appropriate first_rows(1) – the “new” improved option from 9i).

The first thought you might have when looking at this example is the first_rows has a heuristic (i.e. rule) that says “use an index to avoid sorting at all costs if possible (unless the hidden parametere _sort_elimination_cost_ratio is non-zero)”. But that shouldn’t apply here because both indexes will allow Orace to avoid sorting.

And here’s an even stranger detail: notice that the “order by” clause includes column “a”, which is obviously constant because of the “where” clause. Since it’s constant removing it won’t make any difference to the final output - but look what happens:

select
	small_vc
from
	t1
where
	a = '20110401'
and	b > 10
order by
	b
;

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   307 |  7368 |    10 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   307 |  7368 |    10 |
|*  2 |   INDEX RANGE SCAN          | T1_PK |   307 |       |     2 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"='20110401' AND "B">10)
       filter("A"='20110401' AND "B">10)

For no obvious reason the optimizer now picks the right index. What’s going on ? Unfortunately I have to say that I don’t know.

When I checked the 10053 trace file there were a few differences for the two “order by” clauses but I couldn’t see anything that gave me any reasonable ideas. The most significant difference was the choice of indexes examined when the optimizer was doing the “recost for order by” bit. When we ordered by a,b the optimizer considered only the t1_ba index (note – the final costs are slightly higher here because in this run I enabled CPU costing to see if that was having an effect, so there’s a little extra for the CPU):

  Access Path: index (skip-scan)
    SS sel: 0.0030744  ANDV (#skips): 308
    SS io: 308.00 vs. index scan io: 321.00
    Skip Scan chosen
  Access Path: index (SkipScan)
    Index: T1_BA
    resc_io: 617.00  resc_cpu: 23882848
    ix_sel: 0.0030744  ix_sel_with_filters: 0.0030744
    Cost: 618.60  Resp: 618.60  Degree: 1
  Best:: AccessPath: IndexRange  Index: T1_BA
         Cost: 618.60  Degree: 1  Resp: 618.60  Card: 307.44  Bytes: 24

when we ordered by b alone the optimizer considered only the t1_pk index:

  Access Path: index (RangeScan)
    Index: T1_PK
    resc_io: 10.00  resc_cpu: 191334
    ix_sel: 0.0030744  ix_sel_with_filters: 0.0030744
    Cost: 10.01  Resp: 10.01  Degree: 1
  Best:: AccessPath: IndexRange  Index: T1_PK
         Cost: 10.01  Degree: 1  Resp: 10.01  Card: 307.44  Bytes: 24

There really seems to be a flaw in the logic behind the choice of index – and there’s an important point to think about here: if it’s a bug it’s probably not going to be fixed. The first_rows option only exists for “backwards compatibility” and things stop being compatible if you change them.

Footnote: Because the cost of the skip scan path in the original run was 617 and the cost of the primary key range scan path was 10 I could make Oracle choose the primary key by setting the parameter _sort_elimination_cost_ratio to a value just less than 617/10 (say 60); but I mention that only as an idle curiosity. You shouldn’t be using first_rows , and if you do use it you shouldn’t be hacking with undocumented parameters to work around the problems it produces.

Expert Oracle Exadata book – Alpha chapters available for purchase!

Hi,

Apress has made the draft versions of our Expert Oracle Exadata book available for purchase.

How this works is:

  1. You purchase the “alpha” version of the Expert Oracle Exadata book
  2. You get the access to draft/alpha PDF versions of some chapters now!
  3. As more chapters will be added and existing ones updated, you’ll receive an email and you can download these too
  4. You will get a PDF copy of the final book once it’s out!

This is an awesome deal if you can’t wait until the final launch and want to get ahead of the curve with your Exadata skills ;-)

Buy the alpha version of our Expert Oracle Exadata book from Apress here!

If you haven’t heard about this book earlier – I’m one of the 3 authors, writing it together with Kerry Osborne and Randy Johnson from Enkitec and our official tech reviewer is no other than THE Kevin Closson and we are also getting some (unofficial) feedback from Oracle database junkie Arup Nanda.

So this book will absolutely rock and if you want a piece of it now, order the alpha book above!

P.S. This hopefully also explains why I’ve been so quiet with my blogging lately – can’t write a book and do many other things at the same time… (at least if you want to do it well…)

Share

Advanced Oracle Troubleshooting v2.0 Online Deep Dives in April and May 2011

Due to a lot interest I’m going to do another run of my Advanced Oracle Troubleshooting v2.0 Online Deep Dive seminars in April and May (initially I had planned to do it no earlier than Sep/Oct…)

Check the dates & additional info out here:

P.S. People who already attended the AOT2 seminars last year – I will schedule the follow-up Q&A sessions in mid-March!

Share

AWR Reports

A little thought for those who use OEM to generate AWR reports – if you generate the standard HTML form and then try posting it into (say) the OTN database forum when you want some help interpreting it, it won’t look very readable (even if you do remember to put the right tags before and after to reproduce it in fixed font).

So it’s a good idea to know how to produce the text format – which is what I do most of the time (especially since I often end up with a simple telnet or putty session into a client server). Take a look at $ORACLEHOME/rdbms/admin for all the scripts starting with “awr” – there’s quite a lot of them, and the number keeps growing. Apart from finding a script that will give you the standard AWR report in a good old-fashioned text format, you may that newer versions of Oracle include a few useful variations on the theme.

Here’s a list from the 11.2.0.2 home ($ORACLE_HOME/rdbms/admin) with (mostly) the one-line description from the start of file. I’ve put the most useful ones in the first list:

awrrpt.sql      -- basic AWR report
awrsqrpt.sql    -- Standard SQL statement Report

awrddrpt.sql    -- Period diff on current instance

awrrpti.sql     -- Workload Repository Report Instance (RAC)
awrgrpt.sql     -- AWR Global Report (RAC)
awrgdrpt.sql    -- AWR Global Diff Report (RAC)

awrinfo.sql     -- Script to output general AWR information

For most people the awrrpt.sql and awrsqrpt.sql are likely to be sufficient, but the “difference between two periods” can be very useful – especially if you do things like regularly forcing an extra snapshot at the start and end of the overnight batch so that you can (when necessary) find the most significant differences in behaviour between the batch runs on two different nights.

If you get into the ‘RAC difference report’ you’ll need a very wide page – and very good eyesight !

There are also a lot of “infrastructure and support” bits – some of the “input” files give you some nice ideas about how you can write your own code to do little jobs like: “run the most recent AWR report automatically”:

awrblmig.sql    -- AWR Baseline Migrate
awrload.sql     -- AWR LOAD: load awr from dump file
awrextr.sql     -- AWR Extract

awrddinp.sql    -- Get inputs for diff report
awrddrpi.sql    -- Workload Repository Compare Periods Report

awrgdinp.sql    -- Get inputs for global diff reports
awrgdrpi.sql    -- Workload Repository Global Compare Periods Report

awrginp.sql     -- AWR Global Input
awrgrpti.sql    -- Workload Repository RAC (Global) Report

awrinpnm.sql    -- AWR INput NaMe
awrinput.sql    -- Get inputs for AWR report 

awrsqrpi.sql    -- Workload Repository SQL Report Instance

I usually carry copies of the scripts with me when I’m troubleshooting in case I need them at client sites – sometimes I’m not allowed the privileges I really need to do rapid troubleshooting, but if I can persuade the DBA to give me execute privileges on package dbms_workload_repository and select privileges on a couple of the necessary tables and views then I can run the reports from an otherwise “minimal” account.

There are also a couple of deceptively named files that you might miss in 11.2:

spawrrac.sql  -- Server Performance AWR RAC report
spawrio.sql  -- AWR IO Intensity Report
spadvrpt.sql -- Streams Performance ADVisor RePorT

Although the initial letters in the names suggest that these files might fall in with statspack, they actually report from the AWR tables – however the first one (spawrrac.sql) was only a temporary measure, and prints out the warning message:

This script will be deprecated.  The official release of the Global AWR report is awrgrpt.sql

So if you’re using the spawrrac.sql – stop it.

AWR Snapshots

A couple of days ago I mentioned as a passing comment that you could take AWR snapshots at the start and end of the overnight batch processing so that if you ever had to answer the question: “Why did last night’s batch overrun by two hours?” you had the option of creating and comparing the AWR report from the latest batch run with the report from a previous batch run (perhaps the corresponding night the previous week) and check for any significant differences. Moreover, Oracle supplies you with the code to compare and report such differences from 10.2 (at least) using the script $ORACLE_HOME/rdbms/admin/awrddrpt.sql

The thing I didn’t mention at the time was how to take a snapshot on demand. It’s very easy if you have the appropriate execute privilege on package dbms_workload_repository.

execute dbms_workload_repository.create_snapshot('TYPICAL');

The single input parameter can be ‘TYPICAL’ (the default) or ‘ALL’.

I keep this one liner in a little script called awr_launch.sql – because I can never remember the exact name of the package without looking it up. (And sometimes I run it with sql_trace enabled so that I can see how much work goes into a snapshot as we change versions, features, configuration, workload and platforms.)

Constraints

There’s an important detail about constraints – check constraints in particular – that’s easy to forget, and likely to lead you into errors. Here’s a little cut-n-paste demo from an SQL*Plus session:

SQL> select count(*) from t1;

  COUNT(*)
----------
         4

1 row selected.

SQL>
SQL> alter table t1 add constraint t1_ck_v1 check (v1 in ('a','b','c'));

Table altered.

SQL>
SQL> select count(*) from t1 where v1 in ('a','b','c');

  COUNT(*)
----------
         3

1 row selected.
    We count the number of rows in a table – and it’s four.
    We add a constraint to restrict the values for a certain column – and every column survives the check.
    We use the corresponding predicate to count the number of rows that match the check constraint – and we’ve lost a row !

Why ?

A predicate returns a row if it evaluates to TRUE.
A constraint allows a row if it does not evaluate to FALSE - which means it is allowed to evaluate to TRUE or to NULL.

I have one row in the table where v1 is null, and for that row the check constraint evaluates to NULL, which is not FALSE. So the row passes the check constraint, but doesn’t get returned by the predicate. I think it’s worth mentioning this difference because from time to time I see production systems that “lose” data because of this oversight.

To make the constraint consistent with the predicate I would probably add a NOT NULL declaration to the column or rewrite the constraint as (v1 is not null and v1 in (‘a’,'b’,'c’)).

ORA-29913

External tables as a substitute of loading tables through SQL*Loader have become more popular over the last couple of years – which means questions about Oracle error ORA-29913 have also become more popular. So what do you do about something like this:


SQL> select * from extern;
select * from extern
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04063: unable to open log file EXTERN_17063.log
OS error No such file or directory
ORA-06512: at "SYS.ORACLE_LOADER", line 19

The commonest reason for ORA-29913 is simply that Oracle can’t open the operating system files it’s supposed to use. This may be the input data files or the various output files (the “bad”, “discard” and “log” files) that it needs to produce. The reason for failure may be that the directory doesn’t exist, or that the Oracle executable doesn’t have permission to access the directory appropriately, or that the file itself doesn’t have the necessary permissions.

In the example above it looks as if Oracle can’t write to the required log file which, as the “OS error” line suggests, may be a problem with the (unspecified) directory or with the named file (EXTERN_17063.log) in that directory.

So, a few quick checks when you see this error when trying to work with external tables:

    Check your ‘create directory … ‘ statements
    Check that the directories have appropriate permission for Oracle to read and write to them
    Check that there isn’t a problem with existing file permissions

I have a piece of framework code for external tables, and its most important feature is catering for all the output files (that I am aware of), and separating the directories for the inputs and the outputs.

define m_filename = 'd11g_ora_1820.trc'
define m_file_dir = 'C:\oracle\diag\rdbms\d11g\d11g\trace'
define m_log_dir  = 'c:\working'

drop table ext;

create or replace directory ext_tab as '&m_file_dir';
create or replace directory log_dir as '&m_log_dir';

create table ext(text_line varchar(4000))
organization external
(
	type oracle_loader
	default directory ext_tab
	access parameters (
		records delimited by newline
		discardfile	log_dir:'read_trace_dis.txt'
		badfile		log_dir:'read_trace_bad.txt'
		logfile		log_dir:'read_trace_log.txt'
		characterset us7ascii
		fields
			rtrim
			reject rows with all null fields
		(
			text_line	(1:4000)	char
		)
	)
	location ('&m_filename')
)	reject limit unlimited
;

As it stands this has been set up to define a standard trace file on my laptop as an external table; it’s a framework I use sometimes on customer sites where I have lots of database privileges but don’t have any rights to see the server: sometimes you really need to see the trace files. (A nice little feature that’s appeared in 11g, by the way, is that you don’t need to work out names for trace files any more, you just need to get at column v$process.tracefile)

A couple of footnotes:

  • If you don’t specifiy a “log” directory as I have then all the output files go into the “default” directory which, for many people, is likely to be the directory where you had your datafiles. (You could have different directories for the “bad” and “discard” files, of course.)
  • The various log files are appended to (not replaced) every time you query the external table – and it’s very easy to forget this and let them become very large (which means that you could get ORA-29913 because the size of an output file exceeds the O/S limit, or has filled the device).
  • If you don’t specify names for the output files then they will contain the name of the external table and the process ID of the session’s shadow process. The presence of the proces ID does reduce the risk of a single output file getting too big, of course. (I haven’t checked what this does with shared servers, but the typical use of external tables means you probably shouldn’t be querying them through shared servers)
  • The default names of the output files varies with version – until 11g I think it was always {table_name}_{process_id}.{type}, but my 11g sandbox seems to introduce another number in the middle of the name to give:  {table_name}_3168_{process_id}.{type}

Got to stop there, the plane’s about to push back – and it’s “all electronics off”. Spelling and grammar may be corrected this evening.

CBOddity

Warning – make sure you read to the end of this post.

Someone sent me an email this morning asking how Oracle calculates the index cardinality of an index range scan. Now, as I’ve often said, I don’t do private email about Oracle – unless it just happens to catch my attention and looks like something that is sufficiently generic to be worth publishing.

Today’s emailer was a lucky winner – he’d sent me a very short email that took about 30 seconds to read, contained a significant error, and (at first sight) probably had the right information in it for me to work with. Here’s the problem, as a cut-n-paste from an SQL*Plus session:

SQL> explain plan for select * from admlxa.QRT_BENCH where QRT_BENCH_DATE < :a3;
Explained. 

Elapsed: 00:00:00.01 

SQL> select * from table(dbms_xplan.display); 

PLAN_TABLE_OUTPUT
-----------------
Plan hash value: 2896103184 

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |   424K|    81M|  3170   (4)| 00:00:05 |
|   1 |  TABLE ACCESS BY INDEX ROWID| QRT_BENCH       |   424K|    81M|  3170   (4)| 00:00:05 |
|*  2 |   INDEX RANGE SCAN          | IDX_QRT_BENCH_1 | 78876 |       |   303   (5)| 00:00:01 |
----------------------------------------------------------------------------------------------- 

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("QRT_BENCH_DATE"<:A3) 

Look closely at the “Rows” column – there’s clearly a logic error appearing here. If you select 78,876 rowids from an index you can’t possibly acquire 424,000 rows from the table – so where have those two numbers come from ?

The supporting information looked like this:

num_rows for QRT_BENCH table = 8480798
num_distinct for QRT_BENCH_DATE column = 458 

num_rows for IDX_QRT_BENCH_1 = 8763975
distinct_keys for IDX_QRT_BENCH_1 = 537

Of course, I really needed to know whether this was a single-column or multi-column index if I wanted to model the problem correctly and do further checks on when the error appeared, but this was good enough to get started. We note, first, that the 424K for the table cardinality is just the standard “5% due to range predicate with unknown value”: round(0.05 * 8480798) = 424040.

    Step 1: since I don’t know where the number 78876 comes from, let’s try to work backwards – use it in a bit if arithmetic and see what drops out. Let’s try dividing it into the table cardinality: 424040 / 78876 = 5.3760

    Step 2: Does 5.376 look familiar — it may be a coincidence, but isn’t that really close to 1% of the number of distinct keys in the index ?

At this point I don’t have any time to investigate in detail, but a scratch hypothesis is that Oracle is calculating something like: 5 * (number of rows in table / (number of distinct keys in index)); and maybe that magic five appears through a piece of code that takes 5%, but for some reason then divides by the 1% associated with the selectivity normally associated with function(col).

If I had the time (and the data set) I’d start playing with dbms_stats.set_index_stats() et.al. to see the effect of changing num_distinct and num_rows to see if my initial guess was somewhere in the right ballpark. As it is I’ve just emailed this note back to the source.

Erratum:

Before I published this note I got a reply from the original correspondent, with the following comment:

Sure enough – five years ago I had published some details about exactly this “feature” – and that 537 vs. 5.376 really was just a coincidence.

I decided to publish the note anyway for three reasons –

    one: to make sure you realise that I do make mistakes
    two: to show you that simple games with numbers may give you a working hypothesis
    three: to remind you that once you’ve got a working hypothesis it’s often easy to think of ways to demonstrate that your hypothesis is wrong. (A couple of hacks of the statistics would have shown me a constant 0.009 appearing, rather than anything like 5% divided by 1%.)

Upgrade issues

Here’s an example of how a bug-fix can create problems. It’s a code change in 11.2.x.x and (I believe) 10.2.0.5 relating to the costing of queries involving (but perhaps not restricted to) composite partitioned tables. I first saw this change in an email from Doug Burns, who sent me the 10053 traces from a very simple query that had started using the wrong index after an upgrade from 10.2.0.4 to 11.2.0.2.

As part of his testing he had set the optimizer_features_enable parameter back to 10.2.0.4 and found that not only did the choice of index change back to the expected index, but the costs of the two indexes changed dramatically. (The cost of using the “right” index changed from 15 to something in excess of 9,000 on the upgrade!)

The example I’m going to show you demonstrates the principles of the cost change – but with the data set and statistics I’ve generated you won’t see a change of execution plan. This is typical of some of the models I create – it’s enough to establish a principle, after which it’s simple enough to recognise the problems that the principle can cause.

So here’s a data set. I created it in a default 11.2.0.2 install on Windows 32-bit, so it’s running with 8KB blocks, autoallocate extents, and ASSM – but I’ve disabled CPU costing:

create table t1(
	list_col	number,
	range_col	number,
	v100		varchar2(10),
	v10		varchar2(10),
	vid		varchar2(10),
	padding		varchar2(100)
)
partition by range(range_col)
subpartition by list (list_col)
subpartition template (
	subpartition s0		values(0),
	subpartition s1		values(1),
	subpartition s2		values(2),
	subpartition s3		values(3),
	subpartition s4		values(4),
	subpartition s5		values(5),
	subpartition s6		values(6),
	subpartition s7		values(7),
	subpartition s8		values(8),
	subpartition s9		values(9)
)
(
	partition p_10 values less than (10),
	partition p_20 values less than (20),
	partition p_30 values less than (30),
	partition p_40 values less than (40),
	partition p_50 values less than (50),
	partition p_60 values less than (60),
	partition p_70 values less than (70),
	partition p_80 values less than (80),
	partition p_90 values less than (90),
	partition p_100 values less than (100)
)
;

insert into t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	mod(rownum,10),
	mod(rownum,100),
	lpad(mod(rownum,10),10),
	lpad(mod(rownum,100),10),
	lpad(rownum,10),
	rpad('x',100)
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t1 add constraint t1_pk primary key(v10, vid, v100, range_col, list_col) using index local;

create index t1_one_col on t1(v100) local;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'

	);
end;
/

You’ll notice that one of the indexes I’ve created has only one column, while the other has five columns (including the single column of the smaller index). Since I’ve created only 1,000,000 rows spread across 100 partitions every partition is small and the corresponding index partitions are also small, so I’m going to create use dbms_stats.get_index_stats() and dbms_stats.set_index_stats() to make the indexes appear much larger. Specifically I will set the blevel on the single column index to 2, and the blevel on the multi-column index to 3. (This difference in settings isn’t just whimsy, it helps to emulate Doug’s problem.) I’ve previously published the type of code to make this possible; in this case I only set the table-level stats because the queries I’m going to use will trigger Oracle to use just the table-level stats.

select
	/*+ index(t1(v100)) */
	count(*)
from
	t1
where
	v10 = '0000000005'
and	v100 = '0000000005'
;

select
	/*+ index(t1(v10, vid, v100)) */
	count(*)
from
	t1
where
	v10 = '0000000005'
and	v100 = '0000000005'
;

I’m going to show you two sets of execution plans. The first one is where I’ve set optimizer_features_enable to ’10.2.0.4′, the second it where I’ve left it to default.

10.2.0.4 execution plans:
===================
---------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |     1 |    22 |  2350 |       |       |
|   1 |  SORT AGGREGATE                      |            |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL                |            |  1000 | 22000 |  2350 |     1 |    10 |
|   3 |    PARTITION LIST ALL                |            |  1000 | 22000 |  2350 |     1 |    10 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| T1         |  1000 | 22000 |  2350 |     1 |   100 |
|*  5 |      INDEX RANGE SCAN                | T1_ONE_COL |   100K|       |   330 |     1 |   100 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("V10"='0000000005')
   5 - access("V100"='0000000005')

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost  | Pstart| Pstop |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    22 |   100 |       |       |
|   1 |  SORT AGGREGATE      |       |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL|       |  1000 | 22000 |   100 |     1 |    10 |
|   3 |    PARTITION LIST ALL|       |  1000 | 22000 |   100 |     1 |    10 |
|*  4 |     INDEX RANGE SCAN | T1_PK |  1000 | 22000 |   100 |     1 |   100 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V10"='0000000005' AND "V100"='0000000005')
       filter("V100"='0000000005')

11.2.0.2 execution plans:
===================
Execution Plan
----------------------------------------------------------
Plan hash value: 3019183742

---------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |     1 |    22 |  2530 |       |       |
|   1 |  SORT AGGREGATE                      |            |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL                |            |  1000 | 22000 |  2530 |     1 |    10 |
|   3 |    PARTITION LIST ALL                |            |  1000 | 22000 |  2530 |     1 |    10 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| T1         |  1000 | 22000 |  2530 |     1 |   100 |
|*  5 |      INDEX RANGE SCAN                | T1_ONE_COL |   100K|       |   510 |     1 |   100 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("V10"='0000000005')
   5 - access("V100"='0000000005')

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost  | Pstart| Pstop |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    22 |   370 |       |       |
|   1 |  SORT AGGREGATE      |       |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL|       |  1000 | 22000 |   370 |     1 |    10 |
|   3 |    PARTITION LIST ALL|       |  1000 | 22000 |   370 |     1 |    10 |
|*  4 |     INDEX RANGE SCAN | T1_PK |  1000 | 22000 |   370 |     1 |   100 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V10"='0000000005' AND "V100"='0000000005')
       filter("V100"='0000000005')

Regardless of the setting of optimizer_features_enable, if I hint the single-column index I have a plan that visits each partition and subpartition in turn for the index then table; and if I hint the primary key I do that same walk through each partition and subpartition and visit just the index.

The big difference is in the cost. By default, both execution plans cost more in 11.2.0.2 – but the odd thing is that the change in costs is different. The cost of using the single-column index has gone up by 180, the cost of using the primary key index has gone up by 270; and the change in costs can be attributed completely to the indexes.

Is there a pattern in the change ? Yes, it’s not immediately obvious, of course, and needs a few extra tests to confirm it, but the change in cost can be calculated as: (number of range partitions * (number of list partitions – 1) * blevel of index).

    For the single column index this is: 10 * 9 * 2 = 180
    For the primary key index this is: 10 * 9 * 3 = 270

You don’t have to see an example of a change in plan actually happening once you’ve seen this change in arithmetic. In certain circumstances the cost of using composite partitions goes up when you upgrade to 11.2 – and the change in cost is proportional to the number of data segments (subpartitions) and the blevel of the indexes.

In Doug’s case he had a table with roughly 4,000 subpartitions in total – and he had a single column index with a blevel of 2, and a multi-column index with a blevel of 3: so the cost of using the (smaller, but less efficient) index went up by about 8,000 and the cost of using the (larger but efficient) index went up by about 12,000. The difference of 4,000 between these two increases was far higher than the original cost of using the bad index – so Oracle switched from the good index to the bad index.

The nasty thing about this problem is that it’s correcting an error – 11.2 is allowing for the cost of probing every single index subpartition, that’s what the extra multiples of the blevel represent; unfortunately the optimizer’s underlying paradigm of “every block visit is a disk I/O” makes this correction a threat.

Solutions and Workarounds

There is an /*+ optimizer_features_enable(‘x.x.x.x’) */ hint that could be used if the problem applies to just a couple of queries.

If the problem applies to classes of queries involving a couple of big tables and indexes you could use dbms_stats.set_index_stats() to adjust the blevel of the critical indexes.

If the problem appears all over the place you could set the optimizer_features_enable parameter to ’10.2.0.4′ in the parameter file, or in a session logon trigger.

If the problem appears all over the place, but there are other features of the 11.2.0.2 optimizer that are very helpful you could take advantage of the “_fix_control” parameter – after talking to Oracle support. This particular problem comes under bug fix 7132684, which appears in v$system_fix_control under 10.2.0.5, with the description “Fix costing for non prefixed local index”. If you’ve got the right symptoms, then the following statement will bypass the problem:

alter session set "_fix_control"='7132684:OFF';

Final Warning:

This note comes from 30 minutes looking at Doug’s trace files, one hour building a test case, and an hour spent writing this blog. This is not a complete determination of the problem it’s just an initial outline of what can happen and the symptoms to watch out for, posted as an early warning to help a few people save some time. Do not treat this note as a definitive reference.