Who's online

There are currently 0 users and 29 guests online.

Recent comments



Buffer States

Here’s a bit of geek stuff that I’ve been meaning to write up for nearly a year – to the day, more or less – and I’ve finally been prompted to finish the job off by the re-appearance on the OTN database forum of the standard “keep cache” question:

    Why isn’t Oracle keeping an object “properly” when it’s smaller than the db_keep_cache_size and it has been assigned to the buffer_pool keep ?

This is a two-part note – and in the first part I’m just going to run a query and talk about the results. The query is one that has to be run by SYS because it references a couple of x$ structures, and this particular version of the query was engineered specifically for a particular client.

        bitand(bh.flag,power(2,13))     cur,
        count(*)        ct
        x$bh            bh,
        x$kcbwds        wds
        wds.addr = bh.set_ds
and     wds.set_id  between 1 and 24
group by
order by

You’ll notice I’m joining x$bh (the “buffer header” array) to x$kcbwds (the “working data set” array) where I’ve picked sets 1 to 24. On this particular system these were the sets for the KEEP cache. (If you want a generic query to isolate a particular cache then there’s an example here that identifies the RECYCLE cache by reference – but I wanted the query in this note to run as efficiently as possible against this production system, so I did a preliminary lookup against x$kcbwbpd and then used the literal set ids).


Here are a few lines from the resulting output:

       OBJ      STATE        CUR         CT
---------- ---------- ---------- ----------
     40158          1       8192          1

     40189          1          0      87233
                            8192     272789

                    3          0      69804
                            8192     393868

     40192          1          0         87
                            8192      12197

                    3          0      30763
                            8192       1994


    117291          1          0        498
                            8192       4419

                    3          0       3001
                            8192         15

    117294          1          0        243
                            8192       3544

                    3          0       1245
                            8192         23

4294967294          3          0          2
**********                       ----------
sum                                 1216072

Since we’re looking at x$ structures – which rarely have any official documentation – the rest of this note isn’t guaranteed to be correct – and things do change with version so I need to stress that this specific example comes from This is what I think the results show:

The state column is instance-related and is essentially something that’s useful in a RAC enviroment. State 1 translates to ‘XCUR’ (exclusive current) which means that this instance has exclusive rights to the most recent version of the block; state 3 translates to ‘CR’ (only valid for consistent read).

Bit 13 of the flag column is set if the buffer has been “gotten in current mode”. (If you’re interested in the other bits there’s a page on my old website that might keep you entertained for a while – I haven’t yet updated it to 11g, though.)

The problem for the client was this – the total size of all the data segments in the KEEP cache was about 6GB and the total size of the KEEP cache was about 10GB, yet the database was still reporting a constant trickle of physical reads to the objects and, when the code to “re-load” the cache  was executed at a quiet period at the start of the day some 60,000 physical blocks had to be  read. With a 10GB cache for 6GB of data would you really expect to see this I/O ?

Take a look at the figures for object 40189:

There are 272,789 buffers for blocks that were “gotten in current mode” (bit 13 is set) and are also “exclusive current” (state 1) to the instance, but there are also 393,868 buffers that were originally “gotten in current mode” but are now “only valid for consistent read”.

Similarly there are 87,233 buffers for blocks that weren’t “gotten in current mode” but are “exclusive current” to the instance – in other words they are the most up to date version of the block but weren’t fetched with a “db block get”, and again there are 69,804 buffers holding blocks that were not “gotten in current mode” but which are now “only valid for consistent read”.

Buffers that are “only valid for consistent read” are buffers holding blocks that have been generated through one of Oracle’s mechanisms for creating CR (consistent read) clones. As you can see, then, a block that is a CR clone may still be flagged as “gotten in current mode”. In fact, in line with Oracle’s generally “lazy” approach to work you can even find (in some versions of Oracle, at least) CR clones that still have the “dirty” bit set in the flag, even though CR clones can never really be dirty and are NEVER written to disc.

Take another look at the buffer counts – this KEEP cache is sized at 1.2M buffers (10GB), but object 40189 alone has taken out 460,000 of those buffers (3.6GB) in block clones, and for this object there are more clones than originals (at 360,000, which happens to be just a few thousand blocks less than the size of the table). So, when you’re thinking about creating a KEEP cache, remember that you have to allow for block cloning – simply setting the db_keep_cache_size to something “a bit bigger” than the object you want to keep cached may not even be close to adequate.

Part 2 to follow soon.

Hotsos 2011 – Mining the AWR Repository for Capacity Planning, Visualization, and other Real World Stuff

For all that loves Oracle performance.. Hotsos is truly the best conference, all the speakers are performance geeks, all the attendees talks about performance. Everything is about performance! There are lot more stuff that I like about my first Hotsos experience, the following are some of them..

I like that presenters and attendees are curious about what each performance geek has to say..

I like that whenever their brains are already fried. They consume as much coffee/soda as they can.. and just lay their butts on this couch.. and still.. talk about performance.

Statspack Reports

A couple of weeks ago I listed a number of scripts from relating to AWR reports – it seems only sensible to publish a corresponding list for Statspack. In fact, there are two such lists – one for “traditional” Statspack, and one for “standby statspack” – a version of statspack you can pre-install so that you can run statspack reports against a standby database.

Statspack – the basic source of information is $ORACLE_HOME/rdbms/admin/spdoc.txt

spreport.sql    ->  Generates a Statspack Instance report
sprepins.sql    ->  Generates a Statspack Instance report for the database and instance specified
sprepsql.sql    ->  Generates a Statspack SQL report for the SQL Hash Value specified
sprsqins.sql    ->  Generates a Statspack SQL report for the SQL Hash Value specified, for the database and instance specified

sppurge.sql     ->  Purges a limited range of Snapshot Id's for a given database instance
sptrunc.sql     ->  Truncates all Performance data in Statspack tables
spuexp.par      ->  An export parameter file supplied for exporting the whole PERFSTAT user

Standby Statspack – the basic source of information is $ORACLE_HOME/rdbms/admin/sbdoc.txt

sbreport.sql      - Create a report

sbaddins.sql      - Add a standby database instance to the configuration
sblisins.sql      - List instances in the standby configuration
sbdelins.sql      - Delete an instance from the standby configuration

sbpurge.sql       - Purge a set of snapshots

A warning note about standby statspack – there is a significant structural change from to allowing support of multiple standby databases. I haven’t looked closely at it yet, but it did cross my mind that it might be possible to install the version in an database to avoid late upgrade issues – I may get around to testing the idea one day.

Free ASH

If you’re running a version of Oracle older than 10g (where v$active_session_history appeared), or if you’re not using Enterprise Edition, or if you just don’t want to pay for the Diagnostic Pack licence, here are some links relating a free Java program that emulates the “Top Sessions” output of the Enterprise Manager screen:

Important Note:

If you want to query v$active_session_history (or any of the AWR objects) then you need to purchase the licence for the Diagnostic Pack). If you aren’t licensed you should only use the program in emulation mode.


Browsing through the archive for the Oracle-L listserver a couple of days ago I came across this item dated Feb 2011 where the author was puzzled by Oracle’s choice of index for a query.

He was using, and running with the optimizer_mode set to first_rows – which you shouldn’t really be doing with that version of Oracle since Oracle Corp. told us about 10 years ago that “first_rows is avaiable only for backwards compatibility”.

I’ve created a model of their problem to demonstrate the effect. As usual, to make it easier to get a reproducible result, I’ve used locally managed tablespaces with 1MB uniform extents, freelist management, and CPU costing disabled:

create table t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
		to_date('01-Jan-2011','dd-mon-yyyy') +
	)			a,
	mod(rownum,317) + 1	b,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
	generator	v1,
	generator	v2
	rownum <= 100000

alter table t1 add constraint t1_pk primary key(a,b);
create index t1_ba on t1(b,a);

The SQL creates 317 rows for a list of dates which have been stored as eight character strings in the form YYYYMMDD. The 317 rows are numbered from 1 to 317, and the data is stored in order of date and number. I’ve created a primary key on (date, number), and I’ve also created an index on (number, date) – the PK has a very good clustering_factor and the other index has a very bad one because of the way I generated the data.

With this data in hand, and after collecing statistics (compute, no histograms), I run the following SQL (and like the OP I am using

alter session set optimizer_mode = first_rows;

set autotrace traceonly explain

	a = '20110401'
and	b > 10
order by
	a, b

	/*+ index(t1(a,b)) */
	a = '20110401'
and	b > 10
order by
	a, b

I’m after 307 consecutive rows of one date – and I want the data sorted by the date and number. With first_rows optimization the default plan is a little surprising. Here are two execution plans for the query – first the plan that the optimizer chose by default, the second when I hinted the SQL to use the primary key – note that neither plan shows a sort operation:

Default execution plan
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT            |       |   307 |  7368 |   617 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   307 |  7368 |   617 |
|*  2 |   INDEX SKIP SCAN           | T1_BA |   307 |       |   309 |

Predicate Information (identified by operation id):
   2 - access("B">10 AND "A"='20110401')
       filter("A"='20110401' AND "B">10)

Hinted execution plan
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT            |       |   307 |  7368 |    10 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   307 |  7368 |    10 |
|*  2 |   INDEX RANGE SCAN          | T1_PK |   307 |       |     2 |

Predicate Information (identified by operation id):
   2 - access("A"='20110401' AND "B">10)

How strange – it is clearly better to use the primary key index for this query, yet the optimizer doesn’t do it under first_rows optimisation. (It does if you use the slightly more appropriate first_rows(1) – the “new” improved option from 9i).

The first thought you might have when looking at this example is the first_rows has a heuristic (i.e. rule) that says “use an index to avoid sorting at all costs if possible (unless the hidden parametere _sort_elimination_cost_ratio is non-zero)”. But that shouldn’t apply here because both indexes will allow Orace to avoid sorting.

And here’s an even stranger detail: notice that the “order by” clause includes column “a”, which is obviously constant because of the “where” clause. Since it’s constant removing it won’t make any difference to the final output - but look what happens:

	a = '20110401'
and	b > 10
order by

| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT            |       |   307 |  7368 |    10 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   307 |  7368 |    10 |
|*  2 |   INDEX RANGE SCAN          | T1_PK |   307 |       |     2 |

Predicate Information (identified by operation id):
   2 - access("A"='20110401' AND "B">10)
       filter("A"='20110401' AND "B">10)

For no obvious reason the optimizer now picks the right index. What’s going on ? Unfortunately I have to say that I don’t know.

When I checked the 10053 trace file there were a few differences for the two “order by” clauses but I couldn’t see anything that gave me any reasonable ideas. The most significant difference was the choice of indexes examined when the optimizer was doing the “recost for order by” bit. When we ordered by a,b the optimizer considered only the t1_ba index (note – the final costs are slightly higher here because in this run I enabled CPU costing to see if that was having an effect, so there’s a little extra for the CPU):

  Access Path: index (skip-scan)
    SS sel: 0.0030744  ANDV (#skips): 308
    SS io: 308.00 vs. index scan io: 321.00
    Skip Scan chosen
  Access Path: index (SkipScan)
    Index: T1_BA
    resc_io: 617.00  resc_cpu: 23882848
    ix_sel: 0.0030744  ix_sel_with_filters: 0.0030744
    Cost: 618.60  Resp: 618.60  Degree: 1
  Best:: AccessPath: IndexRange  Index: T1_BA
         Cost: 618.60  Degree: 1  Resp: 618.60  Card: 307.44  Bytes: 24

when we ordered by b alone the optimizer considered only the t1_pk index:

  Access Path: index (RangeScan)
    Index: T1_PK
    resc_io: 10.00  resc_cpu: 191334
    ix_sel: 0.0030744  ix_sel_with_filters: 0.0030744
    Cost: 10.01  Resp: 10.01  Degree: 1
  Best:: AccessPath: IndexRange  Index: T1_PK
         Cost: 10.01  Degree: 1  Resp: 10.01  Card: 307.44  Bytes: 24

There really seems to be a flaw in the logic behind the choice of index – and there’s an important point to think about here: if it’s a bug it’s probably not going to be fixed. The first_rows option only exists for “backwards compatibility” and things stop being compatible if you change them.

Footnote: Because the cost of the skip scan path in the original run was 617 and the cost of the primary key range scan path was 10 I could make Oracle choose the primary key by setting the parameter _sort_elimination_cost_ratio to a value just less than 617/10 (say 60); but I mention that only as an idle curiosity. You shouldn’t be using first_rows , and if you do use it you shouldn’t be hacking with undocumented parameters to work around the problems it produces.

Expert Oracle Exadata book – Alpha chapters available for purchase!


Apress has made the draft versions of our Expert Oracle Exadata book available for purchase.

How this works is:

  1. You purchase the “alpha” version of the Expert Oracle Exadata book
  2. You get the access to draft/alpha PDF versions of some chapters now!
  3. As more chapters will be added and existing ones updated, you’ll receive an email and you can download these too
  4. You will get a PDF copy of the final book once it’s out!

This is an awesome deal if you can’t wait until the final launch and want to get ahead of the curve with your Exadata skills ;-)

Buy the alpha version of our Expert Oracle Exadata book from Apress here!

If you haven’t heard about this book earlier – I’m one of the 3 authors, writing it together with Kerry Osborne and Randy Johnson from Enkitec and our official tech reviewer is no other than THE Kevin Closson and we are also getting some (unofficial) feedback from Oracle database junkie Arup Nanda.

So this book will absolutely rock and if you want a piece of it now, order the alpha book above!

P.S. This hopefully also explains why I’ve been so quiet with my blogging lately – can’t write a book and do many other things at the same time… (at least if you want to do it well…)


Advanced Oracle Troubleshooting v2.0 Online Deep Dives in April and May 2011

Due to a lot interest I’m going to do another run of my Advanced Oracle Troubleshooting v2.0 Online Deep Dive seminars in April and May (initially I had planned to do it no earlier than Sep/Oct…)

Check the dates & additional info out here:

P.S. People who already attended the AOT2 seminars last year – I will schedule the follow-up Q&A sessions in mid-March!


AWR Reports

A little thought for those who use OEM to generate AWR reports – if you generate the standard HTML form and then try posting it into (say) the OTN database forum when you want some help interpreting it, it won’t look very readable (even if you do remember to put the right tags before and after to reproduce it in fixed font).

So it’s a good idea to know how to produce the text format – which is what I do most of the time (especially since I often end up with a simple telnet or putty session into a client server). Take a look at $ORACLEHOME/rdbms/admin for all the scripts starting with “awr” – there’s quite a lot of them, and the number keeps growing. Apart from finding a script that will give you the standard AWR report in a good old-fashioned text format, you may that newer versions of Oracle include a few useful variations on the theme.

Here’s a list from the home ($ORACLE_HOME/rdbms/admin) with (mostly) the one-line description from the start of file. I’ve put the most useful ones in the first list:

awrrpt.sql      -- basic AWR report
awrsqrpt.sql    -- Standard SQL statement Report

awrddrpt.sql    -- Period diff on current instance

awrrpti.sql     -- Workload Repository Report Instance (RAC)
awrgrpt.sql     -- AWR Global Report (RAC)
awrgdrpt.sql    -- AWR Global Diff Report (RAC)

awrinfo.sql     -- Script to output general AWR information

For most people the awrrpt.sql and awrsqrpt.sql are likely to be sufficient, but the “difference between two periods” can be very useful – especially if you do things like regularly forcing an extra snapshot at the start and end of the overnight batch so that you can (when necessary) find the most significant differences in behaviour between the batch runs on two different nights.

If you get into the ‘RAC difference report’ you’ll need a very wide page – and very good eyesight !

There are also a lot of “infrastructure and support” bits – some of the “input” files give you some nice ideas about how you can write your own code to do little jobs like: “run the most recent AWR report automatically”:

awrblmig.sql    -- AWR Baseline Migrate
awrload.sql     -- AWR LOAD: load awr from dump file
awrextr.sql     -- AWR Extract

awrddinp.sql    -- Get inputs for diff report
awrddrpi.sql    -- Workload Repository Compare Periods Report

awrgdinp.sql    -- Get inputs for global diff reports
awrgdrpi.sql    -- Workload Repository Global Compare Periods Report

awrginp.sql     -- AWR Global Input
awrgrpti.sql    -- Workload Repository RAC (Global) Report

awrinpnm.sql    -- AWR INput NaMe
awrinput.sql    -- Get inputs for AWR report 

awrsqrpi.sql    -- Workload Repository SQL Report Instance

I usually carry copies of the scripts with me when I’m troubleshooting in case I need them at client sites – sometimes I’m not allowed the privileges I really need to do rapid troubleshooting, but if I can persuade the DBA to give me execute privileges on package dbms_workload_repository and select privileges on a couple of the necessary tables and views then I can run the reports from an otherwise “minimal” account.

There are also a couple of deceptively named files that you might miss in 11.2:

spawrrac.sql  -- Server Performance AWR RAC report
spawrio.sql  -- AWR IO Intensity Report
spadvrpt.sql -- Streams Performance ADVisor RePorT

Although the initial letters in the names suggest that these files might fall in with statspack, they actually report from the AWR tables – however the first one (spawrrac.sql) was only a temporary measure, and prints out the warning message:

This script will be deprecated.  The official release of the Global AWR report is awrgrpt.sql

So if you’re using the spawrrac.sql – stop it.

AWR Snapshots

A couple of days ago I mentioned as a passing comment that you could take AWR snapshots at the start and end of the overnight batch processing so that if you ever had to answer the question: “Why did last night’s batch overrun by two hours?” you had the option of creating and comparing the AWR report from the latest batch run with the report from a previous batch run (perhaps the corresponding night the previous week) and check for any significant differences. Moreover, Oracle supplies you with the code to compare and report such differences from 10.2 (at least) using the script $ORACLE_HOME/rdbms/admin/awrddrpt.sql

The thing I didn’t mention at the time was how to take a snapshot on demand. It’s very easy if you have the appropriate execute privilege on package dbms_workload_repository.

execute dbms_workload_repository.create_snapshot('TYPICAL');

The single input parameter can be ‘TYPICAL’ (the default) or ‘ALL’.

I keep this one liner in a little script called awr_launch.sql – because I can never remember the exact name of the package without looking it up. (And sometimes I run it with sql_trace enabled so that I can see how much work goes into a snapshot as we change versions, features, configuration, workload and platforms.)


There’s an important detail about constraints – check constraints in particular – that’s easy to forget, and likely to lead you into errors. Here’s a little cut-n-paste demo from an SQL*Plus session:

SQL> select count(*) from t1;


1 row selected.

SQL> alter table t1 add constraint t1_ck_v1 check (v1 in ('a','b','c'));

Table altered.

SQL> select count(*) from t1 where v1 in ('a','b','c');


1 row selected.
    We count the number of rows in a table – and it’s four.
    We add a constraint to restrict the values for a certain column – and every column survives the check.
    We use the corresponding predicate to count the number of rows that match the check constraint – and we’ve lost a row !

Why ?

A predicate returns a row if it evaluates to TRUE.
A constraint allows a row if it does not evaluate to FALSE - which means it is allowed to evaluate to TRUE or to NULL.

I have one row in the table where v1 is null, and for that row the check constraint evaluates to NULL, which is not FALSE. So the row passes the check constraint, but doesn’t get returned by the predicate. I think it’s worth mentioning this difference because from time to time I see production systems that “lose” data because of this oversight.

To make the constraint consistent with the predicate I would probably add a NOT NULL declaration to the column or rewrite the constraint as (v1 is not null and v1 in (‘a’,'b’,'c’)).