Top 60 Oracle Blogs

Recent comments

September 2010

Presentation at New York Oracle User Group

Thank you for all those who attended the session - Analyzing Application Performance in RAC - at New York Oracle User Group's Fall Meeting in New York City.

You can download the slides from here. And, please watch out for the article on the same topic in NYOUG Tech Journal.

And now for something completely different

The always excellent Cricinfo is in the process of bringing to an end it’s All Time XI series of articles by selecting an All Time World XI from the world teams previously selected. You can read all about it here . You can also choose your own – if you are at all interested in Cricket [...]

Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – I.

Allocating hugepages for Oracle Database on Linux can be tricky. The following is a short list of some of the common problems associated with faulty attempts to get things properly configured:

  1. Insufficient Hugepages. You can be short just a single 2MB hugepage at instance startup and Oracle will silently fall back to no hugepages. For instance, if an instance needs 10,000 hugepages but there are only 9,999 available at startup Oracle will create non-hugepages IPC shared memory and the 9,999 (x 2MB) is just wasted memory.
    1. Insufficient hugepages is an even more difficult situation when booting with _enable_NUMA_support=TRUE as partial hugepages backing is possible.
  2. Improper Permissions. Both limits.conf(5) memlock and the shell ulimit –l must accommodate the desired amount of locked memory.

In general, list item 1 above has historically been the most difficult to deal with—especially on systems hosting several instances of Oracle. Since there is no way to determine whether an existing segment of shared memory is backed with hugepages, diagnostics are in short supply.

Oracle Database 11g Release 2 (
The fix for Oracle bugs 9195408 (unpublished) and 9931916 (published) is available in In a sort of fast forward to the past, the Linux port now supports an initialization parameter to force the instance to use hugepages for all segments or fail to boot. I recall initialization parameters on Unix ports back in the early 1990s that did just that.

The initialization parameter is called use_large_pages and setting it to “only” results in the all or none scenario. This, by the way, addresses list item 1.1 above. That is, setting use_large_pages=only ensures an instance will not have some NUMA segments backed with hugepages and others without.

Direct Debits. How hard can they be?

A little over 3 weeks ago I decided to do some cost cutting by switching all my insurance (car and house) and utilities to the cheapest services available. Most of the changes were fine, but the power company is proving to be a complete nightmare. It seems they are not even capable of setting up a direct debit. It’s been one screw up after another. Each time I have new instruction on my account waiting to be used, but each time they claim the bank has rejected it, which they obviously haven’t, and throw away all details of the direct debit instruction. So I’m left with a perfectly good direct debit instruction on my account, but a piece of crap power company who can’t draw cash from it.

It started to get escalated up the chain in the power company today. This afternoon I had the pleasure of speaking to a customer services representative who felt the need to explain the concept of a direct debit to me. As bored as I was I would have accepted it provided she had actually understood what a direct debit was. When I suggested she was actually describing a standing order I was told in no uncertain terms that they deal with thousands of direct debits and they know what they are doing. I then read out the definition of a direct debit and a standing order over the phone, at which point she back-tracked and tried to suggest she had said the correct thing and I had misunderstood her.

Next stop, her manager. He blamed the bank, then blamed me, then blamed the bank again. It seems the power company, populated by staff that don’t understand the difference between a direct debit and a standing order, can not possibly make a mistake…

I can only assume their cheap rates are possible because they cut costs by hiring complete muppets. Maybe as part of their training they could include a link to the following pages:



Q: The most fundamental difference between HASH and NESTED LOOP joins?

So, what do you think is the most fundamental difference between NESTED LOOPS and HASH JOINS?
This is not a trick question. You’re welcome to write your opinion in the comments section – and I’ll follow up with an article about it (my opinion) later today…
Update: The answer article is here:

Q: The most fundamental difference between HASH and NESTED LOOP joins?

So, what do you think is the most fundamental difference between NESTED LOOPS and HASH JOINS?

This is not a trick question. You’re welcome to write your opinion in the comments section – and I’ll follow up with an article about it (my opinion) later today…

Update: The answer article is here:


Oracle Exadata Database Machine Offerings: X2-2 and X2-8

For those who followed or attended Oracle OpenWorld last week you may have seen the introduction of the new hardware for the Oracle Exadata Database Machine. Here’s a high level summary of what was introduced: Updated Exadata Storage Server nodes (based on the Sun Fire X4270 M2) Updated 2 socket 12 core database nodes for the X2-2 (based on the Sun Fire X4170 M2) New offering of 8 socket 64 core database nodes using the Intel 7500 Series (Nehalem-EX) processors for the X2-8 (based on the Sun Fire X4800) The major updates in the X2-2 compared to V2 database nodes are: CPUs updated from quad-core Intel 5500 Series (Nehalem-EP) processors to six-core Intel 5600 Series (Westmere-EP) Network updated from 1 GbE to 10 GbE RAM updated from 72 GB to 96 GB The updates to the Exadata Storage Servers (which are identical for both the X2-2 and X2-8 configurations) are: CPUs updated to the six-core Intel 5600 Series (Westmere-EP) processors 600 GB 15k RPM SAS offering now known as HP (High Performance) 2 TB  7.2k RPM SAS offering now known as HC (High Capacity) [previously the 2 TB drives were 7.2k RPM SATA] One of the big advantages of the CPU updates to the Intel [...]

Locked Out

I know many of you enjoy my stories about mistakes I’ve made and how I’ve solved them, so here’s another log for the fire…

I’ll start by talking about an event that occurred about a month ago that made a bit concerned at the time, but not enough for me to take action on it.  Little did I know…

Recently during a development cycle, there was a request to “refresh” the Integrated Test database using the Development database as a source – this is the database after Development in the lifecycle, but before Quality Assurance.  Normally I prefer not to “refresh” a database that’s further upstream in the cycle – I prefer to refresh backwards from Production, but actively deploy incremental changes forwards from Development.  However, since at this location there’s not a very robust database SCM process they don’t have a good record of each change in development as it occurs – and unfortunately they don’t want to turn on DDL auditing as it might interfere with their security and auditing processes.  Since they use ETL to populate the data, the decision was made to simply blow away the Integrated Test schema and re-create it using Export / Import (Data Pump versions) – most likely just the DDL (METADATAONLY).

I didn’t like it – I wanted to generate a change script and apply the changes.  However, I didn’t have the time or energy to argue it.

That was about 1 month ago.

Now, in the Integrated Test database we’ve been running weekly 1-hour load tests – the load test lead is competent – which has provided us with some nice baselines and comparative reports.  Since I was out of town last week at OOW, I sat down today to look at last week’s test.

It was a mess – database servers at 100% CPU throughout the test.  Weird queries showing up in the ADDM report.  Emails talking about bouncing the database to “clear it”.

What the heck happened?

I took one of the queries reported by ADDM and started tearing it apart and was flabbergasted at what the optimizer was telling me about the execution plan.  Merge Joins, Cartesian Products, and a Cost that looked like the US Federal Deficit after 2 years of liberal spending.

That’s when the little voice in my head spoke up.  “Remember 2 weeks ago when we were looking at statistics and one of the developers said that the stats appeared out of date?”  “Remember how you said it wasn’t a problem because the auto stats gather job runs every day and brought them up-to-date, that the table in question probably didn’t have any material changes to it?”  “Maybe you should re-examine your assumptions, buddy!”

Ok, ok.

Double check the tables using ALLTABLES and notice that several of the tables haven’t been analyzed in a while (over 2 weeks).  That in-and-of-itself doesn’t appear bad – maybe they haven’t changed.  What was the name of the “view” that showed how much tables have changed since the last analyze?  Oh yeah, DBATABMODIFICATIONS – I remember when we had to set MONITORING manually and would check that view to see when we should kick off a manual statistics gathering job.  So, check DBATABMODFICATIONS – and see that lots of tables have tons of inserts, updates and deletes showing – that’s not right – especially if the auto stats gather job is running nightly.

So, let’s check on the auto stats gather job – maybe it’s not running, maybe it’s got an error, maybe something else…

Where is that job?  Oh yeah, the new DBMSSCHEDULER.  Now I’m looking through DBASCHEDULE views and I don’t see it.  Did the DBAs drop the job?  Disable it?  Why on earth would they do that?  No, that doesn’t make sense – no one asked for them to do that, and I’m sure they wouldn’t do it on their own.

Ah, the auto stats gather job isn’t a SCHEDULEd job, it’s an (wait for it), AUTOTASK (we’re on 11gR2).  (I’m sure I could have saved time by looking at Metalink Note 779596.1 – How Can We Find Out Status of Task ‘Auto Optimizer Stats Collection’ – but what’s the fun with that?)

So, we check and it’s enabled.  We check and it’s been run (DBAAUTOTASKCLIENTHISTORY and DBAAUTOTASKJOBHISTORY) – heck, it ran last night!  Oddly it ran in only 4 seconds – with a smiling “SUCCEEDED” next to it.  How could the job succeed and not update the stats?

Alright – something else must be wrong – I’m not convinced that the database is lying or has a bug – there must be something else.  Maybe something about the tables?  I remember that back in the day if a table didn’t have any stats on it in the first place, that updating them would fail – which is why I used to always do a quick analyze on tables immediately after creating them.  That doesn’t appear to be the case here, but maybe something else about the tables…

Hmmmm…  Where to check?  How about Enterprise Manager?  There must be something there.

Ah, Manage Optimizer Statistics.  Hmm…  Object Status link – search for the schema, and run the “report”.

All the tables show their statistics as Stale, and Locked.


How the heck did they get Locked?

Again, I don’t think the DBAs went and locked them – no request was submitted to do so.  And I know they wouldn’t have locked them on their own initiative.  So they must have been locked all along.

How do table statistics get locked on creation?

Maybe the Import Data Pump locked them?  Maybe a default parameter during the import?

Search google for impdp / imp lock statistics… and there it was:


Default: ALL


Enables you to filter what Export unloads: data only, metadata only, or both.

Syntax and Description

  • ALL unloads both data and metadata. This is the default.

  • DATA_ONLY unloads only table row data; no database object definitions are unloaded.

  • METADATA_ONLY unloads only database object definitions; no table row data is unloaded. Be aware that if you specify CONTENT=METADATA_ONLY, then when the dump file is subsequently imported, any index or table statistics imported from the dump file will be locked after the import.


Exadata Database Machine X2-2 or X2-8? Sure! Why Not? Part I.

Blog Correction (28-SEP-2010). In the section below about processor comparison I originally stated ab erroneous aggregate core count for the full rack X2-2. I have edited out that mistake and left in place the socket-to-socket comparison which was the center of the point I was trying to make anyway.

I’ve been getting a lot of questions about why one would choose Exadata Database Machine X2-8 over Exadata Database Machine X2-2. That’s actually a tough question, however, some topics do spring to mind. I’ll start a list:

  1. The Exadata Database Machine X2-8 only comes in full-rack configurations. No way to “start small.”
  2. The Exadata Database Machine X2-2 only supports Oracle Linux. If Solaris is attractive to you then the X2-2 is not an option.
  3. Database Host RAM. The aggregate database grid RAM in a full-rack X2-2 system is 768 GB but 2 TB with the X2-8. The list is quite long for areas that benefit from the additional memory. Such topics as large user counts (consolidation or otherwise), join processing, and very large SGA come to mind. And, regarding large SGA, don’t forget, the Exadata Database Machine supports in-memory Parallel Query as well.

Not on the numbered list is the more sensitive topic of processor power. While these sorts of things are very workload-dependent, I’d go with 16 Intel Xeon 7500 (Nehalem EX) processors over 16 Intel Xeon 5600 (Westmere EP) for most any workload.

So, readers, what reasons would motivate you in one direction or the other?

gc buffer busy waits

If you have the opportunity to work in a RAC environment, you probably encountered (or you will encounter soon :-) ) this wait event: ‘GC Buffer busy’. We will explore issues leading to excessive waits for this wait events and how to resolve the issue effectively.

What is a GC buffer busy wait?

In a simple sense, GC buffer busy means that the buffer in the buffer cache, that the session is trying to access is already involved in another ongoing global cache operation. Until that global cache operation completes, session must wait. I will explain this with an example: Let’s say that session #1 is trying to access a block of file #7 block ID 420. That block is in the remote cache and so, session #1 opened a BL lock on that block, requested the remote LMS process to send the block, and waiting for the block shipping to complete. Session #2 comes along shortly thereafter and tries to access the same buffer. But, the block is already involved in a global cache operation and so, session #2 must wait for the session #1 to complete GC (Global Cache) activity before proceeding. In this case, Session #2 will wait for ‘gc buffer busy’ wait event with a time-out and repeatedly tries to access that buffer in a loop.

Consider the scenario if the block is a hot block such as segment header, index branch block or transaction table header block etc. In this case, you can see that many such sessions waiting for the ‘Gc buffer busy’ wait event. This can lead to complex wait scenario quickly as few background processes also can wait for ‘gc buffer busy’ event leading to an eventual database hang situation. If you kill the processes, then pmon might need to access that block to do a rollback, which means that pmon can get stuck waiting for ‘gc buffer busy’ waits too.

Few Scenarios

This wait event can occur for many different reasons, including bugs. For example, I encountered a bug in which the index branch block split can cause excessive ‘gc buffer busy’ waits. So, It is not possible to document all scenarios that can lead to gc buffer busy waits. But, it is worth exploring few most common scenarios, and then discuss a mitigation plans for those scenarios. The methods discussed here will be helpful to understand which types of blocks are involved in this issue too.

1. Right hand growth indexes

Typically, applications generate surrogate keys using a sequence based key generation, an example would be employee_id column in the employee table. These types of unique or primary key columns are usually populated using a sequence generated value with a unique constraint on the column. A unique index may be created to support the unique constraint. Although, it is possible to create a non-unique index to support the unique constraint, and that non-unique index also will suffer from the issues similar to its unique counterparts. This problem is related to more about uniqueness of data and locality of new rows, rather than the type of index.

Indexes store sorted (key[s], ROWID) pair, meaning values in the index are arranged in an ascending or descending key column order. ROWIDs in the (key[s], ROWID) pair points to a specific row in the table segment with that index column value. Also, Indexes are implemented as B-Tree indexes. In the case of unique indexes, on columns populated by sequence based key values, recent entries will be in the right most leaf block of the B-Tree. All new rows will be stored in the right most leaf block of the index. As more and more sessions insert rows in to the table, that right most leaf block will be full. Oracle will split that right most leaf block in to two leaf blocks: One block with all rows except one row and a new block with just one row. (This split, aka Index 90-10 split, needs to modify branch block also ). Now that new leaf block becomes the right most leaf block and the concurrency moves to the new leaf block. Simply put, you can see concurrency issues moving from one block to another block in an orderly fashion.

As you can imagine, all sessions inserting in to the table will insert rows in to the current right most leaf block of the index. This type of index growth termed as “Right Hand Growth” Indexes. As sessions inserts in to the right most leaf block of the index, that index becomes hot block, and concurrency on that leaf block leads to performance issues.

In RAC, this problem is magnified. Sequence cache is instance specific and if the cache is small (defaults to 20), then the right most leaf block becomes hot block, not just in one instance, but across all instances. That hot – right most – leaf block will be transferred back and forth between the instances. If the block is considered busy, then LMS process might induce more delays in transferring the blocks between the instances. While the block is in transit, then the sessions accessing that block must wait on ‘GC buffer busy’ waits and this quickly leads to excessive GC buffer busy waits. Also, immediate branch block of those right most leaf block will play a role in the waits during leaf block splits.

So, how bad can it get? It can be very bad. A complete database hang situation is a possibility. Notice below that over 1000 sessions were waiting for ‘gc buffer busy’ events across the cluster. Application is completely down.

   INST_ID SQL_ID        EVENT              STATE    COUNT(*)
---------- ------------- ------------------ -------- --------
         4 4jtbgawt37mcd gc cr request     WAITING    9
         3 4jtbgawt37mcd gc cr request     WAITING    9
         3 a1bp5ytvpfj48 gc buffer busy    WAITING   11
         4 a1bp5ytvpfj48 gc buffer busy    WAITING   17
         4 14t0wadn1t0us gc buffer busy    WAITING   33
         4 gt1rdqk2ub851 gc buffer busy    WAITING   34
         4 a1bp5ytvpfj48 buffer busy waits WAITING   35
         2 a1bp5ytvpfj48 gc buffer busy    WAITING   65
         1 a1bp5ytvpfj48 gc buffer busy    WAITING  102
         2 7xzqcrdrnyw1j gc buffer busy    WAITING  106
         2 7xzqcrdrnyw1j enq: TX - index c WAITING  173
         1 7xzqcrdrnyw1j gc buffer busy    WAITING  198
         3 7xzqcrdrnyw1j gc buffer busy    WAITING  247
         4 7xzqcrdrnyw1j gc buffer busy    WAITING  247

How do you analyze the problem with right hand key indexes?

First, we need to verify that problem is due to right hand indexes. If you have access to ASH data, it is easy. For all sessions waiting for ‘gc buffer busy’ event query the current_obj#. Following query on ASH can provide you with the object_id involved in these waits. Also, make sure the statement is UPDATE or INSERT statements, not SELECT statement[ SELECT statements are discussed below].

select sample_time,  sql_id, event, current_obj#,sum (cnt)  from  gv$active_session_history
   where sample_time between  to_date ('24-SEP-2010 14:28:00','DD-MON-YYYY HH24:MI:SS') and
     to_date ('24-SEP-2010 14:29:59','DD-MON-YYYY HH24:MI:SS')
    group by  sample_time,  sql_id, event, current_obj#
   order by sample_time

SAMPLE_TIME                             |SQL_ID              |EVENT                                   |CURRENT_OBJ#|  COUNT(*)
26-AUG-10 PM               |14t0wadn1t0us       |gc buffer busy                          |       8366|        33
select owner, object_name, object_type from dba_objects where object_id=8366 or data_object_id=8366;

In this example, current_obj# is 8366, which we can query the dba_objects to find the correct object_id. If this object is an unique index or almost unique index, then you might be running in to a right hand growth indexes.

If you don’t have access to ASH then, you need to sample gv$session_wait (or gv$session from 10g), group by p1, p2 to identify the blocks inducing ‘gc buffer busy’ waits. Then, map those blocks to objects suffering from the issue.

select event,    p1, p2, count(*) from  gv$session s
event ='gc buffer busy' and state='WAITING'
group by event, p1, p2
order by 4

You can also use my script segment_stats_delta.sql to see the objects suffering from ‘gc buffer busy’ waits. See below for an example use:

segment_stats_delta.sql v1.01 by Riyaj Shamsudeen

...Prints Change in segment statistics in the past N seconds.
...Default collection period is 60 seconds.... Please wait for at least 60 seconds...

!!! NOTICE !!! This scripts drops and recreates two types: segment_stats_tbl_type and segment_stats_type.

Following are the available statistics:
Pay close attention to sampled column below. Statistics which are sampled are not exact and so, it might not reflect immediately.

NAME                                                             SAM
---------------------------------------------------------------- ---
logical reads                                                    YES
buffer busy waits                                                NO
gc buffer busy                                                   NO
db block changes                                                 YES
gc cr blocks received                                            NO
Enter value for statistic_name: gc buffer busy
Enter value for sleep_duration: 60
WSH                            | WSH_DELIVERY_DETAILS_U1        | 34
APPLSYS                        | WF_ITEM_ATTRIBUTE_VALUES_PK    | 2
WSH                            | WSH_DELIVERY_DETAILS_U1        | 2
INV                            | MTL_MATERIAL_TRANSACTIONS_U1   | 1
ONT                            | OE_ORDER_LINES_ALL             | 1

PL/SQL procedure successfully completed.

How do you resolve ‘GC buffer busy’ waits due to right hand growth index?

In a simplistic sense, You need to reduce the concurrency on the right most leaf block. There are few options to reduce the concurrency, and hash partitioning that unique index (or almost unique index) is a better solution of all. For example, if we convert the unique index as hash partitioned index with 32 partitions, then you are reducing the concurrency on that right most leaf block by 32 fold. Why? In hash partitioning scheme with 32 partitions, there are 32 index trees and inserts will be spread across 32 right most leaf blocks. In contrast, there is just 1 index tree in the case of non-partitioned index. Essentially, Each partition gets its own index tree. Given an ID column value, that row is always inserted in to a specific partition and that partition is identified by applying hash function over the partitioning column.

In the case of non-partitioned index, for example, values from 1000 to 1010 will be stored in the right most leaf block of the index. In the case of partitioned index with 32 partitions, value 1000 will be stored in partition 24, value 1001 will be stored in partition 19, meaning, values are hashed and spread around 32 partitions leading to improved concurrency. This will completely eliminate Right hand Growth index concurrency issue.

Non-partitioned index

Hash partitioned index with 2 partitions

So, What is the drawback of hash partitioning indexes?

If your query needs to do range scan with a predicate similar to ‘id between 1000 and 1005′, then the index range scan will need to scan 32 index trees, instead of one index tree as in the case of non-partitioned tables. Unless, your application executes these sort of queries millions of times, you probably wouldn’t notice the performance difference. For equality predicate, such as ‘id=:B1′, this is not an issue as the database will need to scan just one index tree.

Let’s discuss about reverse key indexes too. As Michael Hallas and Greg Rahn of real-world performance group said (and I happily concur), reverse key indexes are evil. If you are in Oracle Database version 10g, you can create a partitioned index on a non-partitioned table. So, If your application is suffering from a right hand growth index contention issue, you can convert the non-partitioned index to a hash-partitioned index with minimal risk. So, there aren’t many reasons to use reverse key indexes in 10g [ Remember that range scan is not allowed in the reverse key indexes either]. But, if you are unfortunate enough to support Oracle 9i database, you can NOT create a partitioned index on a non-partitioned table. If your application suffers from right hand growth index concurrency issues in 9i, then your options may be limited to reverse key indexes (or playing with sequences and code change or better yet – upgrade to 10g).

2. What if the statement is a SELECT statement?

It is possible for the SELECT statement to suffer from gc buffer busy waits too. If you encounter a scenario in which the object is an index and the statement is a SELECT statement, then this paragraph applies to you. This issue typically happens if there is higher concurrency on few blocks. For example, excessive index full scan on an index concurrently from many instances can cause ‘gc buffer busy’ waits. In this case, right approach would be tune the SQL statement to avoid excessive access to those index blocks.

In my experience, gc buffer busy waits on SELECT statement generally happens if you have problems with statistics and execution plans. So, verify that execution plan or concurrency didn’t change recently.

3. Freelists, Freelist groups

What if the object ID we queried in Active Session History belongs to a table block and the statement is an INSERT statement? We need to check to see if that block is a segment header block. If there are many concurrent inserts, and if you don’t use ASSM tablespace, then the inserts need to find free blocks. Segment header of an object stores the free list [ if you don't use freelist groups]. So, concurrent inserts in to a table will induce excessive activity on the Segment header block of that table leading to ‘gc buffer busy’ waits.

Concurrent inserts in to a table with 1 freelist/1 freelist groups will also have contention in a non-segment header block too. When the session(s) searches for a free block in a freelist, all those sessions can get one or two free blocks to insert. This can lead to contention on that block.

Right approach in this case is to increase freelists, free list groups and initrans on those objects (and might need reorg for these parameters to take effect ). Better yet, use ASSM tablespaces to avoid these issues.

4. Other Scenarios

We discussed just few common issues. But, ‘gc buffer busy’ waits can happen for many reasons. Few of them are: CPU starvation issues, Swapping issues, interconnect issues etc. For example, if the process that opened the request for a block did not get enough CPU, then it might not drain the network buffers to copy the buffer to buffer cache. Other sessions accessing that buffer will wait on ‘gc buffer busy’ waits. Or If there is a network issue and the Global cache messages are slower, then it might induce higher gc buffer busy waits too. Statistics ‘gc lost packets’ is a good indicator for network issues, but not necessarily a complete indicator.

As a special case, if the sequences have been kept with lower cache value, then blocks belonging to seq$ table can be an issue too. Increasing cache for highly used sequence should be considered as a mandatory task in RAC.


In summary, next time you encounter this issue, drill down to see which object types and statements are involved. Debug to understand the root cause as ‘gc buffer busy’ waits are usually symptoms. You can read this blog in the traditional format as gc buffer busy waits_orainternals.pdf