Who's online

There are currently 0 users and 33 guests online.

Recent comments


Oakies Blog Aggregator

Removing Redundant Startup/Restart for the OMS Service in Windows

I’ve been told many times that the OMS for EM12c can take quite some time to start on Windows.  Some told me it took anywhere from three to up to fifteen minutes and wanted to know why.  I’ve done some research on the challenge and it is a complex one.

Let’s start this post by stating that even though I’m focusing on the OMS service that is part of the Windows installation of EM12c from Oracle, that in no way is it to blame, nor is it the only application to have this problem, (so this post may help many others) and it has more to do with over-engineering on MANY different non-Oracle levels and in no way is it a bug.  At the same time, it can really impact the quality of user experience with EM12c on Windows and it helps if you know WHAT is causing the challenge vs. what will easily have fingers pointed to as the blame. We all know that Oracle is blamed until proven innocent, so it’s important that we understand what is happening to correct the problem vs. just pointing fingers.

As most DBAs aren’t as familiar with the Windows OS Platform, lets quickly review what a Windows service is and why its important-

A Microsoft Windows services, formerly known as NT services, creates long-running executable applications that run in their own Windows sessions. These services can be automatically started when the computer boots, can be paused and restarted, and do not [require a] user interface.

When installing Enterprise Manager 12c on Windows or installing even the Oracle database on the Microsoft Windows OS platform, a service is created to support the application.  This service can be created a number of ways, but for Oracle, they support the following:

oradim - new -[sid] -intpwd [password] -maxusers [number] -startmode [auto|manual] -spfile [directory location of spfile]
emctl create service [-oms_svc_name  -user  -passwd ]

and then we have Windows method of a service command:

sc create [service name] -binPath= "[path to executable to start app and argument]" start= [auto|manual] displayName= [name to display]

Each of these options are supported to create many of the different services that are needed to support different features/targets in Enterprise Manager and are used as part of the installation process via the Database Configuration Assistant, the Network Configuration Assistant and the Oracle Installer.

One of the enhancements that they are working on for EM12c is moving the java thread startup and stop from serial to multi-threaded processing.  This is going to speed up the start and stop of the OMS extensively, (anyone tracing the startup of an OMS to see where time is being spent will undoubtedly see that over 80% is the weblogic tier….)

Until this enhancement is made, the extended time trips a few safety measures that are built at a number of levels into services to ensure they stay up.  If a service isn’t up, well, you aren’t going to be using the application, so unfortunately for us, this is where the OCD of the development world has come back to haunt us…. :)

Tracing and Advanced Logging

First, we need to get more info from our node manager to see what is starting the service and when it’s timing out and what is restarting it.  We can do this by going to the following:


Make a backup copy and then choose to edit the original file

By default, the loglevel=info

There are numerous log level settings:

  • SEVERE (highest value)
  • INFO
  • FINE
  • FINEST (lowest value)

My recommendation is to set it to FINEST if you really want to log whats going on, but don’t leave it there, as it will produce a lot of logging and unless you are trouble-shooting something, there just isn’t any need for this amount of fine detail, so remember, a restart of the OMS service is required to update any change to the logging.

Update the loglevel info, save the file and restart the service.  The data will be saved to the following file:


To understand more about tracing and logging, see the Oracle Documentation that can take you through it, (as well as save me a lot of typing… :))

Trace and Log Files

em.start                    Tells you if there were any time outs and at what step the timeout occurred.

OracleManagementServer_EMGC_OMS1_1srvc.log  This is the logged startup and shutdown of the actual service.

nodemanager.log     This is the log of the nodemanager’s interaction with the OMS service.

EMGC_OMS1.out    Steps of weblogic startup, java threads and times.

emctl.log                  Also shows timeouts set by emctl start process.

emoms_startup.trc  Shows timeout by connections, (including sqlnet timeouts)

emoms_pbs.trc      Shows actual timeouts at java level

There’s more data out there than this, especially if you use the EM Diagnostics kit, but just to start, it’s a good beginning.


The OMS Service in Windows uses a standard naming convention, so it should look very similar to the one below:



Even though we are seeing one service, it can be controlled by many different daemons to ensure it is always running, as well as managing how long it has before timing out when it starts and restart options.

1. Service Timeouts:

There are two in the registry, depending on the version of Windows server that you have.  These are here to assist you, but due to redundancy, they could impact you as well. These two values control how long to wait for a service to start before timing out and how long to before killing a service or if unresponsive to kill.

To see these, you will be working in the registry.  The registry is the nervous system of the OS, so take great care when working with it and always make a backup of the folder you are working in before making any changes.

Click on Start –> Run Enter “Regedit” and click OK Go to Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control

Right click on the Control folder, choose export and save off the registry file as Services_TO.reg
In right hand “details” view and remove the following values, (either or both may be present) or even better, set them to a time that will allow enough time for the OMS to start before these come in and try to restart:

Remember, any changes you make here do not take effect until after you restart the computer.  You can revert any changes by importing the saved registry file backup you made beforehand and performing another restart.

2. Auto restart of the OMS by the Node Manager

The node manager’s job is to ensure that the Windows Service is up and running for OMS. It is there check the OMS service and if it sees its down, restart it.  If this is attempting to restart the OMS service while the registry setting are attempting to restart the OMS service, well, you are going to start seeing the issue here.

To stop Nodemanager from attempting to auto-restart service upon timeout:

Go to $GCINST_HOME/user_projects/domains/GCDomain/servers/EMGC_OMS1/data/nodemanager/

Create backup of the file and then open the file in an editor such as notepad or wordpad:
go to the following line: AutoRestart=true
Change value to “false
Save the changes and the node manager will no longer attempt to autorestart the service if it sees it down once restarted.

3.  Clustered services added to a failover, Oracle Failsafe or other clustering process, (not RAC).

Clustering, at an OS level is primarily for high availability, so redundant checks and restart options are built in everywhere for Windows services added.  In the example of a failover cluster, the OMS service is added to the failover node.


This allows for it to automatically fail over with the shared Virtual server and shared storage to the passive node and start up if there is a failure.  The clu2 Virtual server has policy settings telling the OS what to do in case of failure and how to restart.  This, by default is applied to all dependent resources and shared storage allocated to it.


As you can see in the clu2 properties, the policies have been set if:

  • A failure occurs, restart services and storage on the original active node.
  • If the restart fails, then failover to the passive node.
  • If the service or resource doesn’t start within 15 minutes, timeout.

You’ll also notice there is an option to not restart, as well as how soon a restart should be attempted.

You can update this at the server level properties, which will automatically propagate to the dependent resources, (it is the master of all policy settings, so you should set them here.)


We have now asked in case of failure, do not restart and don’t timeout for 30 minutes.


I’ve shown you all the redundant settings that have been built in to ensure that the service is restarted and how long it can attempt to start before timing out and if it should restart and how long between restarts.  The key to all this is knowing that only ONE should be managing this.  If you decide to let Oracle manage it, then use the Node Manager settings and disable option 1 and 3.  If you decide to let Microsoft handle it at the Service level, then disable 2 and 3 and so on.

Understand that if they are all left to manage on top of each other, you will have one timing out the start up while the another is still attempting to start and another notes it’s down and issues a restart.  If you wonder why it’s taking 15 minutes or more to start your OMS on Windows, I’ll bet money you trace out the session and you’ll find more than one process attempting to start or restart the poor thing in your logs.

Honesty dictates that we shouldn’t just blame a complex issue on any one contributor and realize that with added complexity comes the need for added skills to ensure that you have the best configuration to support the technology.  Taking the time to trace out and understand the issue will help make that happen.







Copyright © DBA Kevlar [Removing Redundant Startup/Restart for the OMS Service in Windows], All Right Reserved. 2014.

Shrink Tablespace

If you start moving objects around to try and reclaim space in a tablespace there are all sorts of little traps that make it harder than you might hope to get the maximum benefit with the minimum effort.  I’ve written a couple of notes in the past about how to proceed and, more recently, one of the difficulties involved. This is just a brief note about a couple of ideas to make life a little easier.

  • Check that you’ve emptied the recyclebin before you start
  • Before you try moving/rebuilding an object check that the total free space “below” that object is greater than  the size of the object or you’ll find that parts of the object move “up” the tablespace.
  • Before moving a table, mark its indexes as unusable. If you do this then (in recent versions of Oracle) the default behaviour is for the index space be freed as the segment vanishes and you may find that the extents of the table can move further “down” the tablespace.  (If you’ve kept tables and indexes in different tablespaces this is irrelevant, of course).
  • When you move an object think a little carefully about whether specifying an minimum initial extent size would help or hinder the move.
  • Don’t assume that moving the “highest” object first is the best strategy - work out where you expect the final tablespace HWM to be and you may find that it makes more sense to move other objects that are above the point first.
  • Moving objects with several small (64KB) extents first may allow you to free up larger (1MB, 8MB) gaps that can be used by other larger objects”””””
  • Creating a new tablespace and moving objects to it from the old tablespace “top down” may be the quickest way to proceed.  Work towards dropping the old tablespace HWM regularly.


HOWTO: Validate XML content against an XML Schema

Somewhere on this site is a post regarding how to validate against an XML schema…

Will the REAL Snap Clone functionality please stand up?

Every so often our competitors like to spread false information about our solutions so as to gain unfair advantage. I wanted to address this for the Enterprise Manager Snap Clone functionality, so this blog post describes the Snap Clone solution as it currently stands (November 2014).

Let’s start by introducing Snap Clone functionality. I blogged on that a few months back, but a few months can be an eternity in software development terms, so here’s an update on the salient points of what Snap Clone does for you over the various EM releases:

  • EM12cR2 provided Snap Clone for NAS storage (NetApp and Sun ZFSSA).  It provided RMAN backup based clones, and included the Snap Clone Analyzer to show you the storage savings you could make using Snap Clone
  • EM12cR3 added in support for Snap Clone using the Solaris File System (ZFS) and admin flows for Snap Clone for PDB’s (pluggable databases)
  • EM12cR4 added a lot more
    • Snap Clone using CloneDB – this is the biggie, as it means Snap Clone can now be used with ANY Oracle database release that supports CloneDB, regardless of what storage it’s on
    • Data Guard standby as a test master – allows offloading the impact of creating the test master from your Production environment
    • NetApp Ontap 8.x cluster mode support
    • Certification for engineered systems, with I/O over Infiniband
    • Support for NFSv4
  • And coming down the pipe, support for:
    • Integrated data lifecycle management
    • Snap Clone using EMC SAN and ASM
    • Admin flows for test master creation
    • Integration with masking, patching, upgrades etc.

Looking at it from the cloning options that are now supported, it means you can either provide full clones using RMAN Restore, RMAN Duplicate or Data Pump, or thin clones via either software solutions (ZFS and CloneDB) or hardware solutions (Sun ZFSSSA, NetApp and soon EMC).  Let’s touch on some of these in a but more detail.

Snap Clone using Solaris File System (ZFS)

Snap Clone using ZFS uses a single stock Solaris 11.1+ image which can be either physical or virtual (note: it doesn’t use the Sun ZS3 appliance).  It supports both NAS and SAN.  If you are using SAN, then mount the LUNs as raw disk and format with the ZFS filesystem.  It’s important to note here that this does NOT require any snapshot/cloning licenses from the storage vendor, as these features are available for free.

Additional features provided with this solution include compression, de-duplication, I/O caching and so on.  If you also need HA in this configuration, that can be handled externally either via Solaris Clusters, or by using the HA features of the underlying hypervisor.

Diagrammatically, the configuration looks like this:


CloneDB using dNFS

With CloneDB using dNFS, you get the ability to create thin copies of a database from RMAN image copies. This uses the NFS v3 client that’s embedded in the database technology since Currently, this is supported for single instance databases, but only on file systems (i.e. ASM is not yet supported).

The advantages of this approach include:

  • It’s easy to setup
  • No special storage software is needed
  • It works on all platforms
  • It’s both time efficent (instantaneous cloning) and space efficient (you can create multiple clones based on one backup
  • It uses dNFS to improve the performance, high availability and scalability of NFS storage

Snap Clone on ASM and EMC Storage

Using Snap Clone on ASM and EMC storage provides the ability to create ‘live’ thin clones of databases that are on ASM. A live clone is NOT snapshot based but rather a live clone of the database that can be within the same cluster or indeed another one. Both single instance and RAC are supported – supported versions are or higher of the database, 11.2 and higher of the Grid Infrastructure code. This functionality works on both EMC VMAX (with Time Finder VPSnap) and VNX storage appliances.

Diagrammatically, the configuration looks like this:


End to End Automation

Now let’s look at the data lifecycle and how data moves through this environment. To start off with, there are a few concepts you need to understand:

  • Production DB – Obviously, you need to identify the production database used for cloning.
  • Backups – As most (hopefully all!) of us currently do, you need to take regular backups (RMAN, datapump exports, etc.) These backups can then be used through this process as well.
  • Masking / Subsetting – When moving data from the Production database, clearly it’s important to mask sensitive data. Optionally, you may also want to (or indeed, have to with very large databases) subset to reduce the storage footprint
  • Test Master – the Test Master is a sanitized (see previous bullet) copy of production data for use in Dev / Test environments OR a Data Guard Standby database. It can then be used as the source for our snapshotting.
  • Clones – Depending on your needs, these may be full clones or snap (thin) clones. Full clones are often used for performance / stress testing; snap clones may often be used for functional testing. Which one you use is generally determined by the amount of storage you have available to you and the size of the Test Master.
  • Refresh – the Refresh process is what you use to keep your clone in sync with data changes in Production.

How these concepts all relate together is possibly best shown by the following:

Data movement1

A couple of points of explanation:

  1. Notice that the data in the Test Master has been masked but still remains in a format that looks similar to the original data. That’s important if you want to use the clones to examine performance plans that may have gone awry for some reason. One drawback of using the Data Guard standby approach is that because of its very nature, masking and subsetting is not possible in this scenario. You would need to take manual, discrete copies of the data from Production, which could of course be automated to occur at scheduled intervals.
  2. On the flip side, using the Data Guard Standby means that data refresh to the Test Master is both automatic and instantaneous, so your data can be much more up to date than it might be if you were using discrete copies.
  3. The refresh process can occur either against the Test Master or to backups of your Production database. If you have configured this as a self-service admin flow, self-service users can then refresh their existing clones with new data without you needing to be involved.

Full or Snap Clone: How It Works

With that in mind, let’s talk now about the details of how it works. In simple terms, the Test Master (or the Data Guard standby if you’re using that) is regularly refreshed with current data from Production. From the Test Master / Standby, you can make as many scheduled or manual storage snapshots or RMAN backups as you like. These are called “Profiles” and are indicated by the t0, t1, t2, … tN times in the diagram below. From each of these profiles, clones can be taken. Each usercan have a personal read-write database clone, as the data was at the time the profile was created, and of course, can take as many private backups of their clone as they desire:

how it works

Self Service Provisioning

Clearly, all of this is not much use to you as an administrator if you’re the one who has to keep building all of this, so it’s important to have a way of allowing users to build their own environments while ensuring resource usage is restricted based on roles. That’s where self service provisioning comes in. EM12c comes with an out of the box self service portal capability. You as the administrator create a catalog of different database configurations, possibly with a variety of datasets, which the self service user can then select. Have a look at the following diagram:

Self Service Portal

The larger box at the back left shows the standard Database Cloud Self Service Portal, as seen by a self service user. To the left, you can see this particular user has created 4 databases, along with the memory and storage consumption they have taken. This particular user has been granted permission to create 12 databases, using a maximum of 50 Gb of memory and 100 Gb of storage. These limits have been set by you as the self service administrator.

You are also responsible for building a number of templates to define the configurations they can choose from. When the self service user clicks on the “Request Database” button on the Database Cloud Self Service Portal, the box on the right appears, showing them the service templates they are allowed to choose from. In this example I’m showing you, they can choose from:

  • Full 1.5 TB Database Clone – this is a full copy of the Production Database
  • Generic Application Schema – a copy of an database
  • HR Sample Schema – allows the user to create a copy of the HR sample schema with data
  • Small 200 Gb database from RMAN backup – provides a subset of an existing database for functional testing
  • StoreFront Application Schema – creates a copy of an in-house application called StoreFront, complete with data

(NOTE: These are just examples of the sort of thing you can add as templates. Obviously some of these would fail if the self service user tried to create them because the resource quotas you have given them would be exceeded. :) )

Now, let’s look at an example of the user interface you would see specifically for the Snap Clone functionality. As the self service user, when you select a template to create a thin clone, it takes you to a page that looks like this:


The inputs you provide are:

  • Database SID – the SID for the database that will be created as part of this request
  • Database Service Name – the service name that you will use to connect to this database after creation
  • Optionally, a schedule – when will the database be created, and how long will it be available for. The defaults are to start the creation immediately, and a duration of indefinitely, but you can change these to meet your needs
  • User Name and Password – the username and password that will be assigned for you to manage the database that is being created
  • Service Instance Properties – again, these are optional but you can specify things like Lifecycle Status, contact name and so on
  • Snapshots – this is really the most important part, as it is here where you specify the snapshot time (i.e. the profile) that you will use to create the thin clone. In the example shown here, we are using a profile built on September 16, 2014

Once the database has been created, you will see it listed in the Services region of the Database Cloud Self Service Portal:


By clicking on the name of the service as shown above, you are taken to the database home page (note this is still all being done by the self service user, NOT the self service administrator):


It is from here that you can click on the “Refresh” button to refresh the data to a more recent profile. Clicking that button brings up a pop-up window allowing you to select the date you want to refresh your clone to. In this example, I can choose a snapshot that was taken on September 19:


That’s how straightforward the process is! And of course, in inimitable Oracle style, there is so much more coming just around the corner. You can soon

Addressing the Misinformation

Now that you’ve seen the capabilities of the Snap Clone product, let’s come back to the issue I raised at the beginning of this post – misinformation. I’m not really wanting to take aim at any particular company involved here, just the data, so I won’t name them here except as “Product X”. While some of what they are claiming is certainly correct, some of it is only partially true and some of it is just plain wrong. The claims that fall into these latter categories included:

  • Snap Clone allows customers to leverage existing NetApp or ZFS storage hardware more efficiently but Product X installs on commodity hardware – well, yes Snap Clone does allow that, but as I mentioned above it also supports CloneDB using dNFS, and ASM on EMC storage. Adding Clone DB using dNFS, which is functionality supported natively since the release of the Oracle database, means that Snap Clone is supported on any hardware that Oracle Database or later is supported on, not just on NetApp or ZFS. And of course, the addition of EMC storage just broadens that support base even further.
  • Product X is the only technology in the industry to provide “Live Archive” – archiving and providing point in time versions of a database. This is EXACTLY what Snap Clone provides, so please don’t say your product is the only one in the industry that does that!
  • Product X is the only technology in the industry to provide “Business Intelligence” – 24×7 ETL windows and fast refresh of new data in minutes. Again, not true. True business intelligence normally requires a summarized copy of your Production data, plus data from other sources as well, so any product that simply refreshes from your Production database would not have the capabilities needed by BI. If, however, your BI requirements are simple enough that they can be resolved by having a copy of just your Production data, then Snap Clone provides that capability as well.
  • Product X is the only technology in the industry to provide “Data Center Migration” – “Product X supports active replication between Product X appliances making cloud migration simple, efficient and continually synced between in house and cloud.” That functionality also exists in EM12c.
  • “Snap Clone is a feature that is a simple and nice enhancement for the usage of specialized existing hardware, either Netapp or ZFS storage to make static thin clones at one point in time.” – a competitor’s words. As already mentioned, Snap Clone is NOT restricted to using specialized hardware AND clones can be refreshed as needed, so this statement is just plain wrong.
  • Scale, scale, scale – With Snap Clone, you can scale from 1 to 1000’s of clones. Some competitive tools that are out there would require multiple instances of their product to be deployed to achieve that, all of which adds overhead.
  • Protection of your existing investments – using Snap Clone, you have the choice between hardware solutions that you might already have, as well as software solutions. We also use trusted technologies like Data Guard for test master refresh.

I could go on, but let’s leave it there and look more at another important area.

Post Provisioning

Once you have provisioned your data, there are a lot of other important areas that need to be looked at as well. It’s simply not enough to provision new databases and then leave the management of those environments alone. All that does is lead to database sprawl, creating management headaches. So what are the other areas you need to look at? They include:

  • Patching – Any computing environment will, over time, require patching, either as security and bugs issues are found and addressed or as more major releases occur. EM12c provides a fully integrated patch management functionality to address this space
  • Compliance – EM 12c provides a rich and powerful compliance management framework that automatically tracks and reports conformance of managed targets to industry, Oracle, or internal standards. Enterprise Manager 12c ships with compliance standards for Oracle hardware and software including Database, Exadata Database Machine, Fusion Middleware, and more. These compliance standards validate conformance to Oracle configuration recommendations, best practices, and security recommendations.
  • Performance Management – EM12c includes a variety of tools to management performance, from ASH Analystics, SQL Performance Analyzer and Database Replay at the database level to a complete Application Performance Management (APM) solution for custom applications and Oracle applications (including E-Business Suite Siebel PeopleSoft JD Edwards and Fusion Applications).
  • Chargeback – Chargeback is used to allocate the costs of IT resources to the people or organizations who consume them. While it can be applied in situations where IT resources are dedicated, it is particularly relevant in situations where resources are shared, as without some way to meter and charge for consumption there will be a tendency for users to use more resources than they need. This problem is exacerbated in cloud environments where users are able to provision their own resources using self-service.

All of these areas and so many more are covered in EM12c, along with the Snap Clone functionality we started this post looking at. EM12c is Oracle’s flagship management products for all your database needs, and is in sync with database releases. We provided support for functionality such as RAC and the multi-tenant architecture from day 1 of the software being released, whereas competitor products can take months to catch up. In addition, EM12c provides a full security model, including role based access control, which is used by many Fortune 1000 customers. So with all of that why would you look at a point solution that only covers one part of managing your Oracle infrastructure?

Foreign Archived Log in #Oracle – what does it mean?

When you look into V$RECOVERY_AREA_USAGE, you see a strange row at the bottom:

SQL> select * from v$recovery_area_usage;

----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                         10.18                         0              73          0
BACKUP PIECE                             0                         0               0          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

Curious what that could be? You will see values other than zero on a Logical Standby Database:

SQL> connect sys/oracle@logst as sysdba
SQL> select database_role from v$database;


SQL> select * from v$recovery_area_usage;

----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                         14.93                         0               9          0
BACKUP PIECE                             0                         0               0          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                  2.03                         0              26          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

In contrast to a Physical Standby Database, this one writes not only into standby logs but also into online logs while being in standby role. That leads to two different kinds of archive logs:
Foreign Archived LogsWhen DML (like insert and update) is done on the primary 1) that leads to redo entries into online logs 2) that are simultaneously shipped to the standby and written there into standby logs 2) also. The online logs on the primary and the standby logs on the standby will be archived 3) eventually. So far that is the same for both physical and logical standby. But now a difference: Logical standby databases do SQL Apply 4) by logmining the standby or the archive logs that came from the primary. That generates similar DML on the standby which in turn leads LGWR there to write redo into online logs 5) that will eventually get archived 6) as well.

A logical standby could do recovery only with its own archive logs (if there was a backup taken before) but not with the foreign archive logs. Therefore, those foreign archive logs can and do get deleted automatically. V$ARCHIVED_LOG and V$FOREIGN_ARCHIVED_LOG can be queried to monitor the two different kinds of logs.

That was one topic of the course Oracle Database 12c: Data Guard Administration that I’m delivering as an LVC this week, by the way. Hope you find it useful :-)

Tagged: Data Guard, High Availability

Parallel Fun

As I write, there’s an ongoing thread on Oracle-L that started with the (paraphrased) question: “I’ve got this query that returns 7 million rows; when I change it to ‘select count(*)’ it returns in 4 seconds but when we display the full result set on screen it takes hours, and every second or two the screen pauses; how do I make it go faster.”

The general rapid response was: “You shouldn’t be running 7M rows to a screen – the time is the time for the network traffic and display.”

The first part of the statement is right – the second part is quite likely to be wrong and there’s a very strong hint in the question that makes me say that, it’s the “pauses every second or two”. Of course we don’t know what the OP isn’t telling us, and we don’t know how accurate he is in what he is telling us, so any ideas we have may be completely wrong. For example, we haven’t been given any idea of how long a “pause” is, we don’t really know how accurate that “second or two” might be and whether “every” is an exaggeration, and maybe the query is returning CLOB columns (and that could make a big difference to what you can do to improve performance).

If we take the statement at face value, though, there is one very obvious inference: although some of the time will be due to network traffic time, most of the time is probably due to Oracle doing something expensive for a significant fraction of the rows returned. The pattern of activity probably looks like this:

  • client: call server to fetch next array of rows
  • server: spend some time populating array  – this is where the client sees a pause
  • client: display result array
  • client: call server to fetch next array of rows
  •  etc…

Here’s a trivial example:

connect / as sysdba
set arraysize 500
set pagesize 40

        o1.spare1 ,
        select  max((ctime))
        from    obj$    o2
        where   o2.owner# = o1.owner#
        and     o2.obj# < o1.obj#
        ) ct
from obj$ o1

On my laptop, running an instance of with about 80,000 rows in obj$ (and a lot of them owned by SYS), I can count seconds and find that (approximately) I alternate between one second watching results scrolling up the screen and one second waiting as the server generates the next 500 rows.

Of course it’s possible to argue that the problem really is the network and nothing but the network struggling to cope with the never-ending stream of little packets produced by 7M rows. Could there be a choke point that causes the data to stop and start with great regularity, maybe – but previous experience says probably not. I have experienced bad network problems in the past, but when they’ve occurred I’ve always observed extremely random stop/go behaviour. The regularity implied in the question makes the Oracle-based problem seem far more likely.

Conveniently a couple of people asked for more clues – like the query text and the execution plan; even more conveniently the OP supplied the answers in this response. Since the email format makes them a little hard to read I’ve copied them here:

SELECT  bunch of stuff.....,

                ORDER BY sr1.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr1
                WHERE   sr1.SID                    = slv.SID
                        AND sr1.RELATIONSHIP_LEVEL = '1'
                GROUP BY sr1.SID
                ORDER BY sr2.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr2
                WHERE   sr2.SID                    = slv.SID
                        AND sr2.RELATIONSHIP_LEVEL = '2'
                GROUP BY sr2.SID
                ORDER BY sr3.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr3
                WHERE   sr3.SID                    = slv.SID
                        AND sr3.RELATIONSHIP_LEVEL = '3'
                GROUP BY sr3.SID
                ORDER BY sr4.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr4
                WHERE   sr4.SID                    = slv.SID
                        AND sr4.RELATIONSHIP_LEVEL = '4'
                GROUP BY sr4.SID
                ORDER BY sr5.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr5
                WHERE   sr5.SID                    = slv.SID
                        AND sr5.RELATIONSHIP_LEVEL = '5'
                GROUP BY sr5.SID
FROM    service_lookup slv
        LEFT JOIN service_location sl
        ON      sl.service_location_id = slv.service_location_id;

Plan hash value: 1570133209

| Id  | Operation                        | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
|   0 | SELECT STATEMENT                 |                      |  7331K|  5593M|  1877   (5)| 00:00:01 |        |      |            |
|   1 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|   2 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)           | :TQ10000             |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | PCWC |            |
|*  5 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | PCWP |            |
|   6 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|   7 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|   8 |    PX SEND QC (RANDOM)           | :TQ20000             |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | P->S | QC (RAND)  |
|   9 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | PCWC |            |
|* 10 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | PCWP |            |
|  11 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  12 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  13 |    PX SEND QC (RANDOM)           | :TQ30000             |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | P->S | QC (RAND)  |
|  14 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | PCWC |            |
|* 15 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | PCWP |            |
|  16 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  17 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  18 |    PX SEND QC (RANDOM)           | :TQ40000             |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | P->S | QC (RAND)  |
|  19 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | PCWC |            |
|* 20 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | PCWP |            |
|  21 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  22 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  23 |    PX SEND QC (RANDOM)           | :TQ50000             |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | P->S | QC (RAND)  |
|  24 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | PCWC |            |
|* 25 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | PCWP |            |
|  26 |  PX COORDINATOR                  |                      |       |       |            |          |        |      |            |
|  27 |   PX SEND QC (RANDOM)            | :TQ60002             |  7331K|  5593M|  1877   (5)| 00:00:01 |  Q6,02 | P->S | QC (RAND)  |
|* 28 |    HASH JOIN RIGHT OUTER BUFFERED|                      |  7331K|  5593M|  1877   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  29 |     PX RECEIVE                   |                      |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,02 | PCWP |            |
|  30 |      PX SEND HASH                | :TQ60000             |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | P->P | HASH       |
|  31 |       PX BLOCK ITERATOR          |                      |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | PCWC |            |
|  32 |        TABLE ACCESS STORAGE FULL | SERVICE_LOCATION     |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | PCWP |            |
|  33 |     PX RECEIVE                   |                      |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  34 |      PX SEND HASH                | :TQ60001             |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | P->P | HASH       |
|  35 |       PX BLOCK ITERATOR          |                      |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWC |            |
|  36 |        TABLE ACCESS STORAGE FULL | SERVICE_LOOKUP       |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWP |            |

We have a simple two-table outer join, and five scalar subqueries in the select list. (Not being very familiar with the various XML calls I had no idea of what the scalar subqueries were doing, or how they produced a result, beyond the fact that they were querying and aggregating multiple rows. In fact the combination of calls does much the same as listagg(), though it allows for a CLOB result (which could be part of the performance problem, of course) rather than being limited to a varchar2() result).

Would you like to guess at this point why I constructed my demonstration query again obj$ the way I did when presenting the idea of high-cost per row queries as a reason for regular pauses in the output ? The execution plan matched one of my two initial guesses about what the query was going to look like. When you “select count(*) from {this query}”, the optimizer will factor out the scalar subqueries and only have to count the result set from the hash join – and it might even manage to use a couple of parallel index fast full scans to get that result rather than doing the tablescans. When you run the query you have to run the scalar subqueries.

If we trust the statistics, we have 5 subqueries to run for each row of the hash join – and the hash join is predicted to return 7.3 million rows. Given that the subqueries are all going to run parallel tablescans against a fairly large table (note – the cost of the tablescans on SERVICE_RELATIONSHIP is 368, compared to the cost of the tablescan on SERVICE_LOCATION which is 366 to return 3.1M rows) that’s an awful lot of work for each row returned – unless we benefit from an enormous amount of scalar subquery caching.

Here’s another performance threat that the plan shows, though: notice where the PX SEND QC operation appears – that means the PX slaves send their (7M) rows to the Query Co-ordinator and the QC is responsible for doing all the work of running the scalar subqueries. Another interesting little threat visible in the plan shows up in the TQ column – the plan uses six “data flow operations” (using the original naming convention, though that changed some time ago but survived in the column names of v$pq_tqstat). In principle each DFO could allocate two sets of PX slaves (and every DFO could have a different degree of parallelism); in this example DFO number 6 (the driving hash join) uses two sets of slave, and the other five DFOs (the scalar subqueries) use a single set each. The upshot of this is that if the default degree of parallelism in play is N this query will allocate 7N parallel query slaves. It gets a little nastier than that, though (based on checking the output from v$sql_plan_monitor), because each time one of the scalar subqueries runs Oracle seems to allocate and deallocate the slaves that are supposed to run it – which is probably going to cause some contention if there are other parallel queries trying to run at the same time.


So what could you do with this query ? It depends on how much change you want to make to the code.

It’s possible that an index on service_relationship(relationship_level, sid) – with compress 1 – might help if it’s very precise, and if the target table stays in the buffer cache for the duration of the query – but, in the absence scalar subquery caching that could still leave the query co-ordinator executing 35 million (5 queries x 7 million rows) subqueries in a serialised process.

A better bet may be to convert from subqueries to joins – remembering that the listagg() / xmlserialize() calls will require you to aggregate (which means sorting in this case) an estimated 25 rows per driving row per relationship_level; in other words you may need to sort 7M * 125 = 875M rows – but at least you could do that in parallel, and there’s always the possibility that the estimated 25 drops off as you work through the different levels. You could choose to do 5 outer hash joins or (as Iggy Fernandez outlined in the thread) you could do a single outer join with a decode on the relationship_level. Another variation on this theme (which would probably have a plan showing ‘join then aggregate’) would be to ‘aggregate then join’. It’s possible that creating a non-mergeable inline view for the 5 values of relationsip_level from a single table access, aggregating it to produce the five required columns, then using the result in an outer join, would be the most efficient option. In the absence of a detailed understanding of the data volume and patterns it’s hard to make any prediction of which strategy would work best.


I may be wrong in my analysis of this problem. When I first saw the question the reason for the performance pattern suggested an “obvious” design error in either the SQL or the infrastructure, and when I saw that the query and execution plan matched my prediction it became very hard for me to think that there might be some other significant cause.

There were a couple of interesting details in the execution plan that made me pursue the problem a little more. In the first case I built a very simple model to get an estimate of the time needed to display 7M rows of a reasonable width in SQL*Plus running across a typical LAN (my estimate was in the order of 45 minutes – not hours); then I spent a little more time (about 10 minutes) to build a model that reproduced the key features of the execution plan shown.

I then spent two or three hours playing with the model, and I’ll be writing a further blog with some of the results later on. One detail to carry away today, though, is that in 12c Oracle can do a new form of subquery unnesting which transformed the query from its 5 scalar subquery form into the seven table join form that was one of the suggestions made on the thread; even more interestingly, if I blocked the unnesting (to force the subquery execution) Oracle came up with a new operator (EXPRESSION EVALUATION) that allowed it to run the subqueries from the PX slaves before passing the results to the query co-ordinator – in other words eliminating the serialisation point.

To be continued …

Quick Links to Helpful Literature on Oracle Database In-Memory Column Store.

I’m surprised to find that Google is not cleanly ranking the helpful set of blog posts by Oracle’s Maria Colgan on the Oracle Database 12c In-Memory Column Store feature so I thought I’d put together this convenient set of links. Google search seems to only return a few of them in random order.

Over time I may add other helpful links regarding Oracle’s new, exciting caching technology.

Starter Information

Getting Started With Oracle Database In-Memory. Part I.

Getting Started With Oracle Database In-Memory. Part II.

Getting Started With Oracle Database In-Memory. Part III.

Getting Started With Oracle Database In-Memory. Part IV. 

In-Memory Column Store With Real Application Clusters

The following are links to information about Oracle Database In-Memory on Real Application Clusters:

Oracle Database In-Memory on RAC. Part I.

In-Memory Product That Requires Proprietary Storage?

How could the brand of storage matter for an in-memory cache feature? Good question.

Fellow Oaktable Network member Christian Antognini has produced a very important article regarding how Oracle Database 12c In-Memory Column Store with Real Application Clusters is questionable unless using Oracle storage (Exadata, SPARC SuperCluster).  I found Christian’s article very interesting because, after all, the topic at hand is an in-memory cache product (a.k.a., In-Memory Column Store). I fail to see any technical reason why Oracle wouldn’t support an in-memory product with blocks from any and all storage. It is in-memory after all, isn’t it? Please visit Christian’s article here: The Importance of the In-Memory DUPLICATE Clause for a RAC System.






Filed under: oracle

Audio semi-Visual Presentation on Clustering Data in Oracle

I suppose it had to happen eventually but one of my presentations has ended up on YouTube. It’s a recent presentation I did for the Oracle Midlands user group in September.

The topic is (as the title of this blog post hints at!)Boosting select performance by clustering data. The video consists of the slides I presented, changing as the presentation progresses, with my audio over the top. It goes on for a bit, close to an hour, but you could watch a section and then go and do something else before watching a bit more.

I have to say, it is very odd hearing my voice (and the slight touch of the “brummie” {Birmingham} accent coming through) and I do wince at the places where I blather or say something slightly wrong or make a joke that involved a visual element that is lost. Oh well, at least you don’t see me wandering around and jumping up,literally, to point out bits on the slides.

I’m glad to say I will be repeating a slightly more polished version of the presentation at this year’s UKOUG Tech14 conference in December. I was a bit under the weather back on September the 16th, due to having just returned to the Working Life, and with this perfect example of what I did then I should be able to make the next shot at it a cracker… maybe.

On the topic of Oracle Midlands, I like this user group that is run by Mike Mckay Dirden, the meetings consist of evening presentations in Birmingham with a bit of support from Redgate. This includes half-time samosas to keep you going. The next meeting is described here and is on Tuesday 25th November. This meeting has two presentations by my friend Bjoern Rost, who is an Oracle Ace Director {gasps of appreciation from the audience} and a consummate presenter. I wish I could make it there as I would like to share a pint with Bjoern (well, he’ll have wine or a cocktail I suspect as he is not a beer fan) as well as some of my other friends up that part of the country.

EM12c, Rel. 4, OMS and OMR Health, Part II

There are a large number of “moving parts” when performance tuning or trouble shooting an Enterprise Manager environment.  The new EM performance features, (available in release are there to assist you in understanding the source of the issue and can really make the difference for those that are unfamiliar with the challenges of Weblogic, java, network or other complexities that make up EM12c and aren’t commonly thought of as part of the DBA’s job role.

Now that we’ve finished with the Health Overview, we can look deeper into the health and performance of the two most well known components of the EM12c architecture, the Oracle Management Repository, (OMR) and the Oracle Management Services, (OMS).

Due to the impressive features offered in the new EM performance consoles, I’m going to break these up into multiple posts and start with OMR and focus on the Repository Tab.

The Repository

The Repository Tab is accessed via the same Setup menu in EM12c console:


Once accessed, there are a number of tabs at the top of the page, Repository, Metrics and Schema.  Starting with the Repository tab, (Left to Right) we’ll inspect what specific performance data is important when reviewing an OMR.

Repository Tab

The Repository page is going to display a number of graphs that tells you everything from specific information about the OMR database, incidents involving the OMR database and even how collections performance at the database repository level.  It is important to remember that this tab is all about the Repository, (OMR) and should not be confused with the Service, (OMS).

Basic Information

We begin by viewing information about database name, space allocated, space used number of current sessions actively connected to the OMR.


All of these are links, so you can click on the link and a you’ll be taken to a detailed view of the data and more information to investigate if you have questions.

For the Target type for the OMR, you can click on the target name, (db name) and the console will take you to the home page for the OMR database.

Click on Last Backup date and the console will take you to the backup activity report for the OMR database.

Click on Space Used and the console will then bring you to the Management Services and Repository page drill down for Repository Tablespace Used.


There is a ton of information in this location and we’ll dig deeper into it as a separate post, but just to understand how user friendly the interface is, note the links you have at your fingertips right here.

If you click on Management Service Repository Sessions, the following table with session type and counts will display:



On the right hand side of the top of the page, we access to the incidents linked to the OMR.  No other incidents will be listed except the ones connected to the OMR, so this is a great place to check first when you are experiencing issues.


Notice that it includes incidents for page processing time outs to the OMS and collection timeouts.  This can be very helpful when you are experiencing slow response and need to know where the issue is sourced from.

Initialization Parameters for the OMR

Not only does the next graph identify what size category you fall into for your Enterprise Manager environment, (small, medium or large) but it also lets you know if any of your parameters are outside of the recommended sizing for that category.


In our example, you can see that we don’t have a set MEMORY_TARGET value and that is outside of our compliance as we recommend to have this set to one.  We can also view each of the values we do have set and how they compare to what Oracle thinks the minimum value for that category of OMR size should be.

Job Scheduler Status

To the right of the Init parameters is all the graph with information pertaining to all the jobs running in the OMR to support the Enterprise Manager environment.  Unlike Job Activity in the console, this is reporting all those jobs that are taking care of the Enterprise Manager.

If a job fails and you have the option to edit the schedule to run again, (the glasses icon) then you can click on the glasses and the following popup will show and you can then enter a new time for the job to retry:


Once you enter in the new time to run the job, click on Save and verify that the job has been successful in the console view, (green check mark vs. a red X.)

Collections Performance

At the bottom left, Collections is the next category that’s covered.  If collections aren’t uploading to the OMR, then the console isn’t able to provide the most up to date data and notifications of incidents and alerts aren’t sent out to notify administrators of issues.  Timely collections and the performance of collections is of a great concern to an EM Cloud Control administrator.


The graph is well laid out and shows clearly the number of collections in backlog and throughput performance.   The top of the graph, when hovered over, will show you the warning and critical threshold line for number of backlogs allowed.

Backlog is an issue, as if it gets too high and hits the threshold, your agent can stop uploading.  You can also see the duration, on average of the collections and view over time if the duration is increasing.  If you use a lot of metric extensions or plug-ins, this is something you’ll want to monitor, so this graph is extremely helpful when inspecting collection performance.

By hovering your cursor over the Collections Backlog line in the graph, I then am offered a number of options to look into the performance:


You have the option to click on Problem Analysis to look into the backlog, Metrics Detail or go to the Target Home.

Problem Analysis

As my EM environment is running quite smoothing at the OMR level, there isn’t a lot to show you in the Problem Analysis, but I wanted to at least give everyone a peak into this cool, new tool.


First of all, if I did have an issue, there would be collections showing in backlog.  This is very important for an administrator to check and ensure that backlog is not occurring.

As there is no backlog, you can see, my resource usage by my collections is pretty consistent and quite below the thresholds expected for most of the resource types shown:


You can also export the data from the table view, (small link at the bottom right of the screen, not shown) if you need the raw data.

You will note that my memory utilization is creeping, little by little to the critical threshold.  This is commonly due to java garbage collection causing a small memory leak and should be reviewed from time to time.  If it is considerable, the java heap should be examined and a more efficient value set.

Adding Metrics to the Performance Analysis

On the right hand side of the Performance Analysis, you will notice the Metric Palette.  This offers you the opportunity to go from the standard configuration to display more data on the existing metrics or add analysis on other metrics, such as Agents and Page Performance.

It’s important to know, even though you can be brought to this page from many different links within the OMR/OMS Performance pages, while you are in the Performance Analysis, you can inspect other performance metric factors than just the original ones you are reviewing.

For our example, we’ll add an additional metric graph,(Time estimates for clearing backlog) for review to the left hand analysis page-


We now have an additional graph on the left hand side analysis to compare to our existing data to see if the load times correlate to resource usage:


This can be done for dozens of metrics and offers some incredible analysis power when researching performance issues with EM12c.  The Performance Analysis link is one of the most powerful tools for locating where a bottleneck in performance is coming from and very quickly.  The fluid ability to add metrics to the graphs section and see how they correspond to the other resource usage is incredibly beneficial as well.

Metric Details

Now back to our Collections graph, if you remember we had three options when we click on the blue line:


By clicking on the Metrics Details link, we’ll then go to performance page for All Metrics.


This page displays information about the number of short and long running collections in backlog and will display the status if the threshold value has been hit for backlog quantity.  The page functions similar to Incidents, in that you can click on the right middle button to display the Task Class information highlighted to full page.

You are also offered the option to modify thresholds if the current values don’t meet the demands of the system is under currently, but know that the recommended values are their for a reason and the option to change them should be seriously researched beforehand.

Target Home

This link takes you to the Overview and Health page for the OMR.  I get to save a lot of typing by just sending you to my blog post on this great feature! :)

A final clarification, too-  the three options available, Performance Analysis, Metric Details and Target Home are options available for each metric by double-clicking in the Repository Collections Performance or the Metric Data Rollup graph, which we’ll discuss next.

Metric Data Rollup Performance

The last graph, in the right hand bottom corner, is the for metric data.  This graph displays the number of metric records rolled up and the throughput per minute for this data to be uploaded into the OMR.

We again have the ability to inspect performance analysis by double-clicking on the metric in the graph.


Each of the three options work almost exactly the same way as I demonstrated for the Collections Performance, but the data is based on the metrics rollup.

The main functionality of each of these sections is to realize how many different ways you can do performance analysis on different performance data:


Yes, even the legend can be clicked on and a detail option chosen.

That completes the review of the Repository Tab, remember, I have two more tabs to cover in posts before we dig into the Management Services and Agents performance consoles.

















Copyright © DBA Kevlar [EM12c, Rel. 4, OMS and OMR Health, Part II], All Right Reserved. 2014.

The Importance of the In-Memory DUPLICATE Clause for a RAC System

With the INMEMORY clause you can specify 4 sub-clauses:

  • The MEMCOMPRESS clause specifies whether and how compression is used
  • The PRIORITY clause specifies the priority (“order”) in which the segments are loaded when the IMCS is populated
  • The DISTRIBUTE clause specifies how data is distributed across RAC instances
  • The DUPLICATE clause specifies whether and how data is duplicated across RAC instances

The aim of this post is not to describe these attribues in detail. Instead, this post shows you the impact of the DUPLICATE clause when a query is executed in a RAC environment. For a short description, have a look to this table in the Database Reference guide. For more details, have a look to the Oracle Database In-Memory white paper.

It is essential to understand that the DUPLICATE clause (also known as “Fault Tolerant In-Memory Column Store”), according to the Oracle Database Licensing Information guide, requires Exadata or Supercluster. Also note that, according to this Twitter conversation, Oracle also plans to support the feature on ODA. Hence, the idea is to support it on all engineered systems.

In this post I would like to show you examples that make clear why the DUPLICATE clause is key to get good performance on a RAC system. To do so, I will execute a simple query like “SELECT sum(col) FROM tab” with different settings.

First of all, let me set the scene…

  • Since the In-Memory option is required, I have to use version
SQL> SELECT banner
  2  FROM v$version;

Oracle Database 12c Enterprise Edition Release - 64bit Production
PL/SQL Release - Production
CORE      Production
TNS for Linux: Version - Production
NLSRTL Version - Production
  • What you cannot see from the previous output is that the database is hosted on RAC/Exadata system
  • The test table stores 12GB of data in about 1.5 million blocks (BTW, this is the same table I used in this post)
SQL> SELECT blocks, round(blocks*8192/1024/1024) AS size_mb
  2  FROM user_tables
  3  WHERE table_name = 'T';

---------- ----------
   1550061      12110
  • Note that parallel processing is disabled at the table level
  • At the beginning, make sure that the In-Memory Column Store (IMCS) is not used for the test table

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

no rows selected


Let me prepare the environment for the first test:

  • One instance is up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  • Specify to use the IMCS for the test table and trigger the population

SQL> SELECT count(*)
  2  FROM t;

  • After waiting for few minutes, check that the population is over (POPULATE_STATUS=COMPLETED) and that the test table is completely stored in the IMCS (BYTES_NO_POPULATED=0).
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

---------- --------- ------------- ---------- -------------------
         1 COMPLETED    1.0921E+10 1.2707E+10                   0
  • Finally, flush the buffer cache (note that the IMCS is not impacted by this action)

Now that the environment is ready, I execute the test query and, at the same time, I activate extended SQL trace and gather runtime information:

  • The execution took 2.75 seconds
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;


Elapsed: 00:00:02.75

SQL> execute dbms_session.session_trace_disable
  • A serial execution taking advantage of an IM scan (TABLE ACCESS INMEMORY FULL) was used
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:02.73 |       4 |      2 |
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:02.73 |       4 |      2 |
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:00.91 |       4 |      2 |
  • The corresponding extended SQL trace information as shown by TKPROF is the following
SELECT /*+ gather_plan_statistics */ sum(n_32)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.72       2.73          2          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.72       2.73          2          4          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=4 pr=2 pw=0 time=2730184 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=4 pr=2 pw=0 time=907260 us cost=17717 size=300000000 card=100000000)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read                 2        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

In summary, the execution was, for a query that had to process 12GB of data, quite fast. This was possible because all data was extracted from the IMCS.


The second test uses two instances. As a result, after starting the second instance, the Im-Memory attributes of the test table needed to be changed, and the IMCS had to be re-populated.

  • Two instances are up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  • Data is duplicated on both instances
  • Trigger population
SQL> SELECT count(*)
  2  FROM t;

  • After waiting for few minutes, check that the population is over and that test table is completely stored in both IMCS
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

---------- --------- ------------- ---------- -------------------
         2 COMPLETED    1.0921E+10 1.2707E+10                   0
         1 COMPLETED    1.0921E+10 1.2707E+10                   0
  • Flush the buffer cache of both instances

Of course the test is the same as for TEST 1. The only difference is that the test table is stored in the IMCS of both instances.

  • The execution took 2.97 seconds
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;


Elapsed: 00:00:02.97

SQL> execute dbms_session.session_trace_disable
  • It goes without saying that the execution plan is the same as for TEST 1
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:02.97 |       4 |      2 |
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:02.97 |       4 |      2 |
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:01.02 |       4 |      2 |
  • And here is the extended SQL trace information
SELECT /*+ gather_plan_statistics */ sum(n_32)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.95       2.96          2          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.95       2.97          2          4          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=4 pr=2 pw=0 time=2967854 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=4 pr=2 pw=0 time=1023321 us cost=17717 size=300000000 card=100000000)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read                 2        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

In summary, there is no real difference between TEST 1 and TEST 2.


The third test also uses two instances. But, this time, data is not fully duplicated on each instance. Instead, it is only distributed. This means that every row of the test table is stored in the IMCS of either one instance or the other.

Let me prepare the environment:

  • Two instances are up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  • Data is not duplicated on both instances
  • Trigger population
SQL> SELECT count(*)
  2  FROM t;

  • After waiting for few minutes, check that the population is over. Notice that since no instance contains all data, the BYTES_NOT_POPULATED column remains greater than 0
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

---------- --------- ------------- ---------- -------------------
         2 COMPLETED    5726535680 1.2707E+10          6029524992
         1 COMPLETED    5194317824 1.2707E+10          6652174336
  • Flush the buffer cache of both instances

Once more, the same SQL statements are used to run the test:

  • The execution took 2 minutes and 40 seconds!!!!!
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;


Elapsed: 00:02:39.95

SQL> execute dbms_session.session_trace_disable
  • The execution plan is the same as for TEST 1 and TEST 2. The difference is that a huge number of logical and physical I/O were performed
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:02:39.94 |    1086K|   1086K|
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:02:39.94 |    1086K|   1086K|
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:02:31.06 |    1086K|   1086K|
  • The information provided by extended SQL trace confirms what dbms_xplan shows. In addition, it also shows that the poor performance is due to the high number of single-block physical reads that were performed (about 275K)
SELECT /*+ gather_plan_statistics */ sum(n_32)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     93.50     159.94    1086979    1086981          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     93.50     159.94    1086979    1086981          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=1086981 pr=1086979 pw=0 time=159942519 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=1086981 pr=1086979 pw=0 time=151063172 us cost=18478 size=300000000 card=100000000)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                        1        0.00          0.00
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read            274946        0.03         73.01
  gc cr multi block request                    6368        0.03          4.19
  cell multiblock physical read                6368        0.02         10.00
  latch: cache buffers lru chain                  6        0.00          0.00
  latch: gcs resource hash                        2        0.00          0.00
  latch: gc element                             113        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

Since the bad performance is caused by the number of physical I/O, it goes without saying that if only logical I/O are taking place, the performance is much better. In this case, an immediate re-execution of the query took 8.28 seconds.

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;


Elapsed: 00:00:08.28

SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:08.28 |    1086K|
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:08.28 |    1086K|
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:08.26 |    1086K|

In summary, only part of the data was accessed through the IMCS. In addition, despite the fact that an IM scan was processed, except if data was in the buffer cache, many single-block physical reads were carried out. As a result, the performance was bad. This behaviour shows that a session is only able to access the IMCS it is managed by the instance it is connected to. This is also a good example showing that accessing data through the IMCS is faster than accessing the same data through the buffer cache.

The fourth and final test is a continuation of the previous one. Its aim is to check the performance of a parallel query. Hence, I added the PARALLEL hint to the test query. Also note that I flushed the buffer cache of both instances before executing the test query.

SQL> SELECT /*+ gather_plan_statistics parallel */ sum(n_32)
  2  FROM t;


Elapsed: 00:00:01.49

SQL> SELECT * FROM table(dbms_xplan.display_cursor('br2as2yzy7gk4',2,'iostats last'));

SQL_ID  br2as2yzy7gk4, child number 2
SELECT /*+ gather_plan_statistics parallel */ sum(n_32) FROM t

Plan hash value: 3126468333

| Id  | Operation                       | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
|   0 | SELECT STATEMENT                |          |      1 |        |      1 |00:00:01.41 |       7 |      1 |
|   1 |  SORT AGGREGATE                 |          |      1 |      1 |      1 |00:00:01.41 |       7 |      1 |
|   2 |   PX COORDINATOR                |          |      1 |        |     10 |00:00:01.34 |       7 |      1 |
|   3 |    PX SEND QC (RANDOM)          | :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   4 |     SORT AGGREGATE              |          |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   5 |      PX BLOCK ITERATOR          |          |      0 |    100M|      0 |00:00:00.01 |       0 |      0 |
|*  6 |       TABLE ACCESS INMEMORY FULL| T        |      0 |    100M|      0 |00:00:00.01 |       0 |      0 |

Predicate Information (identified by operation id):

   6 - inmemory(:Z>=:Z AND :Z<=:Z)

   - automatic DOP: Computed Degree of Parallelism is 10
   - parallel scans affinitized for inmemory

The execution took 1.49 seconds. In other words, the performance of the parallel query is way better than the one of TEST 3. The reason can be seen in the Note section of the dbms_xplan output: "parallel scans affinitized for inmemory". This means that parallel slaves were started on both instances and that every set of parallel slaves accessed only the data stored in the IMCS of the instance they were attached to. As a result, even though the data was distributed, the performance was good. The only "overhead" compared to the serial execution is that the parallel slaves had to communicate with the query coordinator.


  • DUPLICATE is only applicable to engineered systems (as I write this blog, ODA does not yet support it)
  • On a RAC system, without DUPLICATE, data is distributed across all instances
  • A process can only access data stored in the IMCS of the instance it is connected to
  • Even though an IM scan is used, serial executions on RAC systems might require many logical/physical I/O

In summary, I would not advise to use the In-Memory option on a RAC system that does not support the DUPLICATE clause. At least, not in general.