Search

OakieTags

Who's online

There are currently 1 user and 39 guests online.

Online users

Recent comments

Affiliations

Oakies Blog Aggregator

Database Thin Cloning : clonedb (Oracle)

This post is part of an ongoing series:

(very interested in real life experiences and alternative strategies to using EMC, Netapp, cloned, ZFS or other for thin cloning. Please comment and or email me at kylelf@gmail.com)

 

A production database is full of data that makes sense for its purpose. There are metadata tables for important business facts, tables that hold history, and tables that are used for processing transactions. Whether the database is 10GB or 10TB, the data it contains is the data it needs (with the exception of the odd DBA created temporary table they forgot to drop).

If you take that database and clone it for QA or development, a completely new point of view emerges. Suddenly the data is cumbersome, unnecessary. Terabytes of disk are provisioned simply to hold a replica for the purpose of testing a small subset of it. An entire architecture with its myriad support structures, historical data, indexes, and large objects cloned and ready just to be partially used, trashed, and rebuilt. This is waste, both of time and storage resources. To the business, having a duplicate environment makes absolute sense; however, from an IT point of view the repeated duplication of storage space and  the time drain it cause just makes no sense at all.

When database copies are made from the same master database (the source environment , the master database, being used for cloning), typically 95% or more of the blocks are duplicated across all copies. In a QA, development, and reporting environments there will almost always be some changes exclusive to the cloned system; however, the amount is usually extremely small compared to the size of the source database. The unchanged blocks are redundant and take up massive amounts of disk space that could be saved if the blocks could somehow be shared.

Yet sharing duplicate blocks is not an easy feat in most database environments. It requires a technology that can act as a foundational cornerstone that coordinates and orchestrates access to duplicate blocks. At the same time, it requires the database copies to be writable with their own private modifications that are hidden from the source or other clones made from the source.

There are several technologies available in the industry that can accomplish block sharing across database clones. The primary technology involves filesystem snapshots that can be deployed across multiple clones. This concept is known as thin cloning, and it allows filesystems to share original, unmodified blocks across multiple snapshots while keeping changes on the target clones private to the clone that made the changes.

But as with all new technologies, there are many methods available that all accomplish the same task. Likewise with the taks of  thin cloning, there are multiple vendors and methods that provide this technology.

 

Thin Cloning Technologies

The main requirement of database cloning is that the database files and logs must be in a consistent state on the copied system. This can be achieved either by having datafiles in a consistent states (via a cold backup) or with change logs that can bring the datafiles to a consistent state. The cloning process must also perform prerequisite tasks like producing startup parameter files, database definition files, or other pre-creation tasks for the target database to function. For example, Oracle requires control files, password files, pfile/spfiles, and other pre-created components before the target can be opened for use. Controlfiles, logfiles, and datafiles together constitute a database that can be opened and read by the appropriate DBMS version.

In order to share data between two distinct, non-clustered instances of a database, the two instances must believe they have sole access to the datafiles. Modifications to the datafiles in one instance (a production database, for example) cannot be seen by the other instance (a clone) as it would result in corruption. For thin clones, datafiles are virtual constructs that are comprised of the shared common data and a private area specific to the clone.

There are a number of technologies that can be leveraged to support thin cloning. These technologies can be broken down into

  • Application software made to manage access to shared datafile copies
  • Copy-on-write filesystem snapshots
  • Allocate-on-write filesystem snapshots

The difference in these technologies is substantial and will determine how flexible and manageable the final thin cloning solution will be.

Software Managed Thin Cloning

Because it is part of the existing stack, one approach to sharing databases is to have the database software itself as the thin cloning technology. In this scenario the DBMS software itself orchestrates access to a shared set of database files while modifications are written to an area that is private to each database clone. In this way the clones are managed as part of the DBMS software itself. This is the approach Oracle has taken with Clonedb, a software model introduced in Oracle 11gR2 patchset 2.

Clonedb

Clonedb manages the combination of shared and private data within an Oracle environment. By maintaining a central set of read only datafiles and a private area for each clone. The private area is only visible to each clone, guaranteeing that cross-clone corruption cannot occur. The Oracle RDBMS software orchestrates access to the read only datafiles and private areas maintained by the clones.

Oracle’s Clonedb is available starting in Oracle version 11.2.0.2 and higher. The cloned option takes a set of read only datafiles from an RMAN backup as the basis for the clone. Clonedb then maps a set of ‘sparse’ files to the actual datafiles. These sparse files represent the private area for each clone where all changes are stored. When the clone database needs a datablock, it will look in the sparse file; if it is not found there, then the cloned database looks in the underlying read only backup datafile. Through this approach, many clones can be created from a single set of RMAN datafile backups.

The greatest benefit of this technology is that by sharing the underlying set of read only datafiles, the common data shared between each clone does not have to be replicated. Clones can be created easily and with minimal storage requirements. With time and space no longer a constraint, cloning operations become far more efficiently and with minimal resource requirements.

Many of the business and operational issues summarized in chapters 1 and 2 of this book can be alleviated with this technology. For instance, DBAs can use Clonedb to provision multiple developer copies of a database instead of forcing developers to share the same data set. By using multiple developer copies many delays and potential data contamination issues can be avoided, which speeds development and efficiency during application development. Developers will be able to perform their test work, validate it against his or her private dataset, and then commit their code and merge it into a shared development database. The cloned environment can be trashed, refreshed, or handed over to another developer.

On the other hand, Clonedb requires that all clones be made from the source database at the same point in time. For example, if Clonedb is used to provision databases from an RMAN backup taken a day ago and developers want clones of the source database as it is today, then an entire new RMAN backup must be made. This dilutes the storage and timesavings advantage that Clonedb originally brought. While it is possible to use redo and archive logs to bring the previous day’s RMAN backups up to speed (all changes from the last 24 hours would be applied to yesterday’s RMAN datafile copies), the strategy would only work efficiently in some cases. The farther the clone is from the original RMAN datafile copies, the longer and more arduous the catching up process would be, resulting in wasted time and resources.

Clonedb functionality is effective and powerful in some situations, but it is limited in its ability to be a standard in an enterprise-wide thin cloning strategy.

 Screen Shot 2013-06-03 at 10.28.11 AM

Figure 1. Sparse files are mapped to actual datafiles behind the scenes. The datafile backup copy is kept in a read only state. The cloned instance (using Clonedb) first looks for datablocks in the sparse file. If the datablock is not found, it will then read from the RMAN backup. If the Clonedb instance modifies any data, it will write the changes to the sparse file.

In order to implement Clonedb, Oracle 11.2.0.2 or higher is required. Additionally, Direct NFS (dNFS) must be used for sparse files. The sparse file is implemented on a central NFS mounted directory with files that can be accessed via Oracle’s direct NFS implementation.

To create this configuration, the following high-level steps must be taken:

  • Recompile the Oracle binaries with Oracle dNFS code
  • Run the clonedb.pl script, available through Metalink Document 1210656.1
  • Startup the cloned database with the startup script created by clonedb.pl

The syntax for clonedb.pl is relatively simple:

clonedb.pl initSOURCE.ora create_clone.sql

Three environment variables must be set for the configuration:

MASTER_COPY_DIR=”/rman_backup”
CLONE_FILE_CREATE_DEST=”/nfs_mount”
CLONEDB_NAME=”clone”

Once clonedb.pl is run, running the output file generated by the script will create the dabase clone.

sqlplus / as sysdba @create_clone.sql

The create clone script does the work in four basic steps:

  1. The database is started up in nomount mode with a generated pfile (initclone.ora in this case).
  2. A custom create controlfile command is run that points to the datafiles in the RMAN backup location.
  3. Maps the sparse files on a dNFS mount to the datafiles in the RMAN backup location. For instance: dbms_dnfs.clonedb_renamefile(‘/backup/file.dbf’, ‘/clone/file.dbf’);
  4. The database is brought online in resetlogs mode: alter database open resetlogs;

 Screen Shot 2013-06-03 at 10.28.22 AM

Figure 2. This image shows that multiple Clonedb instances can share the same underlying RMAN backup. Each Clonedb instances writes its changes to its own private sparse files

 Screen Shot 2013-06-03 at 10.28.31 AM

Figure 3. A graphical outline of the process. An RMAN backup is taken of the source database and placed in a location where the Clonedb instances can access them (in this case, an NFS mount). A Clonedb instance can be set up on any host that has access to the NFS filer via dNFS. The Clonedb instances will create sparse files on the NFS filer. The sparse files map to the datafiles in the RMAN backup.

NOTE:    If Clonedb instances are going to be created from two different points in time, then a new RMAN backup has to be taken and copied to the NFS server before it can be used as the source for new clones as shown in Figure 3.

Because Clonedb adds an extra layer of code that requires reads to both the sparse files over dNFS and the RMAN datafile backups, there is a performance hit for using Clonedb. The biggest drawback is the requirement for multiple copies of source databases in order to create clones from different points in time, which diminishes the storage savings. The Clonedb functionality is a powerful option that should be in the back pocket of any Oracle DBA but it has limited use for an automated strategy involving thin cloning.

Reference

The best write-up on clonedb is Tim Hall’s blog post at

http://www.oracle-base.com/articles/11g/clonedb-11gr2.php

To be continued

This topic will be continued on upcoming posts

Solaris Eye for the Linux Guy… Part III (hugepages = ISM)

This post has been a long time coming but recently, I have started working on some SPARC SuperCluster POC’s with customers and I am getting re-acquainted with my old friend Solaris and SPARC.

If you are a Linux performance guy you have likely heard of HugePages.   Huge pages are used to increase the performance of large memory machines but requiring fewer TLB‘s .  I am not going to go into the details TLB’s, but every modern chip supports multiple memory page sizes.

So how do you get huge pages with Solaris?

Do nothing – it is the DEFAULT with Oracle running on Solaris.

The “use_ism” parameter used to be used to control this, but it has been moved to the “_underbar” category these days since there is really no reason whatsoever to change it.   I remember doing tests back in the Solaris 8 days with/without ISM pages to show the performance differences and truly it was and still is a good thing.

How are ISM/Huge pages used with Oracle running on Solaris?

At first, ISM pages are only used for the SGA so OLTP style environments benefited the most from ISM.   With Oracle 10 on Solaris, it was also allowed for the PGA.  So, if you were doing have PGA activity like a HASH join or sort, you would benefit from larger page sizes as well.

With Solaris, it is easy to see if the page sizes of any running process by simply running the “pmap(2)” command. 

root@ssc401:~# pmap -xs 23189
 23189: ora_pmon_HC1
 Address Kbytes RSS Anon Locked Pgsz Mode Mapped File
 0000000100000000 64 64 - - 8K r-x-- oracle
 0000000100010000 48 48 - - - r-x-- oracle
 000000010001C000 64 64 - - 8K r-x-- oracle
 ...
 000000010D420000 256 256 64 - 64K rwx-- oracle
 000000010D460000 64 64 - - - rwx-- oracle
 ...
 ... 
 000000010D540000 2304 2304 2304 - 64K rwx-- [ heap ]
 0000000380000000 262144 262144 - 262144 256M rwxsR [ ism shmid=0xf00007e ]
 0000000390000000 65536 65536 - 65536 4M rwxsR [ ism shmid=0xf00007e ]
 0000000400000000 31457280 31457280 - 31457280 2G rwxsR [ ism shmid=0x600007f ]
 0000000B80000000 1572864 1572864 - 1572864 256M rwxsR [ ism shmid=0x600007f ]
 0000000BE0000000 196608 196608 - 196608 4M rwxsR [ ism shmid=0x600007f ]
 0000000C00000000 24 24 - 24 8K rwxsR [ ism shmid=0x7000000 ]
 FFFFFFFF5A800000 16 16 - - 8K r-x-- libodm11.so
 FFFFFFFF5A902000 8 8 8 - 8K rwx-- libodm11.so
 FFFFFFFF60500000 64 64 - - 64K r-x-- libclsra11.so
 FFFFFFFF60510000 24 24 - - - r-x-- libclsra11.so
 ...
 FFFFFFFF7D1FC000 8 8 - - 8K r-x-- libsched.so.1
 FFFFFFFF7D1FE000 8 8 - - 8K r-x-- libdl.so.1
 FFFFFFFF7D300000 8 8 8 - 8K rw--- [ anon ]
 FFFFFFFF7D400000 8 8 8 - 8K rw--- [ anon ]
 FFFFFFFF7D500000 8 8 8 - 8K rw--- [ anon ]
 FFFFFFFF7FFE0000 128 128 128 - 64K rw--- [ stack ]

Notice that the “text”, “heap”, “anon”, “stack”, and shared memory can all use different page sizes.  In this case, the SGA is backed by 2G, 256M, 4M, 8k ISM pages.

So what about Dynamic ISM?  Should I use ISM or DISM?

So, Dynamic ISM was introduced to resize the SGA.  DISM is really just ISM memory that can be paged.  This can be useful when you have HUGE memory machines and want to resize the SGA without taking down the instance.  But why is this needed?

  • Single-Instance availability on HUGE machines that can dynamically add/replace MEMORY.  Machines like the E10k/E25k/M9000/M10… etc all allow you to add components on the fly without restarted Solaris.  Let’s say have have a failing memory board.  You could “Shrink” the SGA so that it fits into the surviving space and while you service the faulty board.  Also, say you start with a 1/2 populated machine.  You can add memory without and grow the SGA without stopping the instance.
  • Consolidation or Cloud like services.  In this environment you can resize running instances on the fly in order to free up memory for new instances.

Personally, I don’t see a lot of used for DISM with the SuperCluster.   If you have RAC you don’t need DISM for availability reasons and with cloud/consolidation I think multiple instances within a single server is not the best practice going forward.   At one point you needed to use DISM for NUMA features, but that is not case with 11.2.0.3.

Filed under: Exadata, Linux, Oracle, Solaris

Byzantium…

I’ve just got back from watching Byzantium at the cinema.

I had never heard of this film until about 20 minutes before the film started. In this age of watered down and twinkly vampires it’s good to see them portrayed with a gritty and raw edge. There are fairly obvious similarities between this story and Interview with the Vampire, though Byzantium is not so epic. It was a rather slow paced film, but I don’t mean that in a bad way. It wasn’t trying to be horror or action, but a story about the characters, told at a steady pace.

Saoirse Ronan is intense and beautiful. I knew I recognized her from somewhere, but it was not until I checked on IMDB that I saw she was the kid from Hanna. I think she was perfect casting to play a 16 year old that has been “alive” for 200 years. I think this was a good role for Gemma Arterton too. I like her in the less-Hollywood stuff.  was so odd, in a good way, in this film. He looked so thin and awkward, compared to his role as Banshee in the X-Men: First Class.

Overall I would have to say I liked it. If you are a fan of the vampire genre and are sick of the current “My Little Pony” vampires on film and TV, you might want to give it a try.

Cheers

Tim…


Byzantium… was first posted on June 2, 2013 at 12:18 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Detective? Crime Writer? DBA? Which are you?

The DBA role can sometimes feel like a mix of detective, crime writer and DBA all thrown together. What do I mean by that? When you hit some problems you have to play detective, trying to find clues to what is going on. Once you’ve started to gather clues, you have to play crime writer and put them together to form a coherent story that explains what is going on. Once you have the outline of your crime story you can start looking at the facts again and see if they fit with your story. If they do, your story may just be correct. If they don’t, you probably need to check the accuracy of the facts and do some rewriting of the story until the two things fit together. Once things seem to fit, you can then get busy trying to arrest the villain, or fix the problem. :) This process is often necessary as part of root cause analysis.

I’ll use a recent case to highlight what I mean. I’m going to simplify things a little or the post will be too long. :)

Detective

We recently had some intermittent, but very severe performance problems on a system, resulting in the application effectively hanging for some time. As usual, it’s the DBA job to prove it’s not the fault of the database. :) A bit of detective work came up with the following facts:

  • Moderate CPU usage on the application and DB servers.
  • Gradual reduction in free memory on both servers.
  • Over time the systems started to swap.
  • Gradual increase in number of application processes connecting to the database.
  • Lots of I/O waits on the database server.

Crime Writer

So let’s switch to crime writer and think of some stories that could explain this.

Story 1

High load on the application causes lots of database connections, eventually swamping the servers.

Does it fit the facts? The reduction in free memory, eventually leading to swap could be caused by an increase in numbers of processes on the servers. Lots of processes all doing I/O on the server will probably increase the amount of I/O waits as they fight with each other for the disks. That sounds promising.

How do we confirm it? Use the AWR reports to compare the load profiles (amongst other things) of the system compared to other times in the day, days of the week, same time and day last week etc.

Conclusion: The load on the system was not substantially different from other times when the system was running fine. The story doesn’t fit together.

Story 2

Slow I/O is reducing response time. As new requests come in, the backlog of work is getting bigger…

Does it fit the facts? In checking out the AWR reports looking for confirmation of the previous story we noticed the average times of certain physical I/O operations were about 30ms. Normally the average is <1ms. Checking the event histogram section of the AWR report, we could see a very wide spread of timings for physical I/O operations, including some in excess of 1 second. After a few minutes we started to see Warning “aiowait timed out x times” in alert.log [ID 222989.1] messages in the alert log, suggesting some of those I/O events were taking longer than 10 minutes. :(

Now we have a new fact to add to the previous list:

  • Some of the individual I/O waits are extremely long!

We know the application layer of this application is not too clever. If a request comes and there is an idle application process available, it will be reused. If no idle application process is available, a new one is spawned to process the request. The only limit on this is the max number of processes for that user at the OS level.

If the disk I/O is slow, application requests will take a long time to complete. If application requests are taking a long time, the likelihood of new application requests finding an idle application process to reuse is reduced, so the incidence of new processes being spawned increases. As new processes are spawned, we see a gradual increase in memory usage, leading eventually to swap. At the same time, we see an increase in database connections, which require processes, which use memory… You can see where this is going…

Conclusion: This story does seem to fit the facts. Essentially, memory, swap and the number of processes are the symptom, not the cause of the problem. The root cause could be the I/O performance issue.

Catching the Criminal

Now I said this was an intermittent problem. The next time it occurred I was prepared and watched events unfold using Luca Canali’s nifty Latency Heat Map Visualization, which prompted this quick post. The result of this was we had pretty solid evidence to suggest the problems we were experiencing were nothing to do with the application or database, but were as a result of poor I/O performance. The baton was passed to the system administrators, network and storage team to try and bottom out the problem. That process is still ongoing, but smoking gun seems to be the storage network layer.

Conclusion

Over time you build up a level of experience that allows you to spot patterns and quickly get to the root of problems. To observers it can seem almost mystical, which is why us grunt DBAs look at those database performance gurus with awe, but it all comes down to root cause analysis. The more you do this stuff, the better you get at it. Stop doing it for a while and you get rusty.

Detective? Crime Writer? DBA? Which are you? You should be all three. :)

Cheers

Tim…


Detective? Crime Writer? DBA? Which are you? was first posted on June 1, 2013 at 10:12 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Java 7 EE Launch : June 12th & 13th

The Oracle ACE program recently invited members to a teleconference session about the upcoming launch of Java 7 EE. Arun Gupta took us through a preview of the information that will be available to everyone as part of the launch.

If you are interested in getting the low down on this new release, you can register for the launch events here.

Cheers

Tim…


Java 7 EE Launch : June 12th & 13th was first posted on May 31, 2013 at 8:08 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Oracle Security Class and software for Oracle security

I have just agreed a public class dates of my very popular " How to perform a security audit of an Oracle database " with Oracle University to be held on September 24th and 25th in Rome, Italy. The registration....[Read More]

Posted by Pete On 30/05/13 At 05:54 PM

BGOUG Spring 2013 : Photos

I was a little bit rubbish and forgot to get photos of the recent BGOUG event. Fortunately, the conference photos have arrived. You can check them out here.

There are a number of photos of an old fat guy that looks a little like me. I don’t remember meeting him though. Weird… I think he needs the toilet in this shot.

By the way, don’t say stuff like, “Bind variables are a waste of time!”, when you are near to Tom Kyte. He will slap you quicker than Debra Lilley after a negative comment about Fusion Apps. :)

Cheers

Tim…


BGOUG Spring 2013 : Photos was first posted on May 29, 2013 at 7:55 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Forcing Smart Scans on Exadata – is the _serial_direct_read parameter safe to use in production?

One of the most common Exadata performance problems I see is that the direct path reads (and thus also Smart Scans) don’t sometimes kick in when running full scans in serial sessions. This is because in Oracle 11g onwards, the serial full segment scan IO path decision is done dynamically, at runtime, for every SQL execution – and for every segment (partition) separately. Whether you get a direct path read & smart scan, depends on the current buffer cache size, how big segment you’re about to scan and how much of that segment is actually cached at the moment. Note that the automatic IO path decision for index fast full scans is slightly different from table scans.

This dynamic decision unfortunately can cause unexpected surprises and variance in your report/batch job runtimes. Additionally, it looks like the SELECT part of your UPDATE/DELETE statements (the select part finds the rows to update/delete) does not ever automatically get direct path read/smart scan chosen – by design! So, when your SELECT statement may use smart scan and be really fast, the same select operation in an INSERT SELECT (or UPDATE/DELETE) context will not end up using smart scans by default. There’s even a bug explaining that – closed as “not a bug” (Exadata Smartscan Is Not Being Used On Insert As Select[Article ID 1348116.1]).

To work around these problems and force a direct path read/smart scan, you can either:

  1. Run your query in parallel as parallel full segment scans will use direct path reads, unless your parallel_degree_policy = AUTO, then you may still get buffered reads thanks to the dynamic in-memory parallel execution decision of Oracle 11.2
  2. Run your query in serial, but force the serial direct path reads by setting _serial_direct_read = TRUE (or ALWAYS in 11.2.0.2+)

Here are the valid options for this parameter in 11.2.0.2+

SQL> @pvalid _serial_direct_read

Display valid values for multioption parameters matching "_serial_direct_read"...

  PAR# PARAMETER                                                 ORD VALUE                          DEFAULT
------ -------------------------------------------------- ---------- ------------------------------ -------
  1993 _serial_direct_read                                         1 ALWAYS
       _serial_direct_read                                         2 AUTO
       _serial_direct_read                                         3 NEVER
       _serial_direct_read                                         4 TRUE
       _serial_direct_read                                         5 FALSE

And this leads to the question – as _serial_direct_read is an undocumented, hidden parameter – is it safe to use it in production?

In my mind, there are 3 kinds of parameters:

  1. Documented parameters – they should be safe and should work. If they don’t, it’s a bug and should get fixed by Oracle
  2. Undocumented parameters which nobody uses and knows much about
  3. Undocumented parameters which are documented in My Oracle Support (and are widely used in practice)

You shouldn’t use #2 parameters in production without a written blessing by Oracle Support – and I’d like to know some justification, why the recommended parameter ought to help.

The _serial_direct_read, however belongs to category #3 – it’s an undocumented parameter, but widely documented by public use and more importantly (formally), documented in My Oracle Support. If you search MOS for _serial_direct_read, you’ll find plenty of notes recommending the _serial_direct_read as a workaround – but the best of them is MOS note Best Practices for OLTP on the Sun Oracle Database Machine[Article ID 1269706.1], which says this:

Direct reads bypass the buffer cache and go directly into the process PGA. Cell offload operations occur for direct reads only. Parallel query processes always use direct reads and therefore offload any eligible operations (ex: scans). For normal foreground processes running serially, the RDBMS decides whether direct or buffered reads are the best option based on both the state of the buffer cache and the size of the table. In the event that the RDBMS does not make the correct choice to use direct reads/offload processing, you can set _serial_direct_read=TRUE (available at session level). Keep in mind that it is possible that setting this parameter will makes things worse if the app is better off with buffered reads, so make sure you know that you want direct reads/offload processing before setting it.

Comments:

  1. The abovementioned MOS note is actually incorrect stating that “Parallel query processes always use direct reads“, the in-memory parallel execution changes this, as explained above.
  2. The note recommends setting the _serial_direct_read = TRUE, but I’ve used ALWAYS for clarity. The TRUE before 11.2.0.2+ really meant “AUTO”, the dynamic decision and FALSE meant NEVER. But starting from Oracle 11.2.0.2 onwards, the TRUE = ALWAYS, FALSE = NEVER and AUTO means dynamic decision (like TRUE used to mean on 11.2.0.1 and before)
  3. This parameter forces direct reads only for full segment scans (full table scan and fast full index scan variations), your “random” index lookups and range scans etc will still use reads via buffer cache regardless of this parameter – which is great for OLTP and mixed workload systems
  4. The _serial_direct_read parameter controls the direct path read decision both for table and index segment scans

I’m not a fan of setting such parameters at system level, but in past I have created login triggers in mixed-workload environments, where the OLTP sessions will never do a direct path read (or I leave the decision automatic) and the reporting and batch sessions in the same database will have forced direct path reads for their full segment scans.

This article served two purposes, talking about direct path reads and also evaluating whether that cool-undocumented-parameter should be used in production at all. First, you must be able to argument, why would this parameter help at all – and second, is it at least documented as a valid workaround in some MOS article. If unsure, raise an SR and get the Support’s official blessing before taking the risk.


French Philosophy

From Mohamed Houri, here’s a French translation of all my “Philosophy” notes to date.

How to hint

Here’s a live example demonstrating a point I’ve often made – you have to be very detailed in your hinting or Oracle will find a way to obey your hints and do the wrong thing.  A recent posting on the OTN database forum gave use the following query and execution plan:

SELECT
	ERO.DVC_EVT_ID,
	E.DVC_EVT_DTTM
FROM D1_DVC_EVT E,
     D1_DVC_EVT_REL_OBJ ERO
WHERE
	ERO.MAINT_OBJ_CD = 'D1-DEVICE'
AND	ERO.PK_VALUE1 = :H1
AND	ERO.DVC_EVT_ID = E.DVC_EVT_ID
AND	E.DVC_EVT_TYPE_CD IN (
		'END-GSMLOWLEVEL-EXCP-SEV-1',
		'STR-GSMLOWLEVEL-EXCP-SEV-1'
	)
ORDER BY
	E.DVC_EVT_DTTM DESC

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Starts | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |Reads  |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |      1 |  3196 (100)|       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   1 |  SORT ORDER BY                       |            |      1 |  3196   (1)|       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   2 |   NESTED LOOPS                       |            |      1 |            |       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   3 |    NESTED LOOPS                      |            |      1 |  3195   (1)|       |       |   1059 |00:00:07.77 |    2138 |  1197 |
|*  4 |     INDEX RANGE SCAN                 | TEST1      |      1 |    30   (0)|       |       |   1059 |00:00:00.07 |      11 |    11 |
|   5 |     PARTITION RANGE ITERATOR         |            |   1059 |     1   (0)|   KEY |   KEY |   1059 |00:00:07.69 |    2127 |  1186 |
|*  6 |      INDEX UNIQUE SCAN               | D1T400P0   |   1059 |     1   (0)|   KEY |   KEY |   1059 |00:00:07.67 |    2127 |  1186 |
|*  7 |    TABLE ACCESS BY GLOBAL INDEX ROWID| D1_DVC_EVT |   1059 |     2   (0)| ROWID | ROWID |    134 |00:00:06.08 |    1057 |   939 |
-----------------------------------------------------------------------------------------------------------------------------------------

You’ll notice that something close to half the time spent came from the table access in line 7 (This is 11g, and we have a plan which shows the “double nested loop” of an index access followed by a table access – for each rowid returned in line 3 (totalling 7.77 seconds) we access the table through the nested loop driven by line 2 which totals 13.85 seconds).

After a little chat, the suggestion arose to introduce an index that avoided the table access – it’s doing a fairly large amount of random I/O, and we might be able to run the query roughly twice as fast if we didn’t visit it. So the DBA set up a suitable test index (called test2) on the D1_DVC_EVT table, and found that the optimizer didn’t use it (perhaps because the index was larger then the alternative, perhaps because the clustering_factor was much bigger) – so he added a hint to the code: /*+ index (e test2) */ which made Oracle use the index to produce the following plan:

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Starts | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |      1 | 98415 (100)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|   1 |  SORT ORDER BY              |       |      1 | 98415   (1)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|*  2 |   HASH JOIN                 |       |      1 | 98414   (1)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|*  3 |    INDEX RANGE SCAN         | TEST1 |      1 |    30   (0)|       |       |   1059 |00:00:00.01 |      11 |      0 |
|   4 |    PARTITION RANGE ALL      |       |      1 | 98249   (1)|1048575|     1 |   7566K|00:03:34.58 |     100K|  96848 |
|   5 |     PARTITION RANGE SUBQUERY|       |    287 | 98249   (1)|KEY(SQ)|KEY(SQ)|   7566K|00:03:10.87 |     100K|  96848 |
|*  6 |      INDEX FULL SCAN        | TEST2 |   2296 | 98249   (1)|1048575|     1 |   7566K|00:02:45.47 |   97412 |  96848 |
----------------------------------------------------------------------------------------------------------------------------

Unfortunately, although Oracle obeyed the hint – it had to, since it was legal and in-context – it didn’t take the path the DBA expected.

When you hint, you have to make it impossible for Oracle find any path you don’t want, and that can take a lot of hints. In this case the DBA simply wanted to use the same nested loop path that he’d originally seen, but using the new index instead. To get the path safely he needed at least 4 hints: one to specify the join order, one to specify the join method, and one for each table to specify the access method. In this case:

/*+
        leading(ero e)
        use_nl(ero e)
        index(ero test1)
        index(e test2)
*/

Once you’ve hinted some SQL and got it working the safe thing to do, in 11g, is to check the outline section of the actual execution plan to see if you’ve missed any important hints and then, if you can’t change the production code, attach the SQL Baseline from your hinted code to the SQL text from the original. (See – for example: http://jonathanlewis.wordpress.com/2011/01/12/fake-baselines/ )

It’s hard to create a full set of hints by hand – and I often see hinted SQL in production systems where the plan that appears happens to be the right one but it’s not the only plan that could be derived from the hints. So my 11g mantra for hinting is this: if you can hint it, baseline it”.