Golden rules of RAC performance diagnostics

After collaborating with many performance engineers in a RAC database, I have come to realize that there are common pattern among the (mis)diagnosis. This blog about discussing those issues. I talked about this in Hotsos 2014 conference also.

Golden rules

Here are the golden rules of RAC performance diagnostics. These rules may not apply general RAC configuration issues though.

  1. Beware of top event tunnel vision
  2. Eliminate infrastructure as an issue
  3. Identify problem-inducing instance
  4. Review send-side metrics also
  5. Use histograms, not just averages

Looks like, this may be better read as a document. So, please use the pdf files of the presentation and a paper. Presentation slide #10 shows indepth coverage on gc buffer busy* wait events. I will try to blog about that slide later (hopefully).

Dynamic Resource Mastering in 12c

I blogged about Dynamic Resource Mastering (DRM) in RAC here . DRM freezes the global resources during the reconfiguration event and no new resources can be allocated during the reconfiguration. This freeze has a dramatic effect of inducing huge amount of waits for gc buffer busy [acquire|release] events and other gcs drm freeze release, gcs remaster events. In database version 12c, DRM has been improved further.

A major improvement I see is that not all resources are frozen at any time. Essentially, resources are broken down in to partitions and only a resource partition is frozen. This improvement should decrease the impact of DRM related waits tremendously.

LMON Trace file

Hotsos 2014

I will be presenting in HOTSOS symposium 2014 discussing correct methods to diagnose RAC performance issues. Very surprisingly, even very senior performance engineers make mistakes in their analysis while reviewing RAC issues. Come to my presentation and learn the golden rules of RAC performance diagnostics.

Runtime Load Balancing Advisory in RAC 12c-addendum

A reader asked an interesting question yesterday with regards to the previous post on the subject: where did you get your service metrics from when you queried v$servicemetric-PDB or CDB$ROOT?

I queried the PDB, but this morning repeated the test to make sure the results are consistent, and they are. This is definitely something you’d hope for: you should not have different results in the same v$-view depending on the container you execute your query in for a given CON_ID.

During testing I noticed something interesting though. I queried gv$servicemetric but did not limit the result to the service I wanted to test with (FCFSRV). Here is the query against gv$servicemetric while the system was idle.

Runtime Load Balancing Advisory in RAC 12c

This is a follow-up on yesterday’s post about services in the new 12c database architecture. After having worked out everything I needed to know about TAF and RAC 12c in CDBs I wanted to check how FCF works with PDBs today. While investigating I found out that the Runtime Load Balancing Advisory does not seem to work as expected in some cases. But I’m getting ahead of myself. First of all, here is my test case:

  • Oracle Linux 6.4 x86-64
  • Grid Infrastructure, i.e. January 2014 PSU applied
  • RDBMS, likewise patched with the January PSU
  • A CDB with just 1 PDB for this purpose, named DEMOPDB
  • Service FCFSRV is used

Service setup

The service is specifically created to connect against the PDB:

RAC and Pluggable Databases

In preparation of the OUGN Spring Seminar and to finally fulfill at least a part of my promise from July I was getting ready to research RAC, PDBs and services for my demos. It turned out to become a lot more interesting than I first assumed.

RAC and Multi-Tenancy

So the first attempt to really look at how this works has started with my 2 node cluster where I created a RAC database: RAC12C, administrator managed with instance RAC12C1 and RAC12C2. The database is registered in Clusterware. Clusterware and RDBMS are patched to the January PSU, i.e.

RAC Plans

Recently appeared on Mos – “Bug 18219084 : DIFFERENT EXECUTION PLAN ACROSS RAC INSTANCES”

Now, I’m not going to claim that the following applies to this particular case – but it’s perfectly reasonable to expect to see different plans for the same query on RAC, and it’s perfectly possible for the two different plans to have amazingly different performance characteristics; and in this particular case I can see an obvious reason why the two nodes could have different plans.

Here’s the query reported in the bug:

Getting up and running with UCP and Application Continuity

I have already posted a couple of articles on the use of Oracle’s Universal Connection Pool in the past with regards to Workload Management and Oracle RAC 11.2. Since then a lot happened, with the release of Oracle 12c being the most notable event. With 12c you get lots of interesting new features for JDBC, and the one I would like to present today is Application Continuity. This continues the previous post on playing with Application Continuity outside of a midlle-tier environment. Well, if you allow me to call Tomcat 7 “middle-tier” that is.

The aim of this post is to extend my previous posts about setting up UCP with Application Continuity. The basic setup remains unchanged, but this time I tested with JDK 1.6 (build 1.6.0_45-b06) and Tomcat 7.0.47 on Oracle Enterprise Linux 6.4 64bit.

Playing with Application Continuity in RAC 12c

One of the more interesting features in Oracle 12c RAC is application continuity. Why do I believe it is? Because it relieves developers from having to think about retrying connections and catch SQLExceptions in the code. I already thought that Fast Application Notification and Fast Connect Failover (FCF) are great, _but_ they required the developer to understand RAC and Oracle which you can’t take for granted. In fact looking back over the last few years since I co-wrote Pro Oracle Database 11g RAC on Linux the number 1 complaint I got from developers was that RAC was too complex (see for example in this thread on oracle-l)

Now Application Continuity takes away 2 problems I have seen.

Delete Archived Logs from Standby

This is a little surprising to me because it’s so simple – but I couldn’t find a script anywhere on oracle support or on the internet which elegantly (IMHO) cleaned up archived logs on a standby system.  (Specifically, a RAC/thread aware script.)

There are a few scripts published: