I’ve recently realized that I didn’t post anywhere the second version of my presentation — Under The Hood of Oracle Clusterware 2.0: Grid Infrastructure, codenamed UTHOC2. I think it would be very useful as I still see lots of questions being asked and UTHOC1 covers Oracle RAC 10g and 11gR1 only. 11g Release 2 brought many changes in the clusterware and the slides needed some good refresh.
I’ve purposely left it as an 188.8.131.52 installation as you can get this from OTN without needing access to My Oracle Support (MOS). The process works just as well for 184.108.40.206 and I would recommend you use that if you do have access to MOS. Remember, if you are doing the RAC installation on Oracle Linux 6 you are going to need 220.127.116.11, so OL5 might be the right option if you are playing around with this at home with no access to MOS.
I spent today updating my Oracle 11gR2 RAC installation on OL6 article. The original article used an older version of VirtualBox , which meant some of the screen shots looked a little dated. It’s now updated to VirtualBox 4.2.6, so it should be a little less confusing for anyone who is new to VirtualBox.
I’ll probably update the OL5 RAC article some time this next week, since that article uses VirtualBox 3.2.8, which is pretty much ancient history now.
Followers of my blog know I like doing virtual RAC installations.When I do these I’m focusing very much on the Oracle side of things, leaving the virtualization tool to handle the hardware virtualization, like networking and shared disk.
A few weeks ago Gilbert Standen contacted me to say he had done a virtual RAC installation using OpenvSwitch to virtualize the network components. He posted some basic tips here. When I mentioned it on G+ and twitter, it generated some interest, so I suggested he write it up with a little more detail. That process has started now on his blog. You can see the first couple of articles here:
There are many questions from few of my clients about asmlib support in RHEL6, as they are gearing up to upgrade the database servers to RHEL6. There is a controversy about asmlib support in RHEL6. As usual, I will only discuss technical details in this blog entry.
ASMLIB is applicable only to Linux platform and does not apply to any other platform.
Now, you might ask why bother and why not just use OEL and UK? Well, not every Linux server is used as a database server. In a typical company, there are hundreds of Linux servers and just few percent of those servers are used as Database servers. Linux system administrators prefer to keep one flavor of Linux distribution for management ease and so, asking clients to change the distribution from RHEL to OEL or OEL to RHEL is always not a viable option.
Do you need to use ASMLIB in Linux?
This is a quick note about reverse path filtering and impact of that feature to RAC. I encountered an interesting problem recently with a client and it is worth blogging about it, with a strong hope that it might help one of you in the future.
Environment is 18.104.22.168 GI, Linux 5.6. In a 3 node cluster, Grid Infrastructure (GI) comes up cleanly in just one node, but never comes up in other nodes. If we shutdown GI in first node, we can start the GI in second node with no issues. Meaning, GI can be up in just one node at any time.
System Admins indicated that there are no major changes, only few bug fixes. Seemingly, problem started after those bug fixes. But there were few other changes to the environment /init.ora parameter change etc. So, the problem was not immediately attributable to just OS changes.
Let’s first discuss how RAC traffic works before continuing. Environment for the discussion is: 2 node cluster with 8K database block size, UDP protocol is used for cache fusion. (BTW, UDP and RDS protocols are supported in UNIX platform; whereas Windows uses TCP protocol).
UDP protocol, fragmentation, and assembly
UDP Protocol is an higher level protocol stack, and it is implemented over IP Protocol ( UDP/IP). Cache Fusion uses UDP protocol to send packets over the wire (Exadata uses RDS protocol though).
We know that database blocks are transferred between the nodes through the interconnect, aka cache fusion traffic. Common misconception is that packet transfer size is always database block size for block transfer (Of course, messages are smaller in size). That’s not entirely true. There is an optimization in the cache fusion code to reduce the packet size (and so reduces the bits transferred over the private network). Don’t confuse this note with Jumbo frames and MTU size, this note is independent of MTU setting.
If you are attending Collaborate 2012, you might be interested in my content-rich sessions below :
Session Number: 326
Session Title: SCAN, VIP, HAIP, and other RAC acronyms
Session Date/Time/Room: Tue, Apr 24, 2012 (10:45 AM – 11:45 AM) : Surf C
Session Number: 327
Session Title: Internals and Performance Boot Camp: Truss, pstack, pmap, and more
Session Date/Time/Room: Wed, Apr 25, 2012 (03:00 PM – 04:00 PM) : Palm A
Hope to see you there!
Update: I am uploading presentation files. Presentations are much more recent than the document
Last week (March 2012), I was conducting Advanced RAC Training online. During the class, I was recreating a ‘gc buffer busy’ waits to explain the concepts and methods to troubleshoot the issue.
Let’s define these events first. Event ‘gc buffer busy’ event means that a session is trying to access a buffer,but there is an open request for Global cache lock for that block already, and so, the session must wait for the GC lock request to complete before proceeding. This wait is instrumented as ‘gc buffer busy’ event.
From 11g onwards, this wait event is split in to ‘gc buffer busy acquire’ and ‘gc buffer busy release’. An attendee asked me to show the differentiation between these two wait events. Fortunately, we had a problem with LGWR writes and we were able to inspect the waits with much clarity during the class.