Search

OakieTags

Who's online

There are currently 0 users and 25 guests online.

Recent comments

11g Release 2

Getting up and running with Universal Connection Pool – Take 2

In yesterday’s post (which is actually didn’t want to post that day) I wrote about the Universal Connection Pool feature. You should be able to get started with the information I gave you, but it didn’t include any hints on how to have a look under the covers of UCP. This can be changed …Oracle includes very fine-grained logging information with UCP, but experiment show that you have to either use log level FINE or FINEST to get to the real information of what’s going on.

LOGGING

Tomcat uses the log4j framework to define its own logging, as shown in the catalina.{bat,sh} file:

set LOGGING_CONFIG=-Djava.util.logging.config.file=”%CATALINA_BASE%\conf\logging.properties”

One thing you possibly don’t want to do is to include your own log4j configuration there-any change to your application’s file requires a restart of tomcat. Think of a production environment and then it becomes clear why such an approach is neither desirable nor practical.

Instead, you can put a file called logging.properties into the src directory within your application. In it you define logging to your heart’s delight. I have taken the following from the documentation and Pas Apicella’s blog. It really opens the floodgates for very detailed logging, so don’t use this in any other than a development environment…

handlers = org.apache.juli.FileHandler, java.util.logging.ConsoleHandler

org.apache.juli.FileHandler.level = ALL
org.apache.juli.FileHandler.directory = ${catalina.base}/logs
org.apache.juli.FileHandler.prefix = ucp.
org.apache.juli.FileHandler.formatter = oracle.ucp.util.logging.UCPFormatter

java.util.logging.FileHandler.level = WARNING

.level = FINEST

# FCF classes
oracle.ucp.common.FailoverEventHandlerThreadBase.level = ALL

oracle.ucp.jdbc.oracle.ONSDatabaseFailoverEvent.level = ALL
oracle.ucp.jdbc.oracle.ONSRuntimeLBEventHandlerThread.level = ALL
oracle.ucp.jdbc.oracle.ONSOracleRuntimeLBEventSubscriber.level = ALL
oracle.ucp.jdbc.oracle.ONSRuntimeLoadBalancingEvent.level = ALL
oracle.ucp.jdbc.oracle.ONSOracleFailoverEventSubscriber.level = ALL
oracle.ucp.jdbc.oracle.ONSDatabaseEventHandlerThread.level = ALL
oracle.ucp.jdbc.oracle.OracleFailoverEvent.level = ALL
oracle.ucp.jdbc.oracle.OracleFailoverEventSubscriber.level = ALL
oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.level = ALL
oracle.ucp.jdbc.oracle.OracleFailoverEventNotification.level = ALL
oracle.ucp.jdbc.oracle.OracleFailoverHandler.level = ALL
oracle.ucp.jdbc.oracle.OracleFailoverablePooledConnection.level = ALL

oracle.ucp.jdbc.oracle.OraclePooledConnectionConnectionPool.level = ALL
oracle.ucp.jdbc.oracle.OracleConnectionConnectionPool.level = ALL
oracle.ucp.jdbc.oracle.OracleXAConnectionConnectionPool.level = ALL
oracle.ucp.jdbc.oracle.OracleJDBCConnectionPool.level = ALL

oracle.ucp.jdbc.oracle.OracleDatabaseInstanceInfo.level = ALL
oracle.ucp.jdbc.oracle.OracleDatabaseInstanceInfoList.level = ALL

# RCLB classes
oracle.ucp.jdbc.oracle.ONSRuntimeLBEventHandlerThread.level = ALL
oracle.ucp.jdbc.oracle.ONSOracleRuntimeLBEventSubscriber.level = ALL
oracle.ucp.jdbc.oracle.OracleRuntimeLoadBalancingHandler.level = ALL
oracle.ucp.jdbc.oracle.ONSRuntimeLoadBalancingEvent.level = ALL

Now when you redeploy your application, a new file ucp.timestamp will be created in %CATALINA_HOME%\logs and list all the beauty of what’s going on. The log file is now very verbose though.

FAST CONNECTION FAILOVER

The previous source code didn’t take full potential of the fact that the fastConnectionFailover functionality was enabled. For the following to work you might want to disable the validateConnectionOnBorrow setting in META-INF/context.xml. In fact, we’d like to validate the connection when we get it from the pool ourselves. The code of the Hello.java servlet has changed as follows:

// time to get UCP to start
writer.println("

UCP

"); Connection conn = null; oracle.ucp.jdbc.PoolDataSource ds = null; try { Context ctx = new InitialContext(); Context envContext = (Context) ctx.lookup("java:/comp/env"); ds = (oracle.ucp.jdbc.PoolDataSource) envContext.lookup ("jdbc/UCPPool"); writer.println("Got the datasource - "); writer.println("FCF enabled? " + ds.getFastConnectionFailoverEnabled()); conn = ds.getConnection(); } catch (Exception e) { try { // here's where the FCF comes in if (conn == null || !((ValidConnection) conn).isValid()) { writer.println("

Have to retry connection (" + e.getMessage() + ")

"); OracleJDBCConnectionPoolStatistics stats = (OracleJDBCConnectionPoolStatistics) ds.getStatistics(); writer.println("Pool stats: " + stats.getFCFProcessingInfo()); conn.close(); } else { writer.println("Unknown exception: " + e); } } catch (SQLException sqle) { e.printStackTrace(); return; } } try { writer.println("

Connected to Oracle intance

"); Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery("select 'Hello World from '||sys_context('userenv','instance_name') from dual"); while (rs.next()) { writer.println("

" + rs.getString(1) + "

");
}

rs.close();
stmt.close();
conn.close();

} catch (Exception e) {
writer.println("

" + e + "

");
}

writer.println("");
writer.println("");

}

I will probably attach the code to the post as the layout is pretty broken.

IS THE POOL INTELLIGENT?

To see if the runtime load balancing feature works at all I shut down the second of my two database instances before restarting the application. I expect all sessions to be on the first instance (to be expected as there is not a second one). Is that so?

SQL> ed
Wrote file afiedt.buf

 1  select count(inst_id),inst_id,
 2  status from gv$session where username='SCOTT'
 3* group by inst_id, status
SQL> /
COUNT(INST_ID)    INST_ID STATUS
-------------- ---------- --------
58          1 INACTIVE

Now what happens when I start the second instance?

SQL> !srvctl start instance -d lnrut1d -i LNRUT1D_2
/

SQL>
COUNT(INST_ID)    INST_ID STATUS
-------------- ---------- --------
 69          1 INACTIVE

SQL> /

COUNT(INST_ID)    INST_ID STATUS
-------------- ---------- --------
 57          1 INACTIVE
 19          2 INACTIVE

SQL> /

COUNT(INST_ID)    INST_ID STATUS
-------------- ---------- --------
 25          1 INACTIVE
 45          2 INACTIVE

That’s neat-the second instance catches up. I eventually ended up near-equilibrium. The inverse is also true. Consider two instances up as shown here:

SQL> select count(inst_id),inst_id,
 2  status from gv$session where username='SCOTT'
 3  group by inst_id, status
 4  /

COUNT(INST_ID)    INST_ID STATUS
-------------- ---------- --------
 21          1 INACTIVE
 26          2 INACTIVE
 1          2 ACTIVE
 2          1 ACTIVE

SQL> !srvctl stop instance -d lnrut1d -i LNRUT1D_2 -o abort

SQL> /

COUNT(INST_ID)    INST_ID STATUS
-------------- ---------- --------
 36          1 INACTIVE

Now when I shut the second instance down, the number of sessions on the first node goes up:


SQL> /

COUNT(INST_ID)    INST_ID STATUS
-------------- ---------- --------
 65          1 INACTIVE

I like this!

Getting up and running with Universal Connection Pool

Oracle’s next generation connection pooling solution, Universal Connection Pool, can be a bit tricky to set up. This is especially true when a JNDI data source is to be used-most example don’t assume such a scenario. A lot of information is out there on the net, but no one seems to have given the full picture. During the research for chapter 11 of “Pro Oracle Database 11g RAC on Linux” I learned this the hard way. Since the book has been published, a few minor changes changes have been made to the software I used at the time, and those merit an update. Please note that this article’s emphasis is to get  this example running-it is by no means meant to be secure enough for a production release! You need to harden  the setup considerably for production, but it serves well for demonstration purposes (only).

THE SETUP

I have used a four node 11.2.0.2 RAC system as the source for my data. A 2 node cluster database with service “TAFTEST” runs on nodes 1 and 2. It’s administrator-managed and the service has both nodes set aside as “preferred” nodes. The database nodes run Oracle Enterprise Linux 5.564bit with RAC 11.2.0.2. For the sake of simplicity, I used my Windows laptop to host the Tomcat instance, which is now updated to version 6.0.30. I am using apache Ant to build the application. The current stable ant build is 1.8.2. My JDK is also upgraded to the latest and greatest, version 1.6.0_23. I am using the 32bit 11.2.0.2 client package to supply me with ons.jar, ojdbc6.jar and ucp.jar.ORACLE CLIENT

Part of this excercise is to demonstrate FCF and FAN events, which means we need an Oracle client for the remote ONS configuration (please refer to chapter 11 of the RAC book for a detailed description of local vs remote ONS configurations). I downloaded the 11.2.0.2 32bit client from support.oracle.com for Windows and installed it to c:\oracle\product\11.2.0\client_1, chosing the administrator option in Oracle Universal Installer.

TOMCAT

Start by downloading Tomcat for your platform-I have successfully tested Tomcat on Linux and Windows. I deployed apache-tomcat-6.0.30 to c:\ for this test. Once it’s unzipped, copy the necessary JAR files from the Oracle client installation into %TOMCAT_HOME%\lib. These are ojdbc6.jar, ons.jar and ucp.jar. Next, you should set a few environment variables. To keep things simple, I edited  %tomcat_home%\bin\startup.sh and added these:

  • set JAVA_HOME=c:\program files\Java\jdk1.6.0_23
  • set JAVA_OPTS=-Doracle.ons.oraclehome=c:\oracle\product\11.2.0\client_1

I’m also interested in the final content of %JAVA_OPTS%, so I modified catalina.bat as well and added this line into the section below line 164:

echo Using JAVA_OPTS: “%JAVA_OPTS%”

Finally we need to add a user with access to the tomcat manager application. Edit %tomcat_home%\conf\tomcat-users.xml and create and entry similar to this one:


This is one of the major differences-starting with tomcat 6.0.30, the previous “manager” role has been split into the ones shown above to protect from xref attacks. It took me a while to discover the reason for all the http 403 errors I got when I tried to deploy my application… You’d obviously use a strong password here!

This concludes the TOMCAT setup.

ANT

Ant is a build tool used for deployment and compiling the sample application I am going to adapt. Simply download the zip file (version 1.8.2 was current when I wrote this) and deploy it somewhere conveniently. Again, a few environment variables are helpful which I usually put into a file called sp.bat:

@echo off
set ant_home=c:\apache-ant-1.8.2
set path=%ant_home%\bin;%path%
set java_home=c:\program files\Java\jdk1.6.0_23
set path=%java_home%\bin;%path%

This makes building the application a lot easier. Just change into the application directory and enter sp to get set up.

BUILDING AN APPLICATION

My earliest contact with Tomcat was a long time ago with version 3 and since then I remember the well written documentation. The “docs” application, usually accessible as http://localhost:8080/docs has a section about the first application. It’s highly recommended to read it: http://localhost:8080/docs/appdev/index.html.

To get started, I copied the “sample” directory from %tomcat_home%\webapps\docs\appdev to c:\temp and started adapting it. First of all my sp.bat script is copied in there. With a command prompt I changed into sample and edited the build.xml file. The first major section to go over is starting in line 129. I changed it to read like this:

 
 
 
 
 
 
 
 
 
 
 
 

The properties to change/add are app.name, app.version, catalina.home, manager.username and manager.password. The manager.* properties will come in very handy later as they allow us to automatically deploy the compiled/changed application.

With all this done, try to compile the application as it is:

C:\TEMP\sample>ant compile
Buildfile: C:\TEMP\sample\build.xml
Trying to override old definition of datatype resources

prepare:

compile:
 [javac] C:\TEMP\sample\build.xml:301: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

BUILD SUCCESSFUL
Total time: 0 seconds

C:\TEMP\sample>

That’s it-the application has been created. You can now start tomcat and try to deploy the app. Open a cmd window, change to %tomcat_home%\bin and execute startup.bat. This will start tomcat-the new window shows any problems it might have when deploying application. This window is very useful for example when it comes to troubleshoot incorrect XML configuration files.

Now the next step is to use the ant target “install” to test the installation of the web archive. I changed the install target slightly in the way that I depend on the “dist” target to be completed before deployment to the tomcat server. My modified working install target is defined as this in build.xml:

 

 
 

Try invoking it as shown in the example below:

C:\TEMP\sample>ant install
Buildfile: C:\TEMP\sample\build.xml
Trying to override old definition of datatype resources

prepare:

compile:
 [javac] C:\TEMP\sample\build.xml:301: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=las
t; set to false for repeatable builds

javadoc:
 [javadoc] Generating Javadoc
 [javadoc] Javadoc execution
 [javadoc] Loading source files for package mypackage...
 [javadoc] Constructing Javadoc information...
 [javadoc] Standard Doclet version 1.6.0_23
 [javadoc] Building tree for all the packages and classes...
 [javadoc] Building index for all the packages and classes...
 [javadoc] Building index for all classes...

dist:

install:
 [deploy] OK - Deployed application at context path /ucp

BUILD SUCCESSFUL
Total time: 1 second

C:\TEMP\sample>

This operation succeeded. You should also see a line in your tomcat window:

INFO: Deploying web application archive ucp.war

When pointing your browser to http://localhost:8080/ucp/ you should be greeted by the familiar tomcat sample application.

ADDING THE DATASOURCE

So far this hasn’t been groundbreaking at all. It’s only now that it’s getting more interesting: the JNDI data source needs to be defined and used in our code. Instead of messing around with resources and res-ref configuration in the global %tomcat_home%\conf directory it is advisable to add the context to the application.

Back in directory c:\temp\sample create a new directory web\META-INF. Inside META-INF you create a file “context.xml” which takes the JNDI data source definition:


 

It really is that simple to implement-if it only were equally simple to find how to do this… The next step is to modify the Hello.java Servlet to reference the JNDI data source. The code for the servlet is shown below-it’s basically the existing servlet code amended with the JNDI and JDBC relevant code. It actually does very little: after looking the JDNI name up it grabs a session from the pool and checks which instance it is currently connected to. It then releases all resources and exits.


/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package mypackage;

import java.sql.Connection;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.ResultSet;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.PoolDataSource;
import javax.naming.*;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.Enumeration;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public final class Hello extends HttpServlet {

public void doGet(HttpServletRequest request,
 HttpServletResponse response)
 throws IOException, ServletException {

 response.setContentType("text/html");
 PrintWriter writer = response.getWriter();

 writer.println("");
 writer.println("");
 writer.println("Sample Application Servlet Page");
 writer.println("");
 writer.println("");

 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("
"); writer.println(""); writer.println(""); writer.println("

Sample Application Servlet

"); writer.println("This is the output of a servlet that is part of"); writer.println("the Hello, World application."); writer.println("
"); // this is the UCP specific part! writer.println("

UCP

"); try { Context ctx = new InitialContext(); Context envContext = (Context) ctx.lookup("java:/comp/env"); javax.sql.DataSource ds = (javax.sql.DataSource) envContext.lookup ("jdbc/UCPPool"); writer.println("Got the datasource"); Connection conn = ds.getConnection(); writer.println("

Connected to an Oracle intance

"); Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery("select 'Hello World from '||sys_context('userenv','instance_name') from dual"); while (rs.next()) { writer.println(rs.getString(1)); } rs.close(); stmt.close(); conn.close(); conn = null; } catch (Exception e) { writer.println("
" + e + "

");
}

writer.println("

");
writer.println("");
}
}

Done! Now let’s compile the code and deploy it to Tomcat before testing. The most common problem I get with the code is a JNDI error, stating the “jdbc/UCPPool” is not defined. This can happen in 2 cases:

  • You have a typo in the resource definition, in which case the context really doesn’t exist (it’s case sensitive)
  • You have non-compatible line breaks in the context.xml file. In this case I’d try having all the contents of the file in 1 line (use “J” in vi to join lines together)

VALIDATION

You should now see a number of sessions as user “scott” against the database-query gv$session for username “SCOTT” and you should see the result of your hard work.

11.2.0.2 Bundled Patch 3 for Linux x86-64bit Take 2

Yesterday I wrote about the application of Bundle Patch 3 to my 2 node RAC cluster. Unfortunately I have run into problems when applying the patches for the GRID_HOME. I promised a fix to the situation, and here it is.

First of all I was unsure if I could apply the missing patches manually, but then decided against it. The opatch output for interim patches lists the patch together with a unique patch as shown here:

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:08:43 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

The formatting unfortunately is lost when pasting this here.

I was not sure if that patch/unique patch combination would appear if I patched manually so decided to not be brave and roll the bundle patch back altogether before applying it again.

Patch Rollback

This was actually very simple: I have opted to rollback all the applied patches from the GRID_HOME. The documentation states that you have to simply append the “-rollback” flag to the opatch command. I tried it on the node where the application of 2 patches failed:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2 -rollback
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh /u01/app/oragrid/product/11.2.0.2 -rollback -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-03_09-04-13.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch 10387939  rollback successful for home /u01/app/oragrid/product/11.2.0.2
The patch  10157622 does not exist in /u01/app/oragrid/product/11.2.0.2
The patch  10626132 does not exist in /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.

So that was simple enough. Again – you won’t see anything in your opatch session and you might think that the command somehow stalled. I usually start a “screen” on my terminal and open a new window to tail the opatchauto file in $GRID_HOME/cfgtoollogs/.

Re-Applying the Patch

The next step is to re-apply the patch. The initial failure was a lack of disk space as it has been evident from the log file.

2011-02-02 15:57:45: The apply patch output is Invoking OPatch 11.2.0.1.4

 Oracle Interim Patch Installer version 11.2.0.1.4
 Copyright (c) 2010, Oracle Corporation.  All rights reserved.

 UTIL session

 Oracle Home       : /u01/app/oragrid/product/11.2.0.2
 Central Inventory : /u01/app/oracle/product/oraInventory
 from           : /etc/oraInst.loc
 OPatch version    : 11.2.0.1.4
 OUI version       : 11.2.0.2.0
 OUI location      : /u01/app/oragrid/product/11.2.0.2/oui
 Log file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch2011-02-02_15-57-35PM.log

 Patch history file: /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch_history.txt

 Invoking utility "napply"
 Checking conflict among patches...
 Checking if Oracle Home has components required by patches...
 Checking conflicts against Oracle Home...
 OPatch continues with these patches:   10157622

 Do you want to proceed? [y|n]
 Y (auto-answered by -silent)
 User Responded with: Y

 Running prerequisite checks...
 Prerequisite check "CheckSystemSpace" failed.
 The details are:
 Required amount of space(2086171834) is not available.
 UtilSession failed: Prerequisite check "CheckSystemSpace" failed.

 OPatch failed with error code 73

2011-02-02 15:57:45: patch /u01/app/oracle/product/stage/10387939/10157622  apply  failed  for home  /u01/app/oragrid/product/11.2.0.2
2011-02-02 15:57:45: Performing Post patch actions

So this time around I ensured that I had enough free space (2.5G recommended minimum) available in my GRID_HOME. The procedure is the inverse to the rollback:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh /u01/app/oragrid/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-03_09-27-39.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.
[root@node1 stage]#

And this time it worked-all 3 patches were applied. However my free space diminished quite drastically from 2.5G to around 780M. And that was after purging lots of logs from $GRID_HOME/log/`hostname -s`/ . Nevertheless, this concludes the patch application.

Summary

In summary I am quite impressed with this patch. It looks as if it had been designed to be deployable by OEM and it’s silent, doesn’t require input (except for the ocm.rsp file) and is a rolling patch. However, the user has to manually check the installed patches vs the list of installable targets manually or through a script to ensure that all patches have been applied.You also have to ensure you have enough free space on your $GRID_HOME mount point.

As an enhancement request I’d like to request feedback from the opatch session that it started doing things-initially I hit CTRL-C thinking the command had stalled while it was actually busy in the background, shutting down my CRS stack. The “workaround” is to tail the logfile with the “-f” option.

11.2.0.2 Bundled Patch 3 for Linux x86-64bit

It has become a habit recently to issue explicit warnings when presenting information that-if it doesn’t work as described-might cause you harm. So to follow suit, here’s mine:

Warning! This is a Patch Set Update installation war story-it worked for me. That does by no means imply that the same works for you! So always apply the PSU or bundle patch on a non-critical test system first. And always take a backup of your ORACLE_HOME and inventory first before applying any patches-you have been warned. I like to be rather safe than sorry.

A good one I think …  First of all thanks to Coskan for the pointer to the Oracle blog entry: http://blogs.oracle.com/UPGRADE/2011/01/11202_bundled_patch_3_for_linu.html which actually prompted me to apply the patch in the first place.

My environment is a 2 node 11.2.0.2 cluster on OEL 5.5 64bit. Judging by the readme file I need the latest OPatch, 11.2.0.1.4, (patch p6880880)  and the bundle patch itself (patch 10387939). Unfortunately Oracle leaves it as an exercise to the reader to make sense of the patch readme and note 1274453.1 which is applicable if ACFS is involved on the cluster. Although I am using ACFS all my ACFS volumes were disabled before patching (i.e. they are not shared ORACLE_HOMEs). If you have ACFS in use then follow note 1274453.1 and shut all of them down before proceeding. Note 1274453.1 is divided into a number of sections. First it’s important to find the right case-I wanted to try the opatch auto feature this time, but patch GRID_HOME independently of the RDBMS_HOME. I knew from other users that opatch auto has gained a bad reputation in the past but was curious whether or not Oracle fixed it. Regardless of your configuration, you have to stop dbconsole instances on all nodes if there are any.

If applicable, the next step is to stop all database instances and their corresponding ACFS volumes. The ACFS mounts also have to be dismounted. If you are unsure you can use /sbin/acfsutil registry to list the ACFS volumes. Those registered with Clusterware show up in the output of crsctl status resource -t.

Important! If not already done so, update OPatch on all nodes for GI and RDBMS home. It’s a lot of manual work, I wish there was a cluster wide OPatch location…it would make my life so much easier. Just patching OPatch on 4 nodes is a pain, I don’t want to imagine a 8 or more nodes cluster right now.

Now proceed with the patch installation. I’ll only let it patch the GRID_HOME at this stage. The patch should be rolling which is good. Note that you’ll be root, which can cause its own problems in larger organisations. My patch is staged in /u01/app/oracle/product/stage, and the patch readme suggests I should use the opatch executable in the $GRID_HOME.

[root@node1 ~]# cd /u01/app/oracle/product/stage
[root@node1 stage]# cd 10387939
[root@node1 10387939]# export PATH=/u01/app/oragrid/product/11.2.0.2/OPatch:$PATH

Oracle Configuration Manager response file

Unlike with previous patchsets which prompted the user each time you ran opatch, this time the OCM configuration is not part of the patch set installation. Instead, you have to create an OCM configuration “response” file before you apply the patch. The file is created in the current directory. As the GRID_OWNER, execute $ORACLE_HOME/OPatch/ocm/bin/emocmrsp as shown here:

[root@node1 stage]# su - oragrid
[oragrid@node1 ~]$ . oraenv
ORACLE_SID = [oragrid] ? grid
The Oracle base for ORACLE_HOME=/u01/app/oragrid/product/11.2.0.2 is /u01/app/oragrid/product/admin
[oragrid@node1 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/ocm/bin/emocmrsp
OCM Installation Response Generator 10.3.1.2.0 - Production
Copyright (c) 2005, 2009, Oracle and/or its affiliates.  All rights reserved.

Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:

You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  y
The OCM configuration response file (ocm.rsp) was successfully created.
[oragrid@node1 ~]$ pwd
/home/oragrid

I don’t want automatic email notifications in this case so I left the email address blank. I also prefer to be uninformed of future patch sets. I then moved the file into /u01/app/oracle/product/stage/ocm.rsp as root for the next steps. It is good practise to save the detailed oracle inventory information for later. Again as the grid owner I ran this command:

[root@node1 stage]# su - oragrid
[oragrid@node1 ~]$ . oraenv
ORACLE_SID = [oragrid] ? grid
The Oracle base for ORACLE_HOME=/u01/app/oragrid/product/11.2.0.2 is /u01/app/oragrid/product/admin
[oragrid@node1 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/opatch lsinv -detail > /tmp/gridhomedetail

What can (will) be patched?

The bundle.xml file lists the components that can be patched with a bundle patch. In my case, the bundle is applicable for these:


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

So in other words, the bundle exists of 3 patches – 10387939, 10157622 and 10626132. For each of these a target type is shown. SIHA is short for Single Instance High Availability, which has been renamed “Oracle Restart”. I assume SIDB is single instance DB, but don’t know for sure.

Patching the GRID_HOME

I’m now ready to apply the patch to the GRID home. The command to do so is simple, the below example worked for me:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh

/u01/app/oragrid/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_15-50-04.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:

Before entering the path I opened another terminal to look at the log file, which is in $ORACLE_HOME/cfgtoollogs/opatchauto plus a timestamp. After entering the full path to my ocm.rsp file nothing happened in my opatch session-Oracle could have told us that it was about to apply the patch. That is because it actually is doing that. Go back to your other terminal and see the log messages fly by! The opatch auto command automatically does everything that the DBA had to do previously, including unlocking the CRS stack and calling opatch napply for the relevant bits and pieces. This is indeed a nice step forward. I can vividly remember having to apply portions of a PSU to the GRID_HOME and others to the RDBMS home, 6 or 7 steps in total before a patch was applied. That was indeed hard work.

Only after CRS has been shut down (after quite a while after entering the path to the ocm.rsp file!) will you be shown this line:

Successfully unlock /u01/app/oragrid/product/11.2.0.2

Even further down the line you see those, after the patches have been applied:

patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply  failed  for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.

And with that I got my prompt back. One patch failed to apply-the log file indicated a lack of space (2G are required to be free).Tomorrow I’ll post an update and remove/reapply the patch manually.

I checked the other node and found that the patch has indeed been applied on the local node only. I hasn’t propagated across which is good as the readme wasn’t really clear if the patch was rolling. From what I’ve seen I’d call it a local patch, similar to the “opatch napply -local” we did manually before the opatch auto option has been introduced. And even better, it worked. Querying opatch lsinventory I got this result:

Lsinventory Output file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/lsinv/lsinventory2011-02-02_16-01-27PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (1) :

Patch  10387939     : applied on Wed Feb 02 15:56:50 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node1
 Remote node = node2

--------------------------------------------------------------------------------

OPatch succeeded.

Patching the RDBMS home

Now with that I’ll try applying it to the RDBMS home on the same node:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oracle/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh

/u01/app/oracle/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_16-05-29.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622/custom/server/10157622  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oracle/product/11.2.0.2
[root@node1 stage]#

Cool! That was indeed easy. Did it work?

$ORACLE_HOME/OPatch/opatch lsinv
[...]
--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Database 11g                                                  11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:08:43 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

Patch  10157622     : applied on Wed Feb 02 16:08:27 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:19 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10178670, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676

Patch  10157622     : applied on Wed Feb 02 16:08:27 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:19 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10178670, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676

Patch  10387939     : applied on Wed Feb 02 16:07:18 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node1
 Remote node = node2

--------------------------------------------------------------------------------

OPatch succeeded.

It did indeed. As a nice side effect I didn’t even have to worry about the srvctl start/stop home commands, they were automatically done for me. From the log:

2011-02-02 16:08:45: /u01/app/oracle/product/11.2.0.2/bin/srvctl start home -o /u01/app/oracle/product/11.2.0.2 -s

/u01/app/oracle/product/11.2.0.2/srvm/admin/stophome.txt -n node1 output is
2011-02-02 16:08:45: Started resources from datbase home /u01/app/oracle/product/11.2.0.2

Now I’ll simply repeat this on the remaining nodes and should be done. To test the stability of the process I didn’t limit the opatch command to a specific home but had it patch them all.

Patching all homes in one go

This takes a little longer but otherwise is just the same. I have added the output here for the sake of completeness:

[root@node2 stage]# which opatch
/u01/app/oragrid/product/11.2.0.2/OPatch/opatch
[root@node2 stage]# opatch auto /u01/app/oracle/product/stage/10387939
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_16-36-09.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622/custom/server/10157622  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oracle/product/11.2.0.2
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.
[root@node2 stage]#

Well that’s for the technical part. When I compared the number of patches applied in the $GRID_HOME on node 2 (which simply used “opatch auto”) then I found 3 interim patches applied, as compared to only 1 on node 1 when I only patched the GRID_HOME. I’ll have to raise this with Oracle…

[oragrid@node2 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/opatch lsinv
Invoking OPatch 11.2.0.1.4

Oracle Interim Patch Installer version 11.2.0.1.4
Copyright (c) 2010, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oragrid/product/11.2.0.2
Central Inventory : /u01/app/oracle/product/oraInventory
 from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.4
OUI version       : 11.2.0.2.0
OUI location      : /u01/app/oragrid/product/11.2.0.2/oui
Log file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch2011-02-02_17-13-40PM.log

Patch history file: /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/lsinv/lsinventory2011-02-02_17-13-40PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:48:59 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

Patch  10157622     : applied on Wed Feb 02 16:48:34 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:33 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676
 9974223, 10260251

Patch  10387939     : applied on Wed Feb 02 16:46:29 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node2
 Remote node = node1

--------------------------------------------------------------------------------

OPatch succeeded.

Viewing Runtime Load Balancing Events

Yesterday I have run a benchmark on a 2 node RAC cluster (ProLiant BL685c G6 with 4 Six-Core AMD Opteron 8431) and 32G RAM each. It’s running Oracle Grid Infrastructure 11.2.0.2 as well as an Oracle 11.2.0.2 database on Oracle Enterprise Linux 5.5 64bit and device-mapper-multipath.

I was testing how the system would react under load but also wanted to see if the Runtime Load Balancing was working. The easiest way to check this is to view the AQ events that are generated for a service if AQ HA notifications is set to true. They can either be dequeued from the database as described in chapter 11 of Pro Oracle Database 11g RAC on Linux or alternatively queried from the database. The latter is the quicker method and this article will focus on it.

Before you can make use of Runtime Load Balancing you need to set at least 2 properties in your service:

  • Connection Load Balancing Goal (either SHORT or LONG)
  • Runtime Load Balancing Goal (SERVICE_TIME or THROUGHPUT)

.Net applications require AQ HA notifications to be set to true as these can’t directly make use of Fast Application Notification (FAN) events as said in the introduction. My JDBC application is fully capable of using the FAN events, however as you will see later I am using the AQ notifications anyway to view the events.

Connected as the owner of the Oracle binaries, I created a new service to make use of both instances:

$ srvctl add service -d TEST -s TESTSRV -r TEST1,TEST2 -P BASIC  \
> -l PRIMARY -y MANUAL -q true -x false -j short -B SERVICE_TIME \
> -e SESSION -m BASIC -z 0 -w 0

The service TESTSRV for database TEST has TEST1 and TEST2 as preferred instances, and the service should be started (manually) when the database is in the primary role. AQ Notifications are enabled, and I chose the connection load balancing goal to be “short” (usually ok with web applications and connection pooling) and a runtime load balancing goal of service time (should also be appropriate for many short transactions typical for a web environment). The remaining paramters define Transparent Application Failover. Please refer to the output of “srvctl add service -h” for more information about the command line parameters.

The result of this endavour can be viewed with srvctl config service:

$ srvctl config service -d TEST -s TESTSRV
Service name: TESTSRV
Service is enabled
Server pool: TEST_TESTSRV
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: MANUAL
DTP transaction: false
AQ HA notifications: true
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: BASIC
Edition:
Preferred instances: TEST1,TEST2
Available instances:

So to begin with I created the order entry schema (SOE) in preparation of a swingbench run. (I know that Swingbench’s Order Entry is probably not the best benchmark out there but my client knows and likes it). Once about 10G of data were generated I started a swingbench run with 300 users, and reasonably low think time (min transaction time 20ms and max of 60ms). The connect string was //scan1.example.com:1800/TESTSRV

A query against gv$session showed an even balance of sessions, which was good:

select count(inst_id), inst_id
 from gv$session
where username = 'SOE'
group by inst_id

However, whatever I did I couldn’t get the Runtime Load Balancing in sys.sys$service_metrics_tab to chanage. They always looked like this (column user_data):

{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} }

That sort of made sense as none of the nodes broke into a sweat-the system was > 50% idle with a load average of about 12. So that wouldn’t cut it. Instead of trying to experiment with the Swingbench parameters, I decided to revert back to the silly CPU burner: a while loop which generates random numbers. I wasn’t interested in I/O at this stage, and created this minimal script:

$ cat dothis.sql
declare
 n number;
begin
 while (true) loop
 n:= dbms_random.random();
 end loop;
end;
/

A simple for loop can be used to start the load test:

$ for i in $(seq 30); do
> sqlplus soe/soe@scan1.example.com:1800/TESTSRV @dothis &
done

This created an even load on both nodes. I then started another 20 sessions on node1 against TEST1 to trigger the change in behaviour. And fair enough, the top few lines of “top” revealed the difference. The output for node 1 was as follows:


top - 10:59:30 up 1 day, 21:16,  6 users,  load average: 42.44, 20.23, 10.07
Tasks: 593 total,  48 running, 545 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.9%us,  0.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32960688k total, 11978912k used, 20981776k free,   441260k buffers
Swap: 16777208k total,        0k used, 16777208k free,  8078336k cached

Whereas node 2 was relatively idle.

top - 10:59:22 up 5 days, 17:45,  4 users,  load average: 15.80, 10.53, 5.74
Tasks: 631 total,  16 running, 605 sleeping,  10 stopped,   0 zombie
Cpu(s): 58.8%us,  0.6%sy,  0.0%ni, 40.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32960688k total, 11770080k used, 21190608k free,   376672k buffers
Swap: 16777208k total,        0k used, 16777208k free,  7599496k cached

Would that imbalance finally make a difference? It did, as the user_data column (truncated here for better readability) reveals:

SQL> select user_data
2  from sys.sys$service_metrics_tab
3  order by enq_time desc;

{instance=TEST1 percent=4 flag=GOOD aff=TRUE}{instance=TEST2 percent=96 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:01:16')
{instance=TEST1 percent=6 flag=GOOD aff=TRUE}{instance=TEST2 percent=94 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:00:46')
{instance=TEST1 percent=10 flag=GOOD aff=TRUE}{instance=TEST2 percent=90 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:00:16')
{instance=TEST1 percent=18 flag=GOOD aff=TRUE}{instance=TEST2 percent=82 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:59:46')
{instance=TEST1 percent=28 flag=GOOD aff=TRUE}{instance=TEST2 percent=72 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:59:16')
{instance=TEST1 percent=35 flag=GOOD aff=TRUE}{instance=TEST2 percent=65 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:58:46')
{instance=TEST1 percent=40 flag=GOOD aff=TRUE}{instance=TEST2 percent=60 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:58:16')
{instance=TEST1 percent=43 flag=GOOD aff=TRUE}{instance=TEST2 percent=57 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:57:46')
{instance=TEST1 percent=44 flag=GOOD aff=TRUE}{instance=TEST2 percent=56 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:57:16')
{instance=TEST1 percent=48 flag=GOOD aff=TRUE}{instance=TEST2 percent=52 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:56:46')
{instance=TEST1 percent=49 flag=GOOD aff=TRUE}{instance=TEST2 percent=51 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:56:16')
{instance=TEST1 percent=50 flag=GOOD aff=TRUE}{instance=TEST2 percent=50 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:55:46')
{instance=TEST1 percent=50 flag=GOOD aff=TRUE}{instance=TEST2 percent=50 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:55:16')
{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} } timestamp=2011-01-20 10:54:46')
{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} } timestamp=2011-01-20 10:54:16')

Where it was initially even at 50-50 it soon became imbalanced, and TEST2 would be preferred after a few minutes in the test.  So everything was working as expected, I just didn’t manage to put enough load on the system intially.

Adding storage dynamically to ASM on Linux

Note: This discussion is potentially relevant only to OEL 5.x and RHEL 5.x- I haven’t been able to verify that it works the same way on other Linux distributions. I would assume so though. Before starting with the article, here are some facts:

  • OEL/RHEL 5.5 64bit
  • Oracle 11.2.0.2
  • native multipathing: device-mapper-multipath

The question I have asked myself many times is: how can I dynamically add a LUN to ASM without having to stop any component of the stack? Mocking “reboot-me” OS’s like Windows I soon was quiet when it came to discussing the addition of a LUN to ASM on Linux. Today I learned how to do this, by piecing together information I got from Angus Thomas, a great Red Hat system administrator I had the pleassure to work with in 2009 and 2010. And since I have a short lived memory I decided to write it down.

I’ll describe the process from the top to bottom, from the addition of the LUN to the server all the way up to the addition of the ASM disk to the disk group.

Adding the storage to the cluster nodes

The first step is to obviosuly get the LUN assigned to the server(s). This is the easy part, and outside of the control of the Linux/Oracle admin. The storage team will provision a new LUN to the hosts in question. At this stage, Linux has no idea about the new storage: to make it available, the system administrator has to rescan the SCSI bus. A proven and tested way in RHEL 5 is to issue this command:

[root@node1 ~]# for i in `ls -1 /sys/class/scsi_host`; do
> echo "- - -" > /sys/class/scsi_host/${i}/scan
> done

The new, unpartitioned LUN will appear in /proc/partitions. If it doesn’t then there’s probably something wrong on the SAN side-check /var/log/messages and talk to your storage administrator. If it’s not a misconfiguration then you may not have an option but to reboot the node.

Configure Multipathing

So far so good, the next step is to add it to the multipathing. First of all, you need to find out what the new WWID of the device is. In my case that’s simple: the last new line in /proc/partitions is usually a giveaway. If you are unsure, ask the man who can check the WWID a console to the array. It’s important to get this right at this stage :)

To add the new disk to the multipath.conf file, all you need to do is to add a new section, as in the following example:


multipaths {
..
multipath {
wwid 360000970000294900664533030344239
alias ACFS0001
path_grouping_policy failover
}
..
}

By the way, I have written a more detailed post about configuring multipathing in a previous blog post here. Don’t forget to replicate the changes to the other cluster nodes!

Now  you reload multipathd using /etc/init.d/multipathd reload on each node, and voila, you should see the device in /dev/mapper/ – my ACFS disk appeared as /dev/mapper/ACFS0001.

Now the tricky bit is to partition it (if you need to-it’s no longer mandatory with 11.1 and newer. Some software like EMC’s Replication Manager requires you to though). I succeeded in doing so by checking the device in /dev/disk/by-id and then using fdisk against it as in this example:

...
# fdisk /dev/disk/by-id/scsi-360000970000294900664533030344239
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 23251.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
 (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): n
Command action
 e   extended
 p   primary partition (1-4)
p
Partition number (1-4): 1
First sector (32-47619839, default 32): 128
Last sector or +size or +sizeM or +sizeK (129-47619839, default 47619839):
Using default value 47619839

Command (m for help): p

Disk /dev/disk/by-id/scsi-360000970000294900664533030344239: 24.3 GB, 24381358080 bytes
64 heads, 32 sectors/track, 23251 cylinders, total 47619840 sectors
Units = sectors of 1 * 512 = 512 bytes

 Device Boot                                                         Start  End         Blocks     Id  System
/dev/disk/by-id/scsi-360000970000294900664533030344239p1             128    47619839    23809855+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Once you are in fdisk, the commands are identical to single-pathed storage. Type “n” to create a new partition, “p” for a primary and specify the start and end cylinders as needed. Type “p” to print the partition table, and if you are happy with it use “w” to write it. You might wonder why I added an offset and changed the unit (“u”)-this is due to the EMC storage this site uses.  The EMC® Host Connectivity Guide for Linux (P/N 300-003-865 REV A23) suggests a 64k offset. Don’t simply repeat this in your environment-check with the storage team first.

Before adding the partitions to ACFS0001 and ACFS0002 I had 107 partitions:


[root@node1 ~]# wc -l /proc/partitions
107 /proc/partitions

The new partitions are recognised after the 2 fdisk commands completed:


[root@node1 ~]# wc -l /proc/partitions
 107 /proc/partitions

But when you check /dev/mapper now you still don’t see the partition-the naming convention is to append pn to the device name, i.e. /dev/mapper/ACFS0001p1 for the first partition and so on.

kpartx to the rescue! This superb utility can read the partition table of a device and modify it. Initially my setup was as follows:


[root@node1 ~]# ls -l /dev/mapper/ACFS*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002

Usually I would have rebooted the node at this stage as I didn’t know about how to update the partition table. But with kpartx (“yum install kpartx” to install) this is no longer needed. Consider the below example:

[root@node1 ~]# kpartx -l /dev/mapper/ACFS0001
ACFS0001p1 : 0 47619711 /dev/mapper/ACFS0001 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0001
[root@node1 ~]# kpartx -l /dev/mapper/ACFS0002
ACFS0002p1 : 0 47619711 /dev/mapper/ACFS0002 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0002

[root@node1 ~]# ls -l /dev/mapper/ACFS000*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 36 Jan 18 10:13 /dev/mapper/ACFS0001p1
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002
brw-rw---- 1 root disk 253, 37 Jan 18 10:13 /dev/mapper/ACFS0002p1

“kpartx -l” prints the partition table, and “kpartx -a” adds it as the example shows. No more need to reboot! However, as it’s been pointed out in the comments section (see below), kpartx doesn’t use/add both paths, so you should run the partprobe command to add the missing paths:


[root@node1 ~]# partprobe
[root@node1 ~]# wc -l /proc/partitions
109 /proc/partitions

 

 

See how there are 109 partitions listed now instead of just 107 from before-the 2 missing paths have been added (one for each device).

Add disks to ASM

With this done, you can add the disk to ASM – I personally like the intermediate step to create and ASMLib disk. Connect to ASM as sysasm and add the disk using the alter diskgroup command:

SQL> alter diskgroup ACFSDG add disk 'ORCL:ACFS0002', 'ORCL:ACFS0001';

Now just wait for the rebalance operation to complete.

Adding user equivalence for RAC the easy way

This is the first time I am setting up a new 11.2.0.2 cluster with the automatic SSH setup. Until now, I ensured user equivalence by copying ssh RSA and DSA manually to all cluster nodes. For two nodes that’s not too bad, but recently someone asked a question around a 28 (!) node cluster on a mailing list I am subscribing to. So that’s when I think the whole process  gets a bit too labour intensive.

So setting up user equivalence using a script may be the solution. You can also use OUI to do the same, but I like to run “cluvfy stage -post hwos” to check everything is ok before even thinking about executing ./runInstaller.

Here’s the output of a session, my 2 cluster nodes are acfsprodnode1 and acfsprodnode2 (yes, they are for 11.2 ACFS replication and encryption testing). I am using the grid user as the owner of Grid Infrastructure, and oracle to own the RDBMS binaries. Start by navigating to the location where you unzipped the Grid Infrastructure patch file. Then change into directoy “sshsetup” and run the command:


[grid@acfsprdnode1 sshsetup]$ ./sshUserSetup.sh
Please specify a valid and existing cluster configuration file.
Either user name or host information is missing
Usage ./sshUserSetup.sh -user  [ -hosts "" | -hostfile  ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile  ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]

Next execute the command, I opted for option noPromptPassphrase, as I don’t use them for the key.

[grid@acfsprdnode1 sshsetup]$ ./sshUserSetup.sh -user grid -hosts "acfsprdnode1 acfsprdnode2" -noPromptPassphrase
The output of this script is also logged into /tmp/sshUserSetup_2010-12-22-15-39-18.log
Hosts are acfsprdnode1 acfsprdnode2
user is grid
Platform:- Linux
Checking if the remote hosts are reachable
PING acfsprdnode1.localdomain (192.168.99.100) 56(84) bytes of data.
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=2 ttl=64 time=0.019 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=3 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=4 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=5 ttl=64 time=0.018 ms

--- acfsprdnode1.localdomain ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.017/0.017/0.019/0.004 ms
PING acfsprdnode2.localdomain (192.168.99.101) 56(84) bytes of data.
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=1 ttl=64 time=0.331 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=2 ttl=64 time=0.109 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=3 ttl=64 time=0.324 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=4 ttl=64 time=0.256 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=5 ttl=64 time=0.257 ms

--- acfsprdnode2.localdomain ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.109/0.255/0.331/0.081 ms
Remote host reachability check succeeded.
The following hosts are reachable: acfsprdnode1 acfsprdnode2.
The following hosts are not reachable: .
All hosts are reachable. Proceeding further...
firsthost acfsprdnode1
numhosts 2
#ff0000;">The script will setup SSH connectivity from the host acfsprdnode1 to all
#ff0000;">the remote hosts. After the script is executed, the user can use SSH to run
commands on the remote hosts or copy files between this host acfsprdnode1
and the remote hosts without being prompted for passwords or confirmations.

NOTE 1:
As part of the setup procedure, this script will use ssh and scp to copy
files between the local host and the remote hosts. Since the script does not
store passwords, you may be prompted for the passwords during the execution of
the script whenever ssh or scp is invoked.

NOTE 2:
AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY
AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEDGES TO THESE
directories.

Do you want to continue and let the script make the above mentioned changes (yes/no)?
#ff0000;">yes

The user chose yes
User chose to skip passphrase related questions.
Creating .ssh directory on local host, if not present already
Creating authorized_keys file on local host
Changing permissions on authorized_keys to 644 on local host
Creating known_hosts file on local host
Changing permissions on known_hosts to 644 on local host
Creating config file on local host
If a config file exists already at /home/grid/.ssh/config, it would be backed up to /home/grid/.ssh/config.backup.
Removing old private/public keys on local host
Running SSH keygen on local host with empty passphrase
Generating public/private rsa key pair.
Your identification has been saved in /home/grid/.ssh/id_rsa.
Your public key has been saved in /home/grid/.ssh/id_rsa.pub.
The key fingerprint is:
de:e3:66:fa:16:e8:6e:36:fd:c5:e3:77:75:07:9a:b0 grid@acfsprdnode1
Creating .ssh directory and setting permissions on remote host acfsprdnode1
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host acfsprdnode1. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host acfsprdnode1.
Warning: Permanently added 'acfsprdnode1,192.168.99.100' (RSA) to the list of known hosts.
grid@acfsprdnode1's password:
Done with creating .ssh directory and setting permissions on remote host acfsprdnode1.
Creating .ssh directory and setting permissions on remote host acfsprdnode2
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host acfsprdnode2. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host acfsprdnode2.
Warning: Permanently added 'acfsprdnode2,192.168.99.101' (RSA) to the list of known hosts.
grid@acfsprdnode2's password:
Done with creating .ssh directory and setting permissions on remote host acfsprdnode2.
Copying local host public key to the remote host acfsprdnode1
The user may be prompted for a password or passphrase here since the script would be using SCP for host acfsprdnode1.
grid@acfsprdnode1's password:
Done copying local host public key to the remote host acfsprdnode1
Copying local host public key to the remote host acfsprdnode2
The user may be prompted for a password or passphrase here since the script would be using SCP for host acfsprdnode2.
grid@acfsprdnode2's password:
Done copying local host public key to the remote host acfsprdnode2
cat: /home/grid/.ssh/known_hosts.tmp: No such file or directory
cat: /home/grid/.ssh/authorized_keys.tmp: No such file or directory
SSH setup is complete.

------------------------------------------------------------------------
Verifying SSH setup
===================
The script will now run the date command on the remote nodes using ssh
to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,
THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR
PASSWORDS. If you see any output other than date or are prompted for the
password, ssh is not setup correctly and you will need to resolve the
issue and set up ssh again.
The possible causes for failure could be:
1. The server settings in /etc/ssh/sshd_config file do not allow ssh
for user grid.
2. The server may have disabled public key based authentication.
3. The client public key on the server may be outdated.
4. ~grid or ~grid/.ssh on the remote host may not be owned by grid.
5. User may not have passed -shared option for shared remote users or
may be passing the -shared option for non-shared remote users.
6. If there is output in addition to the date, but no password is asked,
it may be a security alert shown as part of company policy. Append the
additional text to the /sysman/prov/resources/ignoreMessages.txt file.
------------------------------------------------------------------------
--acfsprdnode1:--
Running /usr/bin/ssh -x -l grid acfsprdnode1 date to verify SSH connectivity has been setup from local host to acfsprdnode1.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
Wed Dec 22 15:40:10 GMT 2010
------------------------------------------------------------------------
--acfsprdnode2:--
Running /usr/bin/ssh -x -l grid acfsprdnode2 date to verify SSH connectivity has been setup from local host to acfsprdnode2.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
Wed Dec 22 15:40:10 GMT 2010
------------------------------------------------------------------------
SSH verification complete.
[grid@acfsprdnode1 sshsetup]$ ssh acfsprdnode1 hostname
acfsprdnode1
[grid@acfsprdnode1 sshsetup]$ ssh acfsprdnode2 hostname
acfsprodnode2
[grid@acfsprdnode1 sshsetup]$

Nice! That’s a lot of work taken away from me, and I can start runing cluvfy now to fix problems before OUI warns me about shortcomings on my system.

You should note that per the above output, the script only distributes the local ssh keys to the remote hosts. When in OUI’s cluster node addition screen (6 of 16 in the advanced installation) you still need to click on the “SSH Connectivity” button and then on “Setup” after providing username and password to establish cluster wide user equivalence.

Configuration device mapper multipath on OEL5 update 5

I have always wondered how to configure the device mapper multipath package for a Linux system. I knew how to do it in principle, but was never involved in the configuration from start up. Today I got the chance to work on this. The system is used for a lab test and not a production box (otherwise I probably wouldn’t have been allowed on). Actually it’s part of a 2 node cluster.

So the first step is to find out which partitions are visible to the system. The Linux kernel presents this information in the /proc/partitions table, as in the following example:


[root@node1 ~]# cat /proc/partitions
major minor  #blocks  name

 104     0   71652960 cciss/c0d0
 104     1     152586 cciss/c0d0p1
 104     2   71497282 cciss/c0d0p2
 8     0       2880 sda
 8    16  190479360 sdb
 8    32   23809920 sdc
 8    48   23809920 sdd
 8    64   23809920 sde
 8    80   23809920 sdf
 8    96   23809920 sdg
 8   112    1048320 sdh
 8   128    1048320 sdi
 8   144    1048320 sdj
 8   160       2880 sdk
 8   176  190479360 sdl
 8   192   23809920 sdm
 8   208   23809920 sdn
 8   224   23809920 sdo
 8   240   23809920 sdp
 65     0   23809920 sdq
 65    16    1048320 sdr
 65    32    1048320 sds
 65    48    1048320 sdt
 253     0    5111808 dm-0
 253     1   25591808 dm-1
 253     2   10223616 dm-2
 253     3    1015808 dm-3
 253     4   16777216 dm-4
[root@node1 ~]#

Using a keen eye you can see that sdk is the same size as sda, so probably that means that we have two paths to sda to sdj. We’ll confirm this later. The more HBAs and paths you have, the more partitions you are going to see. This is where the multipathing software comes into play: it allows us to abstract from the physical paths and presents a logical device. And offers some additional goodies such as path failover and limited load balancing.

Before proceeding I checked the status of the multipath daemon:

[root@node1 ~]# service multipathd status
multipathd is stopped
[root@node1 ~]# chkconfig --list multipathd
multipathd      0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@node1 ~]# chkconfig multipathd on

As you can see it was not started, and wouldn't start with a reboot - it was necessary to enable the service at boot time using the chkconfig command. This will automatically create links in /etc/rc.d/rcx.d to start and stop the service. As an additional benefit this command will respect dependencies the authors of the startup script have defined and create the {K,S}xxmultipathd links accordingly.

I next loaded the necessary modules-dm-multipath and dm-round-robin:

[root@node1 ~]# modprobe dm-multipath
[root@node1 ~]# modprobe dm-round-robin

With the multipathing nearly done, I need to get the WWIDs of all attached devices. At some point the WWID is going to be repeated - this is where you stop creating meta devices. Let's have a look at the output of this first. You have to change directory to /sys, as the scsi_id commands are relative to it.

[node1 sys]# for i in `cat /proc/partitions | awk '{print $4}' |grep sd`; do echo "### $i: `scsi_id -g -u -s /block/$i`"; done
### sda: 360000970000294900664533030303238
### sdb: 360000970000294900664533030344133
### sdc: 360000970000294900664533030344142
### sdd: 360000970000294900664533030344143
### sde: 360000970000294900664533030344144
### sdf: 360000970000294900664533030344239
### sdg: 360000970000294900664533030344241
### sdh: 360000970000294900664533030344244
### sdi: 360000970000294900664533030344245
### sdj: 360000970000294900664533030344246
### sdk: 360000970000294900664533030303238
### sdl: 360000970000294900664533030344133
### sdm: 360000970000294900664533030344142
### sdn: 360000970000294900664533030344143
### sdo: 360000970000294900664533030344144
### sdp: 360000970000294900664533030344239
### sdq: 360000970000294900664533030344241
### sdr: 360000970000294900664533030344244
### sds: 360000970000294900664533030344245
### sdt: 360000970000294900664533030344246
[node1 sys]#

Here you see again that sda and sdk have the same WWID. I like to assign alias names to the multipathing devices-that's going to make it easier to find out what they are used for. I now have to get the disk sizes and map these to their intended use.

Getting disk sizes:

[node1 sys]# fdisk -l 2>/dev/null | grep ^Disk
Disk /dev/cciss/c0d0: 73.3 GB, 73372631040 bytes        local
Disk /dev/sda: 2 MB, 2949120 bytes                ignore
Disk /dev/sdb: 195.0 GB, 195050864640 bytes
Disk /dev/sdc: 24.3 GB, 24381358080 bytes
Disk /dev/sdd: 24.3 GB, 24381358080 bytes
Disk /dev/sde: 24.3 GB, 24381358080 bytes
Disk /dev/sdf: 24.3 GB, 24381358080 bytes
Disk /dev/sdg: 24.3 GB, 24381358080 bytes
Disk /dev/sdh: 1073 MB, 1073479680 bytes
Disk /dev/sdi: 1073 MB, 1073479680 bytes
Disk /dev/sdj: 1073 MB, 1073479680 bytes
Disk /dev/sdk: 2 MB, 2949120 bytes                ignore
Disk /dev/sdl: 195.0 GB, 195050864640 bytes
Disk /dev/sdm: 24.3 GB, 24381358080 bytes
Disk /dev/sdn: 24.3 GB, 24381358080 bytes
Disk /dev/sdo: 24.3 GB, 24381358080 bytes
Disk /dev/sdp: 24.3 GB, 24381358080 bytes
Disk /dev/sdq: 24.3 GB, 24381358080 bytes
Disk /dev/sdr: 1073 MB, 1073479680 bytes
Disk /dev/sds: 1073 MB, 1073479680 bytes
Disk /dev/sdt: 1073 MB, 1073479680 bytes

The cleared, consolidated view on the storage:

### sdb: 360000970000294900664533030344133    195G
### sdc: 360000970000294900664533030344142    24.3G
### sdd: 360000970000294900664533030344143    24.3G   
### sde: 360000970000294900664533030344144    24.3G   
### sdf: 360000970000294900664533030344239    24.3G   
### sdg: 360000970000294900664533030344241    24.3G   
### sdh: 360000970000294900664533030344244    1G
### sdi: 360000970000294900664533030344245    1G
### sdj: 360000970000294900664533030344246    1G

### sdl: 360000970000294900664533030344133    repeat - second path
### sdm: 360000970000294900664533030344142
### sdn: 360000970000294900664533030344143
### sdo: 360000970000294900664533030344144
### sdp: 360000970000294900664533030344239
### sdq: 360000970000294900664533030344241
### sdr: 360000970000294900664533030344244
### sds: 360000970000294900664533030344245
### sdt: 36000097000029490066453303034424

Finally here's the mapping I will use:

  • sdb    DATA001
  • sdc    REDO001
  • sdd     FRA001
  • sde    FRA002
  • sdf    ACFS001
  • sdg    ACFS002
  • h,i,j     VOTINGOCR{1,2,3}

The mapping between WWID and alias happens in the /etc/multipath.conf file. The defaults section has been taken from MOS note 555603.1. The devnode_blacklist section has to be set up according to your storage config-in my case I ignore IDE devices and the internal RAID adapter.

[root@node1 ~]# cat /etc/multipath.conf
defaults {
 udev_dir                /dev
 polling_interval        10
 selector                "round-robin 0"
 path_grouping_policy    multibus
 getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
 prio_callout            /bin/true
 path_checker            readsector0
 rr_min_io               100
 rr_weight               priorities
 failback                immediate
 no_path_retry           fail
 user_friendly_name      no
}

devnode_blacklist {
 devnode "^(ramrawloopfdmddm-srscdst)[0-9]*"
 devnode "^hd[a-z]"
 devnode "^cciss!c[0-9]d[0-9]*"
 }

}

multipaths {
 multipath {
 wwid 360000970000294900664533030344133
 alias data001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344142
 alias redo001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344143
 alias fra001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344144
 alias fra002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344239
 alias acfs001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344241
 alias acfs002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344244
 alias votingocr001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344245
 alias votingocr002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344246
 alias votingocr003
 path_grouping_policy failover
 }
}

The mapping is really simple-for each device you use, create a "multipath" section, enter WWID, an alias and a path policy. Done! See if that worked by starting the multipath daemon:

[root@node1 ~]# service multipathd start

As always, /var/log/messages is a good place to check:

Nov 16 16:34:58 loninengblc204 kernel: device-mapper: table: 253:5: multipath: error getting device
Nov 16 16:34:58 loninengblc204 kernel: device-mapper: ioctl: error adding target to table
Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: load table [0 5760 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:160 1000]
Nov 16 16:34:58 loninengblc204 multipathd: data001: load table [0 380958720 multipath 0 0 2 1 round-robin 0 1 1 8:16 1000 round-robin 0 1 1 8:176 1000]
Nov 16 16:34:58 loninengblc204 multipathd: redo001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 1 8:192 1000]
Nov 16 16:34:58 loninengblc204 multipathd: fra001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:48 1000 round-robin 0 1 1 8:208 1000]
Nov 16 16:34:58 loninengblc204 multipathd: fra002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:64 1000 round-robin 0 1 1 8:224 1000]
Nov 16 16:34:58 loninengblc204 multipathd: acfs001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:80 1000 round-robin 0 1 1 8:240 1000]
Nov 16 16:34:58 loninengblc204 multipathd: acfs002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 1 1 65:0 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:112 1000 round-robin 0 1 1 65:16 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:128 1000 round-robin 0 1 1 65:32 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:144 1000 round-robin 0 1 1 65:48 1000]
Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: data001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: redo001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: fra001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: fra002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: acfs001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: acfs002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: path checkers start u

Great - are all paths working?

[root@node1 ~]# multipath -ll | head
fra002 (360000970000294900664533030344144) dm-9 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:4 sde 8:64  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:4 sdo 8:224 [active][ready]
fra001 (360000970000294900664533030344143) dm-8 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:3 sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:3 sdn 8:208 [active][ready]
acfs002 (360000970000294900664533030344241) dm-11 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:6 sdg 8:96  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:6 sdq 65:0  [active][ready]

Congratulations - distribute the working multipath.conf to all cluster nodes and start multipathd.

The beauty of this over a solution such as power path is that the device names are consistent across the cluster. With PowerPath I have come across a situation where /dev/rdsk/emcpower1a on node1 was /dev/rdsk/emcpower4a on node2 and again a different device on the other nodes. Not really user friendly, but neither a big issue with ASM: it'll read the information from the disk headers anyway. It was more a problem with pre 11.2 when you had to use block devices to store the OCR and voting files.

Oracle RAC One Node revisited – 11.2.0.2

Since we published the RAC book, Oracle has released patchset 11.2.0.2. Amongst other things, this improved the RAC One Node option, exactly the way we expected.

How it was

A quick recap on the product as it was in 11.2.0.1: RAC One Node is part of Oracle Enterprise Edition, any other software editions are explicitly not allowed. Another restriction exists for 3rd party Clusterware: it’s not allowed to use one. RAC One Node is a hybrid between full blown RAC and the active/passive cluster. The option uses Grid Infrastructure for cluster management and storage provisioning via ASM. The RAC One instance starts its life as a RAC database, limited to only one cluster node. It only ever runs on one node only, but that node can change. It is strongly recommended to create a service for that RAC database. Utilities such as raconeinit provide a text based command line interface to transform that database to a “RAC One Node”-instance. In the process, the administrator can elect which nodes should be allowed to run the instance. The “omotion” utilities allowed the DBA to move the RAC One Node instance from the current node to another one. Optionally a time threshold could be set after which all ongoing transactions were to move to the new node. This feature required TAF or FAN to be set up correctly. The raconestatus utility allowed you to view the status of your RAC One Node instances. Conversion to full RAC was made possible by the racone2rac utility.

If you were after a Data Guard setup you’d be disappointed: that wasn’t (and AFAIK still is not) supported.

So all in all, that seemed a little premature. A patch to be downloaded and applied, no Data Guard and a new set of utilities are not really user friendly. Plus, initially this patch was available for Linux only. But at least a MOS note (which I didn’t find until after having finished writing this!) exists, RAC One — Changes in 11.2.0.2 [ID 1232802.1]

Changes

Instead of having to apply patch 9004119 to your environment, RAC One Node is available “out of the box” with 11.2.0.2. Sadly, the Oracle RAC One Node manual has not been updated, and searches on Metalink reveal no new information. One interesting piece of information: the patch for RAC One Node is listed as “undocumented Oracle Server” section.

The creation of a RAC One Node instance has been greatly simplified-dbca has added support for it, both from the command line for silent installations as well as the interactive GUI. Consider these options for dbca.

$ dbca -help
dbca  [-silent | -progressOnly | -customCreate] {  }  |
 { [ [options] ] -responseFile   }
 [-continueOnNonFatalErrors ]
Please refer to the manual for details.
You can enter one of the following command:

Create a database by specifying the following parameters:
-createDatabase
 -templateName 
 [-cloneTemplate]
 -gdbName 
 [-RACOneNode
 -RACOneNodeServiceName  ]
 [-policyManaged | -adminManaged ]
 [-createServerPool ]
 [-force ]
 -serverPoolName 
 -[cardinality ]
 [-sid ]
...

With RAC One Node you will most likely end up with a policy managed database in the end, I can’t see how an admin managed database made sense.

The srvctl command line tool has been improved to deal with the RAC One node. The most important operations are to add, remove, config and status. The nice thing about dbca is that it actually registers the database in the OCR. Immediately after the installation, you see this status information:

$ srvctl status database -d rontest
Instance rontest_1 is running on node node2
Online relocation: INACTIVE

$ srvctl config database -d rontest
Database unique name: rontest
Database name:
Oracle home: /data/oracle/product/11.2.0.2
Oracle user: oracle
Spfile: + DATA/rontest/spfilerontest.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: rontest
Database instances:
Disk Groups: DATA
Mount point paths:
Services: rontestsrv
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: rontest
Candidate servers: node2,node3
Database is administrator managed

Note that the instance_name, although the instance is administrator managed, changed to $ORACLE_SID_1. Relocating works now with the srvctl relocate database command as in this example:

$ srvctl relocate database -d rontest -n node2

You’ll get feedback about this in the output of the “status” command:

$ srvctl status database -d rontest
Instance rontest_1 is running on node node2
Online relocation: ACTIVE
Source instance: rontest_1 on node2
Destination instance: rontest_2 on node3

After the command completed, check the status again:

srvctl status database -d rontest
Instance rontest_2 is running on node node2
Online relocation: INACTIVE

The important difference between an admin managed database and a policy managed database is that you are responsible for undo tablespaces. If you don’t create and configure undo tablespaces, the relocate command will fail:

$ srvctl relocate database –d rontest -n node3                       <
PRCD-1222 : Online relocation of database rontest failed but database was restored to its original state
PRCD-1129 : Failed to start instance rontest_2 for database rontest
PRCR-1064 : Failed to start resource ora.rontest.db on node node3
CRS-5017: The resource action "ora.rontest.db start" encountered the following error:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use
Process ID: 1587
Session ID: 35 Serial number: 1

CRS-2674: Start of 'ora.rontest.db' on 'node3' failed

In this case, the database runs on the same node. Check the ORACLE_SID (rontest_2 in my case) and modify the initialisation parameter.

SQL> select tablespace_name from dba_data_files where tablespace_name like ‘%UNDO%’;

TABLESPACE_NAME
------------------------------
UNDOTBS1
UNDOTBS2

So the tablespace was there, but the initialisation parameter was wrong! Let’s correct this:

SQL> alter system set undo_tablespace='UNDOTBS1' sid='rontest_1';

System altered.

SQL> alter system set undo_tablespace='UNDOTBS2' sid='rontest_2';

System altered.

Now the relocate will succeed.

To wrap this article up, the srvctl convert database command will convert between single instance, RAC One Node and RAC databases.

Installing RAC 11.2.0.2 on Solaris 10/09 x64

One of the major adventures this time of the year involves installing RAC 11.2.0.2 on Solaris 10 10/09 x86-64. The system setup included EMC Power Path 5.3 as the multipathing solution to shared storage.

I initially asked for 4 BL685 G6 with 24 cores, but in the end “only” got two-still plenty of resources to experiment with.  I especially like the output of this command:

$ /usr/sbin/psrinfo | wc –l
 24

Nice! Actually, it’s 4 Opteron processors:

$ /usr/sbin/prtdiag | less
System Configuration: HP ProLiant BL685c G6
 BIOS Configuration: HP A17 12/09/2009
 BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)
==== Processor Sockets ====================================
Version                          Location Tag
 -------------------------------- --------------------------
 Opteron                          Proc 1
 Opteron                          Proc 2
 Opteron                          Proc 3
 Opteron                          Proc 4

So much for the equipment. The operating system showed 4 NICs, all called bnxen where n was 0 through 4. The first interface, bnxe0, will be used for the public network. The second NIC is to be ignored and the final 2, bnxe2 and bnxe3 will be used for the high available cluster interconnect feature. This way I can prevent the use of SFRAC which inevitably would have meant a clustered Veritas file system instead of ASM.

One interesting point to notice is that the Oracle MOS document 1210883.1 specifies that the interfaces for the private interconnect are on the same subnet. So-node1 will use 192.168.0.1 for bnxe2 and 192.168.0.2 for bnxe3. Similarly, node2 uses 192.168.0.3 for bnxe2 and 192.168.0.4 for bnxe3. The Oracle example is actually a bit more complicated than it could have been, as they use a /25 subnet mask. But ipcalc confirms that the address range they use are all well within the subnet:

 Address:   10.1.0.128            00001010.00000001.00000000.1 0000000
 Netmask:   255.255.255.128 = 25  11111111.11111111.11111111.1 0000000
 Wildcard:  0.0.0.127             00000000.00000000.00000000.0 1111111
 =>
 Network:   10.1.0.128/25         00001010.00000001.00000000.1 0000000 (Class A)
 Broadcast: 10.1.0.255            00001010.00000001.00000000.1 1111111
 HostMin:   10.1.0.129            00001010.00000001.00000000.1 0000001
 HostMax:   10.1.0.254            00001010.00000001.00000000.1 1111110
 Hosts/Net: 126                   (Private Internet)

This setup will have some interesting implications which I’ll describe a little later.

Part of the test was to find out how mature the port to Solaris on Intel was. So I decided to start off by installing Grid Infrastructure on node 1 first, and extend the cluster to node2 using the addNode.sh script in $ORACLE_HOME/oui/bin.

The installation uses 2 different accounts to store the Grid Infrastructure binaries separately from the RDBMS binaries. Operating system accounts are oragrid and oracle.

Oracle: uid=501(oracle) gid=30275(oinstall) groups=309(dba),2046(asmdba),2047(asmadmin)
OraGrid: uid=502(oragrid) gid=30275(oinstall) groups=309(dba),2046(asmdba),2047(asmadmin)

I started off by downloading files 1,2 and 3 of patch 10098816 for my platform. The ratio of downloads of this patch was 243 to 751 between x64 and SPARC. So not a massive uptake of this patchset for Solaris it would seem.

As the oragrid user I created user equivalence for RSA and DSA ssh-keys, a little utility will do this now for you, but I’m old-school and create the keys and exchanged them on the hosts myself. Not too bad a task on only 2 nodes.

The next step was to find out about the shared storage. And that took me a little while I admit freely: I haven’t used the EMC Power Path multipathing software before and found it difficult to approach, mainly for the lack of information about it. Or maybe I just didn’t find it, but device-mapper-multipath for instance is easier to understand. Additionally, the fact that this was Solaris Intel made it a little more complex. First I needed to know what the device names actually mean. As on Solaris SPARC, /dev/dsk will list the block devices, /dev/rdsk/ lists the raw devices. So there’s where I’m heading. Next I checked the devices, emcpower0a to emcpower9a. In the course of the installation I found out how to deal with these. First of all, on Solaris Intel, you have to create a partition of the LUN before it can be dealt with in the SPARC way. So for each device you would like to use, fdisk the emcpowerxp0 device, i.e.

# fdisk /dev/rdsk/emcpower0p0

If there is no partition, simply say “y” to the question if you want to use all of it for Solaris and exit fdisk. Otherwise, delete the existing partition (AFTER HAVING double/triple CHECKED THAT IT’S REALLY NOT NEEDED!) and create a new one of type “Solaris2”. It didn’t seem necessary to make it active.

Here’s a sample session:

bash-3.00# fdisk /dev/rdsk/emcpower0p0
No fdisk table exists. The default partition for the disk is:
a 100% "SOLARIS System" partition
Type "y" to accept the default partition,  otherwise type "n" to edit the partition table.
Y

Now let’s check the result:

bash-3.00# fdisk /dev/rdsk/emcpower0p0
Total disk size is 1023 cylinders
Cylinder size is 2048 (512 byte) blocks
Cylinders
Partition   Status    Type          Start   End   Length    %
=========   ======    ============  =====   ===   ======   ===
1       Active    Solaris2          1  1022    1022    100
SELECT ONE OF THE FOLLOWING:
1. Create a partition
2. Specify the active partition
3. Delete a partition
4. Change between Solaris and Solaris2 Partition IDs
5. Exit (update disk configuration and exit)
6. Cancel (exit without updating disk configuration)
Enter Selection: 6

bash-3.00#

This particular device will be used for my OCRVOTE disk group, that’s why it’s only 1G. The next step is identical on SPARC-start the format tool, select partition, change the fourth partition to use the whole disk (with an offset of 3 cylinders at the beginning of the slice) and label it. With that done, exit the format application.

This takes me back to the discussion of the emcpower-device name. The letters [a-p] refer to the slices of the device, while p stands for the partition. /dev/emcpowernc is slice 2 of the second multipathed device, in other words the whole disk. I usually create a slice 4 which translates to emcpowerne. After having completed the disk initialisation, I had to ensure that the ones I was working on were really shared. Unfortunately the emcpower devices are not consistently named across the cluster. What is emcpower0a on node1 turned out to be emcpower2a on the second node. How to check? The powermt tool to the rescue. Similar to “multipath –ll” on Linux the powermt command can show the underlying disks which are aggregated under the emcpowern pseudo device. So I wanted to know if my device /dev/rdsk/emcpower0e was shared. What I really was interested on was the native device:

# powermt display dev=emcpower0a | awk \
 > '/c[0-9]t/ {print $3}'
 c1t50000974C00A611Cd6s0
 c2t50000974C00A6118d6s0

Well, does that exist on the other node?

# powermt display dev=all | /usr/sfw/bin/ggrep -B8  c1t50000974C00A611Cd6s0
Pseudo name=emcpower3a
Symmetrix ID=000294900664
Logical device ID=0468
state=alive; policy=SymmOpt; priority=0; queued-IOs=0;
==============================================================================
--------------- Host ---------------   - Stor -   -- I/O Path --  -- Stats ---
###  HW Path               I/O Paths    Interf.   Mode    State   Q-IOs Errors
==============================================================================
3072 pci@39,0/pci1166,142@12/pci103c,1708@0/fp@0,0 c1t50000974C00A611Cd6s0 FA  8eA   active  alive       0      0

So yes it was there. Cool! I checked the 2 other OCR/voting disks LUNS and they were shareable as well. The final piece was to change the ownership of the devices to oragrid:asmdba and permissions to 0660.

Project settings

Another item to look at is the project settings for the grid owner and oracle. It’s important to set projects correctly, otherwise the installation will fail when ASM is starting. All newly created users inherit the settings from the default project. Unless the sys admins set the default project high enough, you will have to change them. To check the settings you can use the “prctl -i project default” call to check all the values for this project.

I usually create a project for the grid owner, oragrid, as well as for the oracle account. My settings are as follows for a maximum SGA size of around 20G:

projadd -c “Oracle Grid Infrastructure” ‘user.oracle’
projmod -s -K “process.max-sem-nsems=(privileged,256,deny)” ‘user.oracle’
projmod -s -K “project.max-shm-memory=(privileged,20GB,deny)” ‘user.oracle’
projmod -s -K “project.max-shm-ids=(privileged,256,deny)” ‘user.oracle’

Repeat this for the oragrid user, then log in as oragrid and check that the project is actually assigned:

# id -p oragrid
uid=223(oragrid) gid=30275(oinstall) projid=100(user.oragrid)

Installing Grid Infrastructure

Finally ready to start the installer! The solaris installation isn’t any different from Linux except for the aforementioned fiddling with the raw devices.

The installation went smoothly, I ran orainstroot.sh and root.sh without any problem. If anything, it was a bit slow, taking 10 minutes to complete root.sh on node1. You can tail the rootcrs_node1.log file in /data/oragrid/product/11.2.0.2/cfgtoollogs/crsconfig to see what’s going on behind the scenes. This is certainly one of the biggest improvements over 10g and 11g Release 1.

Extending the cluster

The MOS document I was alluding to earlier suggested, like I said, to have all the private NIC IP addresses in the same subnet. That isn’t necessarily to the liking of cluvfy. The communication over bnxe3 on both hosts fails, as shown in this example. Tests executed from node1:

bash-3.00# ping 192.168.0.1
192.168.0.1 is alive
bash-3.00# ping 192.168.0.2
192.168.0.2 is alive
bash-3.00# ping 192.168.0.3
192.168.0.3 is alive
bash-3.00# ping 192.168.0.4
^C
192.168.0.4 is not replying

Tests executed on node 2

bash-3.00# ping 192.168.0.1
192.168.0.1 is alive
bash-3.00# ping 192.168.0.2
^C
bash-3.00# ping 192.168.0.3
192.168.0.3 is alive
bash-3.00# ping 192.168.0.4
192.168.0.4 is alive

I decided to ignore this for now, and sure enough, the cluster extension didn’t fail. As I’m not using GNS, the command to add the node was

$ ./addNode.sh -debug -logLevel finest "CLUSTER_NEW_NODES={loninengblc208}" \
 CLUSTER_NEW_VIRTUAL_HOSTNAMES={loninengblc208-vip}"

This is actually a little more verbose than I needed, but it’s always good to be prepared for a SR with Oracle.

However, the OUI command will perform a pre-requisite check before the actual call to runInstaller, and that repeatedly failed, complaining about connectivity on the bnxe3 network. Checking the contents of the addNode.sh script I found an environment variable “$IGNORE_PREADDNODE_CHECKS” which can be set to “Y” to force the script to ignore the pre-requisite checks. With that set, the addNode operation succeeded.

RDBMS installation

This is actually not worthy to report, it’s pretty much the same as on Linux. However, a small caveat is specified to Solaris x86-64. Many files in the Oracle inventory didn’t have correct permissions. When launching runInstaller to install the binaries, I was bombarded with complaints about file permissions.

For example, oraInstaller.properties has the wrong permissions. Example for Solaris Intel:

# ls -l oraInstaller.properties
 -rw-r--r--   1 oragrid  oinstall     317 Nov  9 15:01 oraInstaller.properties

On Linux:

$ ls -l oraInstaller.properties
 -rw-rw---- 1 oragrid oinstall 345 Oct 21 12:44 oraInstaller.properties

There were a few more, I fixed them using these commands:

$ chmod 770 ContentsXML
$ chmod 660 install.platform
$ chmod 770 oui
$ chmod 660 ContentsXML/*
$ chmod 660 oui/*

Once the permissions were fixed the installation succeeded.

DBCA

Nothing to report here, it’s the same as for Linux.