Search

Top 60 Oracle Blogs

Recent comments

RAC Book

Pro Oracle Database 11g RAC on Linux is available for Kindle

Addmittedly I haven’t checked for a little while, but an email by my co-author Steve Show prompted me to go to the Amazon website and look it up.

And yes, it’s reality! Our book is now finally available as a kindle version, how great is that?!?

There isn’t really a lot more to say about this subject. I’ll wonder how many techies are intersted in the kindle version after the PDF has been out for quite a while. If you read this and decide to get the kindle version, could you please let me know how you liked it?  Personally I think the book is well suited for the Amazon reader as it’s mostly text which suits the device well.

Happy reading!

Summer Seminars

I am doing a couple of one day seminars with Oracle University, currently planned for Austria and Switzerland. They go by the title “Grid Intrastructure and Database High Availability Deep Dive”, and can be accessed via these links.

To save you from having to get the abstract, I copied it from the Oracle University website:

Providing a highly available database architecture fit for today’s fast changing requirements can be a complex task. Many technologies are available to provide resilience, each with its own advantages and possible disadvantages. This seminar begins with an overview of available HA technologies (hard and soft partitioning of servers, cold failover clusters, RAC and RAC One Node) and complementary tools and techniques to provide recovery from site failure (Data Guard or storage replication).

In the second part of the seminar, we look at Grid Infrastructure in great detail. Oracle Grid Infrastructure is the latest incarnation of the Clusterware HA framework which successfully powers every single 10g and 11g RAC installation. Despite its widespread implementation, many of its features are still not well understood by its users. We focus on Grid Infrastructure, what it is, what it does and how it can be put to best use, including the creation of an active/passive cold failover cluster for web and database resources. Special focus will be placed on the various storage options (Cluster File System, ASM, etc), the cluster interconnect and other implementation choices and on troubleshooting Grid Infrastructure. In the final part of the seminar, we explore Real Application Clusters and its various uses, from HA to scalability to consolidation. We discuss patching and workload management, coding for RAC and other techniques that will allow users to maximise the full potential of the package.

See you there if you are interested!

RAC One Node and Database Protection

An email from fellow Oak Table Member James Morle about RAC One Node and failover got me thinking about the capabilities of the product.

I have written about RON (Rac One Node) in earlier posts, but haven’t really explored what happens with session failover during a database relocation.

Overview

So to clarify what happens in these two scenarios I have developed a simple test. Taking a RON database + a service I modified both to suit my test needs. Connected to the service I performed a database relocation to see what happens. Next I killed the instance (I wasn’t able to reboot the node) t o simulate what happens when the node crashes.

Setup

The setup used an existing database, “RON”. It also had a service defined already, but that needed tweaking. The database was defined as follows:

$ srvctl config database -d RON
Database unique name: RON
Database name: RON
Oracle home: /u01/app/oracle/product/11.2.0.2
Oracle user: oracle
Spfile: +DATA/RON/spfileRON.ora
Domain: example.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: RON
Database instances:
Disk Groups: DATA
Mount point paths:
Services: RON_APP.example.com
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: RON
Candidate servers: node1,node2
Database is administrator managed

The service was initially defined as follows:

srvctl config service -d RON
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

Some of these attributes require special attention, we have 3 categories to deal with: preferred instances for TAF, the TAF setup itself and the runtime load balancing (RLB) advisory.

Transparent Application Failover with “real” RAC only works if there are two preferred instances. As you can see the service has only one preferred instance. Not sure if that can be changed though…Let’s try:

srvctl modify service -s RON_APP -d RON -i RON_1,RON_2
PRKO-2007 : Invalid instance name: RON_2

I wasn’t surprised-RON is not a RAC database so it has only 1 active instance. When registering the RAC One Node database you don’t add instances as you would with a RAC database, instead you set the database type to RACONENODE (srvctl add database -d name -c RACONEONE … )

I recommend setting TAF properties on the service level-that way you don’t miss crucial parameters in your tnsnames.ora file. This is the preferred way of doing it at least since 11.2. Changing the service is straight forward:

srvctl modify service -d RON -s RON_APP.example.com \
> -P BASIC -e SESSION -m BASIC

This piece of code instructs the service to use a BASIC TAF policy, a failover type of SESSION and the BASIC failover method. These parameters were normally be configured in the CONNECT_DATA section of your TNSNames.ora file.

With these changes made, the service configuration has changed to the below:

$ srvctl config service -d RON -s RON_APP.example.com
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

I also wanted to change the defaults to a more suitable RLB configuration. Instead of a CLB goal of “LONG” I wanted to set it up for “SHORT”. And I needed emphasis on SERVICE_TIME as well (my intention was to run swingbench). The change and resulting service configuration are shown below:

$ srvctl modify service -d RON -s RON_APP.example.com \
> -j short -B service_time

$ srvctl config service -d RON -s RON_APP.example.com
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

My SCAN name was scan1.example.com, and I ensured that the local_listener was pointing to my none default port of 1821 and the remote_lister was using the EZConnect synatax (“scan1.example.com:1825). It is very important to set the local_listener parameter if you are not using the default port of 1521!

On my client system I defined a local TNS alias “RON” to connect  to the RON database. Note that it doesn’t use any TAF parameters.

C:\oracle\product\11.2.0\client_1\network\admin>tnsping ron

TNS Ping Utility for 32-bit Windows: Version 11.2.0.2.0 - Production on 15-FEB-2011 17:07:44

Copyright (c) 1997, 2010, Oracle.  All rights reserved.

Used parameter files:
C:\oracle\product\11.2.0\client_1\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = scan1.example.com)(PORT = 1825)
)) (CONNECT_DATA = (SERVICE_NAME = LRON_APP.uk.db.com)))
OK (10 msec)

Database Relocation

Oracle’s promise is that you don’t lose your session during a database relocation. Let’s see if that is actually true. Using my setup I connected to the RON database (sorry for the broken formatting!):

C:\oracle\product\11.2.0\client_1\network\admin>sqlplus martin/test@ron

SQL*Plus: Release 11.2.0.2.0 Production on Tue Feb 15 16:50:08 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
------------------------------ ---------- ------------- ---
MARTIN                         BASIC      SESSION       NO

OK, so TAF is working. At this time I started the relocate command:

[RON_1]oracle@node1.example.com $ srvctl relocate database -d RON -n node2 -w 1 -v
Configuration updated to two instances
Instance RON_2 started

I then repeatedly required my SQL statement above while the instance was relocating. The output is shown below.

SQL> select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 1 MARTIN                         NONE       NONE          NO
 1 MARTIN                         BASIC      SESSION       NO
 2 MARTIN                         NONE       NONE          NO

SQL> select * from v$active_instances;

INST_NUMBER INST_NAME
----------- ------------------------------------------------------------
 1 node1.example.com:RON_1
 2 node2.example.com:RON_2

SQL> select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 1 MARTIN                         NONE       NONE          NO
 1 MARTIN                         BASIC      SESSION       NO
 2 MARTIN                         NONE       NONE          NO

As you can see the configuration change to 2 instances is reflected in the output of my query to v$active_instances. You also see the number of sessions increasing, pay attention to the inst_id colum: a new session is created on the second instance.

In my other session I saw the relocation completing:

Services relocated
Waiting for 1 minutes for instance RON_1 to stop.....
Instance RON_1 stopped
Configuration updated to one instance

I required one more time to see what happened-would my session survive?

SQL> /
select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN'
*
ERROR at line 1:
ORA-25408: can not safely replay call

SQL> /

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 2 MARTIN                         BASIC      SESSION       YES

SQL>
SQL> select * from v$active_instances;

INST_NUMBER INST_NAME
----------- ------------------------------------------------------------
 2 node2.example.com:RON_2

Well it did survice. Note the ORA-25408 error. That’s expected, since I’m using the SQL*Plus client I don’t have the opportunity to trap the error and replay my OCI call. You should capture this SQLException in Java or your preferred development environment. I have provided an example in chapter 11 of Pro Oracle Database 11g RAC on Linux.

Node failure

I couldn’t see how a session would survive in the case of a node failure… I said this in my email to James:

> I cannot see how TAF or FCF work in case of a node failure. In my
> tests I did for the book TAF only worked if there was a service with
> at least 2 preferred instances. And FCF requires a FAN aware
> connection pool which is rare to find. RAC one however only has only 1
> active node (unless you relocate it).

But better test before jumping to conclusions!

I could only kill an instance rather than the server which would have been a better test. I assumed Clusterware would try to restart the failed instance on the same node a few times and then relocate the resource if the stat was not successful.

I knew the database was now running on node 2 so I killed the SMON process. That sure results in an instance crash.

oracle@node2.example.com $ ps -ef | grep RON | grep smon
oracle   14208     1  0 16:52 ?        00:00:00 ora_smon_RON_2
$ kill -9 14208

Just as you would expect, the session didn’s survive this (how could it? There is no active second instance!)

SQL> /
select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN'
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 15649
Session ID: 183 Serial number: 3

And just as expected, Clusterware restarted the failed instance the second it detected the failure. The node’s alert log showed this:

System State dumped to trace file /u01/app/oracle/product/admin/RON/admin/diag/rdbms/RON/RON_2/trace/RON_2_diag_14174.trc
ORA-1092 : opitsk aborting process
2011-02-15 16:58:29.255000 +00:00
ORA-1092 : opitsk aborting process
License high water mark = 6
2011-02-15 16:58:32.569000 +00:00
Instance terminated by PMON, pid = 14164
USER (ospid: 16101): terminating the instance
Instance terminated by USER, pid = 16101
2011-02-15 16:58:35.286000 +00:00
Starting ORACLE instance (normal)
2011-02-15 16:58:36.477000 +00:00
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
2011-02-15 16:58:42.687000 +00:00
Private Interface ‘bond1:1′ configured from GPnP for use as a private interconnect.
[name='bond1:1',...
..., use=public/1]

Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0.2/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is enabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 – 64bit Production

Summary

That concludes my testing. RON works better than a “classic” active/passive cluster and allows sessions to stay connected when migrating the database to a different host. And it makes it easier to convert the database to a full RAC database (I have an example but it needs a bit of tidying up before posting). On the other hand, virtualisation technology has allowed us to do the same for quite some time now. Xen and OracleVM can relocate domUs, and vmotion is the commercial alternative. Whatever suits your needs.

Troubleshooting ora.net1.network on an 8 node cluster

It seems I am doing a lot of fixing broken stuff recently. So this time I have been asked to repair a broken 8 node RAC cluster on OEL 5.5 with Oracle RAC 11.2.0.2. The system has been moved into a different, more secure network, and its firewalls prevented all access to the machines except for ILO. Another way of “security through obscurity”. The new network didn’t allow any clients to connect to any of the 8 node RAC which means that it is actually quite expensive kit to sit idle. The cluster is not in production, it’s still being build to specification but this accessibility problem has been a holdup to the project for a little while now. Yesterday has been a breakthrough-the netops team found an error to their configuration and for the first time the hosts could be accessed via ssh. Unfortunately for me that access is possible via audited gateways using PowerBroker to which I don’t have access.An alternative was the ILO interface which has not yet been hardened to production standards. So after some discussion internally I was given the ILO access credentials. This is good and bad: good, because it was a thoroughly broken system, and bad because there is no copy and paste with a java based console. And if that wasn’t bad enough, I had to contend myself with 80×24 characters on the console (however in very big letters). I pretty much needed all of my 24″ screen to display it. But I digress.

When logging on, I found the following situation:

  • Only 1 out of 8 nodes had OHAS/CRSD started. The others were still down, a kernel upgrade has taken place, but the asmlib kernel module hasn’t been upgraded at the same time. The first node had the correct RPM installed and ASMLib has done its magic on this node
  • Clusterware’s lower stack was up. However the ora.net1.network and all resources depending on it (listener, scan, scan listener, etc) were down. Not a single byte went over the public interconnect. That was strange.

Running /sbin/ifconfig has been a dream on this machine – I saw all 3 SCAN IPs on it, and all 8 node virtual IP addresses. Plus it has 6 NICs for Oracle, bonded into pairs of 2. And this is exactly where the confusion starts. I found the following bonded interfaces defined:

  • bond0
  • bond1.251
  • bond0.212

It took a while to figure out why these interfaces were named as they were, but apparently the suffix is a VLAN name. It also filtered through that one of my colleagues has tried to replace the previously used bond0.212 with bond0 as the public interconnect. He was however not successful in doing so, leaving the cluster in the state it was in.

He used the following commands to update the public interface:

$ oifcfg getif
bond1.251  172.xxx.0  global  cluster_interconnect
bond0  10.2xxx8.0  global  public

He also changed the vip configuration, with the result shown here:

srvctl config vip -n node11
VIP exists: /node1-vip/10.2xx8.13/10.2xx8.0/255.255.255.0/bond0, hosting node node11

However

The VIP however remained unimpressed:

srvctl start vip -n node1
PRCR-1079 : Failed to start resource ora.node1.vip
CRS-2674: Start of 'ora.net1.network' on 'node1' failed
CRS-2632: There are no more servers to try to place resource 'ora.node1.vip' on that would satisfy its placement policy

That’s where I have been asked to cast a keen eye over the installation.

The Investigation

First of all I could find nothing wrong with what has been done so far. So starting my investigation I first thought there was something wrong with the public network so I decided to shut it down:

# ifdown bond0

I then checked the network configuration of /etc/sysconfig/network-scripts. The setting is shown here:

ifcfg-bond0

device=bond0
bonding_opts="use_carrier=0 miimon=0 mode=1 arp_interval=10000 arp_ip_target=10.xxx.4 primary=eth0"
bootproto=none
onboot=yes
network=10.2xxx.0
netmask=255.255.254.0
ipaddr=10.xxx.2
userctl=no

ifcfg-eth0

device=eth0
hwaddr=f4:ce:46:87:fa:d0
bootproto=none
onboot=yes
master=bond0
slave=y
userctl=no

ifcfg-eth1

device=eth1
hwaddr=f4:ce:46:87:fa:d4
bootproto=none
onboot=yes
master=bond0
slave=yes
userctl=no

The MAC addresses of ifcfg-eth* matched the output from the ifconfig command. In the lab I occasionally have the problem that my configurartion files don’t match the real MAC addresses and therefore my NICs don’t come up. But this wasn’t the case here.

I then checked if the kernel module is loaded correctly. Usually you’d find that in /etc/modprobe.conf but there was not entry. I added these lines as per the documentation:

alias bond0 bonding
alias bond1 bonding
alias bond1.251 bonding

With that all done I brought the bond0 interface back up (don’t ever try to bring down the private interconnect-it will cause a node eviction!). Still nothing. The output of crsctl status resource -t remained “OFFLINE” for resource ora.net1.network. BTW, you cannot manually start that a network resource using srvctl (it’s an ora.* resource so don’t even think about trying crsctl start resource ora.net1.network :). All you can do with a network resource is to get its configuration (srvctl config network -k 1…) and modify it (srvctl modify network -k 1…)

ORAROOTAGENT is responsible for starting the network, and it will try to do so every second or so. That’s CRSD’s ORAROOTAGENT by the way, the log file is in $GRID_HOME/log/`hostname -s`/agent/crsd/orarootagent_root/orarootagent_root.log.

After the modification to bond0 I could now ping the IP associated with bond0 so at least that was a success. One thing I learned that day is that the MAC address of the bonded NIC matches the primary eth* interface’s NIC, in my case it was that of eth0, i.e. f4:ce:46:87:fa:d0. If one of the enslaved NICs failed it would probably assume the failback NIC’s MAC address. So in summary:

  • the network bonding was correctly configured
  • I could ping bond0

At this point I could see no reason why starting of the network failed. Maybe a typo in the configuration? The network configuration can be queried with 2 commands: oifcfg and servctl config network. So I tried oifcfg first.oifcfg getif returns:

bond0 10.xx.x2.0           "good"
bond0 10.xx.x8.0           "old/bad"
bind1.251 172.xx.xx.160    interconnect
bind1.251 169.254.0.0

Hmmm, where’s that second bond0 interface from? The bond1.251 interface is in use and working, the 172.xxx IP matches the IP address assigned in ifcfg-bon1.251. The second entry for bind1.251 is created by the HAIP resource and has to do with the high available cluster interconnect which uses multicasting for communication (to the frustration of many users who upgraded to 11.2.0.2 only to find out that the lower stack doesn’t start on the second and other nodes).

So to be sure that I was seeing something unusual I compared the output with another node on the cluster. There I found I only have 3 interfaces …. bond0 and bond1 + the UDP multicast address. I initially tried to remove the bad network with oifcfg delif but that didn’t work. I then verified the output of srvctl config network to see if it matched what I expected to. And here was a surprise: the output of the network listed a wrong subnet mask. Instead of 255.255.254.0 (note the “254″!) i found 255.255.255.0. That was easy to fix and while I was back again trying to delete the old network using oifcfg I suddenly realised that the cluster has sprung back into life. Small typo-big consequences! Finally all the resources depending on ora.net1.network were started, including SCAN VIPs, SCAN listeners, listeners, VIPs…

References for NIC bonding on RHEL5

Getting up and running with Universal Connection Pool

Oracle’s next generation connection pooling solution, Universal Connection Pool, can be a bit tricky to set up. This is especially true when a JNDI data source is to be used-most example don’t assume such a scenario. A lot of information is out there on the net, but no one seems to have given the full picture. During the research for chapter 11 of “Pro Oracle Database 11g RAC on Linux” I learned this the hard way. Since the book has been published, a few minor changes changes have been made to the software I used at the time, and those merit an update. Please note that this article’s emphasis is to get  this example running-it is by no means meant to be secure enough for a production release! You need to harden  the setup considerably for production, but it serves well for demonstration purposes (only).

THE SETUP

I have used a four node 11.2.0.2 RAC system as the source for my data. A 2 node cluster database with service “TAFTEST” runs on nodes 1 and 2. It’s administrator-managed and the service has both nodes set aside as “preferred” nodes. The database nodes run Oracle Enterprise Linux 5.564bit with RAC 11.2.0.2. For the sake of simplicity, I used my Windows laptop to host the Tomcat instance, which is now updated to version 6.0.30. I am using apache Ant to build the application. The current stable ant build is 1.8.2. My JDK is also upgraded to the latest and greatest, version 1.6.0_23. I am using the 32bit 11.2.0.2 client package to supply me with ons.jar, ojdbc6.jar and ucp.jar.ORACLE CLIENT

Part of this excercise is to demonstrate FCF and FAN events, which means we need an Oracle client for the remote ONS configuration (please refer to chapter 11 of the RAC book for a detailed description of local vs remote ONS configurations). I downloaded the 11.2.0.2 32bit client from support.oracle.com for Windows and installed it to c:\oracle\product\11.2.0\client_1, chosing the administrator option in Oracle Universal Installer.

TOMCAT

Start by downloading Tomcat for your platform-I have successfully tested Tomcat on Linux and Windows. I deployed apache-tomcat-6.0.30 to c:\ for this test. Once it’s unzipped, copy the necessary JAR files from the Oracle client installation into %TOMCAT_HOME%\lib. These are ojdbc6.jar, ons.jar and ucp.jar. Next, you should set a few environment variables. To keep things simple, I edited  %tomcat_home%\bin\startup.sh and added these:

  • set JAVA_HOME=c:\program files\Java\jdk1.6.0_23
  • set JAVA_OPTS=-Doracle.ons.oraclehome=c:\oracle\product\11.2.0\client_1

I’m also interested in the final content of %JAVA_OPTS%, so I modified catalina.bat as well and added this line into the section below line 164:

echo Using JAVA_OPTS: “%JAVA_OPTS%”

Finally we need to add a user with access to the tomcat manager application. Edit %tomcat_home%\conf\tomcat-users.xml and create and entry similar to this one:


This is one of the major differences-starting with tomcat 6.0.30, the previous “manager” role has been split into the ones shown above to protect from xref attacks. It took me a while to discover the reason for all the http 403 errors I got when I tried to deploy my application… You’d obviously use a strong password here!

This concludes the TOMCAT setup.

ANT

Ant is a build tool used for deployment and compiling the sample application I am going to adapt. Simply download the zip file (version 1.8.2 was current when I wrote this) and deploy it somewhere conveniently. Again, a few environment variables are helpful which I usually put into a file called sp.bat:

@echo off
set ant_home=c:\apache-ant-1.8.2
set path=%ant_home%\bin;%path%
set java_home=c:\program files\Java\jdk1.6.0_23
set path=%java_home%\bin;%path%

This makes building the application a lot easier. Just change into the application directory and enter sp to get set up.

BUILDING AN APPLICATION

My earliest contact with Tomcat was a long time ago with version 3 and since then I remember the well written documentation. The “docs” application, usually accessible as http://localhost:8080/docs has a section about the first application. It’s highly recommended to read it: http://localhost:8080/docs/appdev/index.html.

To get started, I copied the “sample” directory from %tomcat_home%\webapps\docs\appdev to c:\temp and started adapting it. First of all my sp.bat script is copied in there. With a command prompt I changed into sample and edited the build.xml file. The first major section to go over is starting in line 129. I changed it to read like this:

 
 
 
 
 
 
 
 
 
 
 
 

The properties to change/add are app.name, app.version, catalina.home, manager.username and manager.password. The manager.* properties will come in very handy later as they allow us to automatically deploy the compiled/changed application.

With all this done, try to compile the application as it is:

C:\TEMP\sample>ant compile
Buildfile: C:\TEMP\sample\build.xml
Trying to override old definition of datatype resources

prepare:

compile:
 [javac] C:\TEMP\sample\build.xml:301: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

BUILD SUCCESSFUL
Total time: 0 seconds

C:\TEMP\sample>

That’s it-the application has been created. You can now start tomcat and try to deploy the app. Open a cmd window, change to %tomcat_home%\bin and execute startup.bat. This will start tomcat-the new window shows any problems it might have when deploying application. This window is very useful for example when it comes to troubleshoot incorrect XML configuration files.

Now the next step is to use the ant target “install” to test the installation of the web archive. I changed the install target slightly in the way that I depend on the “dist” target to be completed before deployment to the tomcat server. My modified working install target is defined as this in build.xml:

 

 
 

Try invoking it as shown in the example below:

C:\TEMP\sample>ant install
Buildfile: C:\TEMP\sample\build.xml
Trying to override old definition of datatype resources

prepare:

compile:
 [javac] C:\TEMP\sample\build.xml:301: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=las
t; set to false for repeatable builds

javadoc:
 [javadoc] Generating Javadoc
 [javadoc] Javadoc execution
 [javadoc] Loading source files for package mypackage...
 [javadoc] Constructing Javadoc information...
 [javadoc] Standard Doclet version 1.6.0_23
 [javadoc] Building tree for all the packages and classes...
 [javadoc] Building index for all the packages and classes...
 [javadoc] Building index for all classes...

dist:

install:
 [deploy] OK - Deployed application at context path /ucp

BUILD SUCCESSFUL
Total time: 1 second

C:\TEMP\sample>

This operation succeeded. You should also see a line in your tomcat window:

INFO: Deploying web application archive ucp.war

When pointing your browser to http://localhost:8080/ucp/ you should be greeted by the familiar tomcat sample application.

ADDING THE DATASOURCE

So far this hasn’t been groundbreaking at all. It’s only now that it’s getting more interesting: the JNDI data source needs to be defined and used in our code. Instead of messing around with resources and res-ref configuration in the global %tomcat_home%\conf directory it is advisable to add the context to the application.

Back in directory c:\temp\sample create a new directory web\META-INF. Inside META-INF you create a file “context.xml” which takes the JNDI data source definition:


 

It really is that simple to implement-if it only were equally simple to find how to do this… The next step is to modify the Hello.java Servlet to reference the JNDI data source. The code for the servlet is shown below-it’s basically the existing servlet code amended with the JNDI and JDBC relevant code. It actually does very little: after looking the JDNI name up it grabs a session from the pool and checks which instance it is currently connected to. It then releases all resources and exits.


/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package mypackage;

import java.sql.Connection;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.ResultSet;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.PoolDataSource;
import javax.naming.*;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.Enumeration;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public final class Hello extends HttpServlet {

public void doGet(HttpServletRequest request,
 HttpServletResponse response)
 throws IOException, ServletException {

 response.setContentType("text/html");
 PrintWriter writer = response.getWriter();

 writer.println("");
 writer.println("");
 writer.println("Sample Application Servlet Page");
 writer.println("");
 writer.println("");

 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("
"); writer.println(""); writer.println(""); writer.println("

Sample Application Servlet

"); writer.println("This is the output of a servlet that is part of"); writer.println("the Hello, World application."); writer.println("
"); // this is the UCP specific part! writer.println("

UCP

"); try { Context ctx = new InitialContext(); Context envContext = (Context) ctx.lookup("java:/comp/env"); javax.sql.DataSource ds = (javax.sql.DataSource) envContext.lookup ("jdbc/UCPPool"); writer.println("Got the datasource"); Connection conn = ds.getConnection(); writer.println("

Connected to an Oracle intance

"); Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery("select 'Hello World from '||sys_context('userenv','instance_name') from dual"); while (rs.next()) { writer.println(rs.getString(1)); } rs.close(); stmt.close(); conn.close(); conn = null; } catch (Exception e) { writer.println("
" + e + "

");
}

writer.println("

");
writer.println("");
}
}

Done! Now let’s compile the code and deploy it to Tomcat before testing. The most common problem I get with the code is a JNDI error, stating the “jdbc/UCPPool” is not defined. This can happen in 2 cases:

  • You have a typo in the resource definition, in which case the context really doesn’t exist (it’s case sensitive)
  • You have non-compatible line breaks in the context.xml file. In this case I’d try having all the contents of the file in 1 line (use “J” in vi to join lines together)

VALIDATION

You should now see a number of sessions as user “scott” against the database-query gv$session for username “SCOTT” and you should see the result of your hard work.

5 out of 5 for Pro Oracle Database 11g RAC on Linux!

For a long time I didn’t dare look at Amazon reviews for “Pro Oracle Database 11g RAC on Linux”, the book Steve Shaw and I worked on for the greatest part of the last 18 months. Well I actually joined him in March this year to be fair.

So now Steve sent me an email to notify me of some great reviews on the Amazon USA website. How cool was this? Have a look at the screenshot I took:

I would have never thought of such great reviews.

  • one of the “must have” books on Oracle RAC 11g, November 13, 2010
  • Excellent Resource for RAC Admins, November 18, 2010
  • Just when you start to think that no one book can “have it all”, December 4, 2010
  • Another RAC Gem by Steve Shaw and Martin Bach., December 14, 2010

Thank you all so much for this and purchasing the book. The link to the book and the reviews (very thorough and worth reading!) is here:

http://www.amazon.com/Pro-Oracle-Database-11g-Linux/dp/1430229586/ref=cm_cr_pr_product_top

Oracle RAC One Node revisited – 11.2.0.2

Since we published the RAC book, Oracle has released patchset 11.2.0.2. Amongst other things, this improved the RAC One Node option, exactly the way we expected.

How it was

A quick recap on the product as it was in 11.2.0.1: RAC One Node is part of Oracle Enterprise Edition, any other software editions are explicitly not allowed. Another restriction exists for 3rd party Clusterware: it’s not allowed to use one. RAC One Node is a hybrid between full blown RAC and the active/passive cluster. The option uses Grid Infrastructure for cluster management and storage provisioning via ASM. The RAC One instance starts its life as a RAC database, limited to only one cluster node. It only ever runs on one node only, but that node can change. It is strongly recommended to create a service for that RAC database. Utilities such as raconeinit provide a text based command line interface to transform that database to a “RAC One Node”-instance. In the process, the administrator can elect which nodes should be allowed to run the instance. The “omotion” utilities allowed the DBA to move the RAC One Node instance from the current node to another one. Optionally a time threshold could be set after which all ongoing transactions were to move to the new node. This feature required TAF or FAN to be set up correctly. The raconestatus utility allowed you to view the status of your RAC One Node instances. Conversion to full RAC was made possible by the racone2rac utility.

If you were after a Data Guard setup you’d be disappointed: that wasn’t (and AFAIK still is not) supported.

So all in all, that seemed a little premature. A patch to be downloaded and applied, no Data Guard and a new set of utilities are not really user friendly. Plus, initially this patch was available for Linux only. But at least a MOS note (which I didn’t find until after having finished writing this!) exists, RAC One — Changes in 11.2.0.2 [ID 1232802.1]

Changes

Instead of having to apply patch 9004119 to your environment, RAC One Node is available “out of the box” with 11.2.0.2. Sadly, the Oracle RAC One Node manual has not been updated, and searches on Metalink reveal no new information. One interesting piece of information: the patch for RAC One Node is listed as “undocumented Oracle Server” section.

The creation of a RAC One Node instance has been greatly simplified-dbca has added support for it, both from the command line for silent installations as well as the interactive GUI. Consider these options for dbca.

$ dbca -help
dbca  [-silent | -progressOnly | -customCreate] {  }  |
 { [ [options] ] -responseFile   }
 [-continueOnNonFatalErrors ]
Please refer to the manual for details.
You can enter one of the following command:

Create a database by specifying the following parameters:
-createDatabase
 -templateName 
 [-cloneTemplate]
 -gdbName 
 [-RACOneNode
 -RACOneNodeServiceName  ]
 [-policyManaged | -adminManaged ]
 [-createServerPool ]
 [-force ]
 -serverPoolName 
 -[cardinality ]
 [-sid ]
...

With RAC One Node you will most likely end up with a policy managed database in the end, I can’t see how an admin managed database made sense.

The srvctl command line tool has been improved to deal with the RAC One node. The most important operations are to add, remove, config and status. The nice thing about dbca is that it actually registers the database in the OCR. Immediately after the installation, you see this status information:

$ srvctl status database -d rontest
Instance rontest_1 is running on node node2
Online relocation: INACTIVE

$ srvctl config database -d rontest
Database unique name: rontest
Database name:
Oracle home: /data/oracle/product/11.2.0.2
Oracle user: oracle
Spfile: + DATA/rontest/spfilerontest.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: rontest
Database instances:
Disk Groups: DATA
Mount point paths:
Services: rontestsrv
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: rontest
Candidate servers: node2,node3
Database is administrator managed

Note that the instance_name, although the instance is administrator managed, changed to $ORACLE_SID_1. Relocating works now with the srvctl relocate database command as in this example:

$ srvctl relocate database -d rontest -n node2

You’ll get feedback about this in the output of the “status” command:

$ srvctl status database -d rontest
Instance rontest_1 is running on node node2
Online relocation: ACTIVE
Source instance: rontest_1 on node2
Destination instance: rontest_2 on node3

After the command completed, check the status again:

srvctl status database -d rontest
Instance rontest_2 is running on node node2
Online relocation: INACTIVE

The important difference between an admin managed database and a policy managed database is that you are responsible for undo tablespaces. If you don’t create and configure undo tablespaces, the relocate command will fail:

$ srvctl relocate database –d rontest -n node3                       <
PRCD-1222 : Online relocation of database rontest failed but database was restored to its original state
PRCD-1129 : Failed to start instance rontest_2 for database rontest
PRCR-1064 : Failed to start resource ora.rontest.db on node node3
CRS-5017: The resource action "ora.rontest.db start" encountered the following error:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use
Process ID: 1587
Session ID: 35 Serial number: 1

CRS-2674: Start of 'ora.rontest.db' on 'node3' failed

In this case, the database runs on the same node. Check the ORACLE_SID (rontest_2 in my case) and modify the initialisation parameter.

SQL> select tablespace_name from dba_data_files where tablespace_name like ‘%UNDO%’;

TABLESPACE_NAME
------------------------------
UNDOTBS1
UNDOTBS2

So the tablespace was there, but the initialisation parameter was wrong! Let’s correct this:

SQL> alter system set undo_tablespace='UNDOTBS1' sid='rontest_1';

System altered.

SQL> alter system set undo_tablespace='UNDOTBS2' sid='rontest_2';

System altered.

Now the relocate will succeed.

To wrap this article up, the srvctl convert database command will convert between single instance, RAC One Node and RAC databases.

Kindle version of Pro Oracle Database 11g RAC on Linux

I had a few questions from readers whether or not there was going to be a kindle version of Pro Oracle Database 11g RAC on Linux.

The good news for those waiting is: yes! But it might take a couple of weeks for it to be released.

I checked with Jonathan Gennick who expertly oversaw the whole project and he confirmed that Amazon have been contacted to provide a kindle version.

As soon as I hear more, I’ll post it here.

Pro Oracle Database 11g RAC on Linux is out

So it has finally happened, the day many of you who patiently waited for. Amazon and other publishers are fulfilling pre-orders! The book I helped publish, Pro Oracle Database 11g RAC on Linux is out in print.

I received my author copies last week, and was very happy to see that after the book was out in the USA first, the “pre-order now” recommendation has been removed from Amazon’s UK and German websites when I last checked.

I have to give great credit to all of those who helped me make this possible with extra input and pointers to related information. On the Linux part Steve has done an incredible job compiling all this useful information you don’t find in this form anywhere else.

From now on, feedback and new research with specific reference to the book will appear on my wordpress blog (the one you are reading now), in a separate category-”RAC Book”.

So thanks again for all your patience while we were working very hard to finish the book.