Top 60 Oracle Blogs

Recent comments

February 2011

DB Optimizer diagrams 100 tables

People ask me what the limit is to the number of tables  in a SQL statement  that DB Optimizer can handle when tuning.  Above is a tiny thumbnail for a 100 table SQL statemement. The SQL actually only has a select from one view  but that view expands to 100 different tables. Pretty awesome.

Scientific Method

Troubleshooting on an 8 node cluster

It seems I am doing a lot of fixing broken stuff recently. So this time I have been asked to repair a broken 8 node RAC cluster on OEL 5.5 with Oracle RAC The system has been moved into a different, more secure network, and its firewalls prevented all access to the machines except for ILO. Another way of “security through obscurity”. The new network didn’t allow any clients to connect to any of the 8 node RAC which means that it is actually quite expensive kit to sit idle. The cluster is not in production, it’s still being build to specification but this accessibility problem has been a holdup to the project for a little while now. Yesterday has been a breakthrough-the netops team found an error to their configuration and for the first time the hosts could be accessed via ssh. Unfortunately for me that access is possible via audited gateways using PowerBroker to which I don’t have access.An alternative was the ILO interface which has not yet been hardened to production standards. So after some discussion internally I was given the ILO access credentials. This is good and bad: good, because it was a thoroughly broken system, and bad because there is no copy and paste with a java based console. And if that wasn’t bad enough, I had to contend myself with 80×24 characters on the console (however in very big letters). I pretty much needed all of my 24″ screen to display it. But I digress.

When logging on, I found the following situation:

  • Only 1 out of 8 nodes had OHAS/CRSD started. The others were still down, a kernel upgrade has taken place, but the asmlib kernel module hasn’t been upgraded at the same time. The first node had the correct RPM installed and ASMLib has done its magic on this node
  • Clusterware’s lower stack was up. However the and all resources depending on it (listener, scan, scan listener, etc) were down. Not a single byte went over the public interconnect. That was strange.

Running /sbin/ifconfig has been a dream on this machine – I saw all 3 SCAN IPs on it, and all 8 node virtual IP addresses. Plus it has 6 NICs for Oracle, bonded into pairs of 2. And this is exactly where the confusion starts. I found the following bonded interfaces defined:

  • bond0
  • bond1.251
  • bond0.212

It took a while to figure out why these interfaces were named as they were, but apparently the suffix is a VLAN name. It also filtered through that one of my colleagues has tried to replace the previously used bond0.212 with bond0 as the public interconnect. He was however not successful in doing so, leaving the cluster in the state it was in.

He used the following commands to update the public interface:

$ oifcfg getif
bond1.251  global  cluster_interconnect
bond0  10.2xxx8.0  global  public

He also changed the vip configuration, with the result shown here:

srvctl config vip -n node11
VIP exists: /node1-vip/10.2xx8.13/10.2xx8.0/, hosting node node11


The VIP however remained unimpressed:

srvctl start vip -n node1
PRCR-1079 : Failed to start resource
CRS-2674: Start of '' on 'node1' failed
CRS-2632: There are no more servers to try to place resource '' on that would satisfy its placement policy

That’s where I have been asked to cast a keen eye over the installation.

The Investigation

First of all I could find nothing wrong with what has been done so far. So starting my investigation I first thought there was something wrong with the public network so I decided to shut it down:

# ifdown bond0

I then checked the network configuration of /etc/sysconfig/network-scripts. The setting is shown here:


bonding_opts="use_carrier=0 miimon=0 mode=1 arp_interval=10000 primary=eth0"





The MAC addresses of ifcfg-eth* matched the output from the ifconfig command. In the lab I occasionally have the problem that my configurartion files don’t match the real MAC addresses and therefore my NICs don’t come up. But this wasn’t the case here.

I then checked if the kernel module is loaded correctly. Usually you’d find that in /etc/modprobe.conf but there was not entry. I added these lines as per the documentation:

alias bond0 bonding
alias bond1 bonding
alias bond1.251 bonding

With that all done I brought the bond0 interface back up (don’t ever try to bring down the private interconnect-it will cause a node eviction!). Still nothing. The output of crsctl status resource -t remained “OFFLINE” for resource BTW, you cannot manually start that a network resource using srvctl (it’s an ora.* resource so don’t even think about trying crsctl start resource :). All you can do with a network resource is to get its configuration (srvctl config network -k 1…) and modify it (srvctl modify network -k 1…)

ORAROOTAGENT is responsible for starting the network, and it will try to do so every second or so. That’s CRSD’s ORAROOTAGENT by the way, the log file is in $GRID_HOME/log/`hostname -s`/agent/crsd/orarootagent_root/orarootagent_root.log.

After the modification to bond0 I could now ping the IP associated with bond0 so at least that was a success. One thing I learned that day is that the MAC address of the bonded NIC matches the primary eth* interface’s NIC, in my case it was that of eth0, i.e. f4:ce:46:87:fa:d0. If one of the enslaved NICs failed it would probably assume the failback NIC’s MAC address. So in summary:

  • the network bonding was correctly configured
  • I could ping bond0

At this point I could see no reason why starting of the network failed. Maybe a typo in the configuration? The network configuration can be queried with 2 commands: oifcfg and servctl config network. So I tried oifcfg first.oifcfg getif returns:

bond0 10.xx.x2.0           "good"
bond0 10.xx.x8.0           "old/bad"
bind1.251 172.xx.xx.160    interconnect

Hmmm, where’s that second bond0 interface from? The bond1.251 interface is in use and working, the IP matches the IP address assigned in ifcfg-bon1.251. The second entry for bind1.251 is created by the HAIP resource and has to do with the high available cluster interconnect which uses multicasting for communication (to the frustration of many users who upgraded to only to find out that the lower stack doesn’t start on the second and other nodes).

So to be sure that I was seeing something unusual I compared the output with another node on the cluster. There I found I only have 3 interfaces …. bond0 and bond1 + the UDP multicast address. I initially tried to remove the bad network with oifcfg delif but that didn’t work. I then verified the output of srvctl config network to see if it matched what I expected to. And here was a surprise: the output of the network listed a wrong subnet mask. Instead of (note the “254″!) i found That was easy to fix and while I was back again trying to delete the old network using oifcfg I suddenly realised that the cluster has sprung back into life. Small typo-big consequences! Finally all the resources depending on were started, including SCAN VIPs, SCAN listeners, listeners, VIPs…

References for NIC bonding on RHEL5

Where’s my money gone? Update…

Followers of the blog will know I’ve had a little trouble with Oracle Norway randomly taking money off me for no reason.

Today I got the money refunded, but there was a snag. I was refunded less than the amount that was taken. I’m guessing this has something to do with exchange rates etc. So as it stands I am about £40 out of pocket, which is significantly better than the several thousand pounds I was before.

As you can image, I sent an email off this morning asking for the missing money. Let’s see how quickly that is dealt with.



On the Topic of Copyright

February 11, 2011 (Updated February 25, 2011) On August 26, 2010, just before cancelling my first three month old order for the “Oracle Tuning: The Definitive Reference Second Edition” book, I wrote the following in a blog article (side note: the second order for the book is now more than three months old, and I have [...]

Nokia signs its own death warrant?

I’m really not sure what to think about the collaboration between Nokia and Microsoft. Prior to my recent switch to HTC, I’ve always used Nokia phones, so I have a soft spot for the company, but this recent announcement has me in two minds.

My first reaction was this move is a complete disaster for Nokia and a big bonus for Microsoft. Nokia ships a serious amount of phones, so Microsoft will quickly get some impressive numbers, which is great for them, but what do Nokia get out of it? Currently it seems they get a mobile platform that nobody really wants or cares about.

My second reaction was maybe this is the right move for Nokia. Symbian has a lot of the market share at the moment, but it is going down hill very quickly. They need to make a move, but where to go? If they go Android they will be just another manufacturer in the mix. They would be better off than they are now, but could they dominate this market? If they go Windows they could mark themselves out as the dominant force in this market. The other offerings in this space look rather weak. As Windows Phone develops, with Nokia’s help, maybe this could be a very attractive market.

Of course, only time will tell, but I know one thing. As the mobile OS market currently stands, I won’t be buying a Nokia phone running Windows Phone.



Direct NFS (DNFS) Clonedb…

A bit before Christmas I got an email from Kevin Closson asking me to take a look at a new undocumented Direct NFS (DNFS) feature in the patch set. I think he wanted to see what a regular DBA would think of it. What with Christmas and some family issues, I didn’t get too much done. As soon as I hit the first hurdle I kinda caved in and left if for the new year.

Well, January came and went, then I finally got round to looking at it again. I like to think my constant questions and dumb mistakes has helped to prepare Oracle for the sort of thing that will happen when other idiots like me are let loose on it. Anyway, the result of that little journey is documented here.

As I’ve said in the article, things are still in a state of flux and I will no doubt have to do some alterations once the My Oracle Support (MOS) Note 1210656.1 is released that will properly document it.

So as a regular DBA what do I think? I think it is awesome!

Old-style cloning of databases isn’t hard, but it’s boring and can take ages depending on the size of the database and storage being used. It’s one of those tasks that always makes me sigh, before I get off my ass and start it. Clonedb turns that on its head because it is really quick and simple. There is a bit of setup, but that is really going to be a one-time thing on most servers. You are doing your backups anyway, so there is no big deal there. Now you can just run a script and bang, you have a running clone.

I think this is going to please a lot of DBAs out there!




Warning – make sure you read to the end of this post.

Someone sent me an email this morning asking how Oracle calculates the index cardinality of an index range scan. Now, as I’ve often said, I don’t do private email about Oracle – unless it just happens to catch my attention and looks like something that is sufficiently generic to be worth publishing.

Today’s emailer was a lucky winner – he’d sent me a very short email that took about 30 seconds to read, contained a significant error, and (at first sight) probably had the right information in it for me to work with. Here’s the problem, as a cut-n-paste from an SQL*Plus session:

SQL> explain plan for select * from admlxa.QRT_BENCH where QRT_BENCH_DATE < :a3;

Elapsed: 00:00:00.01 

SQL> select * from table(dbms_xplan.display); 

Plan hash value: 2896103184 

| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT            |                 |   424K|    81M|  3170   (4)| 00:00:05 |
|   1 |  TABLE ACCESS BY INDEX ROWID| QRT_BENCH       |   424K|    81M|  3170   (4)| 00:00:05 |
|*  2 |   INDEX RANGE SCAN          | IDX_QRT_BENCH_1 | 78876 |       |   303   (5)| 00:00:01 |

Predicate Information (identified by operation id):
   2 - access("QRT_BENCH_DATE"<:A3) 

Look closely at the “Rows” column – there’s clearly a logic error appearing here. If you select 78,876 rowids from an index you can’t possibly acquire 424,000 rows from the table – so where have those two numbers come from ?

The supporting information looked like this:

num_rows for QRT_BENCH table = 8480798
num_distinct for QRT_BENCH_DATE column = 458 

num_rows for IDX_QRT_BENCH_1 = 8763975
distinct_keys for IDX_QRT_BENCH_1 = 537

Of course, I really needed to know whether this was a single-column or multi-column index if I wanted to model the problem correctly and do further checks on when the error appeared, but this was good enough to get started. We note, first, that the 424K for the table cardinality is just the standard “5% due to range predicate with unknown value”: round(0.05 * 8480798) = 424040.

    Step 1: since I don’t know where the number 78876 comes from, let’s try to work backwards – use it in a bit if arithmetic and see what drops out. Let’s try dividing it into the table cardinality: 424040 / 78876 = 5.3760

    Step 2: Does 5.376 look familiar — it may be a coincidence, but isn’t that really close to 1% of the number of distinct keys in the index ?

At this point I don’t have any time to investigate in detail, but a scratch hypothesis is that Oracle is calculating something like: 5 * (number of rows in table / (number of distinct keys in index)); and maybe that magic five appears through a piece of code that takes 5%, but for some reason then divides by the 1% associated with the selectivity normally associated with function(col).

If I had the time (and the data set) I’d start playing with dbms_stats.set_index_stats() to see the effect of changing num_distinct and num_rows to see if my initial guess was somewhere in the right ballpark. As it is I’ve just emailed this note back to the source.


Before I published this note I got a reply from the original correspondent, with the following comment:

Sure enough – five years ago I had published some details about exactly this “feature” – and that 537 vs. 5.376 really was just a coincidence.

I decided to publish the note anyway for three reasons –

    one: to make sure you realise that I do make mistakes
    two: to show you that simple games with numbers may give you a working hypothesis
    three: to remind you that once you’ve got a working hypothesis it’s often easy to think of ways to demonstrate that your hypothesis is wrong. (A couple of hacks of the statistics would have shown me a constant 0.009 appearing, rather than anything like 5% divided by 1%.)


I’ve been ignoring the “social network” phenomenon for a long time but I’ve finally given in and, as some people have already discovered, signed up to LinkedIn. I don’t know whether this is a good, bad, or pointless thing to do, but I thought I’d give it a go for a while and see what happens.

At present, though, I’m only linking to people who fall into one of several restrictive categories: I know them “outside” the internet, or have worked for them, or collaborated with them, or had a length technical exchange with them either by email or in person at a technical conference.

So if you send me a “Join my network” and I don’t do anything about it, please don’t be offended – it just means you don’t belong to a fairly restricted group. (On the other hand, since I’ve averaged about 3 days per site at 40 client sites per year for the last 10 years, it’s possible that you’ve been let down by my appalling memory for names.)

Oracle Support-final update to SR

Just had a really pleasent exchange with Oracle support. I was after a way to purge the repository database of an OEM 11.1 Grid Control installation without having to blow it all away. Unfortunately, there is no such option. However, what I liked was this final update from the support member:

Generic Note

From sunny Colorado – blue sky and SNOW! – I do wish we could have provided a better option.

But I do want to thank you so much for your kindness and patience. You are the best kind of customer to work with. That means a lot, in these challenging jobs.

Very best,

The whole SR was well and competently managed by Thom, and at no time did he come up with techniques to buy more time by asking for irrelevant log files or similar. I wish more support staff were like him.