Who's online

There are currently 0 users and 30 guests online.

Recent comments


August 2011

Dual Boot: Near Disaster…

Today has been a little emotional…

I decided I wanted to dual boot a server with Fedora 15 and Oracle VM 3.0.1. The machine already had Fedora 15 on, so I moved some stuff around to free up a 1.5TB drive and decided to install OVM on that.

The installation went well, but I made the schoolboy error of overwriting the bootloader on the MBR. After the restart it was all completely toilet. :(  This incident left me checking my underwear because the Fedora installation contained a load of VMs, including RAC and Data Guard installations. None of them were that important, but they would be a pig to set up again.

To sort it I booted from the Fedora 15 DVD and did a repair, telling it to recreate the bootloader on the MBR. A few seconds later, normal service was resumed. Phew.

So then I retried the OVM installation on the clean drive, remembering not to place the bootloader on the MBR this time. The installation went fine again. I booted into Fedora 15, edited my “/etc/grub.conf” to include a chainloader entry. I rebooted and picked the chainloader line from my grub screen only to find I got “Error 21: Selected disk not found”.

After a bit of investiagtion it seems only my RAID1 system disk is visible to grub. The controller doesn’t seem to be able to present the individual disks, unless I kill the RAID1 config. Crappy controller.

So I now have two options:

  1. Buy another disk (or move this “spare” one) and put it into another machine with a sane controller and dual boot that.
  2. Buy another box that is dedicated to OVM.

I kind like option 2, since dual boot is a total pain. All my boxes have a purpose and having something missing while I play with OVM is not ideal. Trouble is it’s more money and it is a machine that is likely to be unused a lot of the time. Seems a bit of a waste.

If only VirtualBox exposed dmidecode to guests, then I could install OVM on a VirtualBox VM. I know it sounds stupid, but it would make functional testing a lot simpler, even if the performance were a little crappy. Never mind…




This post is by way of reply to Richard Foote’s latest quiz available here Richard wants to know how many columns a histogram will be collected on given certain conditions. I believe that the answer depends on the database version (so 9i will behave differently from 10.2 and above for example). For my tests I’m [...]

METHOD_OPT => SIZE AUTO Quiz (Automatic For The People)

OK, a nice, easy, simple one today. No tricks, honest You have a table with 3 columns and lets say 1M rows. Column 1 is effectively unique, so it has 1M distinct values. Let’s say 1 – 1000000, not that it really matters. Column 2 has 254 distinct values, all evenly distributed so it has [...]

MIN / MAX Quiz Answer (One Shot)

Not only are my regular blog readers a good deal better looking than the average person, but they’re quite a bit smarter as well As most people have correctly identified, the answer I was after to my previous Min/Max Quiz is that Option 1 is indeed the odd one out, as it’s the only option [...]

VirtualBox and Solaris 10 Guest Additions

Just a quick post really to save you some time-as it stands the guest additions for Solaris 10 are broken with Virtual Box 4.1.0. When I tried to install them in my Solaris 10 U9 guest, everything worked until the vboxsf module should have been loaded. The output from dmsg is as follows:

Aug 30 18:45:33 unknown genunix: [ID 819705 kern.notice] /usr/kernel/fs/amd64/vboxfs: undefined symbol
Aug 30 18:45:33 unknown genunix: [ID 819705 kern.notice] /usr/kernel/fs/amd64/vboxfs: undefined symbol
Aug 30 18:45:33 unknown genunix: [ID 819705 kern.notice] /usr/kernel/fs/amd64/vboxfs: undefined symbol
Aug 30 18:45:33 unknown genunix: [ID 472681 kern.notice] WARNING: mod_load: cannot load module 'vboxfs'

I’m running Virtual Box 4.1 on Windows 7. When I tried to check for updates the application returned “you are already running the current version”. However, checking the Downloads section of the website I found 4.1.2-which I’m trying now.

The workaround for 4.1.0 is to install older guest additions, see for more detail.

Highchart and JSON fun

Anyone want to look through the following JSON and spot the problem?

{{“o_w_ms”:[[1313412960000 , 1.71 ],[1313413020000 , 2 ]], “o_w_sz”:[[1313412960000 , 0 ],[1313413020000 , 0 ]], “o_w_MBt”:[[1313412960000 , 0.01 ],[1313413020000 , 0.01 ]], “o_r_ms”:[[1313412960000 , 0.04 ],[1313413020000 , 0.36 ]], “o_r_sz”:[[1313412960000 , 0 ],[1313413020000 , 0 ]], “o_r_MBt”:[[1313412960000 , 0.05 ],[1313413020000 , 0.05 ]], “D_W_IO_ms”:[[1313412960000 , 0.17 ],[1313413020000 , 0.15 ]], “D_W_NFS_ms”:[[1313412960000 , 0.66 ],[1313413020000 , 0.62 ]], “D_W_NFS_sz”:[[1313412960000 , 12 ],[1313413020000 , 18 ]], “D_W_NFS_MB”:[[1313412960000 , 0.02 ],[1313413020000 , 0.03 ]], “D_W_IO_MB”:[[1313412960000 , 0.28 ],[1313413020000 , 0.33 ]], “D_R_NFS_ms”:[[1313412960000 , 0 ],[1313413020000 , 0 ]], “D_R_IO_ms”:[[1313412960000 , 0 ],[1313413020000 , 0.16 ]], “D_R_NFS_sz”:[[1313412960000 , 0 ],[1313413020000 , 0 ]], “D_R_NFS_MB”:[[1313412960000 , 0 ],[1313413020000 , 0 ]], “D_R_IO_MB”:[[1313412960000 , 0.02 ],[1313413020000 , 0 ]], “vdb_names”:["",""], “vdb_oscpu”:[0.06,0.11], “vdb_oracpu”:[0,0], “vdb_oracpuw”:[0.04,0.01], “vdb_read”:[0,0], “vdb_commit”:[0,0], “vdb_wait”:[0,0], “vdb_readskbt”:[0.02,0.2], “vdb_writeskbt”:[0.01,0.01], “vdb_o_readms”:[0.01,0.05], “vdb_o_writems”:[.007,10.705], “vdb_d_readms”:[0,0], “vdb_d_writems”:[616.99,616.99], “vdb_mxsum”:0.14}

This is also a simplified version of the original which was 30x bigger. I’ll give you one hint – all the braces, backets and quotes match up.  Luckily I found a neat little web site:

That I could copy and past the JSON into and voila there is the issue:

Parse error on line 202:
...writems": [        .007,        10.705

The  decimal  was missing a leading 0.  The issue of decimals and decimal places brings me to another issue with highcharts.  Highcharts is super cool  jquery web graphing package that I’m investigating for use in a  monitoring UI.

I did some prototyping with the example on and things would work just fine for a while, then all of a sudden the lines would disappear off the graph but the points would stay. Strange strange strange. I spent a day trying to debug . My debugging was a bit slow exacerbated by the fact that I know nothing about jquery. Well after a process of trial and error it turns out that if I sent numbers to my highcharts with more than 3 places after the decimal point then the lines would disappear. Not sure why, but after wrapping all my floats with

var = int( var * 100 )/ 100

everything runs flawlessly.

Being new to jquery, I found it useful have trace/debug output into the web page I was working on, but finding out how to debug was not obvious in jquery. Jquery,  unlike perl seems to lack practical resources on the web, or at least if there are practical resources they are obfuscated by a large amount of less practical pages.   Perl is quite different. With Perl if I type “perl debug” or “perl output” into google I get tons of great examples. If I type “jquery output” or “jquery debug” I get tons of  pages with content that is close to what I want but no dice. These no dice URLs  can cause me to spend a lot of my time trying to understand them. Most refer to using an alert for output which is a pain as every alert has to be acknowledged. Instead, as an easy solution, one can just create a region with the “div” tag as in


and in jquery you can output to this region with

$(“#debug”).append(“hello again”);

Using this debug technique  was nice as it enabled me to be able to check some of the values I was sending into the jquery code from the cgi which was quite useful since I was brand new to reading JSON with jquery and extracting variable values.


Just poking around the firebug data and found where I can see the JSON data I send the page, pretty cool:


Experimenting with Highcharts Stock Charts. I wanted a stacked area chart

Performance Tuning for Oracle Developers

One of my recent customers was a company with a somewhat large warehouse (around 60TB) on Oracle 10gR2.  The system was using RAC, though it was a fairly simple setup: two nodes, very large AIX LPARs, workload manually partitioned between them and somewhat evenly balanced.  The most important demand of their business is a large number of reports that must be generated every day from the warehouse.  These reports were beginning to take most of the day and consume a large amount of resources… and the current forecast is for dramatic data growth later this year.  So our project goal was to improve performance.


Here’s a quote that says it all:

Dr Joseph Lykken of Fermilab – in response to some (negative) results from the Large Hadron Collider that suggest the simplest form of SuperSymmetry is wrong:

“It [supersymmetry] is a beautiful idea. It explains dark matter, it explains the Higgs boson, it explains some aspects of cosmology; but that doesn’t mean it’s right.”

Mind you, Feynmann got there years ago:

“It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.”

Setting Up Oracle Connection Manager

The Problem

It seems really simple. We have an Oracle database (on all nodes of a full rack Exadata, to be exact), which a lot of end-users connect to through apps designed in a rather adhoc and haphazard manner - on Excel spreadsheets, Access forms, TOAD reports and other assorted tools. We want to control the access from these machines and streamline them.

The database machine sits behind a firewall. To allow the adhoc tools accessing the database from the client machines mean we have to change the firewall rules. Had it been one or two clients, it would have been reasonable; but with 1000+ client machines, it becomes impractical. So I was asked to provide an alternative solution.

The Solution


Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-001a0002-0002a0fe       196     197     X            166    1835           S
TM-0000c800-00000000       166    1835    SX            196     197    SX   SSX

It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).

The modes and types give us some clues about what’s going on: TX/4 is typically about indexes involved in referential integrity (though there are a couple of more exotic reasons such as wait for ITLs, Freelists or tablespace status change); conversion of a TM lock from mode 3 to mode 5 is only possible (as far as I know) in the context of missing foreign key indexes when you delete a parent row.

Here’s a simple data set to help demonstrate the type of thing that could have caused this deadlock:

drop table child;
drop table parent;

create table parent (
	id		number(4),
	name		varchar2(10),
	constraint par_pk
		primary key (id)

create table child(
	id_p	number(4),
	id	number(4),
	name	varchar2(10),
	constraint chi_pk
		primary key (id, id_p),
	constraint chi_fk_par
		foreign key(id_p)
		references parent
		on delete cascade

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1, 1, 'Simon');
insert into child values(2, 1, 'Janet');


Note that I have define the primary key on the child the “wrong way round”, so that the foreign key doesn’t have a supporting index. Note also that the foreign key constraint is defined as ‘on delete cascade’ – this isn’t a necessity, but it means I won’t have to delete child rows explicitly in my demo.

Now we take the following steps:

Session 1: delete from parent where id = 1;

This will delete the child row – temporarily taking a mode 4 (S) lock on the child table – then delete the parent row. Both tables will end up locked in mode 3.

Session 2: insert into child values (1,2,'Sally');

This will lock the parent table in mode 2, lock the child table in mode 3, then wait with a TX mode 4 for session 1 to commit or rollback. If session 1 commits it will raise Oracle error: “ORA-02291: integrity constraint (TEST_USER.CHI_FK_PAR) violated – parent key not found”; if session 1 rolls back the insert will succeed.

Session 1: delete from parent where id = 2;

This will attempt to lock the child table in mode 4, find that there it already has the child locked in mode three (which is incompatible with mode 4) and therefore attempt to convert to mode 5 (SSX). This will make it queue, waiting for session 2 to commit.
Three seconds later session 2 (the first to start waiting) will timeout and report a deadlock with the follow deadlock graph:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TM-00015818-00000000        14     371    SX             17     368    SX   SSX
TX-0009000e-000054ae        17     368     X             14     371           S
session 371: DID 0001-000E-00000018	session 368: DID 0001-0011-00000005
session 368: DID 0001-0011-00000005	session 371: DID 0001-000E-00000018
Rows waited on:
Session 368: no row
Session 371: no row
Session 368:
  pid=17 serial=66 audsid=2251285 user: 52/TEST_USER
  O/S info: user: HP-LAPTOPV1\jonathan, term: HP-LAPTOPV1, ospid: 2300:3528, machine: WORKGROUP_JL\HP-LAPTOPV1
            program: sqlplus.exe
  application name: SQL*Plus, hash value=3669949024
  Current SQL Statement:
  delete from parent where id = 2
End of information on OTHER waiting sessions.
Current SQL statement for this session:
insert into child values(1,2,'new')

You’ll notice that there are no rows waited for – session 1 isn’t waiting for a row it’s waiting for a table and session 2 isn’t waiting for a table row it’s waiting for an index entry.

Footnote: There are several variations on the theme of one session inserting child rows when the other session has deleted (or inserted) the parent. The uncommitted parent change is an easy cause of the TX/4; the delete with unindexed foreign key is a necessary cause of the SX -> SSX.