Who's online

There are currently 0 users and 32 guests online.

Recent comments

Oakies Blog Aggregator

ASM normal redundancy with high protection template

One of the ways you can control how many mirror copies ASM keeps on disk is via diskgroup templates.

This presents an interesting question -- can you turn your normal redundancy disk group into a high redundancy by using a diskgroup template with high protection attribute thus telling ASM to use triple mirroring for related files?

I'll start by creating a diskgroup:

SQL> create diskgroup data normal redundancy
2 disk '/dev/sdb1', '/dev/sdc1', '/dev/sdd1';

Diskgroup created.

I'm not explicitly specifying any failgroups, each disk will end up in it's own FG and triple-mirroring will spread extents across all three disks. Let's add a high protection template:

SQL> alter diskgroup data add template mirror_high attributes (high coarse);

Diskgroup altered.

Now let's create two tablespaces, one using the default mirroring template (two-way) and one using mirror_high (three-way) template:

SQL> create tablespace mirror_normal datafile '+data' size 256m;

Tablespace created.

SQL> create tablespace mirror_high datafile '+data(mirror_high)' size 256m;

Tablespace created.

We can confirm how many mirror extents our datafiles have:

SQL> select,
when 0 then 'primary'
when 1 then 'secondary'
when 2 then 'third'
end, count(*)
from X$KFFXP, v$asm_alias a
where a.file_number=NUMBER_KFFXP
group by, LXN_KFFXP
order by 1, 2; 2 3 4 5 6 7 8 9 10

------------------------------ --------- ----------
MIRROR_HIGH.257.787940101 primary 258
MIRROR_HIGH.257.787940101 secondary 258
MIRROR_HIGH.257.787940101 third 258
MIRROR_NORMAL.256.787940093 primary 258
MIRROR_NORMAL.256.787940093 secondary 258
MIRROR_NORMAL.256.787940093 third 1

6 rows selected.

Indeed, all extents got triple mirrored for MIRROR_HIGH tablespace and only double mirrored for MIRROR_NORMAL (with the exception of one extent which probably holds some triple-mirrored metadata).

Now, because all data essentially got triple mirrored for our MIRROR_HIGH tablespace it ought to survive two disk failure, right? Let's give it a try:

SQL> alter diskgroup data dismount;

Diskgroup altered.

... then I'm going to change the permissions and make two disks inaccessible to ASM:

[oracle@ora11gr2 ~]$ chmod 000 /dev/sdc1
[oracle@ora11gr2 ~]$ chmod 000 /dev/sdd1

Let's see if I could mount it back:

SQL> alter diskgroup data mount force;
alter diskgroup data mount force
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

...looks like the answer is no. It still sees that the remaining disk is missing required partners and does not allow the disk group to be mounted. How many extent copies the datafile had becomes a moot point.

The conclusion is that triple-mirroring inside a normal redundancy disk group can help you in certain cases (like physical corruption of both primary and secondary extents so ASM can use a third one to read the data) but it's not a substitute for a high redundancy disk group.

Update a little bit later: if anyone is thinking about doing it the other way around then remember that mirroring attribute works only for normal redundancy disk groups.

Fault Injection Testing. Spurious Space Depletion? Sure, Why Not?

When file systems run out of space bad things happen. We like to investigate what those “bad things” are but to do so we have to create artificially small installation directories and run CPU-intensive programs to deplete the remaining space. There is a better way on modern Linux systems.

If you should find yourself performing Linux platform fault-injection testing you might care to add spurious space free failures. The fallocate() routine immediately allocates the specified amount of file system space to an open file.  It might be interesting to inject random space depletion in such areas as Oracle Clusterware (Grid Infrastructure) installation directories or application logging directories. Could a node ejection occur if all file system space immediately disappeared? What would that look like on the survivors? What happens if large swaths of space disappear and reappear? Be creative with your destructive tendencies and find out!



int main(int argc, char *argv[])
long int sz;
char *fname;
int ret,fd;

if (argc != 3)
fprintf(stderr, "usage: %s file new-size-in-gigabytes\n", argv[0]);

fname = argv[1];
sz   = atol(argv[2]);

if ((ret = (fd = open(fname, O_RDWR | O_CREAT | O_EXCL, 0666)))  == -1 ) {

if ( (ret = fallocate( fd, 0, (loff_t)0, (loff_t)sz * 1024 * 1024 * 1024 )) != 0 ){
perror ("fallocate");
unlink( fname );

return ret;




# cc fast_alloc.c
# ./a.out
usage: ./a.out file new-size-in-gigabytes
# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sdc 2.7T 1.6T 1.2T 57% /data1
# time ./a.out bigfile 512
real 0m1.875s
user 0m0.000s
sys 0m0.730s
# du -h bigfile
513G bigfile
# rm -f bigfile
# ./a.out bigfile 512
# ls -l bigfile
-rw-r--r-- 1 root root 549755813888 Jul 1 09:48 bigfile

Filed under: oracle

Histograms – What is Wrong with this Quote?

July 4, 2012 It has been several months since I posted my review of the first half of the “Oracle Database 11gR2 Performance Tuning Cookbook” book.  I had planned to post the review for the second half of the book a couple of weeks later, however four months have passed by since that time, and the second [...]

Small Blog Anniversary…

I saw, while fiddling with this site that a small anniversary was in order. Although I started blogging, as an alternative keeping at bay hundreds of SQL and shell scripts, somewhere in 2004 on the site, this site came to light around the 1st of April in 2007, 5 plus years ago. Initially it

Read More…

OTN Tour of Latin America 2012: Travelling to Cali, Colombia…

The journey from Birmingham to Cali (Colombia) was very long!

It started with a quick flight across to Amsterdam and a 1.5 hour wait for my connection to Panama. The flight to Panama took about 11 hours. It wasn’t a bad flight, but it felt super-long because I knew it wasn’t the last flight of the day. I had a 4+ hour wait at Panama for the final flight to Cali.

Panama airport is pretty small and not really the place to hang around for 4 hours. Just about the only entertainment was watching the “junk in the trunk” that was on display everywhere. I don’t think I’ve ever seen anything like it before. It seemed like everywhere I looked was something that made Jennifer Lopez and Beyonce look like they had flat asses…

The final flight to Cali was over pretty quick. I landed and there was a guy waiting to take me to the hotel. I checked in at about midnight, and got to bed at about 01:00 local time, making it a very long and tiring day. Unfortunately, I woke at 04:00 and I’m typing this at 05:30. Today is the beginning of the jet-lag wars… I must remember to grab any opportunity to sleep, no matter how small.



OTN Tour of Latin America 2012: Travelling to Cali, Colombia… was first posted on July 3, 2012 at 11:29 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Oracle Enterprise Manager Cloud Control 12.1 – Agent Installation, Issues and Solutions

Currently busy for a client to install and configure Oracle Enterprise Manager 12c for database and more administration across the globe. These environments were configured and setup by a different 3rd party so not always follow our wishes and administration guidelines. You can imagine that such environments are also a neat environments regarding learning curves

Read More…

XMLTYPE datatype and ORA-22992

Anton has a small but informative post on LOB handling across database links and avoiding a possible ORA-22992. I was not sure if XMLTYPE’s, CLOB based storage or Binary XML (Securefile) storage had the same limitations and/or in need of possible workarounds. So the best way to be sure is of course too just to

Read More…

Get a feel for enterprise block level replication using drbd

I didn’t really have a lot of exposure to block-level replication on the storage level before an engagement in the banking industry. I’m an Oracle DBA, and I always thought: why would I want to use anything but Oracle technology for replicating my data from one data centre to another? I need to be in control! I want to see what’s happening. Why would I prefer storage replication over Data Guard?

For a great many sites Data Guard is indeed all you need. Especially if you don’t have a storage array with a replication option. But many large enterprises have historically used large storage area networks with many enterprise features, including block level replication from array to array. They all come under their own name, and all major storage vendors have them. With the risk of speaking too generally, all of the block level replication allows you to somehow copy data from array A in data centre A to array B in data centre B. The data centres are usually geographically dispersed so as to avoid the impact of catastrophes. The storage replication happens without any DBA intervention or even visibility, harking back to the 90s mantra of “storage administrator does storage, system administrator does the OS and the database administrator works on the database”. I have written about this in the context of Exadata before.

Taking the responsibility of replication away from the DBA can sound attractive to the DBA managers: if it goes all wrong (and that includes human error more than technical problems, at least with the technology I was working with) then it’s not their fault. What is missed from this point of view though is that it is ultimately the DBA’s responsibility to get the system back, regardless of how long that takes. And I’d rather use OEM or another monitoring solution and proactively prevent the problem before it happens. Nothing is worse than going to DR and then having to find out that a LUN hasn’t been replicated and the volume group cannot be mounted-time to restore backup! But like Noons points out in the below comment this is down to how well you know and use your technology.

There are actually good arguments for the use of storage replication! Like you will see below you can mirror an Oracle home and databases to remote hosts, and unlike with the DRBD you do this on the storage array. The mapping of LUNs to a host is not as static in real life as shown here-DRBD really mirrors between hosts and not between storage. So mirroring a Xen domain for example allows you  to continuously store it on a different host, and if some simple prerequisites are met, start it up without too many problems. That could include the whole stack!

Why this post?

The reason for this post was simple: I want to experiment how storage replication works, especially when it comes to Oracle. And since I can’t buy myself a VMAX or HDS 9000 for use in the house I have to improvise. And since I am seriously in love with Linux, there is always a project at hand which makes such improvising possible. The project this time is DRBD:

The Distributed Replicated Block Device (DRBD) is a software-based, shared-nothing, replicated storage solution mirroring the content of block devices (hard disks, partitions, logical volumes etc.) between hosts.

The architecture is nicely visualised on the project’s website:

As you can see it uses a kernel module to send data with destination “local disk” over the wire to a remote server where it is read and written to disk as well. Sounds like some enterprise software to me, at least in principle. And again, I wouldn’t use this in production for an Oracle workload.

What I want

I would like to test a common scenario where the Oracle binaries as well as the databases of a Linux host are replicated to a host in the DR-data centre. For this purpose, I have two virtual machines running in my lab server on Xen 4.1.2 with kernel 3.1.10-1.9-xen on my OpenSuSE 12.1 lab server. The domUs are Oracle Linux 6.2 with kernel 2.6.32-300.3.1.el6uek.x86_64, named ol62drbd1 and ol62drbd2.

Both domUs have two NICs. The primary node, ol62drbd1 has the public address of and a storage network address of The standby node has as private, and As per my standard build I have a 20G “LUN” for an Oracle volume group. To be a little bit more practical volume group is split into 2x10G logical volumes: orabin_lv and oradata_lv. If not replicated, they will be mounted to /u01/app and /u01/oradata respectively.

Get DRBD to play with Oracle Linux and UEK1

To start with it: the DRBD documentation is excellent. Sadly there was no RPM for Oracle Linux and kernel UEK so I had to build the software from the source. (It seems that kernel UEK2 has experimental support for DRDB!) I say sadly-of course it was not. My system had the oracle preinstall PRM installed already, which includes the main development tools as dependencies. The only additional RPMs I had to install were flex and kernel-uek-devel for the kernel headers.

I also had to ensure the the ifcfg-eth1 scripts on both domUs had the ONBOOT flag set to YES. For some reason Oracle Linux allows you to configure devices but then doesn’t automatically enabled them when the system boots.

Following the excellent documentation I downloaded the source for drbd-8.4.1 to /usr/src and unpacked it. The configure script has a lot of options, which are well documented. In my case I went for these options:

$ cd /usr/src/drbd-8.4.1
$ ./configure --localstatedir=/var --sysconfdir=/etc --with-km=yes --with-heartbeat=no --with-pacemaker=no

Although drbd is part of the mainline kernel since 2.6.33 it doesn’t mean it is available in Oracle Linux and kernel UEK: that’s why you have to create the kernel module using the “–with-km” option. I didn’t need support for pacemaker or heartbeat so those were left out of scope.

Create the userland utilities with “make && make install”, and the kernel module with a

$ cd drbd && make clean all

That’s it-you built drdb. Obviously you’d repeat on the other node. I tried building the RPM but that failed since Oracle Linux is not a supported configuration in drbd…

Configuring the nodes

That’s actually quite simple, again thanks to the excellent documentation. What I learned was that you define just one resource! But I’m getting ahead of myself. The drbd configuration has to be identical on both nodes. The main configuration file is /etc/drbd.conf:

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

As you can see it references files in /etc/drdb.d, as it is custom in Linux. The global_common.conf has literally been taken from the documentation:

global {
  usage-count yes;
common {
  net {
    protocol C;

This very simple file in essence sets the replication mode to sync. The actual resource we want to replicate is defined in r0.res:

resource r0 {
  volume 0 {
    device    /dev/drbd1;
    disk      /dev/oraclevg/orabinlv;
    meta-disk internal;
  volume 1 {
    device    /dev/drbd2;
    disk      /dev/oraclevg/oradatalv;
    meta-disk internal;
  on ol62drbd1.localdomain {
  on ol62drbd2.localdomain {

Also quite readable: I want to replicate two volumes, orabin_lv and oradata_lv and map those to /dev/drbd1 and /dev/drbd2 respectively. Once the configuration is ready on the first node, scp it across to the second node.

Initialise the resource for first time use

Do this on both nodes!

[root@ol62drbd2 drbd.d]# drbdadm create-md r0
md_offset 10737414144
al_offset 10737381376
bm_offset 10737053696
Found some data
 ==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
Writing meta data...
initializing activity log
NOT initializing bitmap
New drbd meta data block successfully created.
md_offset 10733219840
al_offset 10733187072
bm_offset 10732859392
Found some data
 ==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
Writing meta data...
initializing activity log
NOT initializing bitmap
New drbd meta data block successfully created.

r0 in this command refers to the resource definition in /etc/drbd.d/r0.res by the way. You really have to be sure that you don’t have data on your underlying block devices! Since I only just created my logical volumes that was a given for me.

You can then “up” your resources on both cluster sides:

[root@ol62drbd1 drbd.d]#  drbdadm up r0

I obviously forgot to load the kernel module first, but that’s easily changed:

[root@ol62drbd1 drbd.d]# drbdadm up r0
Could not stat("/proc/drbd"): No such file or directory
do you need to load the module?
try: modprobe drbd
Command 'drbdsetup new-resource r0' terminated with exit code 20
drbdadm: new-minor r0: skipped due to earlier error
drbdadm: new-minor r0: skipped due to earlier error
[root@ol62drbd1 drbd.d]# modprobe drbd
[root@ol62drbd1 drbd.d]# echo $?
[root@ol62drbd1 drbd.d]# lsmod | grep drbd
drbd                  245440  0
libcrc32c               1220  1 drbd

Using the /proc/drbd file you can check the status:

[root@ol62drbd1 drbd.d]# cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root@ol62drbd1.localdomain, 2012-06-29 12:47:26
 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:10485404
 2: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:10481308
[root@ol62drbd1 drbd.d]#

The status is pretty inconsistent at this stage, but that’s expected. I have to force the information about the primary side right now. This command must not be executed once the system is initialised! And also only from one node.

[root@ol62drbd1 drbd.d]# drbdadm primary --force r0
[root@ol62drbd1 drbd.d]# cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root@ol62drbd1.localdomain, 2012-06-29 12:47:26
 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:244632 nr:0 dw:0 dr:250264 al:0 bm:14 lo:0 pe:4 ua:20 ap:0 ep:1 wo:f oos:10241692
        [....................] sync'ed:  2.4% (10000/10236)M
        finish: 0:06:59 speed: 24,368 (24,368) K/sec
 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:135936 nr:0 dw:0 dr:140696 al:0 bm:8 lo:0 pe:0 ua:16 ap:0 ep:1 wo:f oos:10345372
        [....................] sync'ed:  1.4% (10100/10232)M
        finish: 0:12:40 speed: 13,592 (13,592) K/sec
[root@ol62drbd1 drbd.d]#

All right, it’s syncing! Actually it’s copying empty tracks to empty tracks. There are ways to speed this up which I didn’t explore.

Eventually this process will finish, and the output is as shown:

[root@ol62drbd1 drbd.d]# cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root@ol62drbd1.localdomain, 2012-06-29 12:47:26
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:10485404 nr:0 dw:0 dr:10486088 al:0 bm:640 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:10481308 nr:0 dw:0 dr:10481992 al:0 bm:640 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@ol62drbd1 drbd.d]#

Now we are up to date! Ready to create some data.

Create mount points

DRBD will create a few new block devices, /dev/drbd1 and /dev/drbd2 as defined in /etc/drbd.d/r0.res. It also creates symbolic links in /dev/drbd:

[root@ol62drbd1 drbd.d]# ls -lR /dev/drbd
total 0
drwxr-xr-x. 3 root root 60 Jun 30 22:16 by-disk
drwxr-xr-x. 3 root root 60 Jun 30 22:16 by-res
total 0
drwxr-xr-x. 2 root root 80 Jun 30 22:16 oraclevg
total 0
lrwxrwxrwx. 1 root root 14 Jun 30 22:17 orabinlv - ../../../drbd1
lrwxrwxrwx. 1 root root 14 Jun 30 22:17 oradatalv - ../../../drbd2
total 0
drwxr-xr-x. 2 root root 80 Jun 30 22:16 r0
total 0
lrwxrwxrwx. 1 root root 14 Jun 30 22:17 0 - ../../../drbd1
lrwxrwxrwx. 1 root root 14 Jun 30 22:17 1 - ../../../drbd2
[root@ol62drbd1 drbd.d]#

I initially wanted to create logical volumes on top of /dev/drbd1 and /dev/drbd2 but that was classified as “advanced” in the documentation so I simply didn’t.

Instead I created the file systems on /dev/drbd1 – ext3. I Initially created /dev/drbd2 as an xfs but that caused crashes and kernel panics which I didn’t investigate further. The updated /etc/fstab has the following new lines:

/dev/drbd1              /u01/app                ext3    defaults        1 0
/dev/drbd2              /u01/oradata            ext3    defaults        1 0

It is very important to set the sixth field to a “0″, as shown here. Otherwise you’d bump into system maintenance mode when booting since the devices aren’t ready for fsck!

Add drbd service to boot

To ensure that the kernel module is loaded at boot you should add the /etc/init.d/drbd script. Nowadays that’s easy!

# chkconfig --add drbd
# chkconfig drbd on

Note: this might not be the best way to do this on Oracle Linux 6 since we’re using upstart, but I didn’t check if chkconfig has been amended accordingly, possibly not.

Now when your system reboots it sadly hasn’t preserved the status of the block devices, i.e. you have two secondary devices. You then have to manually assess the situation and set on to primary. At this stage pacemaker or heartbeat would set in. Or any other cluster manager.

The situation

The LUNs on my standby host are not read-writable, which again is quite similar to enterprise replication software:

[root@ol62drbd2 ~]# mount /dev/drbd1
mount: block device /dev/drbd1 is write-protected, mounting read-only
mount: Wrong medium type
[root@ol62drbd2 ~]#

I have since created a database and binaries on the primary node. The status is as follows:

[root@ol62drbd1 ~]# drbd-overview
  1:r0/0  Connected Primary/Secondary UpToDate/UpToDate C r----- /u01/app     ext3 9.9G 4.2G 5.2G 45%
  2:r0/1  Connected Primary/Secondary UpToDate/UpToDate C r----- /u01/oradata ext3 9.9G 1.8G 7.6G 20%

ol62drbd1 is the primary. Let’s switch over! First, gently, after all it’s the first time I’m doing it

[oracle@ol62drbd1 ~]$ sqlplus / as sysdba
SQL*Plus: Release Production on Mon Jul 2 12:04:47 2012
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL select host_name from v$instance;
SQL shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
[oracle@ol62drbd1 ~]$ lsnrctl stop listener_drbd
LSNRCTL for Linux: Version - Production on 02-JUL-2012 12:05:56
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ol62drbd1.localdomain)(PORT=1523)))
The command completed successfully
[oracle@ol62drbd1 ~]$

Nothing running on my mount points:

[root@ol62drbd1 ~]# fuser -m /u01/app /u01/oradata

Prepare the switchover:

[root@ol62drbd1 ~]# umount /dev/drbd1
[root@ol62drbd1 ~]# umount /dev/drbd2

And switch! The below listing includes information from both hosts

[root@ol62drbd1 ~]# drbdadm secondary r0
# on the now primary site
[root@ol62drbd2 ~]# drbdadm primary r0
[root@ol62drbd2 ~]# drbd-overview
  1:r0/0  Connected Primary/Secondary UpToDate/UpToDate C r-----
  2:r0/1  Connected Primary/Secondary UpToDate/UpToDate C r-----
[root@ol62drbd2 ~]# mount /u01/app
[root@ol62drbd2 ~]# mount /u01/oradata/

Resume operations

Now it’s time to start again. It turned out that I had to run /u01/app/oraInventory/ and /u01/app/oracle/product/ on my standby host, before starting. I also needed a new listener.ora file with the new hostname-all things that ought to be addressed. Now after the listener was up I started the database. Cool!

What to do next

This post is already too long, but there are things worth addressing:

  • The switchover isn’t seemless-the new host has a new IP address, requiring a change to DNS or the listener
  • The configuration is very basic and must be updated. With the current setup for example the drbd service will wait forever until all nodes are started.
  • Understand the volume resizing
  • and many more

Again, I would like to stress that such a configuration is not supported or suitable for most Oracle production databases. The above only serves as an example to make you understand what you can and cannot do with replication technologies.

Putting SLOB (The Silly Little Oracle Benchmark) To Use For Knowledge Sake!

This is just a short blog entry here to refer folks interested in SLOB to the following links:

About SLOB:  Introducing SLOB – The Silly Little Oracle Benchmark

Filed under: oracle