Top 60 Oracle Blogs

Recent comments

January 2011

Adding Comments to SQL Statements Improves Performance?

January 15, 2011 While reading the “Pro Oracle SQL” book I learned something interesting.  Commenting your work can improve database performance.  You certainly are aware that thoroughly documenting what you do could prevent hours of headaches that might appear later when trying to investigate problems or add additional features to an existing procedure (I think that was [...]

An Axiomatic Approach to Algebra and Other Aspects of Life

Not many days pass that I don’t think a time or two about James R. Harkey. Mr. Harkey was my high school mathematics teacher. He taught me algebra, geometry, analytic geometry, trigonometry, and calculus. What I learned from Mr. Harkey influences—to this day—how I write, how I teach, how I plead a case, how I troubleshoot, .... These are the skills I’ve used to earn everything I own.

EHCC Mechanics – Proof that whole CU’s are not decompressed

I saw an interesting post recently where Greg Rahn talked about HCC mechanics. He claimed that an update to a record stored in HCC format did not require decompressing the whole Compression Unit (CU) which consist of several Oracle blocks. I’m assuming by this he meant that all the records contained in the CU did not get written back to storage in a non-HCC format due to a single record being updated. Greg then showed an example proving row migration occurred for an updated record. He didn’t show that the other records had not been decompressed though. So since I was already working on an HCC chapter for the upcoming Apress Exadata book, I thought I would take time off from the book writing to post this (hopefully the editors will forgive me).

Here’s the recipe: Basically we’ll update a single row, see that its rowid has changed, veify that we can still get to the record via its original rowid, and check to see if the TABLE FETCH CONTINUED ROW statistic gets updated when we we access the row via its original rowid, thus proving basic row migration (this is what Greg has already shown). Then we’ll look at block dumps for the original and new block to see what’s there.

-bash-3.2$ sqlplus / as sysdba
SQL*Plus: Release Production on Fri Jan 14 14:16:20 2011
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SYS@SANDBOX1> select rowid, old_rowid(rowid) location from kso.skew_hcc3 where pk_col=16367;
ROWID              LOCATION
------------------ --------------------
AAATCBAAIAAF8uSFc9 8.1559442.22333
SYS@SANDBOX1> -- so my row is in file 8, block 1559442, slot 22333
SYS@SANDBOX1> update kso.skew_hcc3 set col1=col1 where pk_col=16367;
1 row updated.
SYS@SANDBOX1> select rowid, old_rowid(rowid) location from kso.skew_hcc3 where pk_col=16367;
ROWID              LOCATION
------------------ --------------------
SYS@SANDBOX1> -- Ha! The rowid has changed – the row moved to file 7, block 3171084, slot 0
SYS@SANDBOX1> -- Let's see if we can still get to it via the original rowid
SYS@SANDBOX1> select pk_col from kso.skew_hcc3 where rowid = 'AAATCBAAIAAF8uSFc9';
SYS@SANDBOX1> -- Yes we can! – can we use the new rowid?
SYS@SANDBOX1> select pk_col from kso.skew_hcc3 where rowid = 'AAATCBAAHAAMGMMAAA';
SYS@SANDBOX1> -- That works too! – It’s a migrated Row!
SYS@SANDBOX1> -- Let’s verify with “continued row” stat
SYS@SANDBOX1> @mystats
Enter value for name: table fetch continued row
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
table fetch continued row                                                         2947
SYS@SANDBOX1> -- select via the original rowid
SYS@SANDBOX1> select pk_col from kso.skew_hcc3 where rowid = 'AAATCBAAIAAF8uSFc9';
SYS@SANDBOX1> @mystats
Enter value for name: table fetch continued row
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
table fetch continued row                                                         2948
SYS@SANDBOX1> -- Stat is incremented – so definitely a migrated row!

So the row has definitely been migrated. Now let’s verify that the migrated row is not compressed. We can do this by dumping the block where the newly migrated record resides.

SYS@SANDBOX1> !cat dump_block.sql
alter system dump datafile &fileno block &blockno;
SYS@SANDBOX1> @dump_block
Enter value for fileno: 7
Enter value for blockno: 3171084
System altered.

Now let’s look at the trace file produced in the trace directory. Here is an excerpt from the block dump.

Block header dump:  0x01f0630c
 Object id on Block? Y
 seg/obj: 0x13081  csc: 0x01.1e0574d4  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x1f06300 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x002f.013.00000004  0x00eec383.01f2.44  ----    1  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x01f0630c
data_block_dump,data header at 0x2b849c81307c
tsiz: 0x1f80
hsiz: 0x14
pbl: 0x2b849c81307c
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f60
tab 0, row 0, @0x1f60
tl: 32 fb: --H-FL-- lb: 0x1  cc: 5
col  0: [ 4]  c3 02 40 44
col  1: [ 2]  c1 02
col  2: [10]  61 73 64 64 73 61 64 61 73 64
col  3: [ 7]  78 6a 07 15 15 0b 32
col  4: [ 1]  59

Notice that there is only one row in the block (nrow=1). Also notice that the object_id is included in the block (in hex format). It is labeled “seg/obj:”. The table has 5 columns. The values are displayed – also in hex format. Just to verify that we have the right block and row we can translate the object_id and the value of the first column as follows:

SYS@SANDBOX1> !cat obj_by_hex.sql
col object_name for a30
select owner, object_name, object_type
from dba_objects
where object_id = to_number(replace('&hex_value','0x',''),'XXXXXX');
SYS@SANDBOX1> @obj_by_hex
Enter value for hex_value: 0x13081
OWNER                          OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------------------ -------------------
KSO                            SKEW_HCC3                      TABLE
SYS@SANDBOX1> desc kso.skew_hcc3
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 PK_COL                                 NUMBER
 COL1                                   NUMBER
 COL2                                   VARCHAR2(30)
 COL3                                   DATE
 COL4                                   VARCHAR2(1)
SYS@SANDBOX1> !cat display_raw.sql
col value for a50
select display_raw(replace('&string',' ',''),nvl('&TYPE','VARCHAR2')) Value from dual
SYS@SANDBOX1> @display_raw
Enter value for string: c3 02 40 44
Enter value for type: NUMBER

As you can see, this is the record that we updated earlier in the SKEW_HCC3 table. Note: display_raw.sql depends on a little function called display_raw() which coincidentally I got from Greg Rahn. Here’s a script to create the function: create_display_raw.sql

Now for a quick look back at the original block (note that in my testing I updated another row in this same block):

tab 0, row 1, @0x32
tl: 5086 fb: --H-F--N lb: 0x3  cc: 1
nrid:  0x0217cb93.0
col  0: [5074]
Compression level: 03 (Archive Low)
 Length of CU row: 5074
kdzhrh: ------PC CBLK: 1 Start Slot: 00
 NUMP: 01
 PNUM: 00 POFF: 5054 PRID: 0x0217cb93.0
CU header:
CU version: 0   CU magic number: 0x4b445a30
CU checksum: 0x982dec03
CU total length: 11403
CU flags: NC-U-CRD-OP
ncols: 5
nrows: 32759
algo: 0
CU decomp length: 7266   len/value length: 945436
row pieces per row: 1
num deleted rows: 2
deleted rows: 22333, 30848,
 00 00 13 d2 1f 01 00 00 00 01 00 00 13 be 02 17 cb 93 00 00 00 4b 44 5a 30
 03 ec 2d 98 00 00 2c 8b eb 06 00 05 7f f7 00 0e 6d 1c 01 00 02 00 00 00 00

So this little except shows that this is an HCC compressed block (Compression level: 03 (Archive Low) and many CU references). The nrows line shows us that the block contains 32759 rows. It also shows that 2 rows have been deleted from the block (num deleted rows). Notice that one of the deleted rows is the one in slot 22333 (sound familiar). If you’ll look back at the original rowid in the old format (fileno.blockno.slot) you’ll see that it is the row we updated. It was “deleted” from this block when it was migrated to the new block. Of course there is still a pointer left behind.

SYS@SANDBOX1> select old_rowid('AAATCBAAIAAF8uSFc9') location from dual;
SYS@SANDBOX1> select old_rowid(rowid) location, a.* from kso.skew_hcc3 a where rowid = 'AAATCBAAIAAF8uSFc9';
LOCATION                 PK_COL       COL1 COL2                           COL3      C
-------------------- ---------- ---------- ------------------------------ --------- -
7.3171084.0               16367          1 asddsadasd                     21-JUL-06 Y

It’s hard to prove a negative, but it does not appear that any records are decompressed other than those that are actually updated. The other rows in the block appear to remain in HCC format.

Clustered ASM and RAC password file maintenance

A recurring question during Grid Infrastructure and RAC courses I teach is “How do you manage Oracle password files in a clustered environment?”. The answer isn’t as straight forward as you might think because there are significant differences between ASM and RAC (==clustered database) environments. Additionally, in recent releases changes were made concerning password file […]

Glenn Fawcett's Oracle blog

Statspack on RAC

Some time ago I was on a client site which was busy setting up a RAC cluster using Oracle 10g. Although I wasn’t involved with that particular system there were a couple of coffee-breaks where I ended up chatting with the consultant that the client had hired to install the cluster. In one of our breaks he surprised me by making the casual comment: “Of course, you can’t run Statspack on RAC because it locks up the system.”

Now there’s no reason why Statspack should lock up a RAC cluster – in principle. But this was an 8-node cluster and if you set up the automatic job to take snapshots by running the default spauto.sql script all eight nodes could start running every hour on the hour – so they would all be fighting constantly for every block of every table and index in the Statspack schema.  (I’m exaggerating for effect, of course, but not by much). You might not notice the global cache contention in a 2-node cluster but eight nodes could, indeed, “lock up the system” at this point.

Ideally, of course, you would like to run all eight snapshot simultaneously and concurrently so that you get information from the same moment across all nodes. In practice you have to stagger the snapshots.

The code for taking snapshots for the AWR (automatic workload repository) gets around this problem by using the ‘WF’ lock for cross-instance synchronisation. One instance gets the lock exclusively, and that’s the instance that sets the snapshot time and coordinates the flushing to disc for all the other instances. (The other instances collect their stats at the same moment – but are cued to flush them to disc one after the other)

For Statspack you have to do something manually – and the simplest approach that people sometimes take is to run a slightly different version of sp_auto.sql on each node so that each node is scheduled to start its snapshot a couple of minutes after the previous one. But there is an alternative that is a little smarter and eliminates the risk of two instances overlapping due to time slippage or a slow snapshot.

Taking the idea from AWR we can make use of dbms_lock to ensure that each snapshot starts as soon as possible after the previous one. We simply create a wrapper procedure for statspack that tries to take a specified user-defined lock in exclusive mode before calling the snapshot procedure and releasing the lock after the snapshot is complete. We can schedule this procedure to run on every node at the start of each hour – which means every node on the system will try to take lock simultaneously – but only one node will get it and the rest will queue. After a node completes its snapshot and releases the lock the next node in the queue immediately acquires the lock and starts its snapshot.

Here’s some sample code for the wrapper. I’ve included a couple of debug messages, and a piece of code that makes the procedure time out without taking a snapshot if it has to wait for more than 10 minutes. Note that you have to grant execute on dbms_lock to perfstat for this to work.

create or replace procedure rac_statspack (i_snap_level in number) as
        m_status        number(38);
        m_handle        varchar2(60);

                lockname        => 'Synchronize Statspack',
                lockhandle      => m_handle

        m_status := sys.dbms_lock.request(
                lockhandle              => m_handle,
                lockmode                => dbms_lock.x_mode,
                timeout                 => 600,         -- default is dbms_lock.maxwait
                release_on_commit       => false        -- which is the default

        if (m_status = 0 ) then
                        to_char(sysdate,'dd hh24:mi:ss') ||
                        ': Acquired lock, running statspack'


                        to_char(sysdate,'dd hh24:mi:ss') ||
                        ': Snapshot completed'

                m_status := sys.dbms_lock.release(
                        lockhandle      => m_handle
                        to_char(sysdate,'dd hh24:mi:ss') ||
                        case m_status
                                when 1 then ': Lock wait timed out'
                                when 2 then ': deadlock detected'
                                when 3 then ': parameter error'
                                when 4 then ': already holding lock'
                                when 5 then ': illegal lock handle'
                                       else ': unknown error'
        end if;


And here’s a little bit of pl/sql you can run on each node in turn to install the procedure under dbms_job.

	m_job	number;
	m_inst	number;
	m_date	date;
	m_jqs	number;

	select	instance_number
	into	m_inst
	from	v$instance;

		job		=> m_job,
		what		=> 'rac_statspack(7);',
		next_date	=> trunc(sysdate + 1 / 24,'HH'),
		interval	=> 'trunc(SYSDATE + 1 / 24,''HH'')',
		no_parse	=> TRUE,
		instance	=> m_inst,
		force		=> true

	into	m_date
	from	dba_jobs
	where	job = m_job

	into	m_jqs
	from	v$parameter
	where	name = 'job_queue_processes'

	dbms_output.put_line('Job number: ' || m_job);
	dbms_output.put_line('Next run time: ' || to_char(m_date,'dd-Mon-yyyy hh24:mi:ss'));
	dbms_output.put_line('Current Job Queues: ' || m_jqs || ' (must be greater than zero)');


Warning: Judging by the date stamps on my files it’s at least 18 months since I last took this approach with a system – so (a) you might want to test it carefully before you use it and (b) you might want to modify the code to use dbms_scheduler to run the job rather than dbms_job.

[Further Reading on Statspack]

Adding storage dynamically to ASM on Linux

Note: This discussion is potentially relevant only to OEL 5.x and RHEL 5.x- I haven’t been able to verify that it works the same way on other Linux distributions. I would assume so though. Before starting with the article, here are some facts:

  • OEL/RHEL 5.5 64bit
  • Oracle
  • native multipathing: device-mapper-multipath

The question I have asked myself many times is: how can I dynamically add a LUN to ASM without having to stop any component of the stack? Mocking “reboot-me” OS’s like Windows I soon was quiet when it came to discussing the addition of a LUN to ASM on Linux. Today I learned how to do this, by piecing together information I got from Angus Thomas, a great Red Hat system administrator I had the pleassure to work with in 2009 and 2010. And since I have a short lived memory I decided to write it down.

I’ll describe the process from the top to bottom, from the addition of the LUN to the server all the way up to the addition of the ASM disk to the disk group.

Adding the storage to the cluster nodes

The first step is to obviosuly get the LUN assigned to the server(s). This is the easy part, and outside of the control of the Linux/Oracle admin. The storage team will provision a new LUN to the hosts in question. At this stage, Linux has no idea about the new storage: to make it available, the system administrator has to rescan the SCSI bus. A proven and tested way in RHEL 5 is to issue this command:

[root@node1 ~]# for i in `ls -1 /sys/class/scsi_host`; do
> echo "- - -" > /sys/class/scsi_host/${i}/scan
> done

The new, unpartitioned LUN will appear in /proc/partitions. If it doesn’t then there’s probably something wrong on the SAN side-check /var/log/messages and talk to your storage administrator. If it’s not a misconfiguration then you may not have an option but to reboot the node.

Configure Multipathing

So far so good, the next step is to add it to the multipathing. First of all, you need to find out what the new WWID of the device is. In my case that’s simple: the last new line in /proc/partitions is usually a giveaway. If you are unsure, ask the man who can check the WWID a console to the array. It’s important to get this right at this stage :)

To add the new disk to the multipath.conf file, all you need to do is to add a new section, as in the following example:

multipaths {
multipath {
wwid 360000970000294900664533030344239
alias ACFS0001
path_grouping_policy failover

By the way, I have written a more detailed post about configuring multipathing in a previous blog post here. Don’t forget to replicate the changes to the other cluster nodes!

Now  you reload multipathd using /etc/init.d/multipathd reload on each node, and voila, you should see the device in /dev/mapper/ – my ACFS disk appeared as /dev/mapper/ACFS0001.

Now the tricky bit is to partition it (if you need to-it’s no longer mandatory with 11.1 and newer. Some software like EMC’s Replication Manager requires you to though). I succeeded in doing so by checking the device in /dev/disk/by-id and then using fdisk against it as in this example:

# fdisk /dev/disk/by-id/scsi-360000970000294900664533030344239
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 23251.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
 (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): n
Command action
 e   extended
 p   primary partition (1-4)
Partition number (1-4): 1
First sector (32-47619839, default 32): 128
Last sector or +size or +sizeM or +sizeK (129-47619839, default 47619839):
Using default value 47619839

Command (m for help): p

Disk /dev/disk/by-id/scsi-360000970000294900664533030344239: 24.3 GB, 24381358080 bytes
64 heads, 32 sectors/track, 23251 cylinders, total 47619840 sectors
Units = sectors of 1 * 512 = 512 bytes

 Device Boot                                                         Start  End         Blocks     Id  System
/dev/disk/by-id/scsi-360000970000294900664533030344239p1             128    47619839    23809855+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Once you are in fdisk, the commands are identical to single-pathed storage. Type “n” to create a new partition, “p” for a primary and specify the start and end cylinders as needed. Type “p” to print the partition table, and if you are happy with it use “w” to write it. You might wonder why I added an offset and changed the unit (“u”)-this is due to the EMC storage this site uses.  The EMC® Host Connectivity Guide for Linux (P/N 300-003-865 REV A23) suggests a 64k offset. Don’t simply repeat this in your environment-check with the storage team first.

Before adding the partitions to ACFS0001 and ACFS0002 I had 107 partitions:

[root@node1 ~]# wc -l /proc/partitions
107 /proc/partitions

The new partitions are recognised after the 2 fdisk commands completed:

[root@node1 ~]# wc -l /proc/partitions
 107 /proc/partitions

But when you check /dev/mapper now you still don’t see the partition-the naming convention is to append pn to the device name, i.e. /dev/mapper/ACFS0001p1 for the first partition and so on.

kpartx to the rescue! This superb utility can read the partition table of a device and modify it. Initially my setup was as follows:

[root@node1 ~]# ls -l /dev/mapper/ACFS*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002

Usually I would have rebooted the node at this stage as I didn’t know about how to update the partition table. But with kpartx (“yum install kpartx” to install) this is no longer needed. Consider the below example:

[root@node1 ~]# kpartx -l /dev/mapper/ACFS0001
ACFS0001p1 : 0 47619711 /dev/mapper/ACFS0001 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0001
[root@node1 ~]# kpartx -l /dev/mapper/ACFS0002
ACFS0002p1 : 0 47619711 /dev/mapper/ACFS0002 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0002

[root@node1 ~]# ls -l /dev/mapper/ACFS000*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 36 Jan 18 10:13 /dev/mapper/ACFS0001p1
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002
brw-rw---- 1 root disk 253, 37 Jan 18 10:13 /dev/mapper/ACFS0002p1

“kpartx -l” prints the partition table, and “kpartx -a” adds it as the example shows. No more need to reboot! However, as it’s been pointed out in the comments section (see below), kpartx doesn’t use/add both paths, so you should run the partprobe command to add the missing paths:

[root@node1 ~]# partprobe
[root@node1 ~]# wc -l /proc/partitions
109 /proc/partitions



See how there are 109 partitions listed now instead of just 107 from before-the 2 missing paths have been added (one for each device).

Add disks to ASM

With this done, you can add the disk to ASM – I personally like the intermediate step to create and ASMLib disk. Connect to ASM as sysasm and add the disk using the alter diskgroup command:

SQL> alter diskgroup ACFSDG add disk 'ORCL:ACFS0002', 'ORCL:ACFS0001';

Now just wait for the rebalance operation to complete.

New Indexing Seminars Scheduled For Vienna and Tallinn (New Europeans)

I have two Oracle Index Internals and Best Practices seminars scheduled in the coming months, in Austria (Vienna) and Estonia (Tallinn). These could very well be my last seminars for quite a while as I’m unsure whether I’ll be able to travel again this year, so this could be your last opportunity to see me [...]

Exadataを支える技術6 Advanced Compression

Oracle Advanced CompressionによるTableの圧縮はParallel Queryのパフォーマンスにも大きく影響するOracle Advanced Compression White Paperより:


Exadataを支える技術5 Oracle Partitioning

Oracle Data Sheetより、