Search

Top 60 Oracle Blogs

Recent comments

March 2011

Review: Oracle Application Express 4.0 Cookbook

I decided to add a twist to my usual blogs - a book review, which I have not done before. I have been reading a book - Oracle Apex 4.0 Cookbook by Marcel van der Plas and Michel van Zoest and published by Packt, a UK based publisher. Michel is one of the first Apex Certified Experts in the world - a no small feat. It has been technically reviewed by a well known cast - Oracle ACE Director Dimitri Gielis, who also won Oracle Magazine's Apex Developer of the Year in '09; Maarten van Luijtelaar and Oracle ACE and frequent blogger Surachart Opun.

ASSM ouch!

Here’s a nasty little surprise I got last week while investigating an oddity with stats collection. I wanted to create a table in an ASSM tablespace and populate it from two or three separate sessions simultaneously so that I could get some “sparseness” in the data load. So I created a table and ran up 17 concurrent sessions to insert a few rows each. Because I wanted to know where the rows were going I got every session to dump the bitmap space management block at the start of the segment – the results were surprising.

I was using 11.1.0.6, with a locally managed tablespace with 1MB uniform extents, 8KB blocks, and ASSM. Here’s the basic code:

create table t1 (
	n1	number,
	v1	varchar2(100)
)
tablespace test_8k_assm
;

--
--  17 sessions then synchronised to do the following
--

begin
	for i in 1..3 loop
		insert into t1 values(i, rpad('x',100));
		commit;
	end loop;
end;
/

alter system flush buffer_cache;
execute dump_seg('t1',1,'table',0)

(The dump_seg() procedure is just a simple bit of code I wrote to dump selected blocks from a named segment.)
The results I saw in the bitmap block were so bizarre that I then ran the following query:

select
	dbms_rowid.rowid_block_number(rowid) as block_id,
	count(*)
from
	t1
group by
	dbms_rowid.rowid_block_number(rowid)
order by
	block_id
;

Remember, I was using 8KB blocks and 1MB uniform extents – which means 128 blocks per extent – and I had started with a completely new, clean, tablespace. So how many blocks do you think I had allocated to the table ? Here are the results of my query:

  BLOCK_ID   COUNT(*)
---------- ----------
        40          3
        41          3
        43          3
       106          3
       110          3
       112          3
       165          3
       173          3
       236          3
       303          3
       433          3
       464          3
       541          3
       555          3
       563          3
       626          3
      1000          3

What you see here is a table which has managed to allocate several extents – despite the fact that the 17 sessions could have inserted their data into 17 separate blocks in the first extent of the table. The table had grown to nine extents – although it hadn’t even put data into all the extents – and according to dbms_stats.gather_table_stats() there were 1,088 blocks in the table !

This looks pretty disastrous for ASSM – but I do have to say that flushing the buffer cache (17 times) could have introduced an extreme pinning problem that resulted in this surprising result. When I removed the flush and dump lines from my test the “realistic” results showed 17 blocks used in the first extent – so don’t panic if you’re using ASSM in a highly concurrent system; but if you have a couple of tables that seem to be much bigger than expected, maybe you now know what to look for.

Implicit datatype conversion in the parsing phase – something new I learned today!

Wow, I wasn’t aware that Oracle can also do an implicit datatype conversion for literal strings during parsing phase!
SQL> @desc t Name Null? Type ------------------------------- -------- ---------------------------- 1 A NUMBER(38) SQL> select * from t where a = '1' || RPAD('0',5,'0'); no rows selected SQL> @x Display execution plan for last statement for this session from library cache...

Implicit datatype conversion in the parsing phase – something new I learned today!

Wow, I wasn’t aware that Oracle can also do an implicit datatype conversion for literal strings during parsing phase!

SQL> @desc t
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      A                                        NUMBER(38)

SQL> select * from t where a = '1' || RPAD('0',5,'0');

no rows selected

SQL> @x
Display execution plan for last statement for this session from library cache...

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
SQL_ID  d7r6md8wfu74d, child number 0
-------------------------------------
select * from t where a = '1' || RPAD('0',5,'0')

Plan hash value: 1601196873

--------------------------------------------------------
| Id  | Operation         | Name | E-Rows | Cost (%CPU)|
--------------------------------------------------------
|   0 | SELECT STATEMENT  |      |        |     2 (100)|
|*  1 |  TABLE ACCESS FULL| T    |      1 |     2   (0)|
--------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"=100000)

You see what happened? The expression ’1′ || RPAD(’0′,5,’0′) has been evaluated, which returns a string. And this string ’100000′ has been converted to a NUMBER 100000 during parsing phase .. otherwise you would see quotes around the number above with a TO_NUMBER() function around it (so that Oracle could compare the NUMBER column “A” to the same datatype)…

I add a TO_CHAR() around the column A just for demoing that a varchar datatype (as the original “literal” in my query is) will be shown with quotes like every normal string:

SQL> select * from t where to_char(a) = '1'||rpad('0',5,'0');

no rows selected

SQL> @x
Display execution plan for last statement for this session from library cache...

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
SQL_ID  7yf6j8fdyrvk7, child number 0
-------------------------------------
select * from t where to_char(a) = '1'||rpad('0',5,'0')

Plan hash value: 1601196873

--------------------------------------------------------
| Id  | Operation         | Name | E-Rows | Cost (%CPU)|
--------------------------------------------------------
|   0 | SELECT STATEMENT  |      |        |     2 (100)|
|*  1 |  TABLE ACCESS FULL| T    |      1 |     2   (0)|
--------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_CHAR("A")='100000')

Let’s see whether this trick is somehow done also for bind variables:

SQL> var x varchar2(10)
SQL> exec :x:= '1' || RPAD('0',5,'0');

PL/SQL procedure successfully completed.

SQL> print x

X
--------------------------------
100000

SQL> select * from t where a = :x;

no rows selected

SQL> @x
Display execution plan for last statement for this session from library cache...

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  45f39y7580bdp, child number 2
-------------------------------------
select * from t where a = :x

Plan hash value: 1601196873

--------------------------------------------------------
| Id  | Operation         | Name | E-Rows | Cost (%CPU)|
--------------------------------------------------------
|   0 | SELECT STATEMENT  |      |        |     2 (100)|
|*  1 |  TABLE ACCESS FULL| T    |      1 |     2   (0)|
--------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (VARCHAR2(30), CSID=873): '100000'

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"=TO_NUMBER(:X))

Apparently not! And this kind of makes sense – as if this string to number conversion is done during parse phase – Oracle doesn’t know what the actual value is yet (in the bind variable memory) so it can’t convert it to number in advance either :-)

This is a little interesting detail… I didn’t know that in addition to the implicit datatype conversion during query execution (using TO_CHAR, TO_NUMBER functions etc) Oracle can sometimes convert a string literal to number datatype under the hood during the parse time!

P.S. I tested this on Oracle 11.2.0.2 with optimizer_features_enable set from 11.2.0.2 to all the way back to 8.0.0 and the behavior was the same. I didn’t find any mention of this conversion in the CBO tracefile although after a filter pushdown transformation (FPD) the string literal was already shown as a number datatype. If anyone still has access to ancient Oracle database versions (like 9.2 and 10.1 ;-) then let me know whether you see the same results!

Share

Implicit datatype conversion in the parsing phase – something new I learned today!

Wow, I wasn’t aware that Oracle can also do an implicit datatype conversion for literal strings during parsing phase!
SQL> @desc t Name Null? Type ------------------------------- -------- ---------------------------- 1 A NUMBER(38) SQL> select * from t where a = '1' || RPAD('0',5,'0'); no rows selected SQL> @x Display execution plan for last statement for this session from library cache...

AIX note

This is just a temporary link to a note from Kurt Van Meerbeeck that should be of interest to anyone who’s using AIX and thinking of upgrading Oracle to 11.2 in the near future: Doubled memory footprint on AIX.

Troubleshooting Grid Infrastructure startup

This has been an interesting story today when one of my blades decided to reboot after an EXT3 journal error. The hard facts first:

  • Oracle Linux 5.5 with kernel 2.6.18-194.11.4.0.1.el5
  • Oracle 11.2.0.2 RAC
  • Bonded NICs for private and public networks
  • BL685-G6 with 128G RAM

First I noticed the node had problems when I tried to get all databases configured on the cluster. I got the dreaded “cannot communicate with the CRSD”

[oracle@node1.example.com] $ srvctl config database
PRCR-1119 : Failed to look up CRS resources of database type
PRCR-1115 : Failed to find entities of type resource that match filters (TYPE ==ora.database.type) and contain attributes DB_UNIQUE_NAME,ORACLE_HOME,VERSION
Cannot communicate with crsd

Not too great, especially since everything worked when I left yesterday. What could have gone wrong?An obvious reason for this could be a reboot, and fair enough, there has been one:

[grid@node1.example.com] $ uptime
09:09:22 up  2:40,  1 user,  load average: 1.47, 1.46, 1.42

The next step was to check if the local CRS stack was up, or better, to check what was down. Sometimes it’s only crsd which has a problem. In my case everything was down:

[grid@node1.example.com] $ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[grid@node1.example.com] $ crsctl check cluster -all
**************************************************************
node1:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
node2,node3, node4, node5, node6, node7, node8

The CRS-4404 was slightly misleading, I assumed all cluster nodes were down after a clusterwide reboot. Sometimes a single node reboot triggers worse things. However, logging on to node 2 I saw that all but the first node were ok.

CRSD really needs CSSD to be up and running, and CSSD requires the OCR to be there. I wanted to know if the OCR was impacted in any way:

[grid@node1.example.com] $ ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
ORA-29701: unable to connect to Cluster Synchronization Service

Well it seemed that the OCR location was unavailable. I know that on this cluster, the OCR is stored on ASM. Common reasons for the PROC-26 error are

  • Unix admin upgrades the kernel but forgets to upgrade the ASMLib kernel module (common grief with ASMLib!)
  • Storage is not visible on the host, i.e. SAN connectivity broken/taken away (happens quite frequently with storage/sys admin unaware of ASM)
  • Permissions not set correctly on the block devices (not an issue when using asmlib)

I checked ASMLib and it reported a working status:

[oracle@node1.example.com] $ /etc/init.d/oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes

That was promising, /dev/oracleasm/ was populated and the matching kernel modules loaded. /etc/init.d/oracleasm listdisks listed all my disks as well. Physical storage not accessible (PROC-26) seemed a bit unlikely now.

I could rule out permission problems since ASMLib was working fine, and I also rule out the kernel upgrade/missing libs problem by comparing the RPM with the kernel version: they matched. So maybe it’s storage related?

Why did the node go down?

Good question, usually to be asked towards the unix administration team. Luckily I have a good contact placed right inside that team and I could get the following excerpt from /var/log/messages arond the time of the crash (6:31 this morning):

Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192116, count = 1
Mar 17 06:26:06 node1 kernel: Aborting journal on device dm-2.
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 55 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192216, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 56 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192166, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 55 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192122, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 55 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192140, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 56 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192174, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 10 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_reserve_inode_write: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_truncate: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_reserve_inode_write: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_orphan_del: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_reserve_inode_write: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_delete_inode: Journal has aborted
Mar 17 06:26:06 node1 kernel: __journal_remove_journal_head: freeing b_committed_data
Mar 17 06:26:06 node1 kernel: ext3_abort called.
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_journal_start_sb: Detected aborted journal
Mar 17 06:26:06 node1 kernel: Remounting filesystem read-only
Mar 17 06:26:06 node1 kernel: __journal_remove_journal_head: freeing b_committed_data
Mar 17 06:26:06 node1 snmpd[25651]: Connection from UDP: [127.0.0.1]:19030
Mar 17 06:26:06 node1 snmpd[25651]: Received SNMP packet(s) from UDP: [127.0.0.1]:19030
Mar 17 06:26:06 node1 snmpd[25651]: Connection from UDP: [127.0.0.1]:19030
Mar 17 06:26:06 node1 snmpd[25651]: Connection from UDP: [127.0.0.1]:41076
Mar 17 06:26:06 node1 snmpd[25651]: Received SNMP packet(s) from UDP: [127.0.0.1]:41076
Mar 17 06:26:09 node1 kernel: SysRq : Resetting
Mar 17 06:31:15 node1 syslogd 1.4.1: restart.

So it looks like a file system error triggered the reboot-I’m glad the box came back up ok on it’s own. The $GRID_HOME/log/hostname/alerthostname.log didn’t show anything specific to storage. Normally you would see that it starts counting a node down if it lost contact to the voting disks (in this case OCR and voting disks share the same diskgroup).

And why does Clusteware not start?

After some more investigation it seems there was no underlying problem with the storage, so I tried to manually start the cluster, traililng the ocssd.log file for possible clues.

[root@node1 ~]# crsctl start cluster
CRS-2672: Attempting to start ‘ora.cssd’ on ‘node1′
CRS-2674: Start of ‘ora.cssd’ on ‘node1′ failed
CRS-2679: Attempting to clean ‘ora.cssd’ on ‘node1′
CRS-2681: Clean of ‘ora.cssd’ on ‘node1′ succeeded
CRS-5804: Communication error with agent process
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘node1′
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘node1′

… the command eventually failed. The ocssd.log file showed this:

...
2011-03-17 09:47:49.073: [GIPCHALO][1081923904] gipchaLowerProcessNode: no valid interfaces found to node for 10996354 ms, node 0x2aaab008a260 { host 'node4', haName 'CSS_lngdsu1-c1', srcLuid b04d4b7b-a7491097, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [61 : 61], createTime 10936224, flags 0x4 }
2011-03-17 09:47:49.084: [GIPCHALO][1081923904] gipchaLowerProcessNode: no valid interfaces found to node for 10996364 ms, node 0x2aaab008a630 { host 'node6', haName 'CSS_lngdsu1-c1', srcLuid b04d4b7b-2f6ece1c, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [61 : 61], createTime 10936224, flags 0x4 }
2011-03-17 09:47:49.113: [    CSSD][1113332032]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 2, node2, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 30846440, LATS 10996434, lastSeqNo 30846437, uniqueness 1300108895, timestamp 1300355268/3605443434
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 3, node3, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31355257, LATS 10996434, lastSeqNo 31355254, uniqueness 1300344405, timestamp 1300355268/10388584
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 4, node4, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31372473, LATS 10996434, lastSeqNo 31372470, uniqueness 1297097908, timestamp 1300355268/3605182454
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 5, node5, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31384686, LATS 10996434, lastSeqNo 31384683, uniqueness 1297098093, timestamp 1300355268/3604696294
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 6, node6, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31388819, LATS 10996434, lastSeqNo 31388816, uniqueness 1297098327, timestamp 1300355268/3604712934
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 7, node7, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 29612975, LATS 10996434, lastSeqNo 29612972, uniqueness 1297685443, timestamp 1300355268/3603054884
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 8, node8, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31203293, LATS 10996434, lastSeqNo 31203290, uniqueness 1297156000, timestamp 1300355268/3604855704
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 3, node33, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31355258, LATS 10996434, lastSeqNo 31355255, uniqueness 1300344405, timestamp 1300355268/10388624
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 4, node4, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31372474, LATS 10996434, lastSeqNo 31372471, uniqueness 1297097908, timestamp 1300355268/3605182494
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 5, node5, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31384687, LATS 10996434, lastSeqNo 31384684, uniqueness 1297098093, timestamp 1300355268/3604696304
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 6, node6, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31388821, LATS 10996434, lastSeqNo 31388818, uniqueness 1297098327, timestamp 1300355268/3604713224
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 7, node7, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 29612977, LATS 10996434, lastSeqNo 29612974, uniqueness 1297685443, timestamp 1300355268/3603055224
2011-03-17 09:47:49.197: [    CSSD][1094928704]clssnmvDHBValidateNCopy: node 2, node2, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 30846441, LATS 10996474, lastSeqNo 30846438, uniqueness 1300108895, timestamp 1300355269/3605443654
2011-03-17 09:47:49.197: [    CSSD][1094928704]clssnmvDHBValidateNCopy: node 3, node3, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31355259, LATS 10996474, lastSeqNo 31355256, uniqueness 1300344405, timestamp 1300355268/10389264
2011-03-17 09:47:49.197: [    CSSD][1094928704]clssnmvDHBValidateNCopy: node 8, node8, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31203294, LATS 10996474, lastSeqNo 31203291, uniqueness 1297156000, timestamp 1300355269/3604855914
2011-03-17 09:47:49.619: [    CSSD][1116485952]clssnmSendingThread: sending join msg to all nodes
...

The interesting bit is the “BUT NO NETWORK HB”, i.e. something must be wrong with the network configuration. I quickly checked the output of ifconfig and found a missing entry for my private interconnect. This is defined in the GPnP profile if you are unsure:


 
 
 

Now that’s a starting point! If tried to bring up bond1.251, but that failed:

[root@node1 network-scripts]# ifup bond1.251
ERROR: trying to add VLAN #251 to IF -:bond1:-  error: Invalid argument
ERROR: could not add vlan 251 as bond1.251 on dev bond1

The “invalid argument” didn’t mean too much to me, so I ran ifup with the “-x” flag to get more information about which argument was invalid:

[root@node1 network-scripts]# which ifup
/sbin/ifup
[root@node1 network-scripts]# view /sbin/ifup
# turned out it's a shell script! Let's run with debug output enabled
[root@node1 network-scripts]# bash -x /sbin/ifup bond1.251
+ unset WINDOW
...
+ MATCH='^(eth|hsi|bond)[0-9]+\.[0-9]{1,4}$'
+ [[ bond1.251 =~ ^(eth|hsi|bond)[0-9]+\.[0-9]{1,4}$ ]]
++ echo bond1.251
++ LC_ALL=C
++ sed 's/^[a-z0-9]*\.0*//'
+ VID=251
+ PHYSDEV=bond1
+ [[ bond1.251 =~ ^vlan[0-9]{1,4}? ]]
+ '[' -n 251 ']'
+ '[' '!' -d /proc/net/vlan ']'
+ test -z ''
+ VLAN_NAME_TYPE=DEV_PLUS_VID_NO_PAD
+ /sbin/vconfig set_name_type DEV_PLUS_VID_NO_PAD
+ is_available bond1
+ LC_ALL=
+ LANG=
+ ip -o link
+ grep -q bond1
+ '[' 0 = 1 ']'
+ return 0
+ check_device_down bond1
+ echo bond1
+ grep -q :
+ LC_ALL=C
+ ip -o link
+ grep -q 'bond1[:@].*,UP'
+ return 1
+ '[' '!' -f /proc/net/vlan/bond1.251 ']'
+ /sbin/vconfig add bond1 251
ERROR: trying to add VLAN #251 to IF -:bond1:-  error: Invalid argument
+ /usr/bin/logger -p daemon.info -t ifup 'ERROR: could not add vlan 251 as bond1.251 on dev bond1'
+ echo 'ERROR: could not add vlan 251 as bond1.251 on dev bond1'
ERROR: could not add vlan 251 as bond1.251 on dev bond1
+ exit 1

Hmmm so it seemed that the underlying interface bond1 was missing-which was true. The output of ifconfig didn’t show it as configured, and trying to start it manually using ifup bond1 failed as well. It turned out that the ifcfg-bond1 file was missing and had to be recreated from the documentation. All network configuration files in Red Hat based systems belong into /etc/sysconfig/network-scripts/ifcfg-interfaceName. With the recreated file in place, I was back in the running:

[root@node1 network-scripts]# ll *bond1*
-rw-r–r– 1 root root 129 Mar 17 10:07 ifcfg-bond1
-rw-r–r– 1 root root 168 May 19  2010 ifcfg-bond1.251
[root@node1 network-scripts]# ifup bond1
[root@node1 network-scripts]# ifup bond1.251
Added VLAN with VID == 251 to IF -:bond1:-
[root@node1 network-scripts]#

Now I could try to start the lower stack again:

CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘node1′
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘node1′
CRS-2676: Start of ‘ora.cssd’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘node1′
CRS-2672: Attempting to start ‘ora.ctssd’ on ‘node1′
CRS-2676: Start of ‘ora.ctssd’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.evmd’ on ‘node1′
CRS-2676: Start of ‘ora.evmd’ on ‘node1′ succeeded
CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘node1′ succeeded
CRS-2679: Attempting to clean ‘ora.asm’ on ‘node1′
CRS-2681: Clean of ‘ora.asm’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.asm’ on ‘node1′
CRS-2676: Start of ‘ora.asm’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘node1′
CRS-2676: Start of ‘ora.crsd’ on ‘node1′ succeeded
[root@node1 network-scripts]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Brilliant-problem solved. This is actually the first time that an incorrect network config prevented a cluster I looked after from starting. The best indication in this case is in the gipcd log file, but it didn’t occur to me to have a look at is as the error was clearly related to storage.

Telling the Truth in IT

I’ve been doing presentations for many years, mostly on Oracle Technology, occasionally on management topics. However, my favorite presentation to give is one about when things go wrong.

The title is usually something like “Surviving Survivable Disasters” or “5 ways to Seriously Screw up a Project” and, though the specific examples and flow may vary, the general content is the same. I talk about IT situations that have gone wrong or things that strike me as daft/silly/mindless in IT. My aim is to be entertaining and have a laugh at the situations but I also want to explore what causes disasters and how we might go about avoiding at least some of them.

When doing the presentation I have a couple of ground rules:

  • I must have witnessed the situation myself or know personally, and trust, the individual who is my source.
  • I do not name organisations or individuals unless I am specifically given permission {by individuals that is, organisations never get named. Except one}.
  • I try to resist the temptation to embellish. It’s not hard to resists, a good disaster usually stands on it’s own merits.

It’s a great talk for introducing some light relief into a series of very technical presentations or for opening up a day of talks, to get people relaxed. It’s also the only talk I get seriously nervous about doing – if you are aiming to be entertaining and you miss, you stand to die on stage. The first time I did the talk I was physically sweating. However, it went down a storm. I did it 4 or 5 more times over as many years and it always went down well.

However, about 4 years ago I did the presentation just as I was about to go back to being self employed. After the talk a very good friend came over and said something like “Really entertaining talk but…maybe you should tone it down? A lot. Potential employers are going to take a dim view of you doing this, they will worry they will appear in the next talk”. I protested that I never mention companies or people and, surely, all organisations are able to admit that things go wrong and it is to everyone’s benefit if we all learn from them? My friend was adamant that though companies want to benefit from other disasters, they never, ever want to in any way be the source of that benefit. He was sure it would be very damaging to my potential career. Hmmmm…. I could see his point.

I was already scheduled to do the talk again in a couple of months and I took heed of his advice for it. I toned down the material, I removed some of the best stories and I added several disclaimers. I also died on stage. It went from an amusing 45 minutes to a preachy and stodgy affair.

I have not done it since.

The question is, should I have pulled back from doing that talk? Is it really going to harm my potential employability? (After all, no work has ever come my way from presenting). Why can’t we be honest that issues occur and that learning from them is far more valuable than covering them up? After all, do we believe a person who claims never to have made mistakes?

What prompted this thread is that I have been asked to do the talk again – and I have agreed to do so. I’ll be doing it next week, with the title “5 ways to advance your career through IT Disasters” for the UK Oracle user group Back to Basics event. This is a day of introductory talks for people who are fairly new to Oracle, the brain-child of Lisa Dobson. Lisa realised a few years ago that there were not enough intro-type presentations, most technical talks are by experts for fellow experts {or, at least, people wanting to become experts}.

I’m very happy to support helping those who are new to Oracle and I think it is important that people who are new to IT are exposed to what can go wrong – and any advice that might help them avoid it. After all, it’s better they learn from our mistakes than just repeat them again. OK, they’ll just repeat them again anyway, but they might spot that they are doing so just in time :-)

Is this a good idea? What the hell, I want more free time to do things like this blog – and get on top of the garden.

RAC One Changes in 11.2.0.2 Patchset

Oracle patchsets used to be simple - they were bugfixes only; no additional functionalities were added. Oracle added stuff quietly in 10.2.0.4 patchset (the workload capture); but that was part of a new functionality anyway.


The 11.2.0.2 patchset changed all the rules. Several new functionalities were added to the patchsets. Several functionalities have been added to RACOne. Unfortunately they didn't make it to the manuals. Some of them I discovered only recently. If you have been using RACOne, or considering it, you will be delighted to know these.

buffer flush

This is part 2 of an article on the KEEP cache. If you haven’t got here from part one you should read that first for an explanation of the STATE and CUR columns of the output.

Here’s a little code to demonstrate some of the problems with setting a KEEP cache – I’ve set up a 16MB cache, which gives me 1,996 buffers of 8KB in 10.2.0.3, and then created a table that doesn’t quite fill that cache. The table is 1,900 data blocks plus one block for the segment header (I’ve used freelist management to make the test as predictable as possible, and fixed the pctfree to get one row per block).

create table t1
pctfree 90
pctused 10
storage (buffer_pool keep)
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',1000)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1900
;

alter system flush buffer_cache;

-- scan the table to load it into memory

select
	/*+ full(t1) */
	count(small_vc)
from	t1
where
	id > 0
;

-- check the content of x$bh from another connection

-- update every fifth row (380 in total)

update t1
	set small_vc = upper(small_vc)
where
	mod(id,5) = 0
;

-- check the content of x$bh from another connection

The query I ran under the SYS account was this:

select
	bpd.bp_blksz,
	bpd.bp_name,
	wds.set_id,
	bh.state,
	cur,
	ct
from
	(
	select
		set_ds,
		state,
		bitand(flag, power(2,13))	cur,
		count(*)  ct
	from
		x$bh
	group by
		set_ds,
		state,
		bitand(flag, power(2,13))
	)		bh,
	x$kcbwds	wds,
	x$kcbwbpd	bpd
where
	wds.addr = bh.set_ds
and	bpd.bp_lo_sid <= wds.set_id
and	bpd.bp_hi_sid >= wds.set_id
and	bpd.bp_size != 0
order by
	bpd.bp_blksz,
	bpd.bp_name,
	wds.set_id,
	bh.state,
	bh.cur
;

In my test case this produced two sets of figures, one for the DEFAULT cache, and one for the KEEP cache but I’ve only copied out the results from the KEEP cache, first after the initial tablescan, then after the update that affected 380 blocks:


  BP_BLKSZ BP_NAME     SET_ID STATE   CUR         CT
---------- ------- ---------- ----- ----- ----------
      8192 KEEP             1     0     0         95
                                  1     0       1901
           ******* **********             ----------
           sum                                  1996

  BP_BLKSZ BP_NAME     SET_ID STATE   CUR         CT
---------- ------- ---------- ----- ----- ----------
      8192 KEEP             1     1     0       1462
                                  1  8192        380
                                  3     0        323
           ******* **********             ----------
           sum                                  1996

In the first output you see the 1901 buffers holding blocks from the table (1,900 data plus one segment header), with the remaining 95 buffers still “free” (state 0). The table blocks are all shown as XCUR (state 1, exclusive current)

In the second output you see 380 buffers holding blocks with state ‘XCUR’ with bit 13 of the flag column set, i.e. “gotten in current mode”. These are the 380 blocks that have been updated – but there are also 323 blocks shown as CR (state 3, “only valid for consistent read”). A detailed check of the file# and dbablk for these buffers shows that they are clones of (most of) the 380 blocks in the XCUR buffers.

Do a bit of arithmetic – we have 1462 blocks left from the original tablescan, plus 380 blocks in CUR mode (of which there are 323 clones) for a total of 1,842 blocks – which means that 59 blocks from the table are no longer in the cache. As we clone blocks we can lose some of the blocks we want to KEEP.

Unfortunately for us, Oracle has not given any preferential treatment to buffers which hold blocks in the XCUR state – any buffer which reaches the end of LRU chain and hasn’t been accessed since it was first loaded will be dumped so that the buffer can be used to create a clone (but see footnote). This means that a constant stream of inserts, updates, deletes, and queries could result in lots of clones being created in your KEEP cache, pushing out the data you want to keep.

If you want to size your KEEP cache to minimise this effect, you probably need to start by making it somewhat larger than the objects it is supposed to KEEP, and then checking to see how many clones you have in the cache – because that will give you an idea of how many extra buffers you need to stop the clones from pushing out the important data.

Upgrades:

When I wrote and ran the test cases in this note the client was running Oracle 10.2 – while writing up my notes I happened to run the test on 11.1.0.6 (still using freelists rather than ASSM) and got the following output from my scan of the KEEP cache:

  BP_BLKSZ BP_NAME     SET_ID STATE   CUR         CT
---------- ------- ---------- ----- ----- ----------
      8192 KEEP             1     1     0       1901
                                  3     0         91
           ******* **********             ----------
           sum                                  1992

Apart from the fact that you get slightly fewer buffers per granule in 11g (the x$bh structure has become slightly larger – and x$bh is a segmented array where each segment shares the granule with the buffers it points to) you can see that we only have 91 clones in the KEEP cache, and apparently we’ve managed to update our 380 blocks without changing their flag to “gotten in current mode”. Doing an update is, of course, just one way of making clones appear – but perhaps 11g will generally have more success in keeping current versions of blocks in memory than earlier versions.

There is , unfortunately, a very special feature to this test case – it’s using a single tablescan to update the table. So having said in part 1 that I was going to write a two-part article, I’ve got to this point, done a few more tests, and decided I need to write part three as well.  Stay tuned.

Footnote: Technically there are a couple of circumstances where Oracle will bypass the buffer and walk along the LRU chain looking for another block – but I’ve excluded them from this demonstration.