Who's online

There are currently 0 users and 28 guests online.

Recent comments

Oakies Blog Aggregator


If anyone is expecting email from me any time in the near future, you may have to hold your breath.

In a staggering feat of customer relations BT (British Telecomms), or Yahoo acting on their behalf, has managed to introduce random blocks to any email I send out. At present I have something like a 30% chance of email being blocked by their smtp servers with reports like the following:

An unknown error has occurred. Subject ‘Re: {deleted}‘, Account: ‘{deleted}‘, Server: ‘’, Protocol: SMTP, Server Response: ’554 Transaction failed : Cannot send message due to possible abuse; please visit for more information’, Port: {deleted}, Secure(SSL): {deleted}, Server Error: 554, Error Number: 0x800CCC6F

Take a look at the URL supplied in the response – I’ve highlighted it to make it easy to spot – and then try going to that URL.  (Page doesn’t exist – excellent service, isn’t it?)

Sometimes I can get my reply through if I delete any of the original text, sometimes I can get my reply through if I create a whole new message (rather than hitting reply) and cut just my text into the new message. Sometimes I can hit reply and get a reply through if it contains nothing but the comment (with no signature, and nothing from the incoming post):

British Telecomm is blocking virtually every email I try to send to you as “possible abuse”.
I can’t figure out why, so it may be some time before I get a proper reply through.

I tried calling customer support and was told that the problem  was obviously my configuration and not their mail servers; but they could email me the URL of the BT broadband help page so that I could find out how to set up mail program properly; or I could let the call centre employee log on to my machine over the internet; or I could pay them to get a member of the support staff  to help.

Of course there was no answer to the question:  “so what have BT/Yahoo changed on their mail servers in the last few days?” and there was no answer to the question: “why have several other people suddenly hit the same problem at the same time if it’s my configuration?”.

So, BT, get your act together – send an email to your customers explaining why your email servers might choose to block apparently random email as “possible abuse”; and brief your call centre staff to recognise the description of the problem so they don’t try the standard brush-off

I hear that using twitter, facebook, blogs etc. to make a fuss about  poor quality service is more effective nowadays than phoning call centres in distant countries – so I thought I’d give it a go. I know I’m not the only one in the UK facing this problem at the moment, so I’m curious to see how many more people I can get complaining about it.

Footnote: I have a yahoo mail account via  BT – and the fact that I could send an email from it “proved” to the call centre person that it wasn’t BT’s problem. But it doesn’t, and I don’t want to use a service that floods half my screen with moving adverts all the time.


Clustered Indexes

… which, for those in the know, means something to do with SQL Server. (The closest physical feature in Oracle  is the index-organized table, the closest sounding name for an Oracle feature is the index cluster - which, just to add to the confusion, must have a specific index that is called the cluster index).

Redgate has arranged for an online debate between an Oracle specialist (me) and a SQL Server specialist (Grant Fritchey) to talk about the strengths and weaknesses of the two mechanisms, discuss why Oracle users seem to be biased towards one implementation and SQL Server users towards the other, and then see where the conversation takes us. This will be followed by a Q&A session.

If this sounds interesting – and I think it’s a great idea, you can learn an awful lot more from a discussion than you can by listening to monologue – you need to sign up early. The event will be held on 7th June at 16:00 BST.

Further details (including corrections for different timezones) and registration at this URL.

Update: It’s only a week away, now, so I thought I’d pop this advert to the top of the stack to remind people about it.

Table High Water Mark and How Empty the Table Is

On a recent post about reducing the high water mark in a table by shrinking it someone asked if you could tell how much “space” there was in a table that could be recovered by compressing it.

Yes. As pseudo-code:

select number_of_rows*average_row_length. That is how much data there is.
Divided by the block size, to give how many blocks-worth of data you have.
Divided this by a “overhead” fudge factor to allow for wasted space (see later) of about 20%
Compare to the number of blocks used by the table.

Here is some code to do it, showing some extra data and how we calculate the effective percentage of space used (pct_spc_used). It runs against an example table ‘HWM’

select uts.blocks                                     blks_used
      ,uts.empty_blocks                               empty_blks
      ,usse.blocks                                    alloc_blks
      ,greatest(uts.blocks,1)/greatest(usse.blocks,1) pct_hwm
      ,uts.num_rows*uts.avg_row_len                   data_in_bytes
      ,(uts.num_rows*uts.avg_row_len)/8192            data_in_blks
      ,((uts.num_rows*uts.avg_row_len)/8192)*1.25     mod_data_in_blks
      ,(((uts.num_rows*uts.avg_row_len)/8192)*1.25)/usse.blocks pct_spc_used
from user_tab_statistics uts
    ,user_segments       usse
where uts.table_name='HWM'
and   uts.table_name=usse.segment_name
---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------
      1630       1221      50000         213         34       1664   .97957
     10650000   1300.04883       1625.06104    .97659918

I am collecting the data from USER_TAB_STATISTICS and USER_SEGMENTS. For this code to work you must have reasonably good stats against the table.

I’d like to run through some of the columns I have selected:
First of all, columns AVG_SPACE and EMPTY_BLKS are not populated by dbms_stats.gather_table_stats.. They are populated by the deprecated ANALYZE command that you should NOT use to gather table stats since V10 came along. These columns are populated as I did an ANALYZE to get the data in there, as well as a dbms_stats.
Next, I collect BLOCKS_ALLOCATED from DBA_SEGMENTS {and for this demo I just ignored the potential for partitioned tables) and I compare this to the BLOCKS_USED to get the High Water Mark, as a percentage of the table. I do this as EMPTY_BLOCKS is set to zero if you have never used ANALYZE and, even if you did, unless you use this deprecated command all the time, the value will not change.
On the second line of output I calculate the DATA_IN_BYTES as a simple num_rows*avg_row_len, convert it into blocks {for simplicity I do not collect the block size, I know it is 8k}. I then apply my “Overhead” fudge factor. A block has a header, using around 100 bytes {I’ve not checked the exact figure for years}, pctfree can be varied but defaults to 10% and as only whole rows fit, then an average of half a row of space is empty in each “full” block. Thus I reduce the space available by 20-25%. In this case, 25% as my rows are large.
Finally, I compare this modified data volume to the used blocks to get the actual space

Below I run through creating some test data, looking at the stats and my calculated High Water Mark and pct_space_used and finally shrink my table to see if my guesstimate is a reasonable guesstimate:

populate table

drop table hwm purge;
prompt populate table
set feed on
create table hwm
(id   number(10)
,num1 number(2)
,vc1  varchar2(100)
,vc2  varchar2(100)
insert into hwm 
select rownum
from dual connect by level < 50001
50000 rows created.

exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'HWM')

--where is the HWM compared to total segment size

---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------
      1630          0      50000         210          0       1664   .97957
     10500000   1281.73828       1602.17285   .962844262

The high water mark is 1630 blocks out of 1664 in the segment
My calculated PCT_SPC_USED is 96%. That is probably close enough.
{remember, the last used block will be only partly used, accounting for a bit of the difference}

-- I will use ANALYZE to fill the missing columns
analyze table hwm compute statistics;

---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------
      1630       1150      50000         213         34       1664   .97957
     10650000   1300.04883       1625.06104    .97659918

Now those two columns are populated. 
Not the slightly different AVG_ROW_LEN even though dbms_stats used 100% (as the table is so small)
and ANALYZE was compute 

-- clear 90% of the data randomly

45461 rows deleted.

---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------
      1630       1150       4539         210         34       1664   .97957
       953190   116.356201       145.445251   .087407002

PCT_HWM is not altered of course but PCT_SPC_USED has dropped dramatically. 
The table is now only 8.7% used, according to my calculations (compared to
90% empty) 

The BLKS_USED does not change. The AVG_SPACE and EMPTY_BLOCKS are the same as I 
used dbms_stats to update the statistics and it DOES NOT ALTER the columns that it does not
populate. Thus you have no idea how recent those columns are if you use a mixture of commands.

-- clear some blocks completely by deleting a range

2181 rows deleted.

---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------
      1630       1150       2358        210         34       1664   .97957
       495180   60.4467773       75.5584717   .045407735

Now the PCT_SPC_USED is down to 4.5%

-- has EMPTY_BLOCKS changed if I use ANALYZE?
analyze table hwm compute statistics;

---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------
      1630       7682       2358         213         34       1664   .97957
       502254   61.3103027       76.6378784   .046056417

As you can see, if I use ANALYZE AVG_SPACE alters. But EMPTY_BLOCKS does not, even though I cleared
a chunk of the table. So there are blocks that can be reused but not listed as empty.

I'll just take a quick side-step and show a quick "oddity" about dbms_stats
--deleting the stats (using dbms_stats)

exec dbms_stats.delete_table_stats(ownname=>user,tabname=>'HWM')

---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------

dbms_stats.delete_table_statistics clears ALL statistics, even the ones it does not populate

--and now collect them via dbms_stats again

---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------
      1630          0       2358         210          0       1664   .97957
       495180   60.4467773       75.5584717   .045407735

--now to shrink the table
alter table hwm enable row movement;
alter table hwm shrink space;

---------- ---------- ---------- ----------- ---------- ---------- --------
------------- ------------ ---------------- ------------
        72          0       2358         210          0         80   .90000
       495180   60.4467773       75.5584717   .944480896

So I calculated that there was about 75 blocks of data in that table. having shrunk it, I was
a little bit out.

Having run through those examples we can see that the accuracy of the PCT_SPC_USED is down to the fudge factor employed but is probably close enough at 25%. After all, you are only likely to shrink a table that very clearly would benefit from it.

scsi_id and UDEV issues (update)…

Last month I wrote about a problem I saw with scsi_id and UDEV in  OL5.8. As it screwed up all my UDEV rules is was a pretty important issue for me. It turned out this was due to a mainline security fix (CVE-2011-4127) affecting the latest kernels of both RHEL/OL5 and RHEL/OL6. The comments on the previous post show a couple of workarounds.

Over the weekend I started to update a couple of articles that mentioned UDEV rules (here and here) and noticed the problem had dissapeared. I updated two VMs (OL5.8 and OL6.2) with the latest changes, including the UEK updates and ran the tests again and here’s what I got.

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
# uname -r
# scsi_id -g -u -s /block/sda/sda1

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# uname -r
# /sbin/scsi_id -g -u /dev/sda1

So it looked like normal service had been resumed. :) Unfortunately, the MOS Note 1438604.1 associated with this issue is still not public, so I couldn’t tell if this was a unilateral change in UEK, or part of a mainline fix for the previous change.

To check I fired up a CentOS 6.2 VM with the latest kernel updates and switched an Oracle Linux VM to the latest RHEL compatible kernel and did the test on both. As you can see, they both still don’t report the scsi_id for partitions.

# cat /etc/redhat-release
CentOS release 6.2 (Final)
# uname -r
# /sbin/scsi_id -g -u /dev/sda1

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# uname -r
# /sbin/scsi_id -g -u /dev/sda1

It could be the associated fix has not worked through the mainline to RHEL and CentOS yet. I’ll do a bit of digging around to see what is going on here.



Update: It appears the reversion of this functionality may not be permanent, so I’ve updated my articles to use a “safer” method of referencing the parent (disk) device, rather than the partition device.

scsi_id and UDEV issues (update)… was first posted on April 30, 2012 at 12:46 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Hotsos Symposium 2012 Summary

For those of you who don't follow Twitter (and, let's face it, I still think there are solid reasons to avoid the beast or at least treat it with caution), the 10th Annual Hotsos Symposium finished almost two months ago. It's just taken me a while to blog about it! (Unfortunately, blogging activity has taken a back-seat compared to, well, a bunch of other activities ...)

I'm going to try to talk less about the various dinners and meet-ups with names who most people don't recognise anyway, but Hotsos probably has the highest percentage at any conference of people I know and don't see often (maybe not UKOUG) so there was plenty of food, chat and some somewhat more gentle drinking than most conferences. Or at least the nights finish much earlier than I'm used to at other conferences!


As this was the 10th Hotsos Symposium, it kicked off with some presentations to 10 year veteran attendees and, much more impressively, 10 year veteran *speaker* Wolfgang Breitling. It was a shame that Cary Millsap wasn't also around to pick up his jacket. Stephan Haisley would have been in the 10-year crowd but couldn't make it. Well, he sort of made it by playing drums via live feed and then it was onto Gary Goodman's keynote presentation which was all about focussing on the right issues for your organisation, rather than getting distracted by too much detail and not being able to see the forest for the trees.

They'd lined up Maria Colgan for the single-stream technical session immediately after the keynote and, despite having seen a lot of the content of "Inside The 11g Optimizer: Removing The Mystery" before (although Maria would no doubt claim it was entirely new), she was as entertaining and informative as always. I was discussing it with a fellow speaker afterwards and we agreed that it's unusual to hear someone who speaks so quickly and energetically but covers all the points properly too. Well, almost all. Alex Gorbachev tweeted his disenchantment that she didn't cover the One-Pass Distinct Sampling algorithm, although all the geeky detail is available for anyone who is interested enough. Anyone wanting to hear Maria present a lot more should attend next years Symposium Training Day as she's the planned speaker.

Iordan Iotzov - "Advanced Methods For Managing Statistics Of Volatile Tables In Oracle" was next for me. It wasn't quite what I was expecting because of my own focus on frequent partition-exchange operations rather than standard DML and I'm not sure how much the after-DML trigger techniques used would translate very well to my own work, but it was thought-inspiring stuff, nonetheless.

Despite having left myself with far too much work to do on my own presentations and so needing to spend a good bit of Monday and Tuesday on that, I wasn't going to miss Cary Millsap talking about "Instrumentation: Why You Should Bother".  Which was a good job because, as well as his usual high standard, he had a lot to say about the importance of using DBMS_APPLICATION_INFO which really comes into it's own when you start using the ASH Analytics I was due to cover in my second presentation. I was also able to perform a couple of useful functions. First I helped carry Alex Gorbachev's ridiculously heavy gift from Cary back to his seat (I'm starting to wonder if Cary just takes heavy stuff with him whenever he knows I'll be there to help carry it) and I was also able to model the 'I Can Help You Trace It' t-shirt which I'd worn for a bit of free publicity/support, but hadn't thought of the likely implications of wearing it into the same room as Cary's presentation. Regardless of the embarrassments, Cary was as thought-provoking as always and it was over too quickly.

All I had time left for after that was a brief visit in to see Mark Rittman make his Symposium debut with "Inside Oracle Exalytics And Oracle TimesTen In-Memory Database". It was interesting to see another experienced speaker go through the unusually nervous experience of stepping up on to the podium in Dallas. I don't quite know what it is, but it certainly feels more scary and challenging than most conferences, but I'm convinced it brings the best out of speakers. Certainly, although I could tell Mark was more nervous than usual, he also seemed very focussed and deserved the beer I bought him afterwards before we headed off to the Enkitec Exadata Panel

As well as beers, there was a good spread of Exadata talent on that panel (and in the room, I suppose) and I think the informal atmosphere worked well. (Picture me telling Jonathan Lewis to shut up before proceeding to heckle about Exadata OLTP which, of course, drew a well-deserved retaliatory rebuke!)

Although there were probably a couple of beers after all that, most of my attention was now focussed on my own presentation so I had to skip dinner with a few Oak Table chums. I knew it would be worth it in the end, though ....


By waking up early and skipping breakfast, I managed to attend a few bits and pieces of presentations here and there, kicking off with Jonathan Lewis' - "The Beginners Guide to Becoming an Expert". It's an umbrella presentation that I've seen quite a few times now but the detailed content changes every time so is never boring. As well as a live demo of human reaction times using a bunch of barely-organised volunteers ;-), the gist of the presentation was that you need to learn about the basic mechanics of Oracle so that you can understand what you would reasonably expect Oracle to do in different circumstances and then verify those initial assumptions. Or something like that - it *was* a long time ago now!

Although I managed to pop my head through a couple of doors to see how fellow presenters like Alex Gorbachev and Christian Antognini were getting on, I was primarily focussed on my laptop and making sure that my stats presentation was ready to go. Because I was trying a slightly unusual introduction involving apologies for talking to my mates, thanks for all the help I'd got and introducing Paddington Bear to the audience, I needed to think about some non-technical issues too. The main problem with the presentation was that there was probably too much repetition of the previous year's presentation on the same subject, but the evidence I keep seeing suggests that I probably need to keep banging on about partition stats for a while yet!

As I finished and slinked off for my smoke, Maria Colgan suggested we have a quick word afterwards and my initial (and vocal) reaction was a humorous but despairing 'Oh, what *now*?'. i.e. What mistake had I made this time? LOL

For the record, she pointed out that the GRANULARITY parameter value required to maintain Incremental Stats does not *have* to be AUTO, but *must* include Global Stats. e.g. ALL or GLOBAL AND PARTITION. I've not tested this properly yet, but it is a useful clarification because I might want to choose to only gather stats on a single partition whilst maintaining the Incremental Global Stats (by using GLOBAL AND PARTITION), rather than leaving it to DBMS_STATS to determine which Partitions need stats gathered on them. (Oh, and if I've *still* got this wrong, I'm sure she'll be along to correct me imminently!)

At least with the first presentation out of the way, I was able to catch a couple of more sessions in between worrying about my demos. First was Tanel Poder - Exadata Performance Method - during which Tanel talked a lot of sense. Simply installing your existing applications on to Exadata might make them go two or three times quicker, based on a simple hardware refresh but as most people will have heard by now - if you're not benefitting from Smart Scan properly then don't expect to see the blistering performance that you've heard people rave about.

My final session of the day was Wolfgang Breitling talking about his SQL Tuning Kit: A Guide To Diagnosing Performance Issues. As someone who probably uses fewer performance analysis scripts than I used to (which can be both a good or a bad thing), I was interested in comparing the particular data Wolfgang was looking at and whether that information was available in OEM these days and I'd say that in most cases, all of the useful stuff is and that I personally find it quicker *these days* than a bunch of scripts. However, there's definitely still a place for useful scripts here and there, Wolfgang had a good box-full and, as we were discussing later, his don't require the Tuning and Diagnostics Pack licenses or anything more than sqlplus, perl and maybe a little Java. (I've had this conversation several times recently though - take-up of Diagnostics and Tuning seems to be pretty wide at the sites I and my fellow UK consultants/contractors work at these days so this is less of an issue than some people would have you believe.) Good presentation, though, even if I should have worn my glasses to see the tiny writing!

The evening ended with a combination of good drinks and conversation and the regular Symposium Party Night. Terrific fun as always although I'd suggest that this photo illustrates

- I'm duller than I thought I was - playing around with my phone when there's supposed to be a party on. Probably tweeting :-(
- Jonathan Lewis is clearly duller still as I've decided I can't be bothered with this chat. (Or maybe he can't be bothered with mine?)
- Free glasses of wine tend to be fuller than the paid-for variety! Thanks sponsors!
- I'm a geek really - check out my Sinclair ZX81 T-Shirt. Thanks Mads!

Ultimately, though, it's difficult to enjoy a party fully when you have demos to worry about the next day :-(


Wednesday morning was an utter write-off for me because I was so nervous about how my OEM 12c demo would go but determined not to screw it up after the car-crash that was my previous attempt at an OEM presentation at the Symposium. (Note that I've provided a link to my *post* about it, not the *video* that exists! In fairness, Marco Gralike did manage to catch the small amount of that hour when the presentations actually worked.) I walked through all of the demos time and again in my room and captured screen-shots that I could show people instead if it went wrong again but it really ruins what I'm trying to show if I can't use an interactive seat-of-my-pants approach. I was also determined to assuage for my poor first attempt so I tried to cover all of the angles, including the specific one of making sure I was performing the demos with my wireless net connection *disabled* ;-)

Imagine my sinking heart when, even allowing myself 15 or 20 minutes to setup (I love lunch-breaks! Just not lunch), my carefully prepared laptop VM wouldn't drive the projector properly. Going through a whole restart procedure solved the problem but it was a nerve-wracking process whilst a growing audience watched on. Checking later, it seemed that quite a few of the big audience had attended the previous minor disaster, so I reckon they were all sadists ;-)

In the end it didn't all go perfectly and there are areas I could and will improve but it was such a relief to get through the whole experience in one piece and hopefully show people some new, cool and useful stuff with some sensible suggestions on getting the best use from ASH Analytics. There were certainly lots of people who were very complimentary afterwards and very excited about using OEM 12c but, as I think I said to every one of them, it's the new tools that are the star here. Giving presentations on cool subjects that you really believe in is the most fun. So thanks to JB et. al. at Oracle for coming up with another winner.

As I've said many times, watching any presentation immediately before or after one of my own is something I really struggle with, either through nerves on the front-end or post-presentation adrenalin come-down afterwards, but I simply had to catch Paul Matuszyk talking about Oracle 11g Extended Statistics. There's a bit of a story behind this. Paul and I used to work for a certain large satellite television company in Scotland and when I went off to present at Hotsos, he was really interested. Being independent contractors we moved on and lost touch a little but then he showed up at the Symposium one year because he'd always wanted to go after what I'd told him about it. (Note to certain User Groups - maybe you gain more new attendees from speaker recommendations than you might think! I know I've encouraged a whole bunch of people who have later turned up at conferences ...) But he was particularly interested in presenting and after what I think was one failed submission attempt, he was presenting this year! :-) (Note to people thinking of submitting abstracts - the Hotsos Symposium Agenda is not *just* about existing speakers!)

His presentation on Extended Statistics contained some really detailed and useful information and, sitting next to Maria Colgan throughout, it all seemed accurate too. He was perhaps a little nervous as it was his first Hotsos presentation, but he did an amazing job and I'm glad I got to see it!

Which left time for just one more presentation - Kerry Osborne on Real World Exadata. Initially I was a little disappointed when Kerry pointed out that instead of the customer performance case studies he'd planned to present on, he was going to discuss an informal survey he'd conducted looking at the usage patterns of Enkitec's many Exadata customers. As it turned out, I found the presentation utterly fascinating, enjoyable and a really good break from the usual technical presentations. I can't remember all the detailed numbers but, as usual, I was surprised by just how many customers are buying quarter racks and how many Exadata implementations are for systems that aren't classic Data Warehouse or Reporting systems. There was a debate later about whether optimizer_index_cost_adj has any place in the 21st century (and I have a plan to blog about this soon) but the bulk of the presentation was just sensible real world stuff that I've come to expect from Kerry. Nice way to wrap up the conference.

Well, kind of. As usual, the Symposium is a conference that's still just about small enough to have a final short Farewell session before heading off for some relaxed Mexican food and cocktails with lots of friends. Great end to the main conference.


Although the weather had been a bit hit-and-miss earlier in the week, Thursday dawned to extremely strong wind and rain. I was quite taken aback (having never experienced anything like it in March at Hotsos), although others were there 'the year that it snowed'! It softened the blow a little to hear that Texas was in the midst of a bad drought and needed the water and to realise that I was likely to spend most of the day indoors listening to Jonathan Lewis' Training Day, catching up with things I'd missed whilst not in the office or catching up with some sleep. Man, I was *tired* and only managed to last until about 2pm!

Unsurprisingly, I really enjoyed what I saw though and the good thing about the Training Day is all of the material is printed to take away so I was able to review anything that I'd missed later on. There were a lot of topics covered in very quick succession, so it was a lot to take in! Good stuff.

The evening was spent catching up on email, finally spending a bit more time with a few friends and slowly wading through tons of mail :-( But it was good to catch my breath and have one more nights sleep before the long flight home.


Although I hadn't originally planned to attend and only stepped in at the last minute to fill in for Randolf Geist, I enjoyed the Symposium as much as usual which is largely to do with the perfectly sized number of attendees and lots of friends in attendance and the brilliant organisation skills of all the Hotsos team, particulaly Rhonda and Becky! The agenda was as excellent as always and so Cary Millsap deserves a note of thanks for helping to keep the standard high. As usual, thanks to the Oracle ACE Director program at OTN for helping me to travel over there and a final note of thanks to all those who spent time listening to my presentations. I hope you got something useful from them.

Usual disclosure: My travel and accommodation expenses were covered
by the Oracle ACE Director

Broken links fixed :-)

Ok, it took only a year or so, but I’ve fixed most of the broken links (to my scripts etc) in my blog :-)

Please let me know if you hit any more broken links from now on….


We know that database blocks are transferred between the nodes through the interconnect, aka cache fusion traffic. Common misconception is that packet transfer size is always database block size for block transfer (Of course, messages are smaller in size). That’s not entirely true. There is an optimization in the cache fusion code to reduce the packet size (and so reduces the bits transferred over the private network). Don’t confuse this note with Jumbo frames and MTU size, this note is independent of MTU setting.

In a nutshell, if free space in a block exceeds a threshold (_gc_fusion_compression) then instead of sending the whole block, LMS sends a smaller packet, reducing private network traffic bits. Let me give an example to illustrate my point. Let’s say that the database block size is 8192 and a block to be transferred is a recently NEWed block, say, with 4000 bytes of free space. Transfer of this block over the interconnect from one node to another node in the cluster will result in a packet size of ~4200 bytes. Transfer of bytes representing free space can be avoided completely, just a symbolic notation of free space begin offset and free space end offset is good enough to reconstruct the block in the receiving side without any loss of data.This optimization makes sense as there is no need to clog the network unnecessarily.

Remember that this is not a compression in a traditional sense, rather, avoidance of sending unnecessary bytes.

Parameter _gc_fusion_compression determines the threshold and defaults to 1024 in So, if the free space in the block is over 1024 then the block is candidate for the reduction in packet size.

Test cases and dumps

From the test cases, I see that three fields in the block can be used to determine the free space available in the block. If you dump a block using ‘alter system dump datafile..’ syntax, you would see the following three fields:


fsbo stands for Free Space Begin Offset; fseo stands for Free Space End Offset; avsp stands for AVailable free SPace;

It seems to me from the test cases that LMS process looks up these fields and constructs the buffer depending upon the value of avsp field. If avsp exceeds 1024 then the buffer is smaller than 8K ( smaller than 7K for that matter). Following few lines explains my test results.

Initially, I had just one row (row length =105 bytes), and the wireshark packet analysis shows that one 8K block transfer resulted in a 690 bytes packet transfer. Meaning, the size of network packet was just 690 bytes for on 8192 block transfer. A massive reduction in GC traffic.

In test case #2, with 10 rows in the block, size of the packet transfer was 1680 bytes. Block dump shows that avsp=0x1b44 (6980 bytes) buckets with just 1212 bytes of useful information. Cache fusion code avoided sending 6980 bytes and reduced the transferred packet size to just 1680 bytes.

In test case #3, with 50 rows in the block, size of the transferred packet was 5776 bytes. free space was 2620 bytes in the block.

This behavior continued until the free space was just above 1024. When the free space was below 1024 (I accidentally added more rows and so free space dropped to ~900 bytes), then whole block was transferred and the size of packet was 8336 bytes.


These test cases prove that cache fusion code is optimizing the packet transfer by eliminating the bytes representing free space.

More test cases

So, what happens if you delete rows in the block? Remember that rows are not physically deleted and just tagged with a D flag in the row directory and so, free space information remains the same. Even if you delete 90% of the rows in the block, until block defragmentation happens, avsp field is not updated. This means that just deletion of rows will still result in whole block transfer, until the block is defragmented.

# After deletion of nearly all rows in the block.

I increased the value of _gc_fusion_compression parameter to 4096, then to a value of 8192. Repeated the tests. Behavior is confirmed: When I set this parameter to a value of 8192, a block with just one row transfer resulted in a packet size of 8336, meaning, this optimization simply did not kick in ( as the free space in the block will never be greater than 8192).


Yes, with 0×6 exclamation symbols! This note is to improve the understanding of cache fusion traffic, not a recommendation for you to change it. This parameter better left untouched.

This is a very cool optimization feature. Useful in data warehouse databases with 32K block size. I am not sure, in which version this optimization was introduced though.


Just in on the Oracle-L list server – if you want to hear from the author of SQLTXPLAIN (MOS 215187.1) how to install and use the main features, Carlos Sierra is presenting a one hour seminar on May 15th. In his words:

If you or someone you know may want to attend, please register following link below. Capacity for this 1hr SQLTXPLAIN Webinar on May 15 is limited, so please register early so you can reserve one connection. Feel free to share this link with your customer(s).

Event Information: Using SQLTXPLAIN to diagnose SQL statements performing poorly

The event is scheduled for 8:00 am in San Francisco, which is 4:00 pm BST (GMT+1) if you’re in the UK.


Friday Philosophy – It’s not “Why Won’t It Work!” it’s “What Don’t I Understand?”

I had a tricky performance problem to solve this week. Some SQL was running too slow to support the business need. I made the changes and additions I could see were needed to solve the problem and got the code running much faster – but it would not run faster consistently. It would run like a dream, then run slow, then run like a dream again 2 or 3 times and then run like a wounded donkey 3 or 4 times. It was very frustrating.

For many this would provoke the cry of “Why won’t it work!!!”. But I didn’t, I was crying “What don’t I understand???”. {I think I even did a bit of fist-pounding, but only quietly as my boss was sitting on the desk opposite me.}

I think I’ve always been a bit like that in respect of How Things Work”, but it has been enhanced within me by being blessed to work with or meet people for whom it is more important for them to understand why something is not working than fixing it.

I was reminded of this by a thoughtful comment in an email that one of the oaktable sent to the list. They made the comment that what they felt was common between members of the oaktable is “that we’re not interested (really) in what the solution is of most of our problems, but actually, what is the underlying issue that really causes the problem?”

It struck a real chord with me. Quite a few people I’ve come across seem to be fixated on wanting to know solutions – so that they can look knowledgeable and be “one of the best”. But that’s just stamp collecting really. It’s like one of those ‘games card’ fads that each generation of children has, where you want to collect the best cards so you can win. I never got it as a kid as there are few rules, tactics, ‘how it works’ to elucidate. What success is there in winning when it’s just down to the cards you have? {And being candid, I didn’t like them as partly as I never had the money to buy many cards and partly I was rubbish at trading them. No sales skills.}

I know the solve-it-don’t-just-fix-it position is a topic I have touched on before, but I think the attitude of trying to fix problems by understanding how it works is far more satisfying than doing so by knowing a set of solutions. You develop a deeper understanding to help solve new problems than any amount of solution-stamp-collecting ever will. However, another wise voice on the Oaktable discussion pointed out that you can be in a work environment where there is no time to investigate and you simply have to try your set of fixes and move on if you hit one that works. Your work environment can strongly influence how you work and, it some ways, the ways you think.

I bet some people are wondering what my problem at the start of this post actually was? Well, a nice technical blog about it may appear over the weekend, but the core reason for the toggling of working/not-working was partition swap. We have data coming into the system very fast. We build a new summary of the key data in one table and then swap it into active play via partition swap. On the live system, stats had not been gathered on the “swap” table we had introduced but had on the active table. So, each time the partition swapped, we went from good stats to “empty” stats or the other way around. The empty stats gave a quite, quite dreadful execution plan.

Little things I didn’t know: difference between _enable_NUMA_support and numactl

In preparation for a research project and potential UKOUG conference papers I am researching the effect of NUMA on x86 systems.

NUMA is one of the key features to understand in modern computer organisation, and I recommend reading “Computer Architecture, Fifth Edition: A Quantitative Approach” from Hennessy and Patterson (make sure you grab the 5th edition). Read the chapter about cache optimisation and also the appendix about the memory hierarchy!

Now why should you know NUMA? First of all there is an increasing number of multi-socket systems. AMD has pioneered the move to a lot of cores, but Intel is not far behind. Although AMD is currently leading in the number of cores (“modules”) on a die, Intel doesn’t need to: the Sandy-Bridge EP processors are way more powerful on a one-to-one comparison than anything AMD has at the moment.

In the example, I am using a blade system with Opteron 61xx processors. The processor has 12 cores according to the AMD hardware reference. The output of /proc/cpuinfo lists 48 “processors”, so it should be fair to say that there are 48/12 = 4 sockets in the system. An AWR report on the machine lists it as 4 sockets, 24 cores and 48 processors. I didn’t think the processor was using SMT, when I find out why AWR reports 24c48t  I’ll update the post.

Anyway, I ensured that the kernel command line (/proc/cmdline) didn’t include numa=off, which the oracle-validated RPM sets. Then after a reboot here’s the result:

$ ]$ numactl --hardware
available: 8 nodes (0-7)
node 0 size: 4016 MB
node 0 free: 378 MB
node 1 size: 4040 MB
node 1 free: 213 MB
node 2 size: 4040 MB
node 2 free: 833 MB
node 3 size: 4040 MB
node 3 free: 819 MB
node 4 size: 4040 MB
node 4 free: 847 MB
node 5 size: 4040 MB
node 5 free: 834 MB
node 6 size: 4040 MB
node 6 free: 851 MB
node 7 size: 4040 MB
node 7 free: 749 MB
node distances:
node   0   1   2   3   4   5   6   7
  0:  10  20  20  20  20  20  20  20
  1:  20  10  20  20  20  20  20  20
  2:  20  20  10  20  20  20  20  20
  3:  20  20  20  10  20  20  20  20
  4:  20  20  20  20  10  20  20  20
  5:  20  20  20  20  20  10  20  20
  6:  20  20  20  20  20  20  10  20
  7:  20  20  20  20  20  20  20  10

Right, I have 8 NUMA nodes from 0-7, total RAM on the machine is 32GB. There are huge pages allocated for another database to allow for a 24GB RAM SGA. A lot of information about NUMA can be found in the SYSFS which is now mounted by default on RHEL and Oracle Linux. Check the path to /sys/devices/system/node:

$ ls
node0  node1  node2  node3  node4  node5  node6  node7

$ ls node0
cpu0  cpu12  cpu16  cpu20  cpu4  cpu8  cpumap  distance  meminfo  numastat

For each NUMA node as shown in the output of numactl –hardware there is a subdirectory noden. There you can see also the processors that form the node as well. Oracle Linux 6.x offers a file called cpulist, previous releases with the RHEL-compatible kernel should have subdirectories cpux. Interestingly you find memory information local to the NUMA node in the file meminfo, as well as the distance matrix you can query in numactl –hardware. So far I have only seen distances of 10 or 20-if anyone knows where these numbers come from or has soon other figures please let me know!

Another useful tool to know is numastat which presents memory information (and cross-node memory requests!) which can be useful.

$ numastat
                           node0           node1           node2           node3
numa_hit                 3048548        25344114        14523218        13498057
numa_miss                      0               0               0               0
numa_foreign                   0               0               0               0
interleave_hit              8196          390371          415719          458362
local_node               2415628        24965781        14059618        12907752
other_node                632920          378333          463600          590305

                           node4           node5           node6           node7
numa_hit                 9295098         4072364         3730878         3659625
numa_miss                      0               0               0               0
numa_foreign                   0               0               0               0
interleave_hit            512399          451099          417627          390960
local_node               8637176         3483582         3152133         3159090
other_node                657922          588782          578745          500535

Oracle and NUMA

Oracle has an if then else approach to NUMA as a post from Kevin Closson has explained already. I’m on and need to use “_enable_numa_support” to enable NUMA support in the database. Before that however I though I’d give the numctl command a chance and bind it to node 7 (both for processor and memory)

This is easily done:

[oracle@server1 ~]> numactl --membind=7 --cpunodebind=7 sqlplus / as sysdba <

Have a look at the numactl man page if you want to learn more about the options.

Now how can you check if it respected your settings? Simple enough, the tool is called “taskset”. Unlike the name may suggest not only can you set a task, but you can also get the affinities etc. A simple one-liner does that for my database SLOB:

$ for i in `ps -ef | awk '/SLOB/ {print $2}'`; do taskset -c -p $i; done
pid 1434's current affinity list: 3,7,11,15,19,23
pid 1436's current affinity list: 3,7,11,15,19,23
pid 1438's current affinity list: 3,7,11,15,19,23
pid 1442's current affinity list: 3,7,11,15,19,23
pid 1444's current affinity list: 3,7,11,15,19,23
pid 1446's current affinity list: 3,7,11,15,19,23
pid 1448's current affinity list: 3,7,11,15,19,23
pid 1450's current affinity list: 3,7,11,15,19,23
pid 1452's current affinity list: 3,7,11,15,19,23
pid 1454's current affinity list: 3,7,11,15,19,23
pid 1456's current affinity list: 3,7,11,15,19,23
pid 1458's current affinity list: 3,7,11,15,19,23
pid 1460's current affinity list: 3,7,11,15,19,23
pid 1462's current affinity list: 3,7,11,15,19,23
pid 1464's current affinity list: 3,7,11,15,19,23
pid 1466's current affinity list: 3,7,11,15,19,23
pid 1470's current affinity list: 3,7,11,15,19,23
pid 1472's current affinity list: 3,7,11,15,19,23
pid 1489's current affinity list: 3,7,11,15,19,23
pid 1694's current affinity list: 3,7,11,15,19,23
pid 1696's current affinity list: 3,7,11,15,19,23
pid 5041's current affinity list: 3,7,11,15,19,23
pid 13374's current affinity list: 3,7,11,15,19,23

Is that really node7? Checking the cpus in node7:

$ ls node7
cpu11  cpu15  cpu19  cpu23  cpu3  cpu7

That’s us! Ok that worked.


The next test I did was to see how Oracle handles NUMA in the database. There was a bit of a enable/don’t enable/enable/don’t enable from 10.2 to 11.2. If the MOS notes are correct then NUMA support is turned off by default now. The underscore parameter _enable_NUMA_support turns it on again. At least on my system on Linux there was no relinking of the oracle binary necessary.

But to my surprise I saw this after starting the database with NUMA support enabled:

$ for i in `ps -ef | awk '/SLOB/ {print $2}'`; do taskset -c -p $i; done
pid 17513's current affinity list: 26,30,34,38,42,46
pid 17515's current affinity list: 26,30,34,38,42,46
pid 17517's current affinity list: 26,30,34,38,42,46
pid 17521's current affinity list: 26,30,34,38,42,46
pid 17523's current affinity list: 26,30,34,38,42,46
pid 17525's current affinity list: 26,30,34,38,42,46
pid 17527's current affinity list: 26,30,34,38,42,46
pid 17529's current affinity list: 26,30,34,38,42,46
pid 17531's current affinity list: 0,4,8,12,16,20
pid 17533's current affinity list: 24,28,32,36,40,44
pid 17535's current affinity list: 1,5,9,13,17,21
pid 17537's current affinity list: 25,29,33,37,41,45
pid 17539's current affinity list: 2,6,10,14,18,22
pid 17541's current affinity list: 26,30,34,38,42,46
pid 17543's current affinity list: 27,31,35,39,43,47
pid 17545's current affinity list: 3,7,11,15,19,23
pid 17547's current affinity list: 24,28,32,36,40,44
pid 17549's current affinity list: 26,30,34,38,42,46
pid 17551's current affinity list: 26,30,34,38,42,46
pid 17553's current affinity list: 26,30,34,38,42,46
pid 17555's current affinity list: 26,30,34,38,42,46
pid 17557's current affinity list: 26,30,34,38,42,46
pid 17559's current affinity list: 26,30,34,38,42,46
pid 17563's current affinity list: 26,30,34,38,42,46
pid 17565's current affinity list: 26,30,34,38,42,46
pid 17568's current affinity list: 0,4,8,12,16,20
pid 17577's current affinity list: 0,4,8,12,16,20
pid 17584's current affinity list: 0,4,8,12,16,20
pid 17597's current affinity list: 0,4,8,12,16,20
pid 17599's current affinity list: 24,28,32,36,40,44

Interesting-so the database, with an otherwise identical pfile (and a SLOB PIO SGA of 270 M) is now distributed across lots of NUMA nodes…watch out for that interleaved memory transfer!

It doesn’t help trying to use numactl to force the creation of process on a node-Oracle now uses NUMA API calls internally it seems and overrides your command:

$ numactl --membind=7 --cpunodebind=7 sqlplus / as sysdba < startup
$ for i in `ps -ef | awk '/SLOB/ {print $2}'`; do taskset -c -p $i; done
pid 20155's current affinity list: 3,7,11,15,19,23
pid 20157's current affinity list: 3,7,11,15,19,23
pid 20160's current affinity list: 3,7,11,15,19,23
pid 20164's current affinity list: 3,7,11,15,19,23
pid 20166's current affinity list: 3,7,11,15,19,23
pid 20168's current affinity list: 3,7,11,15,19,23
pid 20170's current affinity list: 3,7,11,15,19,23
pid 20172's current affinity list: 3,7,11,15,19,23
pid 20174's current affinity list: 0,4,8,12,16,20
pid 20176's current affinity list: 24,28,32,36,40,44
pid 20178's current affinity list: 1,5,9,13,17,21
pid 20180's current affinity list: 25,29,33,37,41,45
pid 20182's current affinity list: 2,6,10,14,18,22
pid 20184's current affinity list: 26,30,34,38,42,46
pid 20186's current affinity list: 27,31,35,39,43,47
pid 20188's current affinity list: 3,7,11,15,19,23
pid 20190's current affinity list: 24,28,32,36,40,44
pid 20192's current affinity list: 3,7,11,15,19,23
pid 20194's current affinity list: 3,7,11,15,19,23
pid 20196's current affinity list: 3,7,11,15,19,23
pid 20198's current affinity list: 3,7,11,15,19,23
pid 20200's current affinity list: 3,7,11,15,19,23
pid 20202's current affinity list: 3,7,11,15,19,23
pid 20206's current affinity list: 3,7,11,15,19,23
pid 20208's current affinity list: 3,7,11,15,19,23
pid 20211's current affinity list: 0,4,8,12,16,20
pid 20240's current affinity list: 0,4,8,12,16,20
pid 20363's current affinity list: 0,4,8,12,16,20
sched_getaffinity: No such process
failed to get pid 20403's affinity

Little things I didn’t know! So next time I benchmark I will have that in mind!