Who's online

There are currently 0 users and 38 guests online.

Recent comments


Oakies Blog Aggregator

SQL Trace and Oracle Portal

Recently I was involved in a project where I had to trace the database calls of an application based on Oracle Portal 10.1.4. The basic requirements were the following:

  • Tracing takes place in the production environment
  • Tracing has to be enable for a single user only
  • Instrumentation code cannot be added to the application

Given that Oracle Portal uses a pool of connections and that for each HTTP call it can use several database sessions, statically enable SQL trace for specific sessions was not an option.

Knowing nothing about Oracle Portal I started RTFM and, gladly, I discovered that there is a simple way to inject a piece of code before and after a requested procedure is called. This is done by setting, via the administration GUI, the parameters PlsqlBeforeProcedure and PlsqlAfterProcedure.

Since Oracle Portal provides a function (WWCTX_API.GET_USER) to get the current user, I decided to create the following procedures to set/clear the client identifier before/after every call. Note that I added the call to SUBSTR and the exception handler to make sure that the procedures do not raise exceptions (hey, it’s a production system and I do not want to impact everyone!).

CREATE PROCEDURE tvd_set_client_identifier AS
  WHEN others THEN NULL;
CREATE PROCEDURE tvd_clear_client_identifier AS
  WHEN others THEN NULL;

To “enable” these procedures we did the following:

  • Set PlsqlBeforeProcedure=portal.tvd_set_client_identifier
  • Set PlsqlAfterProcedure=portal.tvd_clear_client_identifier
  • Restarted the application server

With this configuration in place enabling SQL trace for a single user was easily done by calling the DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE procedure by specifying the user to be traced as value for the CLIENT_ID parameter.

Headlong rush to Conference – Preparing the Presentations

With only a few days to go before the UKOUG conference this year I’m preparing my presentations. I know pretty much what I want to say and, for the IOT talk at least, it is not as if I do not have enough material already prepared – some of which has been on the blog posts and some of which has not. (though it did strike me that I could just fire up the blog and walk through the thread, taking questions).

My big problem is not what to say – it is what not to say.

I’ve always had this problem when I want to impart knowledge, I have this desire to grab the audience by the throat, take what I know about the subject and somehow just cram the information into the heads of the people in front of me. All of it. I want them to know everything about it that I know, the core knowledge, the oddities, the gotchas, how it meshes with other topics. It’s ridiculous of course, if I’ve spent many hours (days, weeks, 20 years) acquiring experience, reading articles and learning, I can’t expect to pass that all on in a one hour presentation – especially as I like to provide proof and examples for what I say. But I think the desire to do so is part of what makes me a good presenter and tutor. I bounce around in front of the audience, lobbing information at them and constantly trying to judge if I need to backup and give more time to anything or if I can plough on, skipping the basics. Hint, if you are in the audience and I’m going too fast or garbling my topic, then I am always happy to be asked questions or told to reverse up a little. I’ve never been asked to speed up though :-)

It gets even worse. If I am putting myself up there to talk about a topic then I don’t want to be found wanting. I want to be able to handle any question and have a slide or example up my sleeve to demonstrate it. It’s exhausting and, again, pointless. At somewhere like the UKOUG there is bound to be someone who knows something I don’t know about any topic.

For me the trick is to pare it down, to keep reminding myself that if the audience leaves with more knowledge than they came in with, that is a win. If they actually enjoyed the experience I’m even more happier. Maybe I should forget the topic and just take drinks and nibbles…

So, I’m currently doing what I always do, which is trying to force myself to remove stuff that is not directly relevant whilst still leaving a few little oddities and interesting items. Plus getting the 200 slides down to something more reasonable – like say 120 :-)

If I can get it down to one slide per minute (some of which I skip on the day as they are there for anyone downloading the talk) then I’m OK.

Of course, having done this, the day before the course I’ll do one last “final review” – and add a couple of dozen slides to just clarify a few points…

Mike Carey: Thicker Than Water…

“Thicker Than Water” is book 4 in the Felix Castor series by Mike Carey.

I’m not even going to try and summarize this bad-boy. It’s just a crazy-paced page turner. We find out more about Felix’s life before he became a freelance exorcist. More about his brother, the Catholic Priest. More about the Anathemata, the excommunicated millitant arm of the Catholic Church. Juliet, the succubus, gets involved in the action again. It’s just bam, bam, bam. Love it. :)




I missed my fifth blogging anniversary by a month – my first post was dated 24th Oct 2006 on the topic of the parameter optimizer_index_cost_adj.

Since then I’ve posted about 670 articles (and deleted a few temporary ones) and drafted another 110; I’ve acquired 750 followers;  and the blog has inspired a little  over 6,000 comments. It’s amazing how time passes and the numbers accumulate.

To celebrate the event I’ve created a new category (see top right) of “Site History” so that I could label that first post and the various posts I’ve made about viewing figures and popular posts.

Recycle bin

Recent Charles Hooper’s post on the topic of Recycle bin (which is, BTW, documented behavior) reminded me of an issue with that functionality I’ve seen recently. The problem was a single-row INSERT INTO table VALUES () statement was hanging for more than an hour burning CPU.

Here is how SQL Monitoring report for the statement looked like:

SQL Monitoring Report
SQL Text
INSERT INTO TAB(X1,X2,X3,X4) VALUES (:1,:2,:3,:4)
Global Information
Status              :  EXECUTING                  
Instance ID         :  1                          
Session             :  USER1 (101:34227)
SQL ID              :  55wauznayyk1r              
SQL Execution ID    :  16777303                   
Execution Started   :  11/23/2011 13:11:41        
First Refresh Time  :  11/23/2011 13:11:49        
Last Refresh Time   :  11/23/2011 14:26:50        
Duration            :  4511s                      
Module/Action       :  JDBC Thin Client/-         
Service             :  SYS$USERS                  
Program             :  JDBC Thin Client           
Global Stats
| Elapsed |   Cpu   |    IO    | Concurrency | PL/SQL  |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
|    4528 |    4525 |     0.71 |        0.01 |    0.01 |     2.46 |   202M |   71 | 568KB |
SQL Plan Monitoring Details (Plan Hash Value=0)
| Id |         Operation         | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|    |                           |      | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
|  0 | INSERT STATEMENT          |      |         |      |           |        |     1 |          |          |                 |
|  1 |   LOAD TABLE CONVENTIONAL |      |         |      |           |        |     1 |          |          |                 |

And pstack output for Oracle process looked like this most of the time:

 ffffffff7c5da730 times (64, 4a6f0, 49e2794, 10c20aba8, 10c1c04b8, 10c1ce850) + 8
 000000010633d76c opiosq0 (4a, e, ffffffff7ffdc650, 20, 0, ffffffff7ffdc6c9) + 4cc
 00000001035a0444 opiall0 (ffffffff7ffdc728, 1, ffffffff7ffdce10, 0, ffffffff7ffdcd28, ffffffff7ffdccd0) + 964
 00000001035cf01c opikpr (65, 1c, ffffffff74167bd8, 10c000, 10c1ca000, 1648) + 21c
 00000001035b86b4 opiodr (10a755, 1c, 1035cee00, 10a400, 1400, ffffffff7bc20178) + 594
 0000000105143a64 rpidrus (ffffffff7ffdda78, 10c1c9ce8, 1, 0, 10a6a2, 10a400) + c4
 00000001077e6260 skgmstack (ffffffff7ffddce0, 10c1c9ce8, 1051439a0, ffffffff7ffddcb8, ffffffff7ffddce0, ffffffff7ffddcb8) + 60
 0000000105143fa4 rpidru (ffffffff7ffde5c8, 10a686, 10a400, 105143f00, 105000, 10a400) + a4
 0000000105142cac rpiswu2 (67252c600, 0, ffffffff7ffde438, 10a6a2, 10c1ca490, 0) + 28c
 000000010514b588 kprball (64, 8, 0, 2000, 105143f00, 4000) + 588
 0000000105c23814 kdpurcrb (1154, 11, 0, ffffffff7ffdf308, 10a6a2, 10c1ce) + 1d4
 0000000105c27a9c kdpurts (ffffffff7ffdf580, ffffffff7ffdf550, 1, ffffffff7ffdf5fc, 1a00000000, 380016) + 1a7c
 0000000105c28250 kdpursp (5, 100, ffffffff7ffdf838, 1b0, 7ffffffc, 7ffffc00) + 70
 00000001023c2fd4 ktsx_purge_bin (5, 80, ffffffff7ffdff7c, ffffffff7ffdff78, 5, 10c000) + 134
 0000000102441ad0 ktspnr_next (ffffffff7ffe06c0, 102440e20, 10ac50558, 10ac50500, 10ac00, 10a6a2) + cb0
 0000000101a0bca0 ktrsexec (ffffffff7ffe06c0, ffffffff7ffe08b0, 380000, 380025000, 380025, b) + 240
 0000000102436128 ktspbmphwm (ffffffff79af7fe8, 0, 0, 4cd9980, 76, 1527801) + 468
 0000000102430204 ktspmvhwm (ffffffff7ffe0ec0, 1024313cc, e, 10c1c9, ffffffff79af7fe8, 409) + 24
 0000000102431448 ktsp_bump_hwm (10a400, ffffffff7ffe3fa8, 794, 10c1ca490, 1, 582) + c8
 00000001024321f4 ktspgsp_main (0, 152780100000000, 380024ed8, 0, 0, 3) + 5d4
 0000000103888dbc kdisnew (ffffffff7bc78860, 50c43e2, ffffffff7ffe4678, 2, ffffffff79af7fe8, 0) + 11c
 000000010388bdc8 kdisnewle (ffffffff79af7fd0, 50c43e2, 2d, ffffffff7ffe4678, 0, 0) + a8
 00000001038b2334 kdisle (0, 2d, 1527803, 50c43e2, 1, 2c) + 1e34
 00000001037c3c54 kdiins0 (6371cfdb0, 0, 0, ffffffff7fff59a8, 0, 0) + 2bb4
 000000010381a1a0 kdiinsp (6371cfdb0, ff000000000000, 0, ffffffff7fff59a8, 2, 2) + 60
 0000000100c71154 kauxsin (10a400, ffffffff7bc78860, 0, 10c000, 10a6a2000, 0) + 7d4
 00000001062d2764 qesltcLoadIndexList (ffffffff79ce5530, 6371cfe30, 0, 0, 0, 636f17848) + 384
 00000001061d5a00 qerltcNoKdtBufferedInsRowCBK (ffffffff79ce54e8, 636f17848, 6371d0140, 10c1ca490, 500080025, 1061d5880) + 180
 00000001061d10bc qerltcLoadStateMachine (1061d4, 636f17848, 10c1ca2e0, 1061d4ee0, ffffffff79ce54e8, 1) + dc
 00000001061d1a38 qerltcInsertSelectRop (ffffffff7fff7f88, ffffffff7fff7ee8, ffffffff79cc0518, 10c1ca490, 104c00, 0) + d8
 0000000106068284 qeruaRowProcedure (69000001, 7fff, ffffffff7fff7c18, 800000, 1061d1960, 80000) + 124
 0000000106125330 qerfiFetch (642d22848, ffffffff7bc6ce58, 106068160, ffffffff7fff7c18, 7fff, 107) + 50
 000000010272148c rwsfcd (642d227d0, ffffffff7bc6ce88, 106068160, ffffffff7fff7c18, 7fff, 106068160) + 6c
 0000000106068620 qeruaFetch (33000002, 33008002, 106068160, ffffffff7fff7c18, 642d227d0, ffffffff79ce51b0) + 1a0
 000000010272148c rwsfcd (66375c718, ffffffff79ce5458, 1061d1960, ffffffff7fff7f88, 7fff, 1061d1960) + 6c
 00000001061cdfbc qerltcFetch (800000, 636f17848, 104a70c00, ffffffff79ce54e8, 10c1ca2e0, 1061d1960) + 39c
 0000000104a711dc insexe (10c1df, 1061cdc20, 66, 0, 10c000, 104800) + 35c
 00000001035c305c opiexe (49, 3, ffffffff7fffa26c, 0, 0, bfffffff) + 263c
 0000000103e6d560 kpoal8 (5e, 1c, ffffffff7fffd728, 10c000, 10c1ca000, 1648) + 940
 00000001035b86b4 opiodr (10a755, 1c, 103e6cc20, 10a400, 1400, 10c1c9000) + 594
 00000001070fd3a0 ttcpip (0, 10a686d94, 10c1ca2e0, ffffffff7fffd728, ffffffff7fffc170, 10c1e0e98) + 420
 000000010358c018 opitsk (10a686d94, 10c1e62e8, 10c1e0ca4, 10c1defa8, 0, 10c1c9fa0) + 5f8
 000000010359d2e8 opiino (2270, 10c1e0d20, 10c000, 380000, 9b, ffffffff7ffff080) + 3e8
 00000001035b86b4 opiodr (10c1e0000, 4, 10359cf00, 10c000, 1400, 10c1c9000) + 594
 0000000103580c4c opidrv (3c, 4, ffffffff7ffff080, 1ea1c8, ffffffff7c742fd0, ffffffff7ffff560) + 44c
 00000001046e6f3c sou2o (ffffffff7ffff058, 3c, 4, ffffffff7ffff080, 10c000, 10b800) + 5c
 0000000100604430 opimai_real (0, ffffffff7ffff328, ffffffff7f73aeb8, 10b800, 1, 2) + 130
 00000001046fd140 ssthrdmain (10c000, 2, 44d000, 100604300, 10c1ef000, 10c1ef) + 140
 0000000100604214 main (2, ffffffff7ffff438, 0, ffffffff7ffff338, ffffffff7ffff448, ffffffff7eb00200) + 134
 00000001006040bc _start (0, 0, 0, 0, 0, 0) + 17c

So looking at this process stack trace it’s possible to make a valid assumption that Oracle tried to bump High Water Mark (line 19) while executing client’s INSERT statement, then tried to free space from the recycle bin (line 14) which in turn resulted in executing a recursive statement (line 5). To find which statement is executed by the process we used errorstack at level 3 and the culprit SQL was dumped to trace in the beginning:

----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=b52m6vduutr8j) -----
delete from RecycleBin$                     where bo=:1 

This statement was executed in a recursive session by Oracle probably in a loop. Cumulative SQL runtime execution statistics showed that it has been executed ~1M times, making ~800M buffer gets. With 1024 blocks occupied by recyclebin$ and average row length of the table at 50 bytes (low estimate), it looks like 1) a bit of extra-executing 2) Oracle tries to clean up whole recycle bin, which is strange. I think SQL tracing of a such situation can give more details of how Oracle manages data that is resided in recycle bin. For me it was enough to know that issue 1) is definitely caused by the Recycle Bin functionality 2) we have an application installation code that can re-create tables many times and which most likely was catalyst to the problem.

Filed under: Oracle, Performance Tagged: hang

BIN$ Index Found in the Execution Plan – Digging through the Recycle Bin

November 28, 2011 There are a few articles that can be found on the Internet that describe the cause of indexes with names similar to BIN$ld5VAtb88PngQAB/AQF8hg==$0 and BIN$PESygWW5R0WhbOaDugxqwQ==$0 appearing in execution plans.  As is likely known by readers, the Oracle Database documentation describes that these object names are associated with the recycle bin that was introduced in Oracle Database [...]

Oracle Core

28th November 2011

I’ve just received an email telling me that Apress is having a “Happy Cyber Monday”. Translated into English this means that for today only they’re selling every eBook at  $15.00 each. If you were planning to get an eBook version of Oracle Core, today might be the best day to do it. Here’s a link to the specific page on the Apress site. I’ve repeated this link in the picture of the book at the top-right of this page.

If you want the paper copy, it started shipping from Amazon in the US quite recently but isn’t yet shipping from Amazon UK.

[Update: Although Amazon UK is not shipping the book, you will find that there are a couple of alternative suppliers listed on Amazon UK and one who claims that they can deliver in 3 to 4 working days.]

Note – the book is available in ePub, Mobi, and pdf formats.

Note 2 – it’s possible that some local sales tax will be added dependent on where you are purchasing from.

Update 9th Dec

I’ve uploaded the source code library to my blog as a zip file. Instructions for download are on my index page for the book.

Update 10th Dec

Although the book isn’t available directly from Amazon UK, it’s been on their rankings as the number 1 Oracle book for the last 24 hours. That’s nice, of course, but the reason I wrote this update wasn’t to brag, it was to point out that one of the other categories that Amazon has assigned to the book to is Home Office Databases!

Update 14th Dec

If you go to the main page I’ve set up for the book you will find details of two discount code for the book – one of them expires on 16th Dec. (You’ll also find the link to the source code that I’ve set up for my copy of the zip file.)


Making Simple Performance Charts

Before I dive into this blog post, quick heads up for anyone attending UKOUG: on Tuesday only, I’ll be hanging out with some very smart people from the IOUG RAC Special Interest Group in the “gallery” above the exhibition hall. We’re ready to help anyone run a RAC cluster in a virtual environment on their own laptop. And if your laptop doesn’t meet the minimum requirements then you can try with one of our demo workstations. Come find us!!

Why Make Charts

I’ve heard Kyle Hailey speak on a few different occasions, and more than once he’s talked about the power of visualizing data. (In fact Kyle was a key person behind Grid Control’s performance screens.)

I couldn’t agree more. I regularly visualize data when I’m working. Two reasons come immediately to mind:

  1. It helps me to better understand what’s happening. There have been times when I’ve had an “aha” moment very quickly after seeing the picture.
  2. It helps others more easily understand what i’m trying to communicate. It’s great for management reports and such – not because it’s fluff, but because it’s a good communication tool.

Last week, I made a few quick charts as illustrations for a performance report. The process really isn’t that complicated, but I thought I’d put the steps into a blog post… for myself to reference in the future and for anyone else who might find this helpful. :)

Making Simple Charts

This demonstration will use data from the AWR to build graphs. Note that if you run these queries, Oracle legally requires you to purchase the extra-cost “diagnostic pack” license. But similar queries could be written from free statspack or S-ASH tables.

You need multiple data points to make a graph. For this demo, The AWR was configured to take snapshotsd every 30 minutes and I’m looking at a query which ran for about 10 hours. Also, it was the only query running in the instance for most of that time – so I will also look at some instance-wide statistics.

  1. The first step is to get any needed parameters for pulling performance data. In the case of the AWR, I will need the INSTANCE_NUMBER, the SQL_ID and the first/last SNAP_ID.

    It’s pretty easy to get this information from Grid Control or from Database Console. But if you don’t have access to the web console then you can still get the info from SQLPlus.

    Here’s a useful query to get an overview of the SNAP_IDs:

    SQL> select to_char(BEGIN_INTERVAL_TIME,'MON YYYY') month, 
           min(snap_id) min_snap, max(snap_id) max_snap
         from dba_hist_snapshot 
         where instance_number=4
         group by to_char(BEGIN_INTERVAL_TIME,'MON YYYY');
    MONTH        MIN_SNAP     MAX_SNAP
    -------- ------------ ------------
    APR 2009        10239        10240
    DEC 2009        28752        31924
    FEB 2010        38115        40939
    MAY 2010        47498        48783
    AUG 2010        54975        55013
    NOV 2010        60979        61986
    DEC 2010        61987        64218
    JAN 2011        64219        66448
    FEB 2011        66449        67803
    MAR 2011        67804        69291
    APR 2011        69292        70731
    MAY 2011        70732        72219
    JUN 2011        72220        73655
    JUL 2011        73656        75139
    AUG 2011        75140        76608
    SEP 2011        76609        78048
    OCT 2011        78049        79536
    NOV 2011        79537        80338
    18 rows selected.

    Something similar to this might also be useful:

    SQL> select snap_id,instance_number,begin_interval_time,snap_level 
         from dba_hist_snapshot
         where begin_interval_time between '11-nov-11 17:30' and '11-nov-11 19:00'
           and instance_number=4
         order by snap_id, instance_number;

    For this demo I’m going to use INSTANCE_NUMBER 4 and SQL_ID 8suhywrkmpj5c between snaps 80298 and 80318.

  2. Now create a new spreadsheet in your office suite. I use the free OpenOffice spreadsheet application, but Excel or iWork Numbers should work pretty much the same.

    In the second row of the new spreadsheet, enter the time of the first snapshot you’re going to analyze. In the third row, enter this formula:

    = A2 + 1/24/60 * [minutes between snaps]

    Select several rows below this formula and select Edit > Fill > Down to copy the formula to the following rows. Repeat this until you have reached the end of your analysis window.

  3. Open a SQLPlus session. We will copy-and-paste directly from SQLPlus into the spreadsheet.

    Use a SQL like this to retrieve data for one system statistic:

    set pagesize 999
    col value format 999999999999999
    select value from dba_hist_sysstat
     where instance_number=4 and snap_id between 80298 and 80318
       and stat_name='transaction tables consistent read rollbacks'
     order by snap_id

    You can copy the statistic name directly from an AWR report of there’s a certain stat you’re interested in. You can find more information about system statistics in Oracle’s docs.

    Now move right to the next empty column. First, copy the name of this statistic into the first row. Then, in the second box, enter a formula to find the difference between its left peer and the left upper peer. For cell C3, the formula is B3-B2. Choose Edit > Fill > Down again, as before.

    = B3 - B2

    You can repeat this step to access further system statistics. You can also create another column where you divide or multiply each other.

  4. The previous SQL statement retrieved system statistics. Another easy query runs against historical V$SQLSTAT snapshots. (This only works for long-running queries.)

    SQL> select BUFFER_GETS_TOTAL value from dba_hist_sqlstat
         where instance_number=4 and snap_id between 80298 and 80318
           and sql_id='8suhywrkmpj5c'
         order by snap_id

    Once again, you can read about the available fields and statistics in the oracle docs. You can repeat this step to quickly get additional statistics for a particular SQL, and you can then combine some stats for better graphs.

That’s it! I know, really not that complicated. I hope it’s helpful. :)

Post Script

For anyone who’s curious, the charts in this article are related to a SQL report which was recently discussed on the mailing list.

Is 61.11% Fragmentation Too Fragmented For An XFS File System? No!

Thought of the day:

An XFS file system with 98% free space, 6 files and 61.11% fragmentation:

# df -h .
 Filesystem Size Used Avail Use% Mounted on
 /dev/sdb1 100G 1.1G 99G 2% /test
 # find . -type f -print | wc -l
 # xfs_db -r -c frag /dev/sdb1
 actual 18, ideal 7, fragmentation factor 61.11%

When I asked about this oddity in a conversation with Dave Chinner (XFS Kernel owner) I was expecting a lot of complex background on what this 61.11% actually means. His response? I’ll quote:

18 – 7 / 18 = 0.6111111

[…]it’s been that way forever. Ignore it – it’s much more important to look at the files themselves […]

I like Dave’s candor and have found that individual file analysis does yield  interesting information as well as I showed in my post entitled Little Things Doth Crabby Make – Part XVII. I See xfs_mkfile(8) Making Fragmented Files.

As for deprecated tools, I also have no problem with that. There may have been a day when this command spat out useful information (perhaps in XFS’s previous SGI Unix life?) and folks have scripted to it. Basically, OS distributions can’t just discard such a command.  It just goes that way…no problem.

Way Off Topic
Maybe the next time Dave is in the Bay Area we can repeat the curry! That would be nice.

Filed under: oracle

IOT Answer

It was good to see the answers to the last Quiz Night accumulating. The problem posed was simply this: I have two IOTs and I’ve inserted the same data into them with the same “insert as select” statement. Can you explain the cost of a particular query (and it’s the same for both tables) and extreme differences in work actually done. Here’s the query, the critical stats on the primary key indexes, the shared plan, and the critical execution statistic for running the plan.

SQL> select max(v1) from tX;

---------- ------- ----------- ---------- ---------- -------
T3_PK            1           8          1          1    2000
T4_PK            1          12          1          1    2000

| Id  | Operation             | Name  | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT      |       |     1 |     3 |  2003 |
|   1 |  SORT AGGREGATE       |       |     1 |     3 |       |
|   2 |   INDEX FAST FULL SCAN| Tx_PK |  2000 |  6000 |  2003 |

Statistics (for tX = table t3)
       2012  consistent gets

Statistics (for tX = table t4)
         17  consistent gets

The answer was in the definition of the two tables – which is why I didnt’s supply the DDL to create the structures. In fact I had made it virtually impossible for anyone to get to a totally accurate result because two of the columns were defined as varchar2(40) but held values which were no more than 4 characters long; nevertheless several people deduced that the answer related to overflow segments, and different declarations for the overflow for the two tables.

Here are the original declarations of the two tables (the t1 table used the same select statement, selecting 2,000 rows, to create a simple heap table):

create table t3 (
	id1, id2, v1, v2, padding,
	constraint t3_pk primary key(id1, id2)
organization index
including id2                      --  Note the included column
	mod(rownum,20)			id1,
	trunc(rownum,100)		id2,
	to_char(mod(rownum,20))		v1,
	to_char(trunc(rownum,100))	v2,
	rpad('x',500)			padding
	rownum = 0

create table t4 (
	id1, id2, v1, v2, padding,
	constraint t4_pk primary key(id1, id2)
organization index
including v2                      --  Note the included column
	mod(rownum,20)			id1,
	trunc(rownum,100)		id2,
	to_char(mod(rownum,20))		v1,
	to_char(trunc(rownum,100))	v2,
	rpad('x',500)			padding
	rownum = 0

Note especially the included clause and the overflow keyword. A better choice of syntax for the included clause to be redefined as the exclude everything after clause because that is (essentially) the effect it has. All columns after the included column are stored in a separate segment and do not appear in the index segment.

With this in mind, we can understand that the v1 column will be in the overflow segment for t3, but in the index segment for t4. The existence of an overflow segment accounts for the cost reported, and the difference in the content of the overflow accounts for the difference in the work done.

Overflow costing

When we have an overflow segment, Oracle can produce a clustering_factor for the index segment. The optimizer then assumes that ANY column – other than primary key columns – referenced in the select list must be in the overflow, whether or not that is in accordance with the object definition. This behaviour accounts for the large cost reported for both queries: the data scatter in the overflow segment is very bad because of the way I generated the data, and the optimizer assumes that the v1 value will require a visit to the overflow in both queries.

Run-time resources

When I run the queries, Oracle finds the v1 column in the index segment for t4 as it does the index fast full scan, so the work done is basically the buffer visits for each block in the index segment (plus a couple extra for the segment header visits); however, for each index entry visited in the t3 fast full scan Oracle finds that it has to jump to a different overflow block from the previous one it was looking at, so the number of block visits is essentially the same as the number of rows in the table. (autotrace and tkprof didn’t quite agree on the number of blocks visited – which is why the autotrace figure is a little lower than you might expect.)


The included clause requires a little more subtlety than just checking “columns after the included column”. The internal (data dictionary) column order is not necessarily the same as the visible order when you create or describe the object. One of the critical differences is that Oracle moves the primary key columns to be the first columns in the internal order; it will also move a long column (if you have one) to be the last column.