Tuning

Exadata CAN do smart scans on bitmap indexes

As I’m finishing up a performance chapter for the Exadata book (a lot of work!), I thought to take a quick break and write a blog entry.

This is not really worth putting into my Oracle Exadata Performance series (which so far has only 1 article in it anyway) .. so this is a little stand-alone article …

Everybody knows that the Exadata smart scan can be used when scanning tables (and table partitions). You should also know that smart scan can be used with fast full scan on Oracle B-tree indexes (a fast full scan on an index segment is just like a full table scan, only on the index segment (and ignoring branch blocks)).

For some reason there’s a (little) myth circulating that smart scans aren’t used for scanning bitmap indexes.

So, here’s evidence, that smart scan can be used when scanning bitmap indexes:

SQL> select /*+ tanel3 */ count(*) from t1 where owner like '%XYZXYZ%';

...

Plan hash value: 39555139

-----------------------------------------------------------------------------------
| Id  | Operation                             | Name        | E-Rows | Cost (%CPU)|
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |        |   505 (100)|
|   1 |  SORT AGGREGATE                       |             |      1 |            |
|   2 |   BITMAP CONVERSION COUNT             |             |    400K|   505   (0)|
|*  3 |    BITMAP INDEX STORAGE FAST FULL SCAN| BI_T1_OWNER |        |            |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - storage(("OWNER" LIKE '%XYZXYZ%' AND "OWNER" IS NOT NULL))
       filter(("OWNER" LIKE '%XYZXYZ%' AND "OWNER" IS NOT NULL))

So, as you see the execution plan sure shows a FAST FULL SCAN on a BITMAP INDEX segment, which happens to be on Exadata STORAGE.

Also, you see a storage() predicate applied on the line 3 of the execution plan, which means that Oracle will attempt to use a smart scan predicate offload – but this can’t always be done!

So, you can’t really determine whether a smart scan happened during execution just by looking into the execution plan, you should really check some V$SESSION statistics too. That’s where my Snapper script becomes handy.

I started Snapper on my session just before running the above query. The “smart table scan” and “smart index scan” performance counters are updated right after Oracle has opened the segment header and determines, from the number of blocks in the segment, whether to call the smart scan codepath or not. In other words, the smart scan counters are inremented in the beginning of the segment scan.

The output is following (some irrelevant counters are stripped for brevity):


@snapper all 5 1 "301"
Sampling SID 301 with interval 5 seconds, taking 1 snapshots...
setting stats to all due to option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
    301, TANEL     , STAT, physical read total IO requests                           ,         13,        2.6,
    301, TANEL     , STAT, physical read total multi block requests                  ,          4,         .8,
    301, TANEL     , STAT, physical read requests optimized                          ,          1,         .2,
    301, TANEL     , STAT, physical read total bytes optimized                       ,      8.19k,      1.64k,
    301, TANEL     , STAT, physical read total bytes                                 ,      4.63M,     925.7k,
    301, TANEL     , STAT, cell physical IO interconnect bytes                       ,     10.02k,         2k,
    301, TANEL     , STAT, physical reads                                            ,        565,        113,
    301, TANEL     , STAT, physical reads cache                                      ,          1,         .2,
    301, TANEL     , STAT, physical reads direct                                     ,        564,      112.8,
    301, TANEL     , STAT, physical read IO requests                                 ,         13,        2.6,
    301, TANEL     , STAT, physical read bytes                                       ,      4.63M,     925.7k,
    301, TANEL     , STAT, db block changes                                          ,          1,         .2,
    301, TANEL     , STAT, cell physical IO bytes eligible for predicate offload     ,      4.62M,    924.06k,
    301, TANEL     , STAT, cell physical IO interconnect bytes returned by smart scan,      1.82k,      364.8,
    301, TANEL     , STAT, cell blocks processed by cache layer                      ,        564,      112.8,
    301, TANEL     , STAT, cell blocks processed by txn layer                        ,        564,      112.8,
    301, TANEL     , STAT, cell blocks processed by index layer                      ,        564,      112.8,
    301, TANEL     , STAT, cell blocks helped by minscn optimization                 ,        564,      112.8,
    301, TANEL     , STAT, cell index scans                                          ,          1,         .2,
    301, TANEL     , STAT, index fast full scans (full)                              ,          1,         .2,
    301, TANEL     , STAT, index fast full scans (direct read)                       ,          1,         .2,
    301, TANEL     , STAT, bytes sent via SQL*Net to client                          ,        334,       66.8,
    301, TANEL     , STAT, bytes received via SQL*Net from client                    ,        298,       59.6,
    301, TANEL     , STAT, SQL*Net roundtrips to/from client                         ,          2,         .4,
    301, TANEL     , STAT, cell flash cache read hits                                ,          1,         .2,
    301, TANEL     , TIME, hard parse elapsed time                                   ,     1.17ms,    233.8us,      .0%, |          |
    301, TANEL     , TIME, parse time elapsed                                        ,      1.5ms,    300.2us,      .0%, |          |
    301, TANEL     , TIME, DB CPU                                                    ,       11ms,      2.2ms,      .2%, |          |
    301, TANEL     , TIME, sql execute elapsed time                                  ,     82.2ms,    16.44ms,     1.6%, |@         |
    301, TANEL     , TIME, DB time                                                   ,    84.36ms,    16.87ms,     1.7%, |@         |
    301, TANEL     , WAIT, enq: KO - fast object checkpoint                          ,    16.18ms,     3.24ms,      .3%, |          |
    301, TANEL     , WAIT, gc cr grant 2-way                                         ,      223us,     44.6us,      .0%, |          |
    301, TANEL     , WAIT, gc current grant 2-way                                    ,      136us,     27.2us,      .0%, |          |
    301, TANEL     , WAIT, cell smart index scan                                     ,    56.04ms,    11.21ms,     1.1%, |@         |
    301, TANEL     , WAIT, SQL*Net message to client                                 ,        7us,      1.4us,      .0%, |          |
    301, TANEL     , WAIT, SQL*Net message from client                               ,      4.42s,   884.47ms,    88.4%, |@@@@@@@@@ |
    301, TANEL     , WAIT, cell single block physical read                           ,      541us,    108.2us,      .0%, |          |
    301, TANEL     , WAIT, events in waitclass Other                                 ,     2.22ms,    443.2us,      .0%, |          |
--  End of Stats snap 1, end=2011-03-13 19:36:31, seconds=5

As you see from the above “cell index scans” statistic – indeed one index segment was scanned using the cell smart scan method.

So, I would rather call this feature “smart segment scan” to reflect that smart scan can scan more than just tables…

I guess one of the reasons why few people have seen smart bitmap index scans in action is that (single-column) bitmap indexes tend to be small. Smaller than corresponding table segments and B-tree index segments. On partitioned tables they’re much more likely going to be under the “_small_table_threshold” calculation which is used for determining whether to do a direct path full segment scan or not (yes, the _small_table_threshold applies to fast full index scan and fast full bitmap index scan too, not just table scans). So, it’s likely that Oracle chooses to do a regular, buffered full bitmap segment scan and thus won’t even consider using smart scan (as smart scans require direct path reads).

By the way – the direct path read (or not) decision is done per segment – not per object (like a table or index). So if you have 10 partitions in a table (or index), half of them are large, half are smaller, then Oracle may end up using direct path reads (and smart scan) on 5 of them and buffered (dumb) scan on the other 5. If you run something like Snapper on the session, then you’d see the smart scan counters go up by 5 only. As written above, Oracle decides whether to do direct path reads (and smart scan) right after opening the header block of a segment (partition) and reading out how many blocks this partition’s segment has below HWM.

The above applied to serial direct path reads – the Parallel Execution slaves should always read using direct path mode, right? …. Wrong :)

Well, partially wrong… In 11.2.0.2, if the parallel_degree_policy = manual, then yes, PX slaves behave like usual and always force a direct path read (and try to use a smart scan). However, with parallel_degree_policy = AUTO, which is the future of PX auto-management, Oracle can decide to do a buffered parallel scan instead, again disabling the use of smart scan…

One more note – I didn’t say anything about whether you should or should not use (bitmap) indexes on Exadata, it’s an entirely different discussion. I just brought out that the smart scan is used for scanning table segments, B-tree index segments and bitmap index segments if conditions are right.

And in the end I have to say…. that even with this evidence you can’t be fully sure that a smart scan was used throughout the entire segment, but more about this in the book and perhaps in a later blog article. We have interesting times ahead ;-)

Share

Exadata Training – I’ll be speaking at the 1-day UKOUG Exadata Special Event on 18th April

Hi all,

As my frequent readers know, I have promised to not travel anymore as it’s just too much hassle compared to the benefit of being “there”. This is why I’m going to fly to London on Monday, 18th April to speak at the UKOUG Exadata Special Event. This event is just too sexy to be missed, so I made an exception (the last one, I promise!)… and it’s probably going to be warmer there as well compared to where I am now :-)

I will be talking about what’s been my focus area for last year or so – Oracle Exadata Performance.

Dan Norris and Alex Gorbachev will be speaking there too, so it should end up being a pretty awesome event!

More details here:

My abstract is following:

#000000; font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;">#333333; font-family: arial,sans-serif; text-align: left; font-size: small;">Understanding Exadata Performance: Metrics and Wait Events
#000000; font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;">#333333; font-family: arial,sans-serif; text-align: left; font-size: small;">

In order to systematically troubleshoot and optimize Exadata performance, one must understand the meaning of its performance metrics.

This session provides a deep technical walkthrough of how Exadata IO and smart scans work and how to use relevant metrics for troubleshooting related performance issues. We will review both Exadata database and cell-level metrics, cell wait events and tools useful for troubleshooting. We will also look into metrics related to Exadata Hybrid Columnar Compression and the cell Flash Cache usage.

P.S. The reason why I called this post “Exadata Training” is that you’ll learn some real world practical stuff there… as opposed to the marketing material (and marketing material copy material) overdose out there… ;-)

Share

Oracle Exadata Performance series – Part 1: Should I use Hugepages on Linux Database Nodes?

There was a question in LinkedIn forum about whether Linux Hugepages should be used in Oracle Exadata Database layer, as they aren’t enabled by default during ACS install. I’m putting my answer into this blog entry – apparently LinkedIn forums have a limit of 4000 characters per reply… (interestingly familiar number, by the way…:)

So, I thought that it’s time to start writing my Oracle Exadata Performance series articles what I’ve planned for a while… with some war stories from the field, some stuff what I’ve overcome when researching for writing the Expert Oracle Exadata book etc.

I’ve previously published an article about Troubleshooting Exadata Smart Scan performance and some slides from my experience with VLDB Data Warehouse migrations to Exadata.

Here’s the first article (initially planned as a short response in LinkedIn, but it turned out much longer though):

As far as I’ve heard, the initial decision to not enable hugepages by default was that the hugepages aren’t flexible & dynamic enough – you’ll have to always configure the hugepages at OS level to match your desired SGA size (to avoid wastage). So, different shops may want radically different SGA sizes (larger SGA for single-block read oriented databases like transactional/OLTP or OLAP cubes), but smaller SGA for smart scan/parallel scan oriented DWs. If you configure 40GB of hugepages on a node, but only use 1GB of SGA, then 39GB memory is just reserved, not used, wasted – as hugepages are pre-allocated. AMM, using regular pages, will only use the pages what it touches, so there’s no memory wastage due to any pre-allocation issues…

So, Oracle chose to use an approach which is more universal and doesn’t require extra OS level configuration (which isn’t hard at all though if you pay attention, but not all people do). So, less people will end up in trouble with their first deployments although they might not be getting the most out of their hardware.

However, before enabling hugepages “because it makes things faster” you should ask yourself what exact benefit would they bring you?

There are 3 main reasons why hugepages may be useful in Linux:

1) Smaller kernel memory usage thanks to less PTEs thanks to larger pagesizes

This means less pagetable entries (PTEs) and less kernel memory usage. The bigger your SGA and the more processes you have logged on, the bigger the memory usage.

You can measure this in your case – just “grep Page /proc/meminfo” and see how big portion of your RAM has been used by “PageTables”. Many people have blogged about this, but Kevin Closson’s blog is probably the best source to read about this:

2) Lower CPU usage due to less TLB misses in CPU and soft page-fault processing when accessing SGA.

It’s harder to measure this on Linux with standard tools, although it is sure possible (on Solaris you can just run prstat -m to get microstate accounting and look into TFL,DFL,TRP stats).

Anyway, the catch here is that if you are running parallel scans and smart scans, then you don’t access that much of buffer cache in SGA at all, all IOs or smart scan result-sets are read directly to PGAs of server processes – which don’t use large pages at all, regardless of whether hugepages for SGA have been configured or not. There are some special cases, like when a block clone has to be rolled back for read consistency, you’ll have to access some undo blocks via buffer cache… but again this should be a small part of total workload.

So, in a DW, which using mostly smarts scans or direct path reads, there won’t be much CPU efficiency win from large pages as you bypass buffer cache anyway and use small pages of private process memory. All the sorting, hashing etc all happens using small pages anyway. Again I have to mention that on (my favorite OS) Solaris it is possible to configure even PGAs to use large pages (via _realfree_heap_pagesize_hint parameter) … so it’ll be interesting to see how this would help DW workloads on the Exadata X2-8 monsters which can run Solaris 11.

3) Lock SGA pages into RAM so they won’t be paged out when memory shortage happens (for whatever reason).

Hugepages are pre-allocated and never paged out. So, when you have extreme memory shortage, your SGAs won’t be paged out “by accident”. Of course it’s better to ensure that such memory shortages won’t happen – configure the SGA/PGA_AGGREGATE_TARGET sizes properly and don’t allow third party programs consume crazy amounts of memory etc. Of course there’s the lock_sga parameter in Oracle which should allow to do this on Linux with small pages too, but first I have never used it on Linux so I don’t know whether it works ok at all and also in 11g AMM perhaps the mlock() calls aren’t supported on the /dev/shm files at all (haven’t checked and don’t care – it’s better to stay away from extreme memory shortages). Read more about how the AMM MEMORY_TARGET (/dev/shm) works from my article written back in 2007 when 11g came out ( Oracle 11g internals – Automatic Memory Management ).

So, the only realistic win (for DW workload) would be the reduction of kernel pagetables structure size – and you can measure this using PageTables statistic in /proc/meminfo. Kevin demonistrated in his article that 500 connections to an instance with ~8 GB SGA consisting of small pages resulted in 7 GB of kernel pagetables usage, while the usage with large pages (still 500 connections, 8 GB SGA) was about 265 MB. So you could win over 6 GB of RAM, which you can then give to PGA_AGGREGATE_TARGET or to further inrease SGA. The more processes you have connected to Oracle, the more pagetable space is used… Similarly, the bigger the SGA is, the more pagetable space is used…

This is great, but the tradeoff here is manageability and some extra effort you have to put in to always check whether the large pages actually got used or not. After starting up your instance, you should really check whether the HugePages_Free in /proc/meminfo shrank and HugePages_Rsvd increased (when instance has just started up and Oracle hasn’t touched all the SGA pages yet, some pages will show up as Rsvd – reserved).

With a single instance per node this is trivial – you know how much SGA you want and pre-allocate the amount of hugepages for that. If you want to increase the SGA, you’ll have to shut down the instance and increase the Linux hugepages setting too. This can be done dynamically by issuing a command like echo N > /proc/sys/vm/nr_hugepages (where N is the number of huge pages), BUT in real life this may not work out well as if Linux kernel can’t free enough small pages from right physical RAM locations to consolidate 2 or 4 MB contiguous pages, the above command may fail to create the requested amount of new hugepages.

And this means you should restart the whole node to do the change. Note that if you increase your SGA larger to the number of hugepages (or you forget to increase the memlock setting in /etc/security/limits.conf accordingly) then your instance will silently just use the small pages, while all the memory pre-allocated for hugepages stays reserved for hugepages and is not usable for anything else!).

So, this may become more of a problem when you have multiple database instances per cluster node or you expect to start up and shut down instances on different nodes based on demand (or when some cluster nodes fail).

Long story short – I do configure hugepages in “static” production environments, to save kernel memory (and some CPU time for OLTP type environments using buffer cache heavily), also on Exadata. However for various test and development environments with lots of instances per server and constant action, I don’t bother myself (and the client) with hugepages and make everyone’s life easier… Small instances with small number of connections won’t use that many PTEs anyway…

For production environments with multiple database instances per node (and where failovers are expected) I would take the extra effort to ensure that whatever hugepages I have preallocated, won’t get silently wasted because an instance wants more SGA than the available hugepages can accommodate. You can do this by monitoring /proc/meminfo’s HugePage entries as explained above. And remember, the ASM instance (which is started before DB instances) will also grab itself some hugepages when it starts!

Share

Expert Oracle Exadata book – Alpha chapters available for purchase!

Hi,

Apress has made the draft versions of our Expert Oracle Exadata book available for purchase.

How this works is:

  1. You purchase the “alpha” version of the Expert Oracle Exadata book
  2. You get the access to draft/alpha PDF versions of some chapters now!
  3. As more chapters will be added and existing ones updated, you’ll receive an email and you can download these too
  4. You will get a PDF copy of the final book once it’s out!

This is an awesome deal if you can’t wait until the final launch and want to get ahead of the curve with your Exadata skills ;-)

Buy the alpha version of our Expert Oracle Exadata book from Apress here!

If you haven’t heard about this book earlier – I’m one of the 3 authors, writing it together with Kerry Osborne and Randy Johnson from Enkitec and our official tech reviewer is no other than THE Kevin Closson and we are also getting some (unofficial) feedback from Oracle database junkie Arup Nanda.

So this book will absolutely rock and if you want a piece of it now, order the alpha book above!

P.S. This hopefully also explains why I’ve been so quiet with my blogging lately – can’t write a book and do many other things at the same time… (at least if you want to do it well…)

Share

Advanced Oracle Troubleshooting v2.0 Online Deep Dives in April and May 2011

Due to a lot interest I’m going to do another run of my Advanced Oracle Troubleshooting v2.0 Online Deep Dive seminars in April and May (initially I had planned to do it no earlier than Sep/Oct…)

Check the dates & additional info out here:

P.S. People who already attended the AOT2 seminars last year – I will schedule the follow-up Q&A sessions in mid-March!

Share

Cost – again

Browsing through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index scans:

My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries.

Now I know that this is a statement that pretty much paraphrases something that Tom Kyte wrote on AskTom several years ago – but it’s wrong. As I’ve pointed out in the past, “Cost is Time”. The cost of a query represents the optimizer’s estimate of how long it will take that query to run – so it is perfectly valid to compare the cost of two queries to see which one the optimizer thinks will be faster but, thanks to limitations and defects in the optimizer it may not be entirely sensible to do so.

The point I want to address in this post though is the comment that “it’s valid to compare the cost of different plans, but not to compare the cost of two different queries”. Consider the following queries:

select
	t1.v1
from
	t1
where	t1.id in (
		select
			t2.id
		from	t2
		where	t2.n1 = 15
	)
;

select
	t1.v1
from
	t1
where	exists (
		select
			t2.id
		from	t2
		where	t2.n1 = 15
		and	t2.id = t1.id
	)
;

select
	t1.v1
from
	(
		select	distinct t2.id
		from	t2
		where	t2.n1 = 15
	) t2,
	t1
where
	t1.id = t2.id
;

select
	v1.v1
from	(
	select
		distinct
			t1.rowid,
			t2.id,
			t1.v1
	from
		t2, t1
	where
		t2.n1 = 15
	and	t1.id = t2.id
	) v1
;

Tables t1 and t2 have the following definitions, so all four queries are logically equivalent:

Name                    Null?    Type
----------------------- -------- ----------------
ID                      NOT NULL NUMBER
N1                      NOT NULL NUMBER
V1                               VARCHAR2(6)
PADDING                          VARCHAR2(100)

According to the claim, it is not valid to compare the costs that the optimizer gives you for these four different queries – but they are the same query. In principle the optimizer might transform the IN to an EXISTS, it might simply unnest, it might unnest and merge – so when the optimizer is “comparing different costs for the same query”, it will also be “comparing costs of different queries”.

Cost IS time – but only in theory. The “trick” to sorting out optimization problems lies in recognising where the optimizer model is not right for your data, or the optimizer arithmetic is too simplistic or has a bug.

Index join – 4

In a recent note I wrote about index joins I made a passing comment about limitations in the optimizer’s available strategies that might make you choose to write your code to emulate an index join through explicit SQL references.

Here are two SQL similar SQL statements (with execution plans) that demonstrate the initial problem – the first is just a restatement of the basic example I supplied in the first article:

create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from all_objects where rownum <= 3000
;

-- collect stats, compute, no histograms

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

select
	val1, val2
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    24 |    24 |
|*  1 |  VIEW                  | index$_join$_001 |     3 |    24 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    24 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    24 |    11 |
---------------------------------------------------------------------------

select
	val1, val2, rowid
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    60 |    17 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| INDJOIN |     3 |    60 |    17 |
|*  2 |   INDEX FULL SCAN           | IJ_V1   |   102 |       |     9 |
-----------------------------------------------------------------------

When we include the rowid in the query the optimizer stops using the index join – and it won’t even use the mechanism if we hint it. Apparently, for the purposes of analysing the query, Oracle doesn’t recognise the rowid as a column in the table and this automatically precludes the possibility of using the index join as the access method. So we have to use the manual rewrites I introduced in an earlier article.

You might wonder why this matters – but consider a case where a “perfect” index doesn’t exist for the following query:

select
	padding
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

The only access path available to the optimizer at this point is a fulll tablescan – but what if the two indexes are very small compared to the table; wouldn’t it be a good idea to use an index hash join between the two indexes to get a list of rowids and visit the table only for those rows. Unfortunately isn’t a path the optimizer can derive – so we might try something like:


select
	t.padding
from
	(
	select
		/*+
			index_join(ij ij_v1 ij_v2)
			no_merge
		*/
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	and	val2 between 50 and 150
	)	v1,
	indjoin	t
where
	t.rowid = v1.rowid
;

But, as we’ve just seen, you can’t do an index join if you select the rowid, so this code won’t follow the strategy we want. (In fact, when I tried it, there was something distinctly bug-like about the plan – but I won’t go into that now). But we can do the following:


select
	t.padding
from
	(
	select
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	)	v1,
	(
	select
		rowid
	from
		indjoin		ij
	where
		val2 between 50 and 150
	)	v2,
	indjoin	t
where
	v2.rowid = v1.rowid
and	t.rowid = v2.rowid
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |  1632 |    10 |
|   1 |  NESTED LOOPS               |         |     3 |  1632 |    10 |
|*  2 |   HASH JOIN                 |         |     3 |    96 |     7 |
|*  3 |    INDEX FAST FULL SCAN     | IJ_V1   |   102 |  1632 |     3 |
|*  4 |    INDEX FAST FULL SCAN     | IJ_V2   |   102 |  1632 |     3 |
|   5 |   TABLE ACCESS BY USER ROWID| INDJOIN |     1 |   512 |     1 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V2".ROWID="V1".ROWID)
   3 - filter("VAL1">=100 AND "VAL1"<=200)
   4 - filter("VAL2">=50 AND "VAL2"<=150)

It’s amazing what you can make the optimizer do (even without hinting) if you think about the mechanics underneath the basic operations.

[Further reading on Index Joins]

How to Lock SQL Profiles Generated by SQL Tuning Advisor

I’ve mentioned (many times) that I think SQL Profiles that are generated by the SQL Tuning Advisor (STA) tend to sour over time.

After seeing it happen at a few sites I began to wonder why. So first a few facts about the SQL Profiles that STA generates:

  1. They are simply a set hints that get applied to statements behind the scenes during parsing
  2. They consist mainly of OPT_ESTIMATE hints which modify optimizer calculations
  3. They also may contain direct statistics modification hints (COLUMN_STATS, TABLE_STATS)
  4. They usually contain a OPTIMIZER_FEATURES_ENABLED hint
  5. They very occasionally contain other environment type hints (FIRST_ROWS, etc…)
  6. They do not contain directive hints (FULL, INDEX, NESTED_LOOP, etc..)
  7. The names of STA profiles start with SYS_SQLPROF
  8. STA’s goal is to do a more through job of analyzing a SQL statement to get a better plan

I wrote a little query (sql_profile_distinct_hint.sql) to pull a list of hints from a 10g database along with the number of their occurrences and ran it on several production systems where STA Profiles had been created. Here’s the output from a  system that had 14 STA Profiles.

SQL> @sql_profile_distinct_hints
Enter value for profile_name: SYS_SQLPROF%
 
HINT                                                 COUNT(*)
-------------------------------------------------- ----------
COLUMN_STATS                                               13
FIRST_ROWS                                                  1
IGNORE_OPTIM_EMBEDDED_HINTS                                 1
INDEX_STATS                                                 1
OPTIMIZER_FEATURES_ENABLE                                  14
OPT_ESTIMATE                                              178
TABLE_STATS                                                 2

Notice that the vast majority of hints are of the OPT_ESTIMATE variety. Now let’s have a look at the actual hints contained in a STA Profile.

 
SYS@LAB112> @sql_profile_hints
Enter value for profile_name: SYS_SQLPROF_0126f1743c7d0005
 
HINT
------------------------------------------------------------------------------------------------
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE(default) 
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$6", SCALE_ROWS=2207.090256)
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$5", SCALE_ROWS=2261.586312)
COLUMN_STATS("KSO"."SKEW", "PK_COL", scale, length=5)
COLUMN_STATS("KSO"."SKEW", "COL1", scale, length=4 distinct=828841 nulls=12.8723033 min=1 max=1000000)
TABLE_STATS("KSO"."SKEW", scale, blocks=162294 rows=35183107.66)
 
7 rows selected.

So on this particular STA Profile, the OPT_ESTIMATE hint has been used to tell the optimizer to change the estimate of rows for table A in query block SEL$6 by multiplying it by 2207 (roughly). In addition, there are hints which are basically hard coding table stats and column stats. So as you can see, these hints, while they may be accurate when the Profile is created, are unlikely to remain accurate over the long haul. In fairness, the OPT_ESTIMATE hint does make sense in situations where the optimizer will never get a calculation correct because of a short coming in it’s abilities (correlated columns is a good example of this type of situation). And in those conditions, implementing a STA generated Profile is a valid long term approach. But in my experience this is the exception rather than the rule.

So what are STA Profiles good for? Well two things:

First, they are very good at showing us where the optimizer is having a problem. If you look at the hints that are generated, it is easy to identify the OPT_ESTIMATE hints where the scaling factors are off the chart (hint: anything with an exponent is a place where the optimizer is struggling). This is easy to do with my sql_profile_hints.sql script by the way. Here’s a set of OPT_ESTIMATE hints. Can you spot the place where the optimizer is really having a problem?

OPT_ESTIMATE(@"SEL$5DA710D3", INDEX_FILTER, "F"@"SEL$1", IDX$$_1AA260002, SCALE_ROWS=8.883203639e-06) 
OPT_ESTIMATE(@"SEL$5DA710D3", INDEX_SKIP_SCAN, "F"@"SEL$1", IDX$$_1AA260002, SCALE_ROWS=8.883203639e-06) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=4.446153275) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("C"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=7.884506683) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("E"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=25.60960842) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("F"@"SEL$1", "B"@"SEL$1"), SCALE_ROWS=26.34181566) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("F"@"SEL$1", "B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=839.9683673) 
OPT_ESTIMATE(@"SEL$5DA710D3", TABLE, "D"@"SEL$1", SCALE_ROWS=5.083144565e+11)
OPT_ESTIMATE(@"SEL$5", INDEX_SCAN, "C"@"SEL$5", ORDER_FG_ITEM_IX3, SCALE_ROWS=0.2507281101)

It’s the first two lines and whatever alias F refers to is our problem area. The OPT_ESTIMATE hint tells the optimizer to decrease it’s estimate by a factor of 8.883203639e-06. So the optimizer has vastly overestimated the rows that will be returned by the index.

Second, STA Profiles are sometimes capable of producing better plans. This is primarily due to the fact that STA can take as long as you give it to analyze a statement, making sure that all the optimizer’s calculations are correct. It does this by running various pieces of the statement and checking that the number of rows the optimizer has estimated are actually correct. Obviously this can take a while on complex statements, much longer than the optimizer is allowed when parsing a statement. But as I’ve already shown, the SQL Profiles that get created to enable those better plans have a pretty good chance of going sour on us over time.

Which leads me to the point of this post. We can have our cake and eat it too! We can create the SQL Profile as recommended by STA and then “lock” the plan into place by converting the OPT_ESTIMATE hints to directive type hints. I put the word “lock” in quotes because there is really no such thing as “locking” a plan. It’s just that using directive hints as opposed to OPT_ESTIMATE hints, significantly lowers the probability of the plan changing in the future. So how do we make this conversion. Well I have a script for that called lock_STA_profile.sql. Here’s an example showing how it works.

 
SYS@LAB112> @sql_profiles
Enter value for sql_text: 
Enter value for name: 
 
NAME                           CATEGORY        STATUS   SQL_TEXT                                                               FORCE
------------------------------ --------------- -------- ---------------------------------------------------------------------- -----
PROFILE_fgn6qzrvrjgnz          DEFAULT         DISABLED select /*+ index(a SKEW_COL1) */ avg(pk_col) from kso.skew a           NO
PROFILE_8hjn3vxrykmpf          DEFAULT         DISABLED select /*+ invalid_hint (doda) */ avg(pk_col) from kso.skew where col1 NO
PROFILE_69k5bhm12sz98          DEFAULT         DISABLED SELECT dbin.instance_number,        dbin.db_name, dbin.instance_name,  NO
PROFILE_8js5bhfc668rp          DEFAULT         DISABLED select /*+ index(a SKEW_COL2_COL1) */ avg(pk_col) from kso.skew a wher NO
PROFILE_bxd77v75nynd8          DEFAULT         DISABLED select /*+ parallel (a 4) */ avg(pk_col) from kso.skew a where col1 >  NO
PROFILE_7ng34ruy5awxq          DEFAULT         DISABLED select i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,i.flags,i.prope NO
PROF_6kymwy3guu5uq_1388734953  DEFAULT         ENABLED  select 1                                                               YES
PROFILE_cnpx9s9na938m_MANUAL   DEFAULT         ENABLED  select /*+ opt_param('statistics_level','all') */ * from kso.skew wher NO
PROF_79m8gs9wz3ndj_3723858078  DEFAULT         ENABLED  /* SQL Analyze(252,1) */ select avg(pk_col) from kso.skew              NO
PROFILE_9ywuaagwscbj7_GPS      DEFAULT         ENABLED  select avg(pk_col) from kso.skew                                       NO
PROF_arcvrg5na75sw_3723858078  DEFAULT         ENABLED  select /*+ index(skew@sel$1 skew_col1) */ avg(pk_col) from kso.skew wh NO
SYS_SQLPROF_01274114fc2b0006   DEFAULT         ENABLED  select i.table_owner, i.table_name, i.index_name, FUNCIDX_STATUS, colu NO
SYS_SQLPROF_0127d10ffaa60000   DEFAULT         ENABLED  select table_owner||'.'||table_name tname , index_name, index_type, st NO
SYS_SQLPROF_01281e513ace0000   DEFAULT         ENABLED  SELECT TASK_LIST.TASK_ID FROM (SELECT /*+ NO_MERGE(T) ORDERED */ T.TAS NO
coe_abwg9nwg8prsj_3723858078   DEFAULT         ENABLED                                                                         NO
PROF_84q0zxfzn5u6s_2650913906  TEST            ENABLED  select avg(pk_col) from kso.skew                                       NO
PROF_0pvj94afp6faw_FULL        DEFAULT         ENABLED  select /* test 1 hint */ avg(pk_col) from kso.skew a where col1 = 2222 NO
PROF_875qbqc2gw2qz_4201340344  DEFAULT         ENABLED  select /* NOT IN */ department_name                                    NO
PROF_09gdkwq1bs48h_167097056   DEFAULT         ENABLED  select /*+ index (skew skew_col3_col2_col1) */ count(*) from kso.skew  NO
PROFILE_4cp821ufcwvgc_moved    DEFAULT         ENABLED  select count(*) from kso.skew where col3 = '01-jan-10'                 NO
PROF_8wvgj0n4kh6dx_2650913906  DEFAULT         ENABLED  select avg(pk_col) from kso.skew a where col1 = 333333                 NO
PROFILE_g737q1pfmbvjj_moved    DEFAULT         ENABLED  select /*+ full (skew) */ avg(pk_col) from kso.skew where col1 = 13613 NO
PROFILE_cvdnr0b8dcxzz_MANUAL   DEFAULT         ENABLED  select /* aasdas */ avg(pk_col) from kso.skew where col1 = 136133      NO
PROF_719syuvrm29tq_931251584   DEFAULT         ENABLED  SELECT IOBJID, IDOBJID, INAME, IOWNER, IOWNERID, ISPACE, ITSNO, IFILEN NO
PROF_g4gp07gt2z920_105323984   DEFAULT         ENABLED  update sys.scheduler$_job set  last_start_date = :1, running_instance  NO
 
25 rows selected.
 
SYS@LAB112> @sql_profile_hints
Enter value for profile_name: SYS_SQLPROF_01281e513ace0000
 
HINT
------------------------------------------------------------------------------------------------
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE(default)
FIRST_ROWS(1)
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$6", SCALE_ROWS=2207.090256)
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$5", SCALE_ROWS=2261.586312)
 
5 rows selected.
 
SYS@LAB112> @find_sql
Enter value for sql_text: SELECT TASK_LIST.TASK_ID FROM (SELECT /*+ NO_MERGE(T) ORD%
Enter value for sql_id: 
 
SQL_ID         CHILD  PLAN_HASH EXECS AVG_ETIME  AVG_LIO SQL_TEXT
------------- ------ ---------- ----- --------- -------- --------------------------------------------------
bqfx5q2jas08u      0 2496534803    86       .00       12 SELECT TASK_LIST.TASK_ID FROM (SELECT /*+ NO_MERGE
                                                         (T) ORDERED */ T.TASK_ID FROM (SELECT * FROM DBA_A
                                                         DVISOR_TASKS ORDER BY TASK_ID DESC) T, DBA_ADVISOR
                                                         _PARAMETERS_PROJ P1, DBA_ADVISOR_PARAMETERS_PROJ P
                                                         2 WHERE T.ADVISOR_NAME='ADDM' AND T.STATUS = 'COMP
                                                         LETED' AND T.EXECUTION_START >= (SYSDATE - 1) AND
                                                         T.HOW_CREATED = 'AUTO' AND T.TASK_ID = P1.TASK_ID
                                                         AND P1.PARAMETER_NAME = 'INSTANCE' AND P1.PARAMETE
                                                         R_VALUE = SYS_CONTEXT('USERENV','INSTANCE') AND T.
                                                         TASK_ID = P2.TASK_ID AND P2.PARAMETER_NAME = 'DB_I
                                                         D' AND P2.PARAMETER_VALUE = TO_CHAR(:B1 ) ORDER BY
                                                          T.TASK_ID DESC) TASK_LIST WHERE ROWNUM = 1
 
 
1 row selected.
 
SYS@LAB112> @dplan
Enter value for sql_id: bqfx5q2jas08u
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  bqfx5q2jas08u, child number 0
-------------------------------------
SELECT TASK_LIST.TASK_ID FROM (SELECT /*+ NO_MERGE(T) ORDERED */
T.TASK_ID FROM (SELECT * FROM DBA_ADVISOR_TASKS ORDER BY TASK_ID DESC)
T, DBA_ADVISOR_PARAMETERS_PROJ P1, DBA_ADVISOR_PARAMETERS_PROJ P2 WHERE
T.ADVISOR_NAME='ADDM' AND T.STATUS = 'COMPLETED' AND T.EXECUTION_START
>= (SYSDATE - 1) AND T.HOW_CREATED = 'AUTO' AND T.TASK_ID = P1.TASK_ID
AND P1.PARAMETER_NAME = 'INSTANCE' AND P1.PARAMETER_VALUE =
SYS_CONTEXT('USERENV','INSTANCE') AND T.TASK_ID = P2.TASK_ID AND
P2.PARAMETER_NAME = 'DB_ID' AND P2.PARAMETER_VALUE = TO_CHAR(:B1 )
ORDER BY T.TASK_ID DESC) TASK_LIST WHERE ROWNUM = 1
 
Plan hash value: 2496534803
 
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                        |       |       |     9 (100)|          |
|*  1 |  COUNT STOPKEY                     |                        |       |       |            |          |
|   2 |   VIEW                             |                        |     2 |    26 |     9   (0)| 00:00:01 |
|   3 |    NESTED LOOPS                    |                        |       |       |            |          |
|   4 |     NESTED LOOPS                   |                        |     2 |   240 |     9   (0)| 00:00:01 |
|*  5 |      FILTER                        |                        |       |       |            |          |
|   6 |       NESTED LOOPS OUTER           |                        |     2 |   188 |     7   (0)| 00:00:01 |
|   7 |        NESTED LOOPS                |                        |     2 |   126 |     5   (0)| 00:00:01 |
|*  8 |         TABLE ACCESS BY INDEX ROWID| WRI$_ADV_TASKS         |     2 |    74 |     3   (0)| 00:00:01 |
|   9 |          INDEX FULL SCAN DESCENDING| WRI$_ADV_TASKS_PK      |   822 |       |     2   (0)| 00:00:01 |
|* 10 |         TABLE ACCESS BY INDEX ROWID| WRI$_ADV_PARAMETERS    |     1 |    26 |     1   (0)| 00:00:01 |
|* 11 |          INDEX UNIQUE SCAN         | WRI$_ADV_PARAMETERS_PK |     1 |       |     0   (0)|          |
|* 12 |        TABLE ACCESS BY INDEX ROWID | WRI$_ADV_EXECUTIONS    |     1 |    31 |     1   (0)| 00:00:01 |
|* 13 |         INDEX UNIQUE SCAN          | WRI$_ADV_EXECS_PK      |     1 |       |     0   (0)|          |
|* 14 |      INDEX UNIQUE SCAN             | WRI$_ADV_PARAMETERS_PK |     1 |       |     0   (0)|          |
|* 15 |     TABLE ACCESS BY INDEX ROWID    | WRI$_ADV_PARAMETERS    |     1 |    26 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)
   5 - filter((DECODE(NVL("E"."STATUS","A"."STATUS"),1,'INITIAL',2,'EXECUTING',3,'COMPLETED',4,'INTER
              RUPTED',5,'CANCELLED',6,'FATAL ERROR')='COMPLETED' AND
              NVL("E"."EXEC_START","A"."EXEC_START")>=SYSDATE@!-1))
   8 - filter(("A"."ADVISOR_NAME"='ADDM' AND "A"."HOW_CREATED"='AUTO' AND
              BITAND("A"."PROPERTY",6)=4))
  10 - filter("A"."VALUE"=TO_CHAR(:B1))
  11 - access("A"."ID"="A"."TASK_ID" AND "A"."NAME"='DB_ID')
  12 - filter("A"."ADVISOR_ID"="E"."ADVISOR_ID")
  13 - access("A"."ID"="E"."TASK_ID" AND "A"."LAST_EXEC_NAME"="E"."NAME")
  14 - access("A"."ID"="A"."TASK_ID" AND "A"."NAME"='INSTANCE')
  15 - filter("A"."VALUE"=SYS_CONTEXT('USERENV','INSTANCE'))
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
   - SQL profile SYS_SQLPROF_01281e513ace0000 used for this statement
 
 
57 rows selected.
 
SYS@LAB112> @lock_STA_profile
Enter value for sql_id: bqfx5q2jas08u
Enter value for child_no (0): 0
Enter value for new_profile_name (PROF_sqlid_planhash): 
Enter value for force_matching (FALSE): 
 
PL/SQL procedure successfully completed.
 
SYS@LAB112> @sql_profiles
Enter value for sql_text: 
Enter value for name: %bqfx5q2jas08u%
 
NAME                           CATEGORY        STATUS   SQL_TEXT                                                               FORCE
------------------------------ --------------- -------- ---------------------------------------------------------------------- -----
PROF_bqfx5q2jas08u_2496534803  DEFAULT         ENABLED  SELECT TASK_LIST.TASK_ID FROM (SELECT /*+ NO_MERGE(T) ORDERED */ T.TAS NO
 
SYS@LAB112> @sql_profile_hints
Enter value for profile_name: PROF_bqfx5q2jas08u_2496534803
 
HINT
------------------------------------------------------------------------------------------------
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
DB_VERSION('11.2.0.1')
FIRST_ROWS(1)
NO_PARALLEL
OUTLINE_LEAF(@"SEL$86A1760A")
MERGE(@"SEL$5")
MERGE(@"SEL$532C0C35")
MERGE(@"SEL$6")
OUTLINE_LEAF(@"SEL$1")
OUTLINE(@"SEL$2")
OUTLINE(@"SEL$5")
OUTLINE(@"SEL$532C0C35")
MERGE(@"SEL$4")
OUTLINE(@"SEL$6")
OUTLINE(@"SEL$58B2FD6B")
ELIMINATE_OBY(@"SEL$3")
OUTLINE(@"SEL$4")
OUTLINE(@"SEL$3")
NO_ACCESS(@"SEL$1" "TASK_LIST"@"SEL$1")
INDEX_DESC(@"SEL$86A1760A" "A"@"SEL$4" ("WRI$_ADV_TASKS"."ID"))
INDEX_RS_ASC(@"SEL$86A1760A" "A"@"SEL$6" ("WRI$_ADV_PARAMETERS"."TASK_ID" "WRI$_ADV_PARAMETERS"."NAME"))
INDEX_RS_ASC(@"SEL$86A1760A" "E"@"SEL$4" ("WRI$_ADV_EXECUTIONS"."TASK_ID" "WRI$_ADV_EXECUTIONS"."NAME"))
INDEX(@"SEL$86A1760A" "A"@"SEL$5" ("WRI$_ADV_PARAMETERS"."TASK_ID" "WRI$_ADV_PARAMETERS"."NAME"))
LEADING(@"SEL$86A1760A" "A"@"SEL$4" "A"@"SEL$6" "E"@"SEL$4" "A"@"SEL$5")
USE_NL(@"SEL$86A1760A" "A"@"SEL$6")
USE_NL(@"SEL$86A1760A" "E"@"SEL$4")
USE_NL(@"SEL$86A1760A" "A"@"SEL$5")
NLJ_BATCHING(@"SEL$86A1760A" "A"@"SEL$5")
 
29 rows selected.
 
SYS@LAB112> @sql_profiles
Enter value for sql_text: 
Enter value for name: SYS%
 
NAME                           CATEGORY        STATUS   SQL_TEXT                                                               FORCE
------------------------------ --------------- -------- ---------------------------------------------------------------------- -----
SYS_SQLPROF_01274114fc2b0006   DEFAULT         ENABLED  select i.table_owner, i.table_name, i.index_name, FUNCIDX_STATUS, colu NO
SYS_SQLPROF_0127d10ffaa60000   DEFAULT         ENABLED  select table_owner||'.'||table_name tname , index_name, index_type, st NO
SYS_SQLPROF_01281e513ace0000   SAVED           ENABLED  SELECT TASK_LIST.TASK_ID FROM (SELECT /*+ NO_MERGE(T) ORDERED */ T.TAS NO
 
3 rows selected.
 
SYS@LAB112> @sql_profile_hints
Enter value for profile_name: SYS_SQLPROF_01281e513ace0000
 
HINT
------------------------------------------------------------------------------------------------
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE(default)
FIRST_ROWS(1)
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$6", SCALE_ROWS=2207.090256)
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$5", SCALE_ROWS=2261.586312)
 
5 rows selected.
 
SYS@LAB112>

So in this example I listed all the SQL Profiles in existence on the system (using sql_profile.sql). Then I showed the hints associated with STA Profile, SYS_SQLPROF_01281e513ace0000 with sql_profile_hints.sql. Then I located the sql statement in v$sql using the find_sql.sql script. Then I used dbms_xplan (via the dplan.sql script) to show the plan for the statement (proving that it was using the STA Profile). Then I used the lock_STA_profile.sql script to create a directive hint based Profile in place of the OPT_ESTIMATE hint based Profile. Then I showed the hints for the new SQL Profile. Note that the original STA Profile is not dropped, but rather moved to the SAVED category, so you can still look at its hints as I have done at the end of this example.

So that’s it. This is a complex topic and I have blogged about it before on numerous occasions. You may want to look back at this post, Oracle Support Sanctions Manually Created SQL Profiles, to get a better feel for where the hints came from that are used to replace the OPT_ESTIMATE hints. By the way, Jonathan Lewis and Tom Kyte have also written about this feature. (I trust you can find them via Google)

Also, I have written a chapter on Plan Stability in the upcoming Apress book, Pro Oracle SQL. The chapter is 65 or so pages long and it covers SQL Profiles in depth, so if you are hungry for more info on this topic, I highly recommend it. ;)

You can pre-order the book here: Pro Oracle SQL (if you are so inclined)

It should be released in a few weeks.

Frequency Histograms – 6

In an earlier post on frequency histograms I described how Oracle creates an approximate histogram when dealing with character columns, and warned you that the strategy could lead to a couple of anomalies if you were unlucky. I’ve already published a note about one such anomaly that can occur with fairly long character strings, this note describes another anomaly that could appear in less extreme cases. Again, we start by constructing a data set.


create table t1  (v1 varchar2(42));

insert	into t1
select	'next day'
from	all_objects
where 	rownum <= 150
;

insert into t1
select	'same day'
from 	all_objects
where 	rownum <= 350
;

insert into t1
select 	'priority high'
from 	all_objects
where 	rownum <= 500
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'t1',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 254'
	);
end;
/

select
        endpoint_number,
        endpoint_number - nvl(prev_endpoint,0)  frequency,
        hex_val,
        chr(to_number(substr(hex_val, 2,2),'XX')) ||
        chr(to_number(substr(hex_val, 4,2),'XX')) ||
        chr(to_number(substr(hex_val, 6,2),'XX')) ||
        chr(to_number(substr(hex_val, 8,2),'XX')) ||
        chr(to_number(substr(hex_val,10,2),'XX')) ||
        chr(to_number(substr(hex_val,12,2),'XX')),
        endpoint_actual_value
from    (
        select
                endpoint_number,
                lag(endpoint_number,1) over(
                        order by endpoint_number
                )                                                       prev_endpoint,
                to_char(endpoint_value,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')hex_val,
                endpoint_actual_value
        from
                user_tab_histograms
        where
                table_name = 'T1'
        and     column_name = 'V1'
        )
order by
        endpoint_number
;

ENDPOINT_NUMBER  FREQUENCY HEX_VAL                         CHR(TO ENDPOINT_ACTUAL_VALUE
--------------- ---------- ------------------------------- ------ ------------------------------------------
            150        150  6E65787420644D6D50DD04A6400000 next d
            650        500  7072696F72698970D6651967200000 priori
           1000        350  73616D6520645C36B7AD479D600000 same d

3 rows selected.

This example captures a histogram at a point when my data set has only three popular values in it, and those values are sufficiently different that their “6-byte encoding” is enough for Oracle to tell the difference between them so the histogram doesn’t use the endpoint_actual_value column.

But I have a problem. During the course of the working day I introduce some data with different status codes, and all the queries I run are about these status codes, but by the end of the day (when, possibly, the stats will be collected again by the automatic stats collection job) all the “interesting” data has been updated to one of the three popular values.

What does Oracle do about queries for my “rare” values ? It depends what they look like. Here’s a couple of examples from 10.2.0.3:

select
	count(*)
from
	t1
where
	v1 = 'overnight'
;

select
	count(*)
from
	t1
where
	v1 = 'priority low'
;

set autotrace off

Neither of these values is in the histogram, so I would like Oracle to give a low cardinality estimate to them. But one of the values is very similar to a value that DOES exist in the histogram. This leads to an interesting variation in the numbers that appear in the execution plans:

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    11 |     2 |
|   1 |  SORT AGGREGATE    |      |     1 |    11 |       |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |    11 |     2 |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1"='overnight')

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    11 |     2 |
|   1 |  SORT AGGREGATE    |      |     1 |    11 |       |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  5500 |     2 |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1"='priority low')

In the first case (‘overnight’) the optimizer has decided that the value doesn’t exist in the histogram. Since this database is running 10.2.0.3 it has set the resulting cardinality to 1 (this would be 75 in 10.2.0.4 or 11g – half the lowest known value in the histogram). But in the second case (‘priority low’) the optimizer has done its 6-byte manipulation and decided that ‘priority low’ matches ‘priority high’, and given it a cardinality of 500.

If Oracle had seen ‘priority low’ when creating the histogram it would have used the endpoint_actual_value column and generated two separate rows for ‘priority low’ and ‘priority high’. But it is perfectly feasible that because of timing (the interesting data exists only briefly during the day) or sampling (the interesting data is tiny compared to the popular values) you could see Oracle taking a “large data” execution path when you know that you’re only expecting to handle a small data set.

The solution, of course, is to write code to create a histogram that represents the situation as you want Oracle to see it.

Good Nulls

I’ve often been heard to warn people of the accidents that can happen when they forget about the traps that appear when you start allowing columns to be NULL – but sometimes NULLs are good, especially when it helps Oracle understand where the important (e.g. not null) data might be.

An interesting example of this came up on OTN a few months ago where someone was testing the effects of changing a YES/NO column into a YES/NULL column (which is a nice idea because it allows you to create a very small index on the YESes, and avoid creating a histogram to tell the optimizer that the number of YESes is small).

They were a little puzzled, though, about why their tests showed Oracle using an index to find data in the YES/NO case, but not using the index in the YES/NULL case. I supplied a short explanation on the thread, and was planning to post a description on the blog, but someone on the thread supplied a link to AskTom where Tom Kyte had already answered the question, so I’m just going to leave you with a link to his explanation.