Search

Top 60 Oracle Blogs

Recent comments

Tuning

Training Schedule for 2011 and Public Appearances

Online Seminars
A lot of people have asked me about whether I’d be doing any more seminars in the future. And the answer is yes – at least this year (might be too busy running a company the next year ;-)
I have finally put together the schedule for my 2011 seminars. In addition to the Advanced Oracle Troubleshooting seminar I will also deliver my Advanced Oracle SQL Tuning and Oracle Partitioning and Parallel Execution for Performance seminars, which I have done only onsite in past.
So, check out the seminars page:
Also don’t forget the Expert Oracle Exadata virtual conference next week!
Public Appearances

Oracle OpenWorld 2. October
  • I will talk about Large-Scale Consolidation onto Oracle Exadata: Planning, Execution, and Validation
  • Session ID 09355
Maybe I’ll lurk around the UKOUG venue as well in december ;-)

Cardinality Feedback

I ran into an interesting issue last week having to do with plan stability. The problem description went something like this:

“I have a statement that runs relatively quickly the first time I run it (around 12 seconds). Subsequent executions always run much slower, usually around 20 or 30 minutes. If I flush the shared pool and run it again elapsed time is back to 12 seconds or so.”

The query looked a little like this:

Embarcadero Performance Panel

Karen Morton, Cary MIllsap and I will be participating in a on-line panel discussion about Oracle Performance on July 28th. Since we all worked together in the past we thought it would be a fun to listen to each other answer questions. Embarcadero is sponsoring this event and invited us to participate. Here’s a graphic from the mailer they sent out.

I only point it out because Cary and Karen look like they are posing for a picture, while I, as usual, look like someone just poured a drink down my pants. That’s normal though. I’ve been told I have a great face for radio.

Indexing

A question about partitioning came up on OTN a few days ago – but the really interesting part of the thread was about the choice of indexes, and how the choice of partitioning, combined with the data patterns, could make a big difference to the choice of indexes. I think the comments I made about indexes are worth seeing, so I’ve linked to the thread.


The First Exadata Virtual Conference in the World!

We have been secretly planning something with Kerry Osborne – and now it’s official – we will host The First Exadata Virtual Conference in the World, on 3-4 August 2011.

This conference takes place a couple of weeks after our Expert Oracle Exadata book is published (on 18. July – check out the awesome new cover design). So, we thought it’d be a good idea to run this conference, where we can explain some things in a different format, do live demos and answer questions that attendees have.

On the first day Kerry and Randy will talk about some serious fundamentals of Exadata, like how Exadata Smart Scan Offloading works and how to make the IO resource manager work for you (especially important in mixed workload consolidated environments).

And on the second day we’ll dig even deeper, with Andy Colvin talking about how to survive Exadata patching (he has patched more Exadatas than anyone else I know) and me following up with some complex performance troubleshooting stories I’ve encountered recently (trust me – there’s a LOT of issues to troubleshoot ;-)

About the Conference:

Since its release, Oracle Exadata quickly became a hit. Due to the relative “youth” of Exadata technology and internal behavior changes introduced with frequent patch-sets, there’s not much up-to-date quality technical information and know-how available to public. This virtual conference brings you a chance to learn from the leading Exadata experts, from their experience of working with real Exadata environments, from Exadata V1 to the latest X2-8. Additionally, there is plenty of Q&A time scheduled, so you can also get answers to your Exadata-related questions.

The speakers are probably some of the most experienced Exadata consultants in the world, in the field of Exadata deployment, migration, performance, and troubleshooting. Also, Kerry, Randy and Tanel are the authors of the #2970a6; text-decoration: none; padding: 0px; margin: 0px;\" href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy5hcHJlc3MuY29tLzk3ODE0MzAyMzM5MjM=">Expert Oracle Exadata book published by Apress in July 2011.

Dates:

  • 3-4 August 2011

Location:

  • Online (or should I say “the Cloud” ;-)

Duration:

  • 8am – 12pm (PST) on both days – 2 x 1.5h sessions on each day, with Q&A sessions and a break in between

Speakers:

  • Kerry Osborne, Randy Johnson, Andy Colvin from Enkitec
  • Tanel Poder from right here :-)

All of the speakers are hard-core hands-on professionals, having worked on many different real-life (production) Exadata environments of their clients. Enkitec dudes didn’t stop there, they bought a half rack for themselves, just for playing around with it. Yeah (+1 from me), some people buy a red hot Ferrari, some buy a red hot computer rack with an X on it :-)

Price:

  • 375 USD (early bird until 22. July), 475 regular price

More information, abstracts and registration:

I don’t think you’ll find an Exadata learning opportunity like this from anywhere else (and any time soon), especially considering the price!

This conference is so hot, that one of the attendees managed to sign up to it even before I had published this page to the world! :-)


IOUG Select Journal Editor’s Choice Award 2011

In May I received the IOUG Select Journal Editor’s Choice Award for my Systematic Oracle Latch Contention Troubleshooting article where I introduced my LatchProfX tool for advanced drilldown into complex latch contention problems (thanks IOUG and John Kanagaraj!).

As the relevant IOUG webpage hasn’t been updated yet, I thought to delay this announcement until the update was done – but I just found an official enough announcement (press release) by accident from Reuters site:

Woo-hoo! :-)

The article itself is here:

Thanks to IOUG crew, John Kanagaraj and everyone else who has read, used my stuff and given feedback! :-)

Knowing what you want to achieve before thinking of how to achieve it – a query optimization example

Today I received a question which was a good example of systematic problem solving approach. It was about getting a long-running query to run faster. It took a long time as the correlated subquery in the query was not unnested, was re-visited many times, causing the whole subquery subtree in the plan to be executed again and again). The main part of the question was this:

Is there a way to avoid “NOT IN” conversion to “NOT EXISTS” by optimizer … My sub query, I would like it to be fully instantiated as view and then execute it as a Hash Anti join.

The first part of the above question sounds like the type of question I hear quite frequently (e.g. how to force Oracle use an index or adjust some feature of the execution plan – without any extra context information). These kind of questions are often the result of the developer not being familiar with the data and hoping to optimize the query by adding some “silver bullet” hint.

But the second part of the above question (in bold) clearly shows that the asker had done his homework and knew exactly what he was trying to achieve. In other words, the asker knew the data model, the data (amount and distribution), understood what the query was trying to achieve and finally knew roughly how many rows would be returned from different tables for the given query predicates. And based on that knowledge and understanding of how Oracle SQL plan execution works, he knew that it’s not a good idea to use the nested FILTER operator for correlated subquery lookups, as if the parent query returns millions of rows, you’d potentially have to re-visit the subquery branch of execution plan tree millions of times too, revisiting the same blocks again and again, driving up (logical) IOs, CPU usage etc.

The asker also knew that Oracle can use a special variation of hash join for performing anti-joins (NOT IN, NOT EXISTS style subqueries get unnested and are executed using a join mechanism instead of correlated lookups using FILTER loop). The asker also knew that it’s much more efficient to join / compare millions of rows with a single hash join run (visit source tables only once) as opposed to millions of correlated lookups revisiting “random” index & table blocks again and again.

As a result, the asker formulated what he was trying to change in the execution plan and then thought how to achieve it. This is the systematic way for tuning a SQL statement and it requires you to understand the query, database capabilities and definitely the data this query is using. The problem was that regardless of the hints he tried, he couldn’t achieve the desired plan. (Note the “desired plan” – you know in advance what you want and use tools like hints for achieving that, not the other way around).

I came up with a small test case (as I often don’t remember things off the top of my head) to illustrate the correct hints for changing the plan the way he wanted:

SQL> CREATE TABLE t1 AS SELECT * FROM all_users;

Table created.

SQL> CREATE TABLE t2 AS SELECT * FROM all_users;

Table created.

SQL> CREATE TABLE t3 AS SELECT * FROM all_users;

Table created.

Now the minimal test query (its quite simplistic and note that it’s not even a correlated subquery as the subquery predicates do not reference parent query’s tables):

SELECT *
FROM t1
WHERE t1.user_id NOT IN (
    SELECT t2.user_id
    FROM t2, t3
    WHERE t2.username = t3.username
);

The CBO trace showed the final form of the query after transformations – and the uncorrelated NOT IN subquery was converted to a correlated NOT EXISTS subquery:

******* UNPARSED QUERY IS *******

SELECT /*+ */
    "SYS_ALIAS_1"."USERNAME" "USERNAME","SYS_ALIAS_1"."USER_ID" "USER_ID","SYS_ALIAS_1"."CREATED" "CREATED"
    FROM "SYSTEM"."T1" "SYS_ALIAS_1"
    WHERE NOT EXISTS (
        SELECT /*+ */ 0 FROM "SYSTEM"."T2" "T2","SYSTEM"."T3" "T3"
        WHERE "T2"."USER_ID"="SYS_ALIAS_1"."USER_ID" AND "T2"."USERNAME"="T3"."USERNAME"
    )

The execution plan uses a correlated FILTER loop for probing the subquery, even though the nature of this query doesn’t really require correlated access. As a result, the HASH JOIN in line #3 below and the table access rowsources under it were visited 26 times, in this case once per row returned from the parent query’s row sources (table T1in line #2):

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
|*  1 |  FILTER             |      |      1 |        |            |      0 |00:00:00.01 |     159 |       |       |          |
|   2 |   TABLE ACCESS FULL | T1   |      1 |     26 |     2   (0)|     26 |00:00:00.01 |       3 |       |       |          |
|*  3 |   HASH JOIN         |      |     26 |      1 |     5  (20)|     26 |00:00:00.01 |     156 |  1517K|  1517K|  301K (0)|
|*  4 |    TABLE ACCESS FULL| T2   |     26 |      1 |     2   (0)|     26 |00:00:00.01 |      78 |       |       |          |
|   5 |    TABLE ACCESS FULL| T3   |     26 |     26 |     2   (0)|    351 |00:00:00.01 |      78 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( IS NULL)
   3 - access("T2"."USERNAME"="T3"."USERNAME")
   4 - filter("T2"."USER_ID"=:B1)

The pseudo-bind variable :B1 reported in predicate section gives a further clue that a correlated lookup loop is used, the FILTER operation takes the next parent query row’s USER_ID value and passes it in to the filter predicate on line #4 above with every loop iteration. The execution plan unparser (which generates human readable execution plan texts) uses a naming convention so that such FILTER loop variables are reported as bind variables – but the query doesn’t actually use any real (user supplied binds). It’s just how these query plan explainers report things…

Anyway, as I knew what kind of change we wanted to achieve, I recommended to use the following hints to see whether they give the desired result.

SELECT *
FROM t1
WHERE t1.user_id NOT IN (
    SELECT /*+ UNNEST HASH_AJ(t2) */
        t2.user_id
    FROM t2, t3
    WHERE t2.username = t3.username
);

And now the important part – this wasn’t a wild guess (like “let’s see, maybe it helps”). I deliberately recommended these hints as they control the features we wanted to change.

The resulting unparsed query text after transformations was following – the subquery was gone (no NOT IN or NOT EXISTS anymore) and the subquery was unnested and converted into an inline view (in bold), joined to the parent query (using the where condition in the bottom of the below output):

******* UNPARSED QUERY IS *******

SELECT "T1"."USERNAME" "USERNAME","T1"."USER_ID" "USER_ID","T1"."CREATED" "CREATED"
FROM
    (SELECT /*+ UNNEST */
        "T2"."USER_ID" "$nso_col_1"
        FROM "SYSTEM"."T2" "T2","SYSTEM"."T3" "T3"
        WHERE "T2"."USERNAME"="T3"."USERNAME"
    ) "VW_NSO_1"
   ,"SYSTEM"."T1" "T1"
WHERE
    "T1"."USER_ID"="VW_NSO_1"."$nso_col_1"

A little detour:

Note that the “unparsed” query text in CBO tracefile is FYI only. It is not the real query text actually executed! It doesn’t contain all the little details required for executing the query right and returning correct data. Unparsing means extracting information from the binary execution plan and converting it into human (or DBA) readable form, but some of the information gets lost in the process. If you ran the above unparsed query, you would get resultset of a regular join, not the NOT IN antijoin originally requested. This is why you should never assume that the unparsed query is the correct text to use in your application. It’s just FYI, for debugging.

One of the pieces present in binary execution plan, but not in the unparsed query, is highlighted below. See the HASH JOIN ANTI - this ANTI in the binary plan says that don’t use a regular join (returning matching rows) but do exactly the opposite (return non-matching rows).


-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN ANTI      |          |      1 |      1 |     7  (15)|      0 |00:00:00.02 |       9 |  1023K|  1023K| 1176K (0)|
|   2 |   TABLE ACCESS FULL  | T1       |      1 |     26 |     2   (0)|     26 |00:00:00.01 |       3 |       |       |          |
|   3 |   VIEW               | VW_NSO_1 |      1 |     26 |     5  (20)|     26 |00:00:00.01 |       6 |       |       |          |
|*  4 |    HASH JOIN         |          |      1 |     26 |     5  (20)|     26 |00:00:00.01 |       6 |  1179K|  1179K| 1180K (0)|
|   5 |     TABLE ACCESS FULL| T2       |      1 |     26 |     2   (0)|     26 |00:00:00.01 |       3 |       |       |          |
|   6 |     TABLE ACCESS FULL| T3       |      1 |     26 |     2   (0)|     26 |00:00:00.01 |       3 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."USER_ID"="$nso_col_1")
   4 - access("T2"."USERNAME"="T3"."USERNAME")

When you look into the Starts column in the above plan, you see that now all the tables were scanned only once (that’s what hash joins do). So, no more revisiting the same blocks again and again for each row returned from table(s) in the parent query. Also, note the new VIEW row source which Oracle has injected in between the parent query and subquery tables – this is to make sure that the subquery tables don’t get merged with parent query and joined in a wrong order / possibly with the ANTI-join done in the wrong place. In the stored outline hints, such injection will show up as a NO_ACCESS hint (telling optimizer that the parent query block does not have access into a query block under it for merging its contents).

In any case, now my little test query visited much less buffers, did less work, did not repeatedly scan through the same tables again. Of course, this was just a synthetic test query – but nevertheless, applying the same hints on the real problem query made it run in 2 minutes instead of multiple hours.

Now, there are some outstanding questions here, like whether it’s a good idea to tune a query with hints and whether the optimizer statistics were representative of the reality and so on. Why didn’t optimizer come up with the best plan itself? Was this due to a bug, etc.

But all this isn’t the point of my blog post – my point is that whenever you want to systematically fix or improve something, you will have to know what you are trying to achieve first, also understand why should this actually work and then you’ll pick the best tools for achieving that goal (how). As the guy who asked the question had already done most of the work – figuring out how the plan should be executed for best efficiency – finding the way to achieve that was easy.

When manually optimizing SQL, this does require that you actually care about your work enough to take the time to undestand the capabilities of your database engine, the query and the data. Of course you have to be smart about where you optimize manually and where you should try to get Oracle to do the right thing by itself. Phew, I’ve got to write about that stuff some other day :-)

Update: Thanks to Jonathan Lewis’es comment I drilled down to this example further and it turns out that in this case just using UNNEST hint would be enough (and HASH_AJ is not needed) as after the unnesting is forced, the CBO figures the best (anti)join method out from there. Note that this post wasn’t about hints, it was about emphasizing that if you want to systematically tune a SQL statement execution plan, you should first figure out what it should be doing (which join order, -methods, access paths etc) and then find a way to achieve that – in my case it was done with hints.

P.S. I will schedule my Advanced Oracle SQL Tuning online course in september or october – stay tuned ;-)

Video

Here’s a video of the presentation that I did as the keynote for the Turkish Oracle Users’ Group conference earlier on this year. Recording courtesy of the people at Formspider.

The topic is about tuning a two-table join, and the video is about an hour long (so there’s a lot more that could be  said on the topic, and one day I might do a half  day “masterclass” on it).

Optimisation

A question came up on Oracle-L recently about the difference in work done by the following two queries:

SELECT /*+ RULE */
	DOM_NAME
FROM
	DOMAINS,
	TABLE(CAST(:B1 AS DOMAIN_LIST)) DL
WHERE
	DOM_NAME = DL.COLUMN_VALUE
;

SELECT
	DOM_NAME
FROM
	DOMAINS
WHERE
	DOM_NAME IN (
		SELECT	COLUMN_VALUE
		FROM	TABLE(CAST(:B1 AS  DOMAIN_LIST))
	)
;

Before saying anything else, I should point out that these two queries are NOT logically equivalent unless you can guarantee that the table() operator returns a unique set of values – and Oracle doesn’t allow uniqueness to be enforced on collections.

However, the author of the question had ensured that the results from the table() operator contained no duplicates so the two versions of the query returned the same result set. The question is, why does one query do fewer buffer visits than the other – as evidenced by the results from sql_trace and tkprof.

Rule-based Join
===============

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.01       0.01          0          0          0           0
Fetch        1      0.86       0.86          0     200047          0      115195
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.87       0.87          0     200047          0      115195

Rows     Row Source Operation
-------  ---------------------------------------------------
 115195  NESTED LOOPS  (cr=200047 pr=0 pw=0 time=6355 us)
  99704   COLLECTION ITERATOR PICKLER FETCH (cr=0 pr=0 pw=0 time=284 us)
 115195   INDEX RANGE SCAN DOM_NAME_IDX (cr=200047 pr=0 pw=0 time=3191 us)(object id 54309)

CBO with subquery
=================

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.01       0.01          0          0          0           0
Fetch        1      0.78       0.78          0     157986          0      115195
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.80       0.80          0     157986          0      115195

Rows     Row Source Operation
-------  ---------------------------------------------------
 115195  NESTED LOOPS (cr=157986 pr=0 pw=0 time=4720 us cost=536 size=6141 card=267)
  99704   SORT UNIQUE (cr=0 pr=0 pw=0 time=417 us)
  99704    COLLECTION ITERATOR PICKLER FETCH (cr=0 pr=0 pw=0 time=259 us)
 115195   INDEX RANGE SCAN DOM_NAME_IDX (cr=157986 pr=0 pw=0 time=3353 us cost=2 size=21 card=1)(object id 54309)

Notice how the optimizer has unnested the subquery and introduced a “sort unique” – this demonstrates the optimizer’s recognition that the two queries are not logically identical. However, the number of rows from the collection is the same (99,704) and the number of rows after joining is the same (115,195) – the data sets have been rigged so that it is a fair comparison. So why does the explicit join take 200,047 buffer visits when the transformed  subquery approach take only 157,986 buffer visits.

The answer is that you can visit buffers without doing “logical I/Os” – and this benefit accrues most frequently to indexed access paths. Thanks to the way Oracle has sorted the collection before doing the join the nested loop follows the second index (dom_name_idx) in order – this increases the chance that index blocks that were used in the previous cycle of the loop will be re-used in the next cycle, which means that Oracle need not release the pins on the index blocks, but can revisit them incrementing the statitsic: “buffer is pinned count”.

I have to say that I haven’t proved that this is exactly what was happening, but when I suggested to the person who had asked the question that he re-run and check to see if the decrease in “session logical reads” was balanced by an increase in “buffer is pinned count” the answer seemed to confirm the hypothesis. [Correction (see comment #1 from Randolf): the results didn't confirm the hypothesis - but I think that' s because of the change in "pinning" that Randolf describes, so I'll have to find a way to confirm the hypothesis some other time.]

 

Star Transformation – 2

After publishing my note about emulating star transformations when there was something blocking the optimizer’s ability to do them automatically I ended up having an interesting email conversation with Mark Farnham about variations on the theme, concluding with a completely different (and possibly more intuitive) SQL statement to achieve the same aim.

You will recall that I started off with the following statement:

select
	ord.*
from
	products	prd,
	customers	cst,
	orders		ord
where
	prd.grp = 50
and	cst.grp = 50
and	ord.id_prd = prd.id
and	ord.id_cst = cst.id
;

Although this specific example is so simple that the optimizer could turn it into a star transformation automatically, I used it to demonstrate a “minimum work” version of the statement as follows (with the execution plan from 11.1.0.6):

select
	ord.*
from
	(
	select
		/*+
			leading(prd ord)
			use_nl(ord)
			no_merge
		*/
		ord.rowid 	prid
	from
		products	prd,
		orders		ord
		where
		prd.grp = 50
	and	ord.id_prd = prd.id
		)	prid,
	(
	select
		/*+
			leading(cst ord)
			use_nl(ord)
			no_merge
		*/
		ord.rowid 	crid
	from
		customers	cst,
		orders		ord
	where
		cst.grp = 50
	and	ord.id_cst = cst.id
	)	crid,
	orders	ord
where
	prid.prid = crid.crid
and	ord.rowid = prid.prid
;

--------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |    11 |  1650 |   570 |
|   1 |  NESTED LOOPS               |            |    11 |  1650 |   570 |
|*  2 |   HASH JOIN                 |            |    11 |   264 |   559 |
|   3 |    VIEW                     |            |  3361 | 40332 |   277 |
|   4 |     NESTED LOOPS            |            |  3361 | 87386 |   277 |
|*  5 |      TABLE ACCESS FULL      | CUSTOMERS  |    98 |   882 |    81 |
|*  6 |      INDEX RANGE SCAN       | ORD_CST_FK |    34 |   578 |     2 |
|   7 |    VIEW                     |            |  3390 | 40680 |   281 |
|   8 |     NESTED LOOPS            |            |  3390 | 88140 |   281 |
|*  9 |      TABLE ACCESS FULL      | PRODUCTS   |   100 |   900 |    81 |
|* 10 |      INDEX RANGE SCAN       | ORD_PRD_FK |    34 |   578 |     2 |
|  11 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PRID"."PRID"="CRID"."CRID")
   5 - filter("CST"."GRP"=50)
   6 - access("ORD"."ID_CST"="CST"."ID")
   9 - filter("PRD"."GRP"=50)
  10 - access("ORD"."ID_PRD"="PRD"."ID")

Go back to the original SQL statement, though. The select list contains columns from just the orders table, and there’s a fairly well-known suggestion (or possibly guideline) that “tables that are not in the select list should not appear in the from clause, they should appear in subqueries in the where clause”. It’s not difficult to come up with examples where this strategy is a bad idea – but it’s often worth thinking through the consequences of trying to apply it. (Of course, you often find that after you’ve rewritten your SQL to match the strategy the optimizer transforms it back into the join that you had been avoiding.)

Since the products and customers tables don’t appear in the select list, can I find a way of rewriting my statement with subqueries ? The answer is yes. Here’s the SQL, with the execution plan I got.

select
	ord.*
from
	orders ord
where
	ord.rowid in (
		select
			/*+
				no_use_hash_aggregation
			*/
			prid.prid
		from
			(
			select	/*+ no_merge */
				ord.rowid 	prid
			from
				products	prd,
				orders		ord
				where
				prd.grp = 50
			and	ord.id_prd = prd.id
			)	prid,
			(
			select	/*+ no_merge */
				ord.rowid 	crid
			from
				customers	cst,
				orders		ord
			where
				cst.grp = 50
			and	ord.id_cst = cst.id
			)	crid
		where
			prid.prid = crid.crid
	)
;

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |   138 |   591   (1)| 00:00:08 |
|   1 |  NESTED LOOPS               |            |     1 |   138 |   591   (1)| 00:00:08 |
|   2 |   VIEW                      | VW_NSO_1   |    11 |   132 |   589   (1)| 00:00:08 |
|   3 |    SORT UNIQUE              |            |     1 |   264 |            |          |
|*  4 |     HASH JOIN               |            |    11 |   264 |   589   (1)| 00:00:08 |
|   5 |      VIEW                   |            |  3314 | 39768 |   294   (1)| 00:00:04 |
|   6 |       NESTED LOOPS          |            |  3314 | 86164 |   294   (1)| 00:00:04 |
|*  7 |        TABLE ACCESS FULL    | PRODUCTS   |   100 |   900 |    94   (2)| 00:00:02 |
|*  8 |        INDEX RANGE SCAN     | ORD_PRD_FK |    33 |   561 |     2   (0)| 00:00:01 |
|   9 |      VIEW                   |            |  3314 | 39768 |   294   (1)| 00:00:04 |
|  10 |       NESTED LOOPS          |            |  3314 | 86164 |   294   (1)| 00:00:04 |
|* 11 |        TABLE ACCESS FULL    | CUSTOMERS  |   100 |   900 |    94   (2)| 00:00:02 |
|* 12 |        INDEX RANGE SCAN     | ORD_CST_FK |    33 |   561 |     2   (0)| 00:00:01 |
|  13 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("PRID"."PRID"="CRID"."CRID")
   7 - filter("PRD"."GRP"=50)
   8 - access("ORD"."ID_PRD"="PRD"."ID")
  11 - filter("CST"."GRP"=50)
  12 - access("ORD"."ID_CST"="CST"."ID")

You’ll notice that this plan is remarkably similar to the plan I got from the joins with inline views – although the new plan has one extra “sort unique” operation at line 3. The optimizer has transformed my query by unnesting the subqueries and turning them into inline views – adding in a “distinct” operation because that’s what happens when you turn an “IN subquery” into a join when there isn’t a suitable uniqueness constraint on the join column.

There are two reasons for adopting this subquery approach. There’s the (highly subjective) argument that the code is a little easier to understand in this form, especially if you then want to join the orders table onwards to other tables. There’s also an objective argument relating to the “sort unique”. Note that I included the hint “no_use_hash_aggregation” to stop Oracle from using a “hash unique” operation at this point. By forcing Oracle to sort for uniqueness I know that the rowids will appear in (at least) an “extent-based” order rather than a completely random order. It’s possible that walking the fact table in physical order will be a little more efficient than accessing it in a completely randomised order. The former might find multiple rows in a block or benefit from some form of O/S or SAN read-ahead; the latter is more likely to turn one row into one random physical read.

One final thought – I said that you might want to take this type of approach for queries where the optimizer can’t do a star transformation automatically. There’s one very important case where this is always true – when you’re running Standard Edition, which doesn’t support bitmap indexes. So if you’re running SE and want to do star transformations – you now know how.

Footnote: I haven’t spent much time testing the limits of variations of this code although I did spend 30 minutes extending the example to use three dimension tables. If you think that this strategy might solve a production problem, think about using the sample code to create and test a better model of your production requirement before rewriting the production code – and if you go to production, check very carefully that the resulting code does produce the same results.