Search

Top 60 Oracle Blogs

Recent comments

April 2011

Facebook Page Milestone…

I just noticed my Facebook page has hit 100 likes. I’ve been keeping an eye on it over the last couple of weeks waiting to see if I can hit the mammoth heights of 100 and now I’m there. Thanks to everyone involved… :)

Cheers

Tim…




What to Do When Performance Problems Appear

April 23, 2011 I have a bit of a performance problem.  To be more specific, I am preparing to give a presentation on the topic of Oracle Database performance tuning for a specific ERP product, and I have encountered a performance problem.  Tuning the performance problem will requiring dropping too many slides – 59 slides for a [...]

Determining optimal Amazon S3 transfer parallelism

Amazon’s Simple Storage Service (S3) is a robust, inexpensive and highly-available internet data storage service.  At Blue Gecko, we occasionally help our customers design and implement S3-based backup strategies.

EC2 outage reactions showcase widespread ignorance regarding the cloud

Amazon EC2′s high-profile outage in the US East region has taught us a number of lessons.  For many, the take-away has been a realization that cloud-based systems (like conventionally-hosted systems) can fail.  Of course, we knew that, Amazon knew that, and serious companies who performed serious availability engineering before deploying to the cloud knew that. In cloud environments, as in conventionally-hosted environments, you must implement high-availability if you want high availability.  You can’t just expect it to magically be highly-available because it is “in the cloud.” Thorough and thoughtful high-availability engineering made it possible for EC2-based Netflix to experience no service interruptions through this event.

Star Transformation

A little while ago I published a note explaining how it was possible to find queries which ran faster if you manually de-coupled the index and table accesses. Here’s a further example that came up in discussion on a client site recently. The query looks something like this (at least in concept, although it was a little more complex, with some messy bits around the edges):

select
	ord.*
from
	products	prd,
	customers	cst,
	orders		ord
where
	prd.grp = 50
and	cst.grp = 50
and	ord.id_prd = prd.id
and	ord.id_cst = cst.id
;

There are indexes on products(id), customers(id) and orders(id), as well as indexes on orders(id_prd) and orders(id_cst). Basically it’s a collection of primary key and “foreign key” indexes – except there are no defined constraints and none of the indexes is unique.

The production “orders” table is very large (hundreds of millions of rows). There are lots of customers – and the orders for each customer are scattered throughout the entire orders table; similarly there are lots of products, and the orders for each product are scattered throughout the orders table. What execution plans (with the right indexes, of course) might you get for this query. There are two obvious possibilities:

-----------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost  |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |    11 |  1584 |   492 |
|*  1 |  HASH JOIN          |           |    11 |  1584 |   492 |
|*  2 |   TABLE ACCESS FULL | CUSTOMERS |   100 |   900 |    14 |
|*  3 |   HASH JOIN         |           |  3314 |   436K|   477 |
|*  4 |    TABLE ACCESS FULL| PRODUCTS  |   100 |   900 |    14 |
|   5 |    TABLE ACCESS FULL| ORDERS    |  1000K|   120M|   446 |
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ORD"."ID_CST"="CST"."ID")
   2 - filter("CST"."GRP"=50)
   3 - access("ORD"."ID_PRD"="PRD"."ID")
   4 - filter("PRD"."GRP"=50)

-----------------------------------------------------------------------------
| Id  | Operation                      | Name       | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |            |    11 |  1584 | 10223 |
|   1 |  NESTED LOOPS                  |            |       |       |       |
|   2 |   NESTED LOOPS                 |            |    11 |  1584 | 10223 |
|   3 |    NESTED LOOPS                |            |  3314 |   436K|  3614 |
|*  4 |     TABLE ACCESS FULL          | PRODUCTS   |   100 |   900 |    14 |
|   5 |     TABLE ACCESS BY INDEX ROWID| ORDERS     |    33 |  4158 |    36 |
|*  6 |      INDEX RANGE SCAN          | ORD_PRD_FK |    33 |       |     2 |
|*  7 |    INDEX RANGE SCAN            | CST_I1     |     1 |       |     1 |
|*  8 |   TABLE ACCESS BY INDEX ROWID  | CUSTOMERS  |     1 |     9 |     2 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("PRD"."GRP"=50)
   6 - access("ORD"."ID_PRD"="PRD"."ID")
   7 - access("ORD"."ID_CST"="CST"."ID")
   8 - filter("CST"."GRP"=50)

The first option is to hash the two dimension tables into memory then probe each in turn after a tablescan of the orders table – and this could be a very effective choice in some cases; the second option is to drive from one of the dimension tables into the orders table, picking up a relatively large number of rows, then discard a large fraction of those rows by indexing (or possibly doing a hash join) into the second dimension table. (Obviously the roles of products and customers could be reversed in the nested loop plan I’ve shown).

Both plans are likely to be rather expensive – a full scan on orders could be a huge amount of I/O in the form of “db file scattered read” waits; the indexed access path from one dimension to the orders table could be a huge amount of I/O in the form of “db file sequential read” wait collecting rows we won’t eventually need. In the correct circumstances (with the right data patterns) then, we might like to see a star transformation. But there are two possible problems:

  • It is “common knowledge” that you need primary keys on the dimension tables to do a star transformation. (In fact I was sure of this until about 30 minutes ago but then I discovered I was wrong.)
  • It is also “common knowledge” that you need bitmap indexes on the fact table to do a star transformation. Again this is wrong (but fortunately I’ve known about that for years).

So if we enable star transformations (alter session set star_transformation_enabled=true) we can get a plan that looks like this:

----------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                       | Rows  | Bytes | Cost  |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                            |     1 |   136 |   102 |

|   1 |  TEMP TABLE TRANSFORMATION            |                            |       |       |       |
|   2 |   LOAD AS SELECT                      | SYS_TEMP_0FD9D6601_17047D9 |       |       |       |
|*  3 |    TABLE ACCESS FULL                  | CUSTOMERS                  |   100 |   900 |    14 |
|   4 |   LOAD AS SELECT                      | SYS_TEMP_0FD9D6601_17047D9 |       |       |       |
|*  5 |    TABLE ACCESS FULL                  | PRODUCTS                   |   100 |   900 |    14 |

|*  6 |   HASH JOIN                           |                            |     1 |   136 |    74 |
|*  7 |    HASH JOIN                          |                            |     1 |   131 |    71 |

|   8 |     TABLE ACCESS BY INDEX ROWID       | ORDERS                     |    11 |  1400 |    68 |
|   9 |      BITMAP CONVERSION TO ROWIDS      |                            |       |       |       |
|  10 |       BITMAP AND                      |                            |       |       |       |

|  11 |        BITMAP MERGE                   |                            |       |       |       |
|  12 |         BITMAP KEY ITERATION          |                            |       |       |       |
|  13 |          TABLE ACCESS FULL            | SYS_TEMP_0FD9D6600_17047D9 |     1 |    13 |     2 |
|  14 |          BITMAP CONVERSION FROM ROWIDS|                            |       |       |       |
|* 15 |           INDEX RANGE SCAN            | ORD_CST_FK                 |       |       |     3 |

|  16 |        BITMAP MERGE                   |                            |       |       |       |
|  17 |         BITMAP KEY ITERATION          |                            |       |       |       |
|  18 |          TABLE ACCESS FULL            | SYS_TEMP_0FD9D6601_17047D9 |     1 |    13 |     2 |
|  19 |          BITMAP CONVERSION FROM ROWIDS|                            |       |       |       |
|* 20 |           INDEX RANGE SCAN            | ORD_PRD_FK                 |       |       |     3 |

|  21 |     TABLE ACCESS FULL                 | SYS_TEMP_0FD9D6600_17047D9 |   100 |   500 |     2 |
|  22 |    TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6601_17047D9 |   100 |   500 |     2 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("CST"."GRP"=50)
   5 - filter("PRD"."GRP"=50)
   6 - access("ORD"."ID_PRD"="C0")
   7 - access("ORD"."ID_CST"="C0")
  15 - access("ORD"."ID_CST"="C0")
  20 - access("ORD"."ID_PRD"="C0")

In this example you can see that Oracle has decided to use a hash join to do a “join-back” to the dimension tables and has decided to extract the subset of rows (grp = 50) from each of those tables and save them in a global temporary. This join-back is a little odd since the query isn’t selecting any date from those two tables so we shouldn’t need to do it – but I think that may be a side effect of not having the primary key declared.

I’ve broken the plan up a little bit to make it a little easier to see the critical steps:

  • Lines 11 – 15 we select the index ranges we want from the customer index on orders, convert to bitmaps and merge.
  • Lines 16 – 20 we do the same for the products index on orders
  • Lines 8 – 10 we do the “bitmap and” between the two constructed bit strings, convert to rowids, and pick up exactly the rows we want
  • Lines 6,7,21,22 show two hash joins to the temporary tables which represent the subset of rows we have from customers and products
  • Lines 1 – 5 are where we started by extracting the two subsets into a form of internalised global temporary table.

If this doesn’t work well enough for you – and, allowing for a few little tweaks here and there, it really should – or if there is something about your code that makes it impossible for Oracle to do a star transformation, here’s an example of taking total control by restructuring the SQL to do a similar operation:

select
	ord.*
from
	(
	select
		/*+
			leading(prd ord)
			use_nl(ord)
			no_merge
		*/
		ord.rowid 	prid
	from
		products	prd,
		orders		ord
		where
		prd.grp = 50
	and	ord.id_prd = prd.id
		)	prid,
	(
	select
		/*+
			leading(cst ord)
			use_nl(ord)
			no_merge
		*/
		ord.rowid 	crid
	from
		customers	cst,
		orders		ord
	where
		cst.grp = 50
	and	ord.id_cst = cst.id
	)	crid,
	orders	ord
where
	prid.prid = crid.crid
and	ord.rowid = prid.prid

With the no_merge hints we produce two result sets: prid - the rowids from the orders table for all rows which match the products we are interested in; and crid – the rowids from the orders table for all rows which match the customers we are interested in.

We then do a join between crid and prid – which has to leave us with just those rowids for the rows which represent the customers we’re interested in who have placed orders for products we’re interested in. So we can then do a join, by rowid, to the orders table to collect the data for those orders. Here’s the plan:

--------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |    11 |  1650 |   570 |
|   1 |  NESTED LOOPS               |            |    11 |  1650 |   570 |
|*  2 |   HASH JOIN                 |            |    11 |   264 |   559 |
|   3 |    VIEW                     |            |  3361 | 40332 |   277 |
|   4 |     NESTED LOOPS            |            |  3361 | 87386 |   277 |
|*  5 |      TABLE ACCESS FULL      | CUSTOMERS  |    98 |   882 |    81 |
|*  6 |      INDEX RANGE SCAN       | ORD_CST_FK |    34 |   578 |     2 |
|   7 |    VIEW                     |            |  3390 | 40680 |   281 |
|   8 |     NESTED LOOPS            |            |  3390 | 88140 |   281 |
|*  9 |      TABLE ACCESS FULL      | PRODUCTS   |   100 |   900 |    81 |
|* 10 |      INDEX RANGE SCAN       | ORD_PRD_FK |    34 |   578 |     2 |
|  11 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PRID"."PRID"="CRID"."CRID")
   5 - filter("CST"."GRP"=50)
   6 - access("ORD"."ID_CST"="CST"."ID")
   9 - filter("PRD"."GRP"=50)
  10 - access("ORD"."ID_PRD"="PRD"."ID")

The SQL has obviously become more complex – and complexity is generally something to be wary of as it’s an easy step from “complex” to “wrong”. So don’t try something like this unless you’re sure it’s necessary, and then document what you’re doing very carefully.

Footnote:

If you want to play around with this example, here’s the code to generate the tables. I was using my standard LMT, 1MB uniform extents, 8KB block size, freelist management model. The database was version 11.1.0.6 and CPU costing was disabled.

create table products
as
select
	rownum			id,
	mod(rownum,300)		grp,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	all_objects
where
	rownum <= 30000;

create index prd_i1 on products(id);
-- alter table products add constraint prd_pk primary key(id);

create table customers
as
select
	rownum			id,
	mod(rownum,300)		grp,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	all_objects
where
	rownum <= 30000;

create index cst_i1 on customers(id);
-- alter table customers add constraint cst_pk primary key(id);

create table orders
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 1000)
select
	rownum			id,
	trunc(dbms_random.value(1,30000))	id_prd,
	trunc(dbms_random.value(1,30000))	id_cst,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum  <= 1000000;

create index ord_prd_fk on orders(id_prd);
create index ord_cst_fk on orders(id_cst);

I’ve Updated the White Papers/Presentation Page

I finally took a little time to update the page with some of the presentations I’ve done over the last couple of years. You can get there via the menu bar above or by clicking here: White Papers/Presentations

Maybe some day I’ll have time to organize all the scripts.

Cursor_sharing

For those looking to the next upgrade – here’s an early warning from Oracle:

ANNOUNCEMENT: Deprecating the cursor_sharing = ‘SIMILAR’ setting (Doc ID 1169017.1)

“We recommend that customers discontinue setting cursor_sharing = SIMILAR due to the many problematic situations customers have experienced using it. The ability to set this will be removed in version 12 of the Oracle Database (the settings of EXACT and FORCE will remain available). Instead, we recommend the use of Adaptive Cursor Sharing in 11g.”

The rest of the note contains some interesting information about the behaviour and side effects of this option – which may also help you debug some library cache issues if you’re currently running with this value set in 11g.

VirtualBox 4.0.6 has arrived…

As reported here, VirtualBox 4.0.6 has arrived…

Cheers

Tim…




Updated XML Content & Paper section

Once per year I try to update the “XML Content” page that, in principle, should contain all my XML relevant thoughts and “how to” posts, so I can find my gained “wisdom” a bit more quickly (getting old and stuff). This year I had only 14 and a bit of such XML related posts, which was afterwards easily explained when I updated my “Paper” page where among others I have a list on “presentations” done… Oops… I have been busy…

Anyway. Enjoy the updated XML content overview reference page, it contains now 100+ posts regarding specific XML(DB) related howto’s, solutions, approaches, ideas, etc.

 

6th Planboard DBA Symposium

The 17th of may 2011 Planboard is organising the 6th version of the Planboard DBA Symposium. The symposium is a whole day event, in Amstelveen near Amsterdam.

There is a wide range of Oracle database topics, including Apex, Oracle GoldenGate, Oracle E-Business suite and Oracle Exadata (presented by me :) !).

If you are interested, take a look at the full agenda!