Search

OakieTags

Who's online

There are currently 0 users and 39 guests online.

Recent comments

Affiliations

Troubleshooting

Speaking engagements 2011

Despite my attempts to not travel (much) anymore, I’m still going to speak at two events in first half of 2011.

The first event is at the Suncoast Oracle User Group meeting in Tampa, FL on Thursday, 27th January. I’ll speak about Advanced Oracle Troubleshooting in 60 minutes.

I will deliver the second presentation at Hotsos Symposium 2011 near Dallas, TX on Wednesday 9th March. I will talk about Troubleshooting the Most Complex Oracle Performance Problem I’ve ever seen. If you think that this performance problem was complex, then be prepared to see an even worse one there!

Share

ASSM Again

While checking my backlog of drafts (currently 75 articles in note form) I came across this one from August 2009 and was a little upset that I hadn’t finished it sooner – it’s a nice example of geek stuff that has the benefit of being useful.

From the Oracle newsgroup comp.databases.oracle.server, here’s an example of how to recreate a performance problem due to maintenance on ASSM bitmaps in 10.2.0.4.

Create a table in a tablespace with an 8KB block size, locally managed tablespace with uniform 1MB extents, and automatic segment space management (ASSM). Check the newsgroup thread if you want complete details on reproducing the test:

Session 1: Insert 100,000 rows of about 490 bytes into a table using a pl/sql loop and commit at end.
Session 1: Insert 1,000 rows into the table with single SQL inserts and no commits
Session 1: delete all data from the table with a single statement – but do not commit

Session 2: Insert 1,000 rows into the table with single SQL inserts and no commits – it’s very slow.

As one person on the thread pointed out – it looks as if Oracle is doing a full tablescan of the table, one block at a time showing “db file sequential read” waits all the way through the table. (If your db_cache_size is large enough you might not see this symptom).

I simplified the test – inserting just 100,000 of the rows (with the commit), then deleting them all (without the commit), then inserting one row from another session. Taking a snapshot of x$kcbsw and x$kcbwh, I got the following figures for the activity that took place inserting that one extra row (this was on Oracle 10.2.0.3):

---------------------------------
Buffer Cache - 23-Dec 11:50:36
Interval:-  0 seconds
---------------------------------
          Why0          Why1          Why2    Other Wait
          ----          ----          ----    ----------
         1,457             0             0             0 ktspfwh10: ktspscan_bmb
             8             0             0             0 ktspswh12: ktspffc
             1             0             0             0 ktsphwh39: ktspisc
         7,061             0             0             0 ktspbwh1: ktspfsrch
             1             0             0             0 ktuwh01: ktugus
         7,060             0             0             0 ktuwh05: ktugct
             1             0             0             0 ktuwh09: ktugfb
             2             0             0             0 kdswh02: kdsgrp
             2             0             0             0 kdiwh06: kdifbk
             2             0             0             0 kdiwh07: kdifbk
          ----          ----          ----    ----------
        15,595             0             0             0 Total: 10 rows

The figures tell us how much work Oracle has to do to find a table block that could accept a new row. The idea is simple – Oracle checks the first “level 3″ bitmap block (which is actually part of the segment header block) to find a pointer to the current “level 2″ bitmap block; it checks this level 2 bitmap block to find a pointer to the current “level 1″ bitmap block; and finally it checks the “level 1″ bitmap block to find a pointer to a data block that shows some free space.

Unfortunately every block in our table is apparently empty – but that’s only going to be true once session 1 commits. In this version of Oracle the blocks are all visible as “x% free” in the level 1 bitmaps – but when Oracle visits each block (“ktspbwh1: ktspfsrch”) it checks the ITL entry, which points it to the transaction table slot in the related undo segment header block to Get the Commit Time for the transaction (“ktuwh05: ktugct”) and finds that the transaction is not committed so the space is not really free. So Oracle has to visit the next block shown as free in the bitmap.

In our “bulk delete / no commit” case, we end up visitng every (or nearly every) block in the entire table before we find a block we can actually use – and, given the nature of the ASSM bitmap implementation, the order of the block visits is the “natural” table order, so we see something that looks like a full tablescan operating through single blocks reads (apart, perhaps, from a few blocks that are still cached).

I can’t explain why we do 1,457 visits to bitmap blocks (“ktspfwh10: ktspscan_bmb”) in this version of Oracle, but perhaps it’s simply an indication that Oracle picks (say) five “free block” entries from the bitmap block each time it visits it and therefore has to visit each bitmap block about 12 times if it doesn’t find a data block with space that really is free in it search.

Note – these results will be hugely dependent on version of Oracle – in an earlier version of Oracle the bitmap blocks were not updated by the delete until some time after the commit – and this variation of delayed block cleanout produced other unpleasant anomalies; and, just to make like difficult in later versions of Oracle, the x$kcbsw / x$kcbwh objects are not populated properly in 11g.

Footnote: In case you hadn’t realised, ASSM is a mechanism aimed at OLTP systems with a reasonable degree of concurrency – so it’s not too surprising that you can find problems and weak spots if you hit it with processing which is biased towards the DW and batch processing end of the spectrum.

Subquery Selectivity – 2

Here’s an item I thought I’d published a few years ago as a follow-up to an article on a 10g bug-fix for subquery selectivity. I was reminded of my oversight when a question came up on OTN that looked like an example of the bug introduced by the bug-fix – and I found that I couldn’t simply supply a link to explain the problem.

We start with some simple SQL to create a test data set:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,729)		id1,
	rownum			id2,
	mod(rownum,11)		n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t1 add constraint t1_pk primary key(id1, id2);

create table t2
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,737)		id1,
	trunc((rownum-1)/737)	id2,
	mod(rownum,11)		n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t2 add constraint t2_pk primary key(id1, id2);

-- gather statistics, compute, no histograms

Having created these data sets, we can run the following query against them. It’s a very simple query with aggregate subquery. I’ve included a no_unnest hint because many newer versions of Oracle would do a cost-based transformation on something this simple and unnest the subquery. I want to keep the code and numbers simple while demonstrating a principle. (In the earlier article I used a two-table join with two existence subqueries to make a point about how a change in arithmetic could cause a change in execution plan; in this article I’m just going to show you the change in arithmetic.)


select
	*
from
	t1
where
	1 = (
		select
			/*+ no_unnest */
			max(n1)
		from
			t2
		where
			t2.id1 = t1.id1
	)
;

Here are the execution plans from 9.2.0.8 and 11.1.0.6 respectively (the 10.2 plan matches the 11g plan). For consistency I am not using CPU costing (system statistics) in either case:


Execution Plan from 9.2.0.8
-----------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             | 10000 |  1201K|    13M|
|*  1 |  FILTER                       |             |       |       |       |
|   2 |   TABLE ACCESS FULL           | T1          | 10000 |  1201K|  2712 |
|   3 |   SORT AGGREGATE              |             |     1 |     7 |       |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2          |  1357 |  9499 |  1363 |
|*  5 |     INDEX RANGE SCAN          | T2_PK       |  1357 |       |     6 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ NO_UNNEST */ MAX("T2"."N1") FROM "T2" "T2" WHERE
              "T2"."ID1"=:B1)=1)
   5 - access("T2"."ID1"=:B1)

Execution Plan from 11.1.0.6 (matches plan from 10.2.0.3)
-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |    14 |  1722 |   996K|
|*  1 |  FILTER                       |       |       |       |       |
|   2 |   TABLE ACCESS FULL           | T1    |  1000K|   117M|  2712 |
|   3 |   SORT AGGREGATE              |       |     1 |     7 |       |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2    |  1357 |  9499 |  1363 |
|*  5 |     INDEX RANGE SCAN          | T2_PK |  1357 |       |     6 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ NO_UNNEST */ MAX("N1") FROM "T2" "T2" WHERE
              "T2"."ID1"=:B1)=1)
   5 - access("T2"."ID1"=:B1)

You will notice, of course, that the estimated cardinality of the result (line 0) is different, and the estimated cardinality of the scan on t1 (line 2) is different. There was an important code change introduced in 10g relating to the handling of subqueries that explains this effect.

In the previous article I pointed out that an existence subquery introduces a selectivity of 5% – and that the upgrade to 10g changed the point in the optimizer’s arithmetic where this 5% was applied (hence where the plan showed a change in cardinality). In the example above my code is of the form ” = (subquery) “, which means Oracle applies a “magic” 1% rather than 5%. (For a range-based predicate – i.e. <, > etc – it would have been the 5% that we got for existence.)

So in the 9i plan, line 2 shows a cardinality of 10,000 because that’s 1% of the original 1,000,000 rows but, as I pointed out in the earlier article, this means the optimizer will be working out the rest of the plan based on the estimated effect of a filter subquery that has not yet run. (Do read the previous article if you don’t follow that last comment.)

From 10g onwards (and in our 11.1 plan) the cardinality for line 2 is 1,000,000 because that’s how many rows will appear in the rowsource as the table is scanned. But then you have to ask why Oracle didn’t print a cardinality on line 1 (the FILTER line) because it’s at that line that we should see a cardinality of 10,000 as the filter selectivity of 1% is applied.

But there is another problem. The final cardinality of the 11.1 plan is NOT reported as 10,000; it’s reported as 14. As Oracle has fixed one defect in the optimizer it has introduced another (whether by accident, or by design, I don’t know). The estimate of 14 rows is effectively double-counting. The optimizer has applied the the 1% selectivity for the subquery, and then applied the selectivty of the predicate “t1.id1 = {unknown value}”. (If the predicate had been “> (subquery)” or some other range-based operator, or existence, then the arithmetic would have applied a 5% selectivity on top of that “t1.id1 = {unknown value}” selectivity.

In this case, of course, it probably doesn’t matter that the final cardinality is wrong – but in more complex cases this error may be introduced somewhere in the middle of the plan, making the optimizer do something very silly from that point onwards.

There is a surprising “workaround” to this issue – which may not be relevant in all cases and may not even work in some. If you force Oracle into running the subquery early (with the push_subq hint) the double-counting doesn’t occur:

select
	/*+ push_subq */	-- 9i hint
	*
from
	t1
where
	1 = (
		select
			/*+ no_unnest push_subq */	-- 10g+ hint (if you don't use query block naming)
			max(n1)
		from
			t2
		where
			t2.id1 = t1.id1
	)
;

-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       | 10000 |  1201K|  4075 |
|*  1 |  TABLE ACCESS FULL            | T1    | 10000 |  1201K|  2712 |
|   2 |   SORT AGGREGATE              |       |     1 |     7 |       |
|   3 |    TABLE ACCESS BY INDEX ROWID| T2    |  1357 |  9499 |  1363 |
|*  4 |     INDEX RANGE SCAN          | T2_PK |  1357 |       |     6 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ PUSH_SUBQ NO_UNNEST */ MAX("N1") FROM "T2"
              "T2" WHERE "T2"."ID1"=:B1)=1)
   4 - access("T2"."ID1"=:B1)

It strikes me as a little bizarre that this hint has any effect at all in this case since there is no “earlier point” at which the subquery can run. But it has pushed the optimizer through a different code path, which has produced a plan with a different shape and avoided the double-counting.

Footnote: I’ve just included in the final query text a reminder that the push_subq hint changed in the upgrade from 9i to 10g. Originally it went at the top of the query and caused every subquery to be pushed; from 10g you have to be selective about the subqueries you want pushed. The hint at the top will be “ignored” and you apply the hint to each subquery that you want pushed by writing it in the subquery (as I have here) or by naming each subquery with the qb_name hint and then putting the push_subq(@{your subquery name}) hint(s) at the top of the query, naming the subqueries to be pushed.

Fake Baselines

SQL Baslines in 11g are the new Stored Outlines – and one of the nicest features of SQL Baselines is that you are allowed to fake them; or rather, it’s legal to generate an execution plan for one query and transfer its execution plan to another query using the packaged procedure dbms_spm.load_plans_from_cursor_cache(). This posting is a demonstration of the technique.

We start with a sample data set and a query that is going to do “the wrong thing”. As usual I’ve got a locally managed tablespace with 8KB blocks and 1MB uniform extents, freelist management, and I’m running with CPU Costing disabled (and running 11.1.0.6 in this case):


set serveroutput off

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	mod(rownum,1000)	n1000,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create index t1_n1 on t1(n1000);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/
select
	/*+ target_query */
	id
from
	t1
where
	n1000 = 500
;

The final query is the one that I want to fake a baseline for. With my current set up it does an index range scan to pick up 10 rows, but I’m going to make it do a tablescan instead. I’m going to need to pull the exact text of the query from memory in a moment, so I’ll find its sql_id and child_number by searching for the “pseudo-hint” that I’ve included in the text, and I’ll report the execution plan to show that I’ve picked up the right thing (I’m assuming that there’s only one piece of SQL that’s going to include the text “target_query”, of course):

column	sql_id			new_value	m_sql_id_1
column 	plan_hash_value		new_value	m_plan_hash_value_1
column	child_number		new_value	m_child_number_1

select
	sql_id, plan_hash_value, child_number
from
	v$sql
where
	sql_text like '%target_query%'
and	sql_text not like '%v$sql%'
and	rownum = 1
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_1',&m_child_number_1));

SQL_ID  306m98cpu9yz7, child number 0
-------------------------------------
select  /*+ target_query */  id from  t1 where  n1000 = 500

Plan hash value: 1420382924

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |       |       |    11 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    10 |    80 |    11 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |    10 |       |     1 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1000"=500)

The next step is to create a new query (and in a production system that might simply mean running a heavily hinted version of the target query) that uses the execution plan I want to see; and I’ll use the same search technique to find it and report its plan:

select
	/*+ full(t1) alternate_query */
	id
from
	t1
where
	n1000 = 500
;

column	sql_id			new_value	m_sql_id_2
column 	plan_hash_value		new_value	m_plan_hash_value_2
column	child_number		new_value	m_child_number_2

select
	sql_id, plan_hash_value, child_number
from
	v$sql
where
	sql_text like '%alternate_query%'
and	sql_text not like '%v$sql%'
and	rownum = 1
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_2',&m_child_number_2));

SQL_ID  bvpb73bb6c6uy, child number 0
-------------------------------------
select  /*+ full(t1) alternate_query */  id from  t1 where  n1000 = 500

Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    28 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |    80 |    28 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1000"=500)

After been running and reporting these queries and their plans, I’ve captured the SQL_Id, child_number, and plan_hash_value for each query; and this is more than enough information to make it possible to create an SQL Baseline for one query using the execution plan for the other query.

declare
	m_clob	clob;
begin
	select
		sql_fulltext
	into
		m_clob
	from
		v$sql
	where
		sql_id = '&m_sql_id_1'
	and	child_number = &m_child_number_1
	;

	dbms_output.put_line(m_clob);

	dbms_output.put_line(
		dbms_spm.load_plans_from_cursor_cache(
			sql_id 			=> '&m_sql_id_2',
			plan_hash_value		=> &m_plan_hash_value_2,
			sql_text		=> m_clob,
			fixed			=> 'NO',
			enabled			=> 'YES'
		)
	);

end;
/

I used the SQL_ID and child_number from the first query to get the full SQL text of the query into an in-memory CLOB, and then use the SQL_id and plan_hash_value from the second query to associate the second plan with the first query – storing the result as a SQL Baseline that is enabled and ready for use.

You’ll have to take my word that I haven’t faked the following text, but this is what I get when I re-run the original query (flushing the shared pool first to make sure that I don’t accidentally end up picking up the original child cursor):

alter system flush shared_pool;

select
	/*+ target_query */
	id
from
	t1
where
	n1000 = 500
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_1',null));

SQL_ID  306m98cpu9yz7, child number 1
-------------------------------------
select  /*+ target_query */  id from  t1 where  n1000 = 500

Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    28 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |    80 |    28 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1000"=500)

Note
-----
   - cpu costing is off (consider enabling it)
   - SQL plan baseline SYS_SQL_PLAN_3c0ea7f3dbd90e8e used for this statement

It’s so much easier than the fiddling one had to do for stored outlines which was quite easy in 8i, but got a little nasty in 9i because of the extra (undocumented) details that looked as if they might have been necessary when the third table appeared in the outln schema. However, in 10g, the dbms_outln package was enhanced to allow you to create outlines from the library cache – see this note from Randolf Geist for more details – but remember that stored outlines will be deprecated in Oracle 12.

Footnote: the dbms_spm package is created by script $ORACLE_HOME/rdbms/admin/dbmsspm.sql, and there is a comment near the top saying:


--           Grant the execute privilege on SPM package to public.           --
--           But ADMINISTER SQL MANAGEMENT OBJECT privilege is checked       --
--           before a user is allowed to execute.                            --

SQL Plan Baselines

Here’s another of my little catalogues of articles – this one on SQL Plan Baselines.

Be a little careful as you read through these notes – there are various changes in internal mechanisms, storage, etc. as you go through different versions of Oracle, so check which version the author is writing about.

Maria Colgan of Oracle Corp.

Christian Anognini:

Kerry Osborne

Tim Hall

Jason Arneil

Tracking Down Windows Memory Leaks

This is just a small windows O/S related note, covering how to track down memory usage of a specific windows service. This all happened on Windows Vista. The same steps are likely to work perfectly well on Windows 7 and both flavours of Server 2008. I’d expect them to also work on earlier versions of [...]

Join Surprise

Imagine I have a simple SQL statement with a “where clause” that looks like this:


	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2

Would you expect it to run several times faster (25 minutes instead of a few hours) when the only change you made was to swap the order of the join predicates to:


	t2.id2(+) = t1.id2
and	t2.id1(+) = t1.id1

You may recall that a couple of years ago I wrote about some bugs in the optimizer, and pointed you to a blog article by Alberto Dell’Era that demonstrated an anomaly in cardinality calculations that made this type of thing possible. But here’s an example which has nothing to do with cardinality errors. We start with a suitable dataset – running on 11.1.0.6.


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	trunc(dbms_random.value(1,1000))	id1,
	trunc(dbms_random.value(1,1000))	id2,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',1000)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 7
)
select
	t1.id1,
	t1.id2,
	v1.id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',70)		padding
from
	t1		t1,
	generator	v1
;

-- collect stats, compute, no histograms

This data set models a problem – stripped to the bare essentials – that I came across at a client site some time ago. We have a “parent/child” relationship between the tables (although I haven’t declared the referential integrity), with roughly seven child rows per parent. The parent rows are quite long, the child rows are quite short. Some parents may not have children (although in this data set they do).

We now run a “report” that generates data for a number-crunching tool that extracts all the data from the tables – using an outer join so that parent rows don’t get lost. For various reasons the tool wanted the data sorted in a certain order – so there’s also an order by clause in the query. I’m going to show you the original query – first unhinted, and then hinted to use a merge join:


select
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2
order by
	t1.id2,
	t1.id1
;

---------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      | 10000 |    10M|       |  3720   (1)| 00:00:45 |
|   1 |  SORT ORDER BY         |      | 10000 |    10M|    22M|  3720   (1)| 00:00:45 |
|*  2 |   HASH JOIN RIGHT OUTER|      | 10000 |    10M|  6224K|  1436   (1)| 00:00:18 |
|   3 |    TABLE ACCESS FULL   | T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
|   4 |    TABLE ACCESS FULL   | T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")

select
	/*+ leading(t1 t2) use_merge(t2) */
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2
order by
	t1.id2,
	t1.id1
;

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 10000 |    10M|       |  6343   (1)| 00:01:17 |
|   1 |  SORT ORDER BY       |      | 10000 |    10M|    22M|  6343   (1)| 00:01:17 |
|   2 |   MERGE JOIN OUTER   |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   3 |    SORT JOIN         |      | 10000 |  9853K|    19M|  2509   (1)| 00:00:31 |
|   4 |     TABLE ACCESS FULL| T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
|*  5 |    SORT JOIN         |      | 70000 |  5400K|    12M|  1549   (1)| 00:00:19 |
|   6 |     TABLE ACCESS FULL| T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")
       filter("T2"."ID2"(+)="T1"."ID2" AND "T2"."ID1"(+)="T1"."ID1")

But there’s something a little odd about how the optimizer has chosen to do the merge join. Although our join condition references the join columns in the order (id1, id2) our final sort order is on (id2, id1) – and the optimizer hasn’t taken advantage of the fact that it could do the “sort join” operations in the order (id2, id1) and avoid the final “sort order by” at line 1.

So let’s rewrite the query to make the order of the join predicates match the order of the order by clause, and see what happens to the plan:


select
	/*+ leading(t1 t2) use_merge(t2) */
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id2(+) = t1.id2
and	t2.id1(+) = t1.id1
order by
	t1.id2,
	t1.id1
;

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   1 |  MERGE JOIN OUTER   |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   2 |   SORT JOIN         |      | 10000 |  9853K|    19M|  2509   (1)| 00:00:31 |
|   3 |    TABLE ACCESS FULL| T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
|*  4 |   SORT JOIN         |      | 70000 |  5400K|    12M|  1549   (1)| 00:00:19 |
|   5 |    TABLE ACCESS FULL| T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."ID2"(+)="T1"."ID2" AND "T2"."ID1"(+)="T1"."ID1")
       filter("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")

The plan no longer has the final “sort order by” operation – and the cost of the plan is much lower as a consequence.. You’ll also notice that the predicate sections (always check the predicate section) are a little different – the order of evaluation has been reversed.

In my test case the cost of the merge join still hasn’t fallen below the cost of the hash join – but in the case of the client changing the order of predicates – without adding any hints – made the cost of the merge join much cheaper than the cost of the hash join. Fortunately this was a case where the cost was a realistic indication of run time and avoiding a sort operation of some 35GB of join result was a very good move.

So watch out – with multi-column joins, the order of the join predicates can make a big difference to the way Oracle operates a merge join.

Index ITL fix

Over the last year I’ve written a short collection of articles describing how a defect in the code for handling indexe leaf node splits can cause a significant waste of space in indexes that are subject to a high degree of concurrent inserts. Finally I can announce some good news – Oracle has a patch for the code.

The problem is described in MOS (the database formerly known as Metalink) under bug 8767925 which is reported as “fixed in version 12.1″.

Backports may become available – I’ve already asked for one for 11.1.0.7 on AIX for one of my clients (but’s it has been a little slow in arriving) and another client should have put in a bid for a backport to 10.2.0.4 (also on AIX) in the last couple of days.

[Further reading on Index ITL Explosion]

Geek Stuff – 2

This is a post specially for Junping Zhang, who has been waiting patiently for a follow-up to my geek post about find the system-level setting for parameter use_stored_outlines. His question was: “Is it also possible to find the session-level setting ?”

The answer is yes – but it’s harder, version specific, requires a little preparation, and only allows you to see the setting for your own session.

The biggest problem is that the session-level setting is stored in the session’s “fixed global area”, and Oracle Corp. has not exposed an map of this memory as an X$ structure – so we have to find a way of mapping it ourselves – and the map will change with version and platform.

We start by using oradebug to dump a copy of the session global area to a trace file (and this operation requires you to have a fairly high level of privilege in your system) then searching through the trace file for the value “uso”. Once you’ve done this once you find that the exact name of the variable you want is ugauso_p, and you can find its address.

Once you know the exact name of the variable you’re after you can dump its details to screen rather than to a trace file – the following is a cut-n-paste from a 10.2.0.3 session (with the first few lines of the trace file dump inserted into the middle of the output):


SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump global_area 4
Statement processed.

	********************************
	----- Dump of the Fixed UGA -----
	ub4 ksmugmg1 [91173FC, 9117400) = 0000ABAB
	kghds * ksmughpp [9117400, 9117404) = 09117404
	ksmpy ksmuh_p [9117404, 9117520) = 05E1F6D0 0000FFB8 21595A04 091E0048 ...
	... etc. ...
	********************************

SQL> oradebug dumpvar uga ugauso_p
qolprm ugauso_p [9117A44, 9117A68) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
SQL> alter session set use_stored_outlines=ABCDEFG;

Session altered.

SQL> oradebug dumpvar uga ugauso_p
qolprm ugauso_p [9117A44, 9117A68) = 00000001 42410007 46454443 00000047 00000000 00000000 00000000 00000000 00000000
SQL>

The numbers in the brackets [9117A44, 9117A68) give you the address of the variable in your session’s virtual memory space – so if you can find this pair of addresses from SQL or PL/SQL you can write some code to read x$ksmmem to get the current value for the variable. One way of working out the addresses is to query x$ksmpp – the list of pga memory pointers.


SQL> select * from x$ksmpp;

ADDR           INDX    INST_ID KSMCHCOM         KSMCHPTR   KSMCHSIZ KSMCHCLS   KSMCHTYP KSMCHPAR
-------- ---------- ---------- ---------------- -------- ---------- -------- ---------- --------
091E8AA4          0          1 kpuinit env han  08EEAB80       1584 freeabl           0 00
091E8A6C          1          1 Alloc environm   08EE9B50       4144 recr           4095 08EEB108
...
091E7F0C         53          1 Fixed Uga        091173F0      20620 freeabl           0 00
...
091E7DBC         59          1 PLS non-lib hp   091CBFB0       8224 freeabl           0 08F9E02C
091E7D84         60          1 free memory      091C0010      49056 free              0 00

61 rows selected.

SQL> spool off

The item we are interested in is the “Fixed Uga” which (for this session, on this machine, on this operating system, on this version of Oracle, after this startup of the instance, etc. etc. etc.) is based at 0x091173f0. Although there are many reasons why the location of the fixed uga isn’t fixed in absolute terms, we know that for a given platform and version, the contents of the fixed uga really are “fixed”. So we can work out the offset of the ugauso_p for this platform and version by subtracting the address of the start of the fixed uga from the address we go from oradebug for ugauso_p, viz: 0x9117A44 – 0x091173F0 = 0×0654 = 1,620.

Once we have the offset we can write a pl/sql function to find the start of the fixed uga, jump to the offset, then walk through virtual memory picking up the bytes we need. Basically it’s the same code that I used for the system-level parameter, with a change in how to find the starting address of the value.

create or replace function session_uso
return varchar2
as
	uso		varchar2(30);
	mem_addr	x$ksmmem.addr%type := hextoraw('0E9A7838');
	mem_ct		number;
	mem_indx	number;
	raw_val		x$ksmmem.ksmmmval%type;
	indx_temp	number;
	col_size	number;
	raw_char	varchar2(64) := null;

begin
	select
		type_size
	into
		session_uso.col_size
	from
		v$type_size
	where
		type = 'SIZE_T'
	;

	select
		ksmchptr,
		ksmchsiz
	into
		session_uso.mem_addr,
		session_uso.mem_ct
	from
		x$ksmpp
	where
		ksmchcom = 'Fixed Uga'
	;

	select
		indx
	into
		session_uso.mem_indx
	from
		x$ksmmem
	where
		addr = session_uso.mem_addr
	;

	dbms_output.put_line('Memory address: ' || rawtohex(session_uso.mem_addr));
	dbms_output.put_line('Start location: ' || session_uso.mem_indx);
	dbms_output.put_line('Nominal Length: ' || session_uso.mem_ct);

--
--	Assume that the 11.1.0.6 offset to USO is 2504 bytes
--	And that the first t_type is a counter, then each
--	value after that is a letter until we hit 00 for a
--	maximum of 32 letters
--

	mem_indx := mem_indx + 2504/col_size;		-- 11.1.0.6 on Windows XP 32-bit
--	mem_indx := mem_indx + 1620/col_size;		-- 10.2.0.3 on windows XP 32 bit

	for i in 1 .. 32 loop

--		Have to do the silly addition, or
--		we get a "tablescan"  of x$ksmmem

		session_uso.indx_temp := session_uso.mem_indx + i;

		select
			--+ index(x$ksmmem)
			ksmmmval
		into
			session_uso.raw_val
		from
			x$ksmmem
		where
			indx = session_uso.indx_temp
		;

		dbms_output.put_line(session_uso.raw_val);
		exit when session_uso.raw_val = hextoraw('00');

		raw_char :=
			rawtohex(session_uso.raw_val) ||
			rawtohex(session_uso.raw_char);

	end loop;

	if session_uso.raw_char is null then
		session_uso.uso := 'use_stored_outlines is not set';
	else
		for i in reverse 1..(length(session_uso.raw_char)-4) / 2 loop
			exit when substr(session_uso.raw_char, 2*i - 1, 2) = '00';
			session_uso.uso :=
				session_uso.uso ||
				chr(to_number(substr(session_uso.raw_char, 2*i - 1, 2), 'XX'));
		end loop;
	end if;
	return	session_uso.uso;

end;
/

Just as a quick demonstration – here’s a cut-n-paste (including the debug output) from a normal session after creating the function in the sys account of a 10.2.0.3 database and making the function publicly available through a synonym:

SQL> execute dbms_output.put_line(session_uso)
Memory address: 06F373F0
Start location: 13118004
Nominal Length: 20620
00
use_stored_outlines is not set

PL/SQL procedure successfully completed.

SQL> alter session set use_stored_outlines = long_category;

Session altered.

SQL> execute dbms_output.put_line(session_uso)
Memory address: 06F373F0
Start location: 13118004
Nominal Length: 20620
4F4C000D
435F474E
47455441
0059524F
00
LONG_CATEGORY

PL/SQL procedure successfully completed.

SQL>

Asynch descriptor resize wait event in Oracle

A lot of people have started seeing “asynch descriptor resize” wait event in Oracle 11gR2. Here’s my understanding of what it is. Note that I didn’t spend too much time researching it, so some details may be not completely accurate, but my explanation will at least give you an idea of why the heck you suddenly see this event in your database.

FYI, there’s a short, but incomplete explanation of this wait event also documented in MOS Note 1081977.1



The “direct path loader” (KCBL) module is used for performing direct path IO in Oracle, such as direct path segment scans and reading/writing spilled over workareas in temporary tablespace. Direct path IO is used whenever you see “direct path read/write*” wait events reported in your session. This means that IOs aren’t done from/to buffer cache, but from/to PGA directly, bypassing the buffer cache.



This KCBL module tries to dynamically scale up the number of asynch IO descriptors (AIO descriptors are the OS kernel structures, which keep track of asynch IO requests) to match the number of direct path IO slots a process uses. In other words, if the PGA workarea and/or spilled-over hash area in temp tablespace gets larger, Oracle also scales up the number of direct IO slots. Direct IO slots are PGA memory structures helping to do direct IO between files and PGA.



In order to be able to perform this direct IO asynchronously, Oracle also dynamically scales up the number of OS asynch IO descriptors, one for each slot (up to 4096 descriptors per process). When Oracle doesn’t need the direct IO slots anymore (when the direct path table scan has ended or a workarea/tempseg gets cancelled) then it scales down the number of direct IO slots and asynch IO descriptors. Scaling asynch IO descriptors up/down requires issuing syscalls to OS (as the AIO descriptors are OS kernel structures).



I guess this is supposed to be an optimization, to avoid running out of OS AIO descriptors, by releasing them when not they’re not needed, but as that Metalink note mentioned, the resize apparently sucks on Linux. Perhaps that’s why other ports also suffer and have seen the same wait event.



The “asynch descriptor resize” event itself is really an IO wait event (recorded in the wait class Other though), waiting for reaping outstanding IOs. Once this wait is over, then the OS call to change the amount of asynch IO descriptors (allocated to that process) is made. There’s no wait event recorded for the actual “resize” OS call as it shouldn’t block.



So, the more direct IO you do, especially when sorting/hashing to temp with frequent workarea closing/opening, the more of this event you’ll see (and it’s probably the same for regular tablespace direct path IO too).



This problem wouldn’t be noticeable if Oracle kept async io descriptors cached and wouldn’t constantly allocated/free them. Of course then you may end up running out of aio descriptors in the whole server easier. Also I don’t know whether there would be some OS issues with reusing cached aio descriptors, perhaps there is a good reason why such caching isn’t done.



Nevertheless, what’s causing this wait event is too frequent aio descriptor resize due to changes in direct IO slot count (due to changes in PGA workarea/temp segment and perhaps when doing frequent direct path scans through lots of tables/partitions too).



So, the obvious question here is what to do about this wait event? Well, first you should check how big part of your total response time this event takes at all?



  1. If it’s someting like 1% of your response time, then this is not your problem anyway and troubleshooting this further would be not practical – it’s just how Oracle works :)
  2. If it’s something like 20% or more of your response time, then it’s clearly a problem and you’d need to talk to Oracle Support to sort out the bug
  3. If it’s anything in between, make sure you don’t have an IO problem first, before telling that this is a bug. In one recent example I saw direct path reads take over a second on average when this problem popped up. The asynch descriptor resize wait event may well disappear from the radar once you fix the root cause – slow IO (or SQL doing too much IO). Remember, the asynch descriptor resize wait event, at least on Linux, is actually an IO wait event, the process is waiting for outstanding IO completion before the descriptor count increase/decrease can take place.

Share