Search

OakieTags

Who's online

There are currently 0 users and 25 guests online.

Recent comments

Affiliations

Troubleshooting

Subquery Selectivity – 2

Here’s an item I thought I’d published a few years ago as a follow-up to an article on a 10g bug-fix for subquery selectivity. I was reminded of my oversight when a question came up on OTN that looked like an example of the bug introduced by the bug-fix – and I found that I couldn’t simply supply a link to explain the problem.

We start with some simple SQL to create a test data set:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,729)		id1,
	rownum			id2,
	mod(rownum,11)		n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t1 add constraint t1_pk primary key(id1, id2);

create table t2
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,737)		id1,
	trunc((rownum-1)/737)	id2,
	mod(rownum,11)		n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t2 add constraint t2_pk primary key(id1, id2);

-- gather statistics, compute, no histograms

Having created these data sets, we can run the following query against them. It’s a very simple query with aggregate subquery. I’ve included a no_unnest hint because many newer versions of Oracle would do a cost-based transformation on something this simple and unnest the subquery. I want to keep the code and numbers simple while demonstrating a principle. (In the earlier article I used a two-table join with two existence subqueries to make a point about how a change in arithmetic could cause a change in execution plan; in this article I’m just going to show you the change in arithmetic.)


select
	*
from
	t1
where
	1 = (
		select
			/*+ no_unnest */
			max(n1)
		from
			t2
		where
			t2.id1 = t1.id1
	)
;

Here are the execution plans from 9.2.0.8 and 11.1.0.6 respectively (the 10.2 plan matches the 11g plan). For consistency I am not using CPU costing (system statistics) in either case:


Execution Plan from 9.2.0.8
-----------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             | 10000 |  1201K|    13M|
|*  1 |  FILTER                       |             |       |       |       |
|   2 |   TABLE ACCESS FULL           | T1          | 10000 |  1201K|  2712 |
|   3 |   SORT AGGREGATE              |             |     1 |     7 |       |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2          |  1357 |  9499 |  1363 |
|*  5 |     INDEX RANGE SCAN          | T2_PK       |  1357 |       |     6 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ NO_UNNEST */ MAX("T2"."N1") FROM "T2" "T2" WHERE
              "T2"."ID1"=:B1)=1)
   5 - access("T2"."ID1"=:B1)

Execution Plan from 11.1.0.6 (matches plan from 10.2.0.3)
-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |    14 |  1722 |   996K|
|*  1 |  FILTER                       |       |       |       |       |
|   2 |   TABLE ACCESS FULL           | T1    |  1000K|   117M|  2712 |
|   3 |   SORT AGGREGATE              |       |     1 |     7 |       |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2    |  1357 |  9499 |  1363 |
|*  5 |     INDEX RANGE SCAN          | T2_PK |  1357 |       |     6 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ NO_UNNEST */ MAX("N1") FROM "T2" "T2" WHERE
              "T2"."ID1"=:B1)=1)
   5 - access("T2"."ID1"=:B1)

You will notice, of course, that the estimated cardinality of the result (line 0) is different, and the estimated cardinality of the scan on t1 (line 2) is different. There was an important code change introduced in 10g relating to the handling of subqueries that explains this effect.

In the previous article I pointed out that an existence subquery introduces a selectivity of 5% – and that the upgrade to 10g changed the point in the optimizer’s arithmetic where this 5% was applied (hence where the plan showed a change in cardinality). In the example above my code is of the form ” = (subquery) “, which means Oracle applies a “magic” 1% rather than 5%. (For a range-based predicate – i.e. <, > etc – it would have been the 5% that we got for existence.)

So in the 9i plan, line 2 shows a cardinality of 10,000 because that’s 1% of the original 1,000,000 rows but, as I pointed out in the earlier article, this means the optimizer will be working out the rest of the plan based on the estimated effect of a filter subquery that has not yet run. (Do read the previous article if you don’t follow that last comment.)

From 10g onwards (and in our 11.1 plan) the cardinality for line 2 is 1,000,000 because that’s how many rows will appear in the rowsource as the table is scanned. But then you have to ask why Oracle didn’t print a cardinality on line 1 (the FILTER line) because it’s at that line that we should see a cardinality of 10,000 as the filter selectivity of 1% is applied.

But there is another problem. The final cardinality of the 11.1 plan is NOT reported as 10,000; it’s reported as 14. As Oracle has fixed one defect in the optimizer it has introduced another (whether by accident, or by design, I don’t know). The estimate of 14 rows is effectively double-counting. The optimizer has applied the the 1% selectivity for the subquery, and then applied the selectivty of the predicate “t1.id1 = {unknown value}”. (If the predicate had been “> (subquery)” or some other range-based operator, or existence, then the arithmetic would have applied a 5% selectivity on top of that “t1.id1 = {unknown value}” selectivity.

In this case, of course, it probably doesn’t matter that the final cardinality is wrong – but in more complex cases this error may be introduced somewhere in the middle of the plan, making the optimizer do something very silly from that point onwards.

There is a surprising “workaround” to this issue – which may not be relevant in all cases and may not even work in some. If you force Oracle into running the subquery early (with the push_subq hint) the double-counting doesn’t occur:

select
	/*+ push_subq */	-- 9i hint
	*
from
	t1
where
	1 = (
		select
			/*+ no_unnest push_subq */	-- 10g+ hint (if you don't use query block naming)
			max(n1)
		from
			t2
		where
			t2.id1 = t1.id1
	)
;

-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       | 10000 |  1201K|  4075 |
|*  1 |  TABLE ACCESS FULL            | T1    | 10000 |  1201K|  2712 |
|   2 |   SORT AGGREGATE              |       |     1 |     7 |       |
|   3 |    TABLE ACCESS BY INDEX ROWID| T2    |  1357 |  9499 |  1363 |
|*  4 |     INDEX RANGE SCAN          | T2_PK |  1357 |       |     6 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ PUSH_SUBQ NO_UNNEST */ MAX("N1") FROM "T2"
              "T2" WHERE "T2"."ID1"=:B1)=1)
   4 - access("T2"."ID1"=:B1)

It strikes me as a little bizarre that this hint has any effect at all in this case since there is no “earlier point” at which the subquery can run. But it has pushed the optimizer through a different code path, which has produced a plan with a different shape and avoided the double-counting.

Footnote: I’ve just included in the final query text a reminder that the push_subq hint changed in the upgrade from 9i to 10g. Originally it went at the top of the query and caused every subquery to be pushed; from 10g you have to be selective about the subqueries you want pushed. The hint at the top will be “ignored” and you apply the hint to each subquery that you want pushed by writing it in the subquery (as I have here) or by naming each subquery with the qb_name hint and then putting the push_subq(@{your subquery name}) hint(s) at the top of the query, naming the subqueries to be pushed.

Fake Baselines

SQL Baslines in 11g are the new Stored Outlines – and one of the nicest features of SQL Baselines is that you are allowed to fake them; or rather, it’s legal to generate an execution plan for one query and transfer its execution plan to another query using the packaged procedure dbms_spm.load_plans_from_cursor_cache(). This posting is a demonstration of the technique.

We start with a sample data set and a query that is going to do “the wrong thing”. As usual I’ve got a locally managed tablespace with 8KB blocks and 1MB uniform extents, freelist management, and I’m running with CPU Costing disabled (and running 11.1.0.6 in this case):


set serveroutput off

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	mod(rownum,1000)	n1000,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create index t1_n1 on t1(n1000);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/
select
	/*+ target_query */
	id
from
	t1
where
	n1000 = 500
;

The final query is the one that I want to fake a baseline for. With my current set up it does an index range scan to pick up 10 rows, but I’m going to make it do a tablescan instead. I’m going to need to pull the exact text of the query from memory in a moment, so I’ll find its sql_id and child_number by searching for the “pseudo-hint” that I’ve included in the text, and I’ll report the execution plan to show that I’ve picked up the right thing (I’m assuming that there’s only one piece of SQL that’s going to include the text “target_query”, of course):

column	sql_id			new_value	m_sql_id_1
column 	plan_hash_value		new_value	m_plan_hash_value_1
column	child_number		new_value	m_child_number_1

select
	sql_id, plan_hash_value, child_number
from
	v$sql
where
	sql_text like '%target_query%'
and	sql_text not like '%v$sql%'
and	rownum = 1
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_1',&m_child_number_1));

SQL_ID  306m98cpu9yz7, child number 0
-------------------------------------
select  /*+ target_query */  id from  t1 where  n1000 = 500

Plan hash value: 1420382924

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |       |       |    11 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    10 |    80 |    11 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |    10 |       |     1 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1000"=500)

The next step is to create a new query (and in a production system that might simply mean running a heavily hinted version of the target query) that uses the execution plan I want to see; and I’ll use the same search technique to find it and report its plan:

select
	/*+ full(t1) alternate_query */
	id
from
	t1
where
	n1000 = 500
;

column	sql_id			new_value	m_sql_id_2
column 	plan_hash_value		new_value	m_plan_hash_value_2
column	child_number		new_value	m_child_number_2

select
	sql_id, plan_hash_value, child_number
from
	v$sql
where
	sql_text like '%alternate_query%'
and	sql_text not like '%v$sql%'
and	rownum = 1
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_2',&m_child_number_2));

SQL_ID  bvpb73bb6c6uy, child number 0
-------------------------------------
select  /*+ full(t1) alternate_query */  id from  t1 where  n1000 = 500

Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    28 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |    80 |    28 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1000"=500)

After been running and reporting these queries and their plans, I’ve captured the SQL_Id, child_number, and plan_hash_value for each query; and this is more than enough information to make it possible to create an SQL Baseline for one query using the execution plan for the other query.

declare
	m_clob	clob;
begin
	select
		sql_fulltext
	into
		m_clob
	from
		v$sql
	where
		sql_id = '&m_sql_id_1'
	and	child_number = &m_child_number_1
	;

	dbms_output.put_line(m_clob);

	dbms_output.put_line(
		dbms_spm.load_plans_from_cursor_cache(
			sql_id 			=> '&m_sql_id_2',
			plan_hash_value		=> &m_plan_hash_value_2,
			sql_text		=> m_clob,
			fixed			=> 'NO',
			enabled			=> 'YES'
		)
	);

end;
/

I used the SQL_ID and child_number from the first query to get the full SQL text of the query into an in-memory CLOB, and then use the SQL_id and plan_hash_value from the second query to associate the second plan with the first query – storing the result as a SQL Baseline that is enabled and ready for use.

You’ll have to take my word that I haven’t faked the following text, but this is what I get when I re-run the original query (flushing the shared pool first to make sure that I don’t accidentally end up picking up the original child cursor):

alter system flush shared_pool;

select
	/*+ target_query */
	id
from
	t1
where
	n1000 = 500
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_1',null));

SQL_ID  306m98cpu9yz7, child number 1
-------------------------------------
select  /*+ target_query */  id from  t1 where  n1000 = 500

Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    28 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |    80 |    28 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1000"=500)

Note
-----
   - cpu costing is off (consider enabling it)
   - SQL plan baseline SYS_SQL_PLAN_3c0ea7f3dbd90e8e used for this statement

It’s so much easier than the fiddling one had to do for stored outlines which was quite easy in 8i, but got a little nasty in 9i because of the extra (undocumented) details that looked as if they might have been necessary when the third table appeared in the outln schema. However, in 10g, the dbms_outln package was enhanced to allow you to create outlines from the library cache – see this note from Randolf Geist for more details – but remember that stored outlines will be deprecated in Oracle 12.

Footnote: the dbms_spm package is created by script $ORACLE_HOME/rdbms/admin/dbmsspm.sql, and there is a comment near the top saying:


--           Grant the execute privilege on SPM package to public.           --
--           But ADMINISTER SQL MANAGEMENT OBJECT privilege is checked       --
--           before a user is allowed to execute.                            --

SQL Plan Baselines

Here’s another of my little catalogues of articles – this one on SQL Plan Baselines.

Be a little careful as you read through these notes – there are various changes in internal mechanisms, storage, etc. as you go through different versions of Oracle, so check which version the author is writing about.

Maria Colgan of Oracle Corp.

Christian Anognini:

Kerry Osborne

Tim Hall

Jason Arneil

Tracking Down Windows Memory Leaks

This is just a small windows O/S related note, covering how to track down memory usage of a specific windows service. This all happened on Windows Vista. The same steps are likely to work perfectly well on Windows 7 and both flavours of Server 2008. I’d expect them to also work on earlier versions of [...]

Join Surprise

Imagine I have a simple SQL statement with a “where clause” that looks like this:


	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2

Would you expect it to run several times faster (25 minutes instead of a few hours) when the only change you made was to swap the order of the join predicates to:


	t2.id2(+) = t1.id2
and	t2.id1(+) = t1.id1

You may recall that a couple of years ago I wrote about some bugs in the optimizer, and pointed you to a blog article by Alberto Dell’Era that demonstrated an anomaly in cardinality calculations that made this type of thing possible. But here’s an example which has nothing to do with cardinality errors. We start with a suitable dataset – running on 11.1.0.6.


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	trunc(dbms_random.value(1,1000))	id1,
	trunc(dbms_random.value(1,1000))	id2,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',1000)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 7
)
select
	t1.id1,
	t1.id2,
	v1.id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',70)		padding
from
	t1		t1,
	generator	v1
;

-- collect stats, compute, no histograms

This data set models a problem – stripped to the bare essentials – that I came across at a client site some time ago. We have a “parent/child” relationship between the tables (although I haven’t declared the referential integrity), with roughly seven child rows per parent. The parent rows are quite long, the child rows are quite short. Some parents may not have children (although in this data set they do).

We now run a “report” that generates data for a number-crunching tool that extracts all the data from the tables – using an outer join so that parent rows don’t get lost. For various reasons the tool wanted the data sorted in a certain order – so there’s also an order by clause in the query. I’m going to show you the original query – first unhinted, and then hinted to use a merge join:


select
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2
order by
	t1.id2,
	t1.id1
;

---------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      | 10000 |    10M|       |  3720   (1)| 00:00:45 |
|   1 |  SORT ORDER BY         |      | 10000 |    10M|    22M|  3720   (1)| 00:00:45 |
|*  2 |   HASH JOIN RIGHT OUTER|      | 10000 |    10M|  6224K|  1436   (1)| 00:00:18 |
|   3 |    TABLE ACCESS FULL   | T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
|   4 |    TABLE ACCESS FULL   | T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")

select
	/*+ leading(t1 t2) use_merge(t2) */
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2
order by
	t1.id2,
	t1.id1
;

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 10000 |    10M|       |  6343   (1)| 00:01:17 |
|   1 |  SORT ORDER BY       |      | 10000 |    10M|    22M|  6343   (1)| 00:01:17 |
|   2 |   MERGE JOIN OUTER   |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   3 |    SORT JOIN         |      | 10000 |  9853K|    19M|  2509   (1)| 00:00:31 |
|   4 |     TABLE ACCESS FULL| T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
|*  5 |    SORT JOIN         |      | 70000 |  5400K|    12M|  1549   (1)| 00:00:19 |
|   6 |     TABLE ACCESS FULL| T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")
       filter("T2"."ID2"(+)="T1"."ID2" AND "T2"."ID1"(+)="T1"."ID1")

But there’s something a little odd about how the optimizer has chosen to do the merge join. Although our join condition references the join columns in the order (id1, id2) our final sort order is on (id2, id1) – and the optimizer hasn’t taken advantage of the fact that it could do the “sort join” operations in the order (id2, id1) and avoid the final “sort order by” at line 1.

So let’s rewrite the query to make the order of the join predicates match the order of the order by clause, and see what happens to the plan:


select
	/*+ leading(t1 t2) use_merge(t2) */
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id2(+) = t1.id2
and	t2.id1(+) = t1.id1
order by
	t1.id2,
	t1.id1
;

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   1 |  MERGE JOIN OUTER   |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   2 |   SORT JOIN         |      | 10000 |  9853K|    19M|  2509   (1)| 00:00:31 |
|   3 |    TABLE ACCESS FULL| T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
|*  4 |   SORT JOIN         |      | 70000 |  5400K|    12M|  1549   (1)| 00:00:19 |
|   5 |    TABLE ACCESS FULL| T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."ID2"(+)="T1"."ID2" AND "T2"."ID1"(+)="T1"."ID1")
       filter("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")

The plan no longer has the final “sort order by” operation – and the cost of the plan is much lower as a consequence.. You’ll also notice that the predicate sections (always check the predicate section) are a little different – the order of evaluation has been reversed.

In my test case the cost of the merge join still hasn’t fallen below the cost of the hash join – but in the case of the client changing the order of predicates – without adding any hints – made the cost of the merge join much cheaper than the cost of the hash join. Fortunately this was a case where the cost was a realistic indication of run time and avoiding a sort operation of some 35GB of join result was a very good move.

So watch out – with multi-column joins, the order of the join predicates can make a big difference to the way Oracle operates a merge join.

Index ITL fix

Over the last year I’ve written a short collection of articles describing how a defect in the code for handling indexe leaf node splits can cause a significant waste of space in indexes that are subject to a high degree of concurrent inserts. Finally I can announce some good news – Oracle has a patch for the code.

The problem is described in MOS (the database formerly known as Metalink) under bug 8767925 which is reported as “fixed in version 12.1″.

Backports may become available – I’ve already asked for one for 11.1.0.7 on AIX for one of my clients (but’s it has been a little slow in arriving) and another client should have put in a bid for a backport to 10.2.0.4 (also on AIX) in the last couple of days.

[Further reading on Index ITL Explosion]

Geek Stuff – 2

This is a post specially for Junping Zhang, who has been waiting patiently for a follow-up to my geek post about find the system-level setting for parameter use_stored_outlines. His question was: “Is it also possible to find the session-level setting ?”

The answer is yes – but it’s harder, version specific, requires a little preparation, and only allows you to see the setting for your own session.

The biggest problem is that the session-level setting is stored in the session’s “fixed global area”, and Oracle Corp. has not exposed an map of this memory as an X$ structure – so we have to find a way of mapping it ourselves – and the map will change with version and platform.

We start by using oradebug to dump a copy of the session global area to a trace file (and this operation requires you to have a fairly high level of privilege in your system) then searching through the trace file for the value “uso”. Once you’ve done this once you find that the exact name of the variable you want is ugauso_p, and you can find its address.

Once you know the exact name of the variable you’re after you can dump its details to screen rather than to a trace file – the following is a cut-n-paste from a 10.2.0.3 session (with the first few lines of the trace file dump inserted into the middle of the output):


SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump global_area 4
Statement processed.

	********************************
	----- Dump of the Fixed UGA -----
	ub4 ksmugmg1 [91173FC, 9117400) = 0000ABAB
	kghds * ksmughpp [9117400, 9117404) = 09117404
	ksmpy ksmuh_p [9117404, 9117520) = 05E1F6D0 0000FFB8 21595A04 091E0048 ...
	... etc. ...
	********************************

SQL> oradebug dumpvar uga ugauso_p
qolprm ugauso_p [9117A44, 9117A68) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
SQL> alter session set use_stored_outlines=ABCDEFG;

Session altered.

SQL> oradebug dumpvar uga ugauso_p
qolprm ugauso_p [9117A44, 9117A68) = 00000001 42410007 46454443 00000047 00000000 00000000 00000000 00000000 00000000
SQL>

The numbers in the brackets [9117A44, 9117A68) give you the address of the variable in your session’s virtual memory space – so if you can find this pair of addresses from SQL or PL/SQL you can write some code to read x$ksmmem to get the current value for the variable. One way of working out the addresses is to query x$ksmpp – the list of pga memory pointers.


SQL> select * from x$ksmpp;

ADDR           INDX    INST_ID KSMCHCOM         KSMCHPTR   KSMCHSIZ KSMCHCLS   KSMCHTYP KSMCHPAR
-------- ---------- ---------- ---------------- -------- ---------- -------- ---------- --------
091E8AA4          0          1 kpuinit env han  08EEAB80       1584 freeabl           0 00
091E8A6C          1          1 Alloc environm   08EE9B50       4144 recr           4095 08EEB108
...
091E7F0C         53          1 Fixed Uga        091173F0      20620 freeabl           0 00
...
091E7DBC         59          1 PLS non-lib hp   091CBFB0       8224 freeabl           0 08F9E02C
091E7D84         60          1 free memory      091C0010      49056 free              0 00

61 rows selected.

SQL> spool off

The item we are interested in is the “Fixed Uga” which (for this session, on this machine, on this operating system, on this version of Oracle, after this startup of the instance, etc. etc. etc.) is based at 0x091173f0. Although there are many reasons why the location of the fixed uga isn’t fixed in absolute terms, we know that for a given platform and version, the contents of the fixed uga really are “fixed”. So we can work out the offset of the ugauso_p for this platform and version by subtracting the address of the start of the fixed uga from the address we go from oradebug for ugauso_p, viz: 0x9117A44 – 0x091173F0 = 0×0654 = 1,620.

Once we have the offset we can write a pl/sql function to find the start of the fixed uga, jump to the offset, then walk through virtual memory picking up the bytes we need. Basically it’s the same code that I used for the system-level parameter, with a change in how to find the starting address of the value.

create or replace function session_uso
return varchar2
as
	uso		varchar2(30);
	mem_addr	x$ksmmem.addr%type := hextoraw('0E9A7838');
	mem_ct		number;
	mem_indx	number;
	raw_val		x$ksmmem.ksmmmval%type;
	indx_temp	number;
	col_size	number;
	raw_char	varchar2(64) := null;

begin
	select
		type_size
	into
		session_uso.col_size
	from
		v$type_size
	where
		type = 'SIZE_T'
	;

	select
		ksmchptr,
		ksmchsiz
	into
		session_uso.mem_addr,
		session_uso.mem_ct
	from
		x$ksmpp
	where
		ksmchcom = 'Fixed Uga'
	;

	select
		indx
	into
		session_uso.mem_indx
	from
		x$ksmmem
	where
		addr = session_uso.mem_addr
	;

	dbms_output.put_line('Memory address: ' || rawtohex(session_uso.mem_addr));
	dbms_output.put_line('Start location: ' || session_uso.mem_indx);
	dbms_output.put_line('Nominal Length: ' || session_uso.mem_ct);

--
--	Assume that the 11.1.0.6 offset to USO is 2504 bytes
--	And that the first t_type is a counter, then each
--	value after that is a letter until we hit 00 for a
--	maximum of 32 letters
--

	mem_indx := mem_indx + 2504/col_size;		-- 11.1.0.6 on Windows XP 32-bit
--	mem_indx := mem_indx + 1620/col_size;		-- 10.2.0.3 on windows XP 32 bit

	for i in 1 .. 32 loop

--		Have to do the silly addition, or
--		we get a "tablescan"  of x$ksmmem

		session_uso.indx_temp := session_uso.mem_indx + i;

		select
			--+ index(x$ksmmem)
			ksmmmval
		into
			session_uso.raw_val
		from
			x$ksmmem
		where
			indx = session_uso.indx_temp
		;

		dbms_output.put_line(session_uso.raw_val);
		exit when session_uso.raw_val = hextoraw('00');

		raw_char :=
			rawtohex(session_uso.raw_val) ||
			rawtohex(session_uso.raw_char);

	end loop;

	if session_uso.raw_char is null then
		session_uso.uso := 'use_stored_outlines is not set';
	else
		for i in reverse 1..(length(session_uso.raw_char)-4) / 2 loop
			exit when substr(session_uso.raw_char, 2*i - 1, 2) = '00';
			session_uso.uso :=
				session_uso.uso ||
				chr(to_number(substr(session_uso.raw_char, 2*i - 1, 2), 'XX'));
		end loop;
	end if;
	return	session_uso.uso;

end;
/

Just as a quick demonstration – here’s a cut-n-paste (including the debug output) from a normal session after creating the function in the sys account of a 10.2.0.3 database and making the function publicly available through a synonym:

SQL> execute dbms_output.put_line(session_uso)
Memory address: 06F373F0
Start location: 13118004
Nominal Length: 20620
00
use_stored_outlines is not set

PL/SQL procedure successfully completed.

SQL> alter session set use_stored_outlines = long_category;

Session altered.

SQL> execute dbms_output.put_line(session_uso)
Memory address: 06F373F0
Start location: 13118004
Nominal Length: 20620
4F4C000D
435F474E
47455441
0059524F
00
LONG_CATEGORY

PL/SQL procedure successfully completed.

SQL>

Asynch descriptor resize wait event in Oracle

A lot of people have started seeing “asynch descriptor resize” wait event in Oracle 11gR2. Here’s my understanding of what it is. Note that I didn’t spend too much time researching it, so some details may be not completely accurate, but my explanation will at least give you an idea of why the heck you suddenly see this event in your database.

FYI, there’s a short, but incomplete explanation of this wait event also documented in MOS Note 1081977.1



The “direct path loader” (KCBL) module is used for performing direct path IO in Oracle, such as direct path segment scans and reading/writing spilled over workareas in temporary tablespace. Direct path IO is used whenever you see “direct path read/write*” wait events reported in your session. This means that IOs aren’t done from/to buffer cache, but from/to PGA directly, bypassing the buffer cache.



This KCBL module tries to dynamically scale up the number of asynch IO descriptors (AIO descriptors are the OS kernel structures, which keep track of asynch IO requests) to match the number of direct path IO slots a process uses. In other words, if the PGA workarea and/or spilled-over hash area in temp tablespace gets larger, Oracle also scales up the number of direct IO slots. Direct IO slots are PGA memory structures helping to do direct IO between files and PGA.



In order to be able to perform this direct IO asynchronously, Oracle also dynamically scales up the number of OS asynch IO descriptors, one for each slot (up to 4096 descriptors per process). When Oracle doesn’t need the direct IO slots anymore (when the direct path table scan has ended or a workarea/tempseg gets cancelled) then it scales down the number of direct IO slots and asynch IO descriptors. Scaling asynch IO descriptors up/down requires issuing syscalls to OS (as the AIO descriptors are OS kernel structures).



I guess this is supposed to be an optimization, to avoid running out of OS AIO descriptors, by releasing them when not they’re not needed, but as that Metalink note mentioned, the resize apparently sucks on Linux. Perhaps that’s why other ports also suffer and have seen the same wait event.



The “asynch descriptor resize” event itself is really an IO wait event (recorded in the wait class Other though), waiting for reaping outstanding IOs. Once this wait is over, then the OS call to change the amount of asynch IO descriptors (allocated to that process) is made. There’s no wait event recorded for the actual “resize” OS call as it shouldn’t block.



So, the more direct IO you do, especially when sorting/hashing to temp with frequent workarea closing/opening, the more of this event you’ll see (and it’s probably the same for regular tablespace direct path IO too).



This problem wouldn’t be noticeable if Oracle kept async io descriptors cached and wouldn’t constantly allocated/free them. Of course then you may end up running out of aio descriptors in the whole server easier. Also I don’t know whether there would be some OS issues with reusing cached aio descriptors, perhaps there is a good reason why such caching isn’t done.



Nevertheless, what’s causing this wait event is too frequent aio descriptor resize due to changes in direct IO slot count (due to changes in PGA workarea/temp segment and perhaps when doing frequent direct path scans through lots of tables/partitions too).



So, the obvious question here is what to do about this wait event? Well, first you should check how big part of your total response time this event takes at all?



  1. If it’s someting like 1% of your response time, then this is not your problem anyway and troubleshooting this further would be not practical – it’s just how Oracle works :)
  2. If it’s something like 20% or more of your response time, then it’s clearly a problem and you’d need to talk to Oracle Support to sort out the bug
  3. If it’s anything in between, make sure you don’t have an IO problem first, before telling that this is a bug. In one recent example I saw direct path reads take over a second on average when this problem popped up. The asynch descriptor resize wait event may well disappear from the radar once you fix the root cause – slow IO (or SQL doing too much IO). Remember, the asynch descriptor resize wait event, at least on Linux, is actually an IO wait event, the process is waiting for outstanding IO completion before the descriptor count increase/decrease can take place.

Share

Quiz Night

Apart from the fact that the “Rows” figure for the FILTER operation at line 6 is blank, what’s the obvious error in this extract from an execution plan:

-------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  |
-------------------------------------------------------------------------
        ...
|   5 |      NESTED LOOPS                    |                  |  3864 |
|   6 |       FILTER                         |                  |       |
|   7 |        HASH JOIN OUTER               |                  |  3864 |
|   8 |         HASH JOIN OUTER              |                  |   282K|
|   9 |          TABLE ACCESS BY INDEX ROWID | PRODUCT          |   282K|
|  10 |           INDEX RANGE SCAN           | PRD_SUPP_I1      |   282K|
|  11 |          VIEW                        |                  |  2293K|
|  12 |           HASH GROUP BY              |                  |  2293K|
|  13 |            PARTITION LIST SINGLE     |                  |  5790K|
|  14 |             TABLE ACCESS FULL        | PRODUCT_PRICING  |  5790K|
|  15 |         VIEW                         |                  |  2307K|
|  16 |          HASH GROUP BY               |                  |  2307K|
|  17 |           PARTITION LIST SINGLE      |                  |  5703K|
|  18 |            TABLE ACCESS FULL         | PRODUCT_PRICING  |  5703K|
        ...
-------------------------------------------------------------------------

Update 21/Nov/2010:
Once again I am reminded of two things – it’s important to be precise in your use of language if you want people to understand the question; and you can see a lot if you look carefully.

If you start to think about the activity that the plan represents, and the SQL that might have produced it, there are some ideas you might get about re-writing the query to be more efficient – but the point I was trying to make is that there is clearly an error in the content that the optimizer is displaying. The error suggests either that the optimizer has done the wrong arithmetic, or that the output is not a correct copy of the results produced by the optimizer.

The answer I was expecting comes from line 7. Stripping the error back to the bare minimum we see this:

--------------------------------------------------------------------------
| Id  | Operation                             | Name             | Rows  |
--------------------------------------------------------------------------
        ...
|   7 |        HASH JOIN OUTER                |                  |  3864 |
|   8 |         rowsource 1 (HASH JOIN OUTER) |                  |   282K|
|  15 |         rowsource 2 (VIEW)            |                  |  2307K|
        ...
--------------------------------------------------------------------------

As Milo points out in comment 3, In an outer join the result set cannot have fewer rows than the “preserved” rowsource (which, in this case, is the result set from line 8). I mentioned the fact that the “Rows” figure for the FILTER operation at line 6 was blank – it’s just possible that the optimizer has overwritten the figure in line 7 with the figure that should have been in line 6; there are cases where a FILTER operation and the operation you would normally think of as its first child are combined, so it’s possible that a little storage glitch has appeared in some cases where the combination rule doesn’t apply.

Someone did mention the FILTER operation and pointed out that it wasn’t filtering any data. The commonest forms of FILTER operation essentially check that some predicate it true for each row in their first child rowsource – and it is possible for someone to write code that has a filter that doesn’t eliminate any rows. In fact, though, this plan is probably saying: “line 7 will produce 282K rows, and the filter at line 6 will reduce that to 3,684.” (There’s also a comment about a “group by” not reducing the size of the rowsource – the comment was caused by a parallax error, but it is possible, of course, for Oracle to decide that a “group by” is going to produce an output with just as many rows as the input.)

Sean Molloy’s opening comment asks how you can get two different estimates from the same tablescan — and follows up with one answer which is that since we are looking at PARTITION LIST SINGLE the two tablescans could be of different partitions. But it’s only a puzzle if there were no predicates on the tablescans and, as Pavol points out in comment 7, there are no “star flags” in the ID column to suggest the presence of any predicates – but there are no stars anywhere – and there have to be some predicates in the plan, since you can’t do a hash join, index range scan, or filter without a predicate. As Timur points out – you don’t get the predicate section in the report from dbms_xplan.display_awr(), so you don’t get the stars.

Speaking of missing information, Dave Costa in comment 4 suggests that the user made an error in choosing which bits of the plan to copy. I did the choosing – how could it possible by wrong ! Given the number of times I’ve said “you must include the predicate section”, why isn’t it there ? (answer: it’s display_awr). In fact lines 5 and 6 were redundant as far as the “obvious problem” was concerned – but I thought that the blank and reappearance of the same cardinality might be a helpful visual clue.

The SQL:

Several people have commented on the rationale for code that does two outer join aggregations on the same table. It does look a little unusual, but it’s not possible to come to any conclusion about whether it’s a good thing or a bad thing without knowing the data and the intent of the SQL. For example the intent could be something like:


select
        product_name, min(offer_price), max(offer_price) ...
from

(You can assume that in the actual code, the min() and max() would be hidden inside a couple of inline views)
In this case code which visits the product_pricing table twice might be a very good idea – because there is a special “index (min/max) range scan” optimisation that works very well if (a) have the right indexes in place (b) use a nested loop and (c) only ask for min, or max, but not both. Perhaps our problem is that the optimizer is doing a hash join when it should be doing a nested loop.

Another possibility is that we have something like:


select
        product_name, min(gbp_offer_price), min(usd_offer_price) ...
from

(Again you can assume the min() and max() would be hidden inside inline view, and the different column names would be derived names rather than being from two separate columns in a table).
Notice that the plan shows list partitioning – maybe we have one currency per partition, and we’ve written the query to maximise the benefits of partition elimination (as well as leaving the data nicely normalised, thus maximising efficiency of maintenance).

Bottom line on the SQL – in this case I wasn’t asking people to guess what was wrong with the code; but it’s very interesting to see how many different topics of thought can come out from a starting point of a few lines extracted from a plan.

Autotrace oddity

I got an unexpected error message from SQL*Plus a couple of days ago while trying to enable autotrace with statistics:

SQL> set autotrace  traceonly statistics;
ERROR:
ORA-24315: illegal attribute type

SP2-0619: Error while connecting
SP2-0611: Error enabling STATISTICS report
SQL>

This was a little puzzling, but since I get through a lot of different sites, databases, and instances, I did a quick check to make sure that the plustrace role had been created on this system and that the schema I was using to log on had been granted the role. Everything looked okay … except for this error message. So I reconnected (almost as good as rebooting as a method for solving problems) and got the following error message:

SQL> connect XXXXXXXX/XXXXXXXX@XXXXXXXX       -- no names, no pack drill
ERROR:
ORA-28002: the password will expire within 7 days

Problem solved. When you enable autotrace statistics SQL*Plus tries to create a second session through your current process (which is one of the reasons why v$session can end up with more rows than v$process) . By an unfortunate coincidence or timing, my password had reached the expiry warning point while my initial session was logged on, so the recursive session got error message ORA-28002 when it tried to connect, and SQL*Plus reported this back in the only way that seemed appropriate.

Once I’d worked out what the problem was all I did was change my password, end all my active sessions, and then logon again. If you want a demonstration of the effect, here’s a simple demo of a very similar problem (cut-n-paste from an SQL*Plus session connected to an 11gR1 database – using schema test_user, when the password wasn’t “fred”):

SQL> alter user test_user identified by fred;

User altered.

SQL> set autotrace traceonly statistics;
ERROR:
ORA-01017: invalid username/password; logon denied

SP2-0619: Error while connecting
SP2-0611: Error enabling STATISTICS report
SQL>