Search

OakieTags

Who's online

There are currently 0 users and 31 guests online.

Recent comments

Oakies Blog Aggregator

Add Constraint

Here’s a quirky little detail that may make you think carefully about how you define and load large tables.
I have a large table which I load with data and then apply the following:

alter table t_15400 modify (id not null, small_vc not null);

Would you really expect to find Oracle doing two tablescans on the table to enable these constraints ? This is what I found in a trace file (with a lot of db file scattered read waits and other stuff in between) when I ran the test recently on 11.2.0.3:

select /*+ all_rows ordered */ A.rowid, :1, :2, :3 from "SYS"."T_15400" A where( "ID" is null)
select /*+ all_rows ordered */ A.rowid, :1, :2, :3 from "SYS"."T_15400" A where( "SMALL_VC" is null)

It’s just a little difficult to come up with a good reason for this approach, rather than a single statement that validates both constaints at once.

Somewhere I think I’ve published a note that points out that when you add a primary key constraint Oracle first checks that the key column (or column set) is not null – which means that adding a primary key may also result in a tablescan for every column in the index before the index is created – but in that case you can’t see the SQL that checks each column, you have to infer the check from the number of tablescans and number of rows fetched by tablescan. The trace file is rather more helpful if all you’re doing is adding the not null constraints.

Recent Oracle 8-Socket Xeon E7 TPC-C Result. Big NUMA Box, No NUMA Parameters.

I’ve read through the full disclosure report from Oracle’s January 2012 TPC-C. I’ve found that the result was obtained without using any NUMA init.ora parameters (e.g., enable_NUMA_support). The storage was a collection of Sun x64 servers running COMSTAR to serve up F5100 flash storage. The storage connectivity was 8GFC fibre channel. This was a non-RAC result with 8s80c160t Xeon E7. The only things that stand out to me are:

  1. The settings of disk_async_io=TRUE. This was ASM on raw disk so I should think ASYNC would be the default. Interesting.
  2. Overriding the default number of DBWR processes by setting db_writer_processes. The default number of DBWR processes would be 20 so the benchmark team increased that 60%. Since sockets are NUMA “nodes” on this architecture the default of 20 would render 2.5 DBWR per “node.” In my experience it is beneficial to have DBWR processes an equal multiple of the number of sockets (NUMA nodes) so if the benchmark team was thinking the way I think they went with 4x socket count.

The FDR is here: http://c970058.r58.cf2.rackcdn.com/fdr/tpcc/Oracle_X4800-M2_TPCC_OL-UEK-FDR_011712.pdf

For more information about the missing enable_NUMA_support parameter see: Meet _enable_NUMA_support: The if-then-else Oracle Database 11g Release 2 Initialization Parameter.

For a lot more about NUMA as it pertains to Oracle, please visit: QPI-Based Systems Related Topics (e.g., Nehalem EP/EX, Westmere EP, etc)

On the topic of increasing DBWR processes I’d like to point out that doing so isn’t one of those “some is good so more must be better” situations. For more reading on that matter I recommend:

Over-Configuring DBWR Processes Part I

Over-Configuring DBWR Processes Part II

Over-Configuring DBWR Processes Part III

Over-Configuring DBWR Processes Part IV

The parameters:

Got A Big NUMA Box For Running Oracle? Take Care To Get Interrupt Handling Spread Across The Sockets Evenly
Page 310 of the FDR shows the following script used to arrange good affinity between the FC HBA device drivers and the sockets. I had to do the same sort of thing with the x4800 (aka Exadata X2-8) back before I left Oracle’s Exadata development organization. This sort of thing is standard but I wanted to bring the concept to your attention:


#!/bin/bash
 service irqbalance stop
 last_node=-1
 declare -i count=0
 declare -i cpu cpu1 cpu2 cpu3 cpu4
 for dir in /sys/bus/pci/drivers/qla2xxx/0000*
do
 node=`cat $dir/numa_node`
 irqs=`cat $dir/msi_irqs`
 if [ "`echo $irqs | wc -w`" != "2" ] ; then
 echo >&2 "script expects 2 interrupts per device"
 exit 1
 fi
first_cpu=`sed 's/-.*//' < $dir/local_cpulist` 
echo $node $irqs $first_cpu $dir done | sort | while read node irq1 irq2 cpu1 dir 
do 
cpu2=$cpu1+10 
cpu3=$cpu1+80 
cpu4=$cpu1+90 
if [ "$node" != "$last_node" ]
then 
count=1 cpu=$cpu1 
else 
count=$count+1 
case $count in 
2) cpu=$cpu2;; 
3) cpu=$cpu3;; 
4) cpu=$cpu4;; 
*) echo "more devices than expected on node $node" count=1 cpu=$cpu1;; 
esac 
fi 
last_node=$node 
echo "#$dir" 
echo "echo $cpu > /proc/irq/$irq1/smp_affinity_list"
 echo "echo $cpu > /proc/irq/$irq2/smp_affinity_list"
 echo
 echo $cpu > /proc/irq/$irq1/smp_affinity_list
 echo $cpu > /proc/irq/$irq2/smp_affinity_list
 done

Filed under: oracle

Vote Peter Robson!

The UKOUG Council elections are in progress and I think User Groups are a critical part of the community. Peter Robson is both a friend and the type of person I admire and want to represent me within the UKOUG but, sadly, the grumpy old sod is possibly the least Web 2.0 chap I know so, concerned that his campaigning efforts might be hindered, I asked him if he'd like to make his pitch here. Over to you, Peter ....


Thanks to Doug for giving me this impromptu platform for a bit more shouting about my candidacy for the Council of the UK Oracle Users Group. It’s only right, as it was he who first persuaded me to stand for the Board (as it then was) all those years ago.

So if you are not a member of the UKOUG, I suppose you’d better stop reading now, unless I can persuade you to join up! My pitch here is to first of all, persuade you, as a member of UKOUG, to at least vote, and then secondly, hopefully convince you that I should be one of your five votes.

I have been actively involved in the UKOUG for about eight years now, both as a director of the original Board, and as an active member of the Scottish User Group committee. Now that I’m retired, I have the time to spend on these things, and my word, it can eat up one’s time, make no mistake! But it is so worthwhile. Time after time we have found that the one thing which members value are the opportunities to get together and talk about their work. Just ask Doug – he always turns up to the Birmingham Conference. Indeed, he is so obsessed with networking that he will even travel to San Francisco for Oracle OpenWorld! We can’t rival OOW, but our conferences, and SIGs, do provide the best example for the Oracle crowd to network here in the UK.

Personally, I have been able to bring lots of experience to input towards the organisation of these events by virtue of having presented in most of the European User Group conferences, as well as a few in North America, not to mention the Chris Date seminars that I also organise across the UK and Europe. Close to my heart is the Scottish community, which languished for many years, but is now thriving. Our annual conference is June 13 – Tom Kyte is presenting – make a note and come up to Scotland!

You can read the usual huff and puff about me on the election pages, but now I want to say something about the sort of person that you should look to elect to the Council. Most of all we need passionate, committed individuals, and not people who think they might try it for the benefit of their CV. We need people with the time to devote to the business of UKOUG, people who REALLY have the time to give. This is so important, as not only do we have a large user base to offer a service to, but we have a large office staff of salaried people who depend on us for their livelihood. No way are the office staff coasting – they are all stretched, and give of their very best. Indeed, it has been an absolute pleasure to work with them over the past years. We can only do our very best for them, which is why I stress that we need committed members of Council with sufficient time to devote to the job. With such people, and the skills and expertise of the office staff, I see no reason why UKOUG cannot thrive and grow.

Thanks for reading this stuff – did I ask you to vote for me? Yeah, go on, I would really appreciate that!

Peter Robson

Join Views, ROWIDs And Query Transformations

Here is an odd little bug that was discussed a couple of weeks ago on the OTN forums.

It's about queries on join views by ROWID that fail with "ORA-01410: invalid ROWID" under certain circumstances. The bug can only be reproduced when using the 11.2 code base. In fact the same setup will cause an internal error in 11.1, but 10.2 will return correct results.

It's probably not a very common scenario but it is an interesting example of how features that work fine by themselves can cause problems when used together.

First of all (hopefully) some of you may ask: How is it possible to query from a join view by ROWID, since the view is based on multiple objects and hence doesn't have a simple one-to-one mapping to a ROWID of a single table?

The answer is: It requires "Key-Preserved Tables". This is a term that is frequently used by Oracle in their documentation, in particular in the context of updatable join views. This means that you cannot query every join view by ROWID - it has to meet certain prerequisites otherwise Oracle will throw an error "ORA-01445: cannot select ROWID from, or sample, a join view without a key-preserved table".

Obviously Oracle needs to be able to determine if there is at least one table in the view for which it is guaranteed that its rows can show up at most once in the result set of the view, which means that there need to be a minimum set of unique or primary key constraints at least if the view involves more than a single table. Furthermore Oracle needs to be able to determine this automatically, and depending on the complexity and expressions used, this is not always the case. See this post by Jonathan Lewis about a discussion of possible improvements in this regard. Other database vendors seem to allow at least to some degree more complex queries to be recognized as what Oracle calls "key-preserved".

Let's start with a simple example that creates a simple join view on three tables, furthermore appropriate constraints are created. The sample generates more constraints than strictly necessary to get a key-preserved table. The advantage of these additional constraints is that Oracle can perform a table or join elimination transformation when querying data only from a subset of the tables joined in the view.

drop table t1;

purge table t1;

drop table t2;

purge table t2;

drop table t3;

purge table t3;

drop view v;

drop view v_t1;

create table t1
as
select
rownum as id
, mod(rownum, 1000) + 1 as fk_id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000
;

exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

create table t2
as
select
rownum as id
, mod(rownum, 100) + 1 as fk_id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000
;

exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

create table t3
as
select
rownum as id
, mod(rownum, 10) + 1 as fk_id
, rpad('x', 100) as filler
from
dual
connect by
level <= 100
;

exec dbms_stats.gather_table_stats(null, 't3', method_opt => 'for all columns size 1')

alter table t1 add constraint t1_pk primary key (id);

alter table t2 add constraint t2_pk primary key (id);

alter table t3 add constraint t3_pk primary key (id);

alter table t1 add constraint t1_t2_fk foreign key (fk_id) references t2 (id);

alter table t2 add constraint t2_t3_fk foreign key (fk_id) references t3 (id);

alter table t1 modify fk_id not null;

alter table t2 modify fk_id not null;

alter table t3 modify fk_id not null;

create or replace view
v
as
select
t1.id as t1_id
, t1.fk_id as t1_fk_id
, t1.filler as t1_filler
, t2.id as t2_id
, t2.fk_id as t2_fk_id
, t2.filler as t2_filler
, t3.id as t3_id
, t3.fk_id as t3_fk_id
, t3.filler as t3_filler
from
t1
, t2
, t3
where
t1.fk_id = t2.id
and t2.fk_id = t3.id
;

So this is a simple join, and given the current setup the key-preserved table is T1.

We can verify this by this simple query:

SQL> select object_name from user_objects where object_id = (select sys.dbms_rowid.rowid_object(rowid) from v where rownum <= 1);

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
T1

So Oracle picks automatically T1 as source for the ROWID.

Let's see the join table elimination feature in action by using the following sample:

SQL> column rid new_value rid
SQL>
SQL> select rowid as rid from v where rownum <= 1;

RID
------------------
AAAVTGAAIAAAAEBAAA

SQL>
SQL> select t1_id from v where rowid = '&rid';
old 1: select t1_id from v where rowid = '&rid'
new 1: select t1_id from v where rowid = 'AAAVTGAAIAAAAEBAAA'

T1_ID
----------
1

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID 5stwqhguqgjr0, child number 0
-------------------------------------
select t1_id from v where rowid = 'AAAVTGAAIAAAAEBAAA'

Plan hash value: 487051824

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | TABLE ACCESS BY USER ROWID| T1 | 1 | 20 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------

13 rows selected.

SQL>
SQL> column rid clear

Oracle 11.2 reduces this query to the simplest possible form - the tables T2 and T3 do not even show up in the execution plan. The validated and non-deferrable foreign key constraints ensure that Oracle can safely ignore the join to these tables given that only data from T1 is accessed.

So the Query Transformation seems to work pretty well and everything is fine so far.

Now let's modify the setup a little bit. As you might have noticed I've crafted the T2 and T1 tables deliberately in such a way that I can define a 1:1 relationship between them: The T1.FK_ID column is also unique.

So let's declare a unique constraint on T1.FK_ID to tell the database about this 1:1 relationship. Furthermore I'll define the simplest possible view on top of T1 that does nothing else than simply a SELECT * FROM T1:

alter table t1 add constraint t1_uq unique (fk_id);

create or replace view v_t1 as select * from t1;

create or replace view
v
as
select
t1.id as t1_id
, t1.fk_id as t1_fk_id
, t1.filler as t1_filler
, t2.id as t2_id
, t2.fk_id as t2_fk_id
, t2.filler as t2_filler
, t3.id as t3_id
, t3.fk_id as t3_fk_id
, t3.filler as t3_filler
from
v_t1 t1
, t2
, t3
where
t1.fk_id = t2.id
and t2.fk_id = t3.id
;

Finally I've changed the view V to refer to the view V_T1 instead of the base table.

Let's repeat the check about the source of the ROWID now:

SQL> select object_name from user_objects where object_id = (select sys.dbms_rowid.rowid_object(rowid) from v where rownum <= 1);

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
T2

Oracle now has selected T2 as the key-preserved table which is fine since T2 is now also guaranteed to be unique in the result set of the view.

Finally, let's repeat the query that I used to demonstrate the join elimination query transformation:

SQL> column rid clear
SQL> set echo on verify on
SQL>
SQL> column rid new_value rid
SQL>
SQL> select rowid as rid from v where rownum <= 1;

RID
------------------
AAAVTHAAIAAAAGBAAA

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID 8ns6u01cr94xa, child number 0
-------------------------------------
select rowid as rid from v where rownum <= 1

Plan hash value: 1420877628

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | NESTED LOOPS | | 1 | 20 | 2 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| T1_UQ | 1000 | 4000 | 2 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | T2_PK | 1 | 16 | 0 (0)| |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=1)
4 - access("FK_ID"="T2"."ID")

22 rows selected.

SQL>
SQL> select t1_id from v where rowid = '&rid';
old 1: select t1_id from v where rowid = '&rid'
new 1: select t1_id from v where rowid = 'AAAVTHAAIAAAAGBAAA'
select t1_id from v where rowid = 'AAAVTHAAIAAAAGBAAA'
*
ERROR at line 1:
ORA-01410: invalid ROWID

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID cgvynnw6sthrw, child number 0
-------------------------------------
select t1_id from v where rowid = 'AAAVTHAAIAAAAGBAAA'

Plan hash value: 487051824

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | TABLE ACCESS BY USER ROWID| T1 | 1 | 20 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------

13 rows selected.

SQL>
SQL> column rid clear

Oops, that doesn't look too good: Oracle applied the ROWID to the wrong table respectively eliminated T2 from the execution plan although it uses T2 to obtain the ROWID. As you can see from the execution plan of the initial query that fetches the first ROWID from the view, T2 is not eliminated in that case.

So the moral of the story: Simple View Merging, another Query Transformation together with Join Elimination causes Oracle 11.2 to apply the ROWID to the wrong table in case multiple possible candidates for key-preserved tables exist. You can see this from the optimizer trace file where suddenly a T1.ROWID = '...' predicate pops up.

When replacing the view with the base table the problem cannot be reproduced. Preventing one of the transformations (Join Elimination or View Merging) also prevents the issue.

Interestingly changing the optimizer features to something below 11g also allowed avoiding the bug in the OTN thread, but with the given test case here I can still reproduce the problem on 11.2.0.3 (but not 11.2.0.1 and 11.2.0.2) when setting the OPTIMIZER_FEATURES_ENABLE to 10.2.0.4 for example:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter session set optimizer_features_enable = '10.2.0.4';

Session altered.

SQL> select object_name from user_objects where object_id = (select sys.dbms_rowid.rowid_object(rowid) from v where rownum <= 1);

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
T2

SQL>
SQL> set echo on verify on
SQL>
SQL> column rid new_value rid
SQL>
SQL> select rowid as rid from v where rownum <= 1;

RID
------------------
AAAVAUAAEAAAEnTAAA

SQL>
SQL> select t1_id from v where rowid = '&rid';
old 1: select t1_id from v where rowid = '&rid'
new 1: select t1_id from v where rowid = 'AAAVAUAAEAAAEnTAAA'
select t1_id from v where rowid = 'AAAVAUAAEAAAEnTAAA'
*
ERROR at line 1:
ORA-01410: invalid ROWID

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 72yy78z1ggn75, child number 1
-------------------------------------
select t1_id from v where rowid = 'AAAVAUAAEAAAEnTAAA'

Plan hash value: 396691268

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | NESTED LOOPS | | 1 | 27 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY USER ROWID| T1 | 1 | 20 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | T2_PK | 1 | | 0 (0)| |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("FK_ID"="T2"."ID")

20 rows selected.

SQL>
SQL> column rid clear

Although it can be seen that the lower optimizer features setting resulted in a different plan where T2 wasn't eliminated, the ROWID predicate was still applied to the wrong table, which is clearly a change in behaviour compared to previous releases, of course in particular 10.2.0.4. So this is again one of the cases where setting OPTIMIZER_FEATURES_ENABLE doesn't reproduce exactly the same plan.

So my preliminary analysis in the OTN thread wasn't entirely correct - the new Outer Join elimination Query Transformation introduced in 11g wasn't causing the problems in the original case - these additional transformations were just triggering the side effect: It needs both a join elimination and a view merging transformation. That is the reason why the test case includes a third table. When omitting the third table, or for example querying columns from the third table that prevent the elimination of T3 the error doesn't reproduce either.

Footnote: There are more variations of similar bugs. See for example "Bug 10129357: Assorted errors referencing ROWID from a view with subquery" where other kinds of query transformations result in invalid SQL throwing ROWID related errors.

Advert: Oracle XML Training With Marco Gralike

I was asked by Jože Senegačnik, if I would be would be interested in doing a Masterclass/Seminar in Slovenia and, yes of course, I really liked the idea. So after having a quick look in my agenda, regarding my free time, we started to set things up. This 2 day seminar will take place the

Read More…

Missing Filter

I see that Christian Antognini posted a note about an interesting little defect in Enterprise Manager a little while ago - it doesn’t always know how to interpret execution plans. The problem appears in Christians’ example when a filter subquery predicate is applied during an index range scan – it’s a topic I wrote about a few months ago with the title “filter bug” because the plan shows (or, rather, fails to show) a “missing” filter operation, which has been subsumed into the predicate section of the thing that would otherwise have been the first child of the filter operation – the rule of recursive descent through the plan breaks, and the ordering that OEM gives for the operations goes wrong.

Christian’s example shows the filter hiding itself when it should be applied to an index scan – as does my example from a few months ago – but the operation can also go missing when it is supposed to apply to a table access (whether a full scan, or an access by rowid), and the recursive descent algorithm that OEM is (probably) using to work out the execution order breaks in that case too.

The easiest way to construct an example of the anomaly is to write a query with a subquery which is blocked from unnesting (I’ve done this by using an outer join in the correlated predicate, but you could simply use a no_unnest hint), and then pushed (push_subq) to execute as early as possible in the plan:

select	/*+
                qb_name(main)
                leading(t1@main t2@subq1 t3@main)
                push_subq(@subq1)
        */
        t1.v1
From
        t1, t3
Where
        t1.n2 = 15
and     exists (
                select --+ qb_name(subq1)
                        null
                from    t2
                where   t2.n1(+) = 15
                and     t2.id(+) = t1.id
        )
and     t3.n1 = t1.n1
and     t3.n2 = 15
;

select * from table(dbms_xplan.display(null,null,'basic rows predicate'));

-------------------------------------------------------
| Id  | Operation                     | Name  | Rows  |
-------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     9 |
|*  1 |  HASH JOIN                    |       |     9 |
|*  2 |   TABLE ACCESS FULL           | T1    |     8 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |     1 |
|*  4 |     INDEX UNIQUE SCAN         | T2_PK |     1 |
|*  5 |   TABLE ACCESS FULL           | T3    |   157 |
-------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."N1"="T1"."N1")
   2 - filter("T1"."N2"=15 AND  EXISTS (SELECT /*+ PUSH_SUBQ QB_NAME
              ("SUBQ1") */ 0 FROM "T2" "T2" WHERE "T2"."ID"(+)=:B1 AND
              "T2"."N1"(+)=15))
   3 - filter("T2"."N1"(+)=15)
   4 - access("T2"."ID"(+)=:B1)
   5 - filter("T3"."N2"=15)

As you can see, there’s a filter predicate at line 2, but this doesn’t come from a filter operation; it’s simply a filter subquery applied to the tablescan. To make it easier to read complex cases I sometimes take a text editor to a plan like this and put back the missing filter operation – which means inserting one line where the filter predicate appears and moving the filtered operation (in this case the full tablescan) and any descendent(s) thereof one step to the right, as follows:

-------------------------------------------------------
| Id  | Operation                     | Name  | Rows  |
-------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     9 |
|*  1 |  HASH JOIN                    |       |     9 |
|*  2a|   FILTER                      |       |     9 |
|*  2b|    TABLE ACCESS FULL          | T1    |     8 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |     1 |
|*  4 |     INDEX UNIQUE SCAN         | T2_PK |     1 |
|*  5 |   TABLE ACCESS FULL           | T3    |   157 |
-------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."N1"="T1"."N1")
   2a- filter("T1"."N2"=15 AND  EXISTS (SELECT /*+ PUSH_SUBQ QB_NAME
              ("SUBQ1") */ 0 FROM "T2" "T2" WHERE "T2"."ID"(+)=:B1 AND
              "T2"."N1"(+)=15))
   3 - filter("T2"."N1"(+)=15)
   4 - access("T2"."ID"(+)=:B1)
   5 - filter("T3"."N2"=15)

If I don’t push the subquery (i.e. allow Oracle to execute late) then the plan changes to the following – showing you the filter operation that you might normally expect on a filter subquery:

------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |
|*  1 |  FILTER                      |       |       |
|*  2 |   HASH JOIN                  |       |   173 |
|*  3 |    TABLE ACCESS FULL         | T1    |   157 |
|*  4 |    TABLE ACCESS FULL         | T3    |   157 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |
|*  6 |    INDEX UNIQUE SCAN         | T2_PK |     1 |
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ QB_NAME ("SUBQ1") */ 0 FROM "T2" "T2"
              WHERE "T2"."ID"(+)=:B1 AND "T2"."N1"(+)=15))
   2 - access("T3"."N1"="T1"."N1")
   3 - filter("T1"."N2"=15)
   4 - filter("T3"."N2"=15)
   5 - filter("T2"."N1"(+)=15)
   6 - access("T2"."ID"(+)=:B1)

Here’s an image captured from the OEM screen (11g) for the plan that actually appeared. The run-time action starts with the full tablescan of t1 at the third line of the plan, and for each row we acquire from t1 we run the subquery at the fourth and fifth lines (fifth first, then fourth) – but as you can see, OEM thinks the action starts with a unique scan of index t2_pk at the fifth line of the plan.

Moral: whenever you know there’s a filter subquery in the plan (and you’ll see it in the predicate section) make sure you think carefully about when it runs.

[Pythian Ads Network] Mastering Oracle Trace Data Online Class Reunion

This is a quick announcement that Method-R is organizing the online class reunion for the participants of their Mastering Oracle Trace Data classes. Cary Millsap and Ron Crisco will entertain us with stories and useful tips around processing and analyzing Oracle 10046 traces. Having Method-R done special training for Pythian about a year ago, I [...]

Pitfalls of Using Parallel Execution with SQL Developer

[This post was originally published on 2012/02/29 and was hidden shortly thereafter. I’m un-hiding it as of 2012/05/30 with some minor edits.]

Many Oracle Database users like tools with GUI interfaces because they add features and functionality that are not easily available from the command line interfaces like SQL*Plus. One of the more popular tools from my experiences is Oracle SQL Developer in part because it’s a free tool from Oracle. Given SQL Developer’s current design (as of version 3.1.07.42), some issues frequently show up when using it with Oracle Databases with Parallel Execution. SQL Developer also contains a bug that exacerbates this issue as well.

The Issue

The crux of the issue with SQL Developer (and possibly other similar tools) and Parallel Execution comes down to how the application uses cursors. By default, SQL Developer has the array fetch size set to 50. This means that for any cursor SQL Developer opens for scolling, it will fetch the first 50 rows and when you scroll to the bottom of those rows in a grid, it will fetch the next 50 rows and so on. The array size can be controlled by going into Properties and changing Database -> Advanced -> SQL Array Fetch Size which allows for a max setting of 500 rows. This is good in the sense that the JDBC application can fetch an array of rows with a single JDBC database call, however, using this approach with Parallel Execution, the PX servers used for this cursor will not be released until the cursor is canceled or the last row is fetched. Currently the only way to force reading until the end of cursor in SQL Developer is to issue a Control+End in the data grid. As a result, any action that uses Parallel Execution and has not fetched all the rows or is not canceled/closed, will squat those Parallel Execution resources and prevent them from being used by other users. If enough users have open cursors backed by Parallel Execution, then it is possible that it could consume all of the Parallel Execution servers and will result in Parallel Execution requests being forced to Serial Execution because resources are not available, even if the system is completely idle.

The SQL Developer Bug

When experimenting with SQL Developer for this blog post I also found and filed a bug (bug 13706186) because it leaks cursors when a user browses data in a table by expanding Tables (in the left pane), clicking on a table name and then the Data tab. Unfortunately this bug adds insult to injury if the table is decorated with a parallel degree attribute because the leaked cursors do not release the Parallel Execution servers until the session is closed, thus preventing other sessions from using them.

This bug is easily demonstrated using the SCOTT schema, but any schema or table will do as long as the table has more rows than the array fetch size. For my example, I’m using a copy of the EMP table, called EMP2, which contains 896 rows and was created using the following SQL:

The steps to demonstrate this issue are as follows:

  1. Set up the EMP2 table using the above script or equivalent.
  2. Use SQL Developer to connect to the SCOTT schema or equivalent.
  3. Expand Tables in the Browser in the left pane.
  4. Click EMP2.
  5. Click on the Data tab of EMP2.
  6. Check the open cursors.
  7. Close the EMP2 tab in the right pane.
  8. Check the open cursors.
  9. Goto Step #4 and repeat.

I’m going to repeat this process two times for a total of three open and close operations. I’ll use this query to show the open cursors for the Data grid query for the EMP2 table (adjust if necessary if you are not using my example):

If we look at the output (scott_emp2_cursors.txt below the EM graphic) from the query we’ll see that the first time the EMP2 Data tab is opened, it opens two identical cursors (sql_exec_id 16777216 & 16777217). After closing the EMP2 Data tab, 16777216 is still open. The second time the EMP2 Data tab is opened, two more identical cursors are opened (sql_exec_id 16777218 & 16777219). The third time two more cursors are opened (sql_exec_id 16777220 & 16777221). After closing the tabs we still see three cursors open (sql_exec_id 16777216, 16777218 & 16777220), each of which are squatting two PX servers.

This behavior can also be seen in 11g Enterprise Manager (or dbconsole) on the SQL Monitoring page by sorting the statements by time — notice the (leaked cusor) statements with the green spinning pinwheels after all tabs have been closed (all parallel statements are monitored by default).

Sqlmon Cursor Leak

By the way, the cursor leak applies for tables without a parallel degree setting as well, but has more significant impact if the table is parallel because PX servers are a shared resource.

(scott_emp2_cursors.txt below)

My Thoughts

Obviously the cursor leak is a SQL Developer bug and needs fixing, but in the interim, DBAs should be aware that this behavior can have a global impact because Parallel Execution servers are shared by all database users. Also, if SQL Developer users are running Parallel Queries and keep the results grid open but do not fetch all the rows by using the Control+End functionality, those Parallel Execution servers will be unavailable for other users to use and could negatively impact other users queries leveraging Parallel Execution.

Personally I’d like to see a few enhancements to SQL Developer to avoid these pitfalls:

  1. Disable Parallel Execution for Table Data browsing.

    Browsing data in a table via a scrollable grid is a “small data” problem and does not require the “big data” resources of Parallel Execution. This could easily be done by adding a NOPARALLEL hint when SQL Developer builds the query string.

  2. Add a property with functionality to read all rows w/o requiring the Control+End command (always fetch until end of cursor) or until a max number or rows are read (or a max amount of memory is used for the result set), then close the cursor.

    By fetching until end of cursor or fetching a max number of rows and closing the cursor, the client will release any Parallel Execution resources it may have used. Obviously fetching all rows could be a problem with large result sets and cause SQL Developer to run out of memory and crash, which would be a bad user experience, but not releasing PX resources can easily lead to many bad user experiences.

    I’ve seen the issue of potentially large result sets dealt with in other JDBC based GUI tools that connect to parallel databases by the tool appending a “LIMIT X” clause to queries where the user can control the value for “X” in a property setting. To the best of my knowledge, no other parallel databases support cursors in the way that Oracle does (being able to fetch rows, pause, then resume fetching), so there is no issue there with squatting resources with them (once they start fetching they must continue until the last row is fetched or the statement is canceled). As of release 11.2, Oracle does not support the LIMIT clause but this functionality could be done in the client by using some some upper limit on “array fetch size” * “number of fetches” or wrapping queries with a “select * from ([query text]) where rownum <= X" or similiar.

There are some “clever” server-side ways to deal with this as well, such as adding a logon trigger that disables parallel query if the V$SESSION.PROGRAM name is “SQL Developer”, but a robust, clean client side solution is preferred by myself and likely other DBAs as well. It’s really just a simple matter of programming.

Summary

When using SQL Developer or similar tools, be aware of the potential to squat Parallel Execution resources if the client tool has open cursors. Educate your SQL Developer users on how they can play well with others in an Oracle Database using Parallel Execution by closing unneeded tabs with open cursors. Be aware of the impact of the cursor leak bug in SQL Developer 3.1.07.42 (and possibly previous releases) until it is fixed.

Personally I’d like to see an enhancement to deal with this behavior and I don’t think it would require much programming. It certainly would allow DBAs to feel more confident that SQL Developer is a tool that can be used on production systems and not result in any issues. What are your thoughts? Do my enhancement requests seem reasonable and warranted?

What if Oracle 11gR2 never gets certified on RHEL6/OL6?

I’ve been involved in a number of blog comment, email and twitter exchanges over the last few months about the 11gR2 on RHEL6/OL6 certification issue.

The last time I blogged specifically about it was in October and it’s now over 6 months since Red Hat completed their part in the certification of 11gR2 on RHEL6, yet still no news.

In the course of these conversations I’ve come across a number of ridiculous conspiracy theories, as well as statements from people who know a hell of a lot more about Oracle platform certification than me. It’s worth saying at this point that none of the sources of these ideas are current Oracle employees, so they are not privy to “inside” information. Same goes for me. I’m just another person trying to figure out what is going on.

Here are some of the points from the last few months that stand out to me:

  • Oracle software working on a platform and certifying it on that platform are not the same thing.
  • Platform certification is a labor intensive operation, most of which is the responsibility of the platform vendor.
  • Even though RH have completed their part of the RHEL6 certification process, Oracle have not done the same for OL6. Oracle will *never* let RHEL6 be certified if OL6 is not.
  • Certification of Oracle on OL6 will have an impact on all Oracle appliances and engineered systems currently on sale. All of these systems currently use OL5.x. How long after certification will customers start demanding an OS upgrade?
  • Oracle have no pressing need to certify RHEL6/OL6, since all the performance improvements of the RHEL6 kernel are already in the OL5.x UEK. Oracle are a business and why throw resources certifying an “old” version of the database on a “new” platform when a new Oracle version is just around the corner.
  • Distro version is unimportant on an Oracle server. The kernel is the biggest factor. Most of the software in a Linux distro is useless guff as far as an Oracle installation is concerned. Do you really care what the version of the browser or LibreOffice ships with your server OS?
  • Oracle 12c is currently in beta. The rumor is it will be announced/released at OOW12. Once it is released Oracle will have to go into overdrive to make sure it is certified on all the important platforms and presumably shipping on all their appliances and engineered systems. That is going to be a mammoth task. Do you really see them wasting time on 11gR2 at this point in the DB lifecycle?
  • The support cycle for RHEL and OL has increased to 10 years, so there is no pressing need to upgrade your OS from a support perspective.

Of course, nobody on the outside really knows what is going on and I imagine anyone on the inside would be looking for a new job if they let slip. From this point on I will follow the advice of people far more qualified than me and assume that “Oracle 11gR2 will never be certified on RHEL6/OL6″. If by some fluke it does happen, then it will be a happy surprise.

To end this depressing post on a lighter note, this is one of my recent tweets on the subject…

Cheers

Tim…

PS. I purposely didn’t attribute names to these points. Not everyone wants to be outed to the world, especially when their opinions were expressed via email.




University of Birmingham CS Alumni Dinner…

Every year the computer science department of The University of Birmingham bring together alumni from the department and current students for an informal dinner to allow the students to get a feel for what goes on in the outside world. It’s sort of like career speed dating. Even though I’m not an alumni of the CS department (I did biology), I’ve been asked if I will come to the next one in March. I’m kinda excited and nervous at the same time.

I feel another outing for the oraclenerd t-shirt coming on! :)

Cheers

Tim…