Search

OakieTags

Who's online

There are currently 0 users and 9 guests online.

Recent comments

Affiliations

Oracle database internals

Query transformation – Part 1

Query transformation is a set of techniques used by the optimizer to rewrite a query and optimizer it better. Few optimization paths open up to the optimizer after query transformation. Some query transformations must be costed to be chosen and some do not need to be costed. For example, if a table can be eliminated completely from the join, then that transformation is applied and need to cost that transformation is minimal.

Test case

We will use the following test case to illustrate the concepts behind Query transformation. Some of the optimizations that we see here works from version 11gR1 onwards and so, these test cases might not work in the versions 10g and below.


create table backup.t1 (n1 number not null primary key, n2 number not null, n3 varchar2(256) );
insert into backup.t1 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100);
create table backup.t2 (n1 number not null primary key , n2 number not null, n3 varchar2(256) );
alter table backup.t1 add constrainT t1_fk foreign key  (n2) references backup.t2(n1)
insert into backup.t2 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100);
insert into backup.t1 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100);

Join elimination (JE)

RAC object remastering ( Dynamic remastering )

In RAC, every data block is mastered by an instance. Mastering a block simply means that master instance keeps track of the state of the block until the next reconfiguration event (due to instance restart or otherwise).

Hash to the master

These data blocks are mastered in block ranges. For example, range of blocks starting from file 10, block 1 through block 128 may be mastered by instance 1, blocks from file 10, block 129 through 256 are mastered by instance 2 etc. Of course, there are differences between various versions 10g, 11g etc, but Idea here is that block ranges are uniformly mastered between various instances so that Global cache grants are evenly distributed among the instances. Interestingly, length of the block range is 128 from 10g onwards (Julian Dyke mentioned that is 1089 in 9i, but I have not personally tested it). Of course, Support recommends you to unset db_file_multiblock_read_count which will be auto adjusted to 128 which means that Full block range can be read with fewer GC messages, I suppose. I digress.

Further, Michael Möller pointed out that this hash-algorithm is further optimized: The hash-algorithm used when initially computing the master node from the DBA, results in a “virtual master”, which is then translated to a real (online&open) master by a lookup table (the length of which is the maximum number of possible nodes (128 ?). This means that when one node goes off/on-line, RAC does NOT have to recalculate the hash for all blocks, but only distribute the new Hash-to-node table. (One can later visualize dynamic remastering as an additional lookup table between the hash value and node. This table also needs redistributing on node changes.)

RMOUG 2010: My presentations

It is very disappointing to me that I had to cancel my trip to RMOUG training days. I am sick and was not able to catch the flight due to that.

But, I can always share my presentations here. I had two presentations planned in this training day and can be accessed as below:

Advanced RAC troubleshooting
Riyaj_Advanced_rac_troubleshooting_RMOUG_2010_doc
Riyaj_Advanced_rac_troubleshooting_RMOUG_2010_ppt

Why optimizer hates my sql
Riyaj_Why_optimizer_hates_my_sql_2010

RMOUG training days audience: Please accept my sincere apologies.

RAC performance tuning: Understanding Global cache performance

Global cache performance metrics are not correctly measured. It is not understood clearly either. There are even few blogs and web pages disseminating incorrect information. This blog entry is an attempt to offer few methods and scripts to understand global cache performance.

Always review all instances

It is very important to review the performance metrics from all instances in that RAC cluster, not just one instance that you are connected. If you have access to AWR reports, then it is critical to generate AWR reports (or statspack reports) from all instances. But, the problem is that, DBAs tend to generate AWR reports after logging in to each instance iteratively, enter couple of parameters and then reports are generated. Not exactly a convenient practice.

  REM connect to each instance separately, type in the beginning snap_id and ending snap_id for each node etc..
   sqlplus mydba@proddb1
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;
   sqlplus mydba@proddb2
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;
   sqlplus mydba@proddb3
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;

There are few issues with this approach. It is a cumbersome practice if the instance count is higher. In addition to that, all of AWR reports are, in turn, accessing underlying AWR tables. Physically, rows from all instances are together in the same block and so, by executing these reports connecting to various instances, Global cache traffic is increased. If the database is suffering from Global cache (GC) performance issues then generating reports connecting to various instances is probably not a grand idea.

Exciting seminars in Dallas arena

If you live in Dallas area, there are two very important seminars coming up:

  1. Tanel Poder is presenting his Advanced Oracle troubleshooting seminar on September 9-11, 2009. .
  2. Jonathan Lewis presenting two day seminar cost based optimization and writing efficient SQL on Oct 28 and Oct 29 2009
  3. .

These two gentlemen needs no introduction. Tanel Poder is an expert in Oracle internals and Jonathan Lewis is a Guru in Cost based optimization and writing efficient SQL.

Tell them I sent ya :-)

Update: I just heard from Tanel and he is unable to make it to Dallas this September. He is rescheduling his seminar and I can’t wait for his seminar. I will update this blog with his rescheduled dates.

Shared pool freelists (and durations)

My earlier blog about shared pool duration got an offline response from one of my reader:
” So, you say that durations aka mini-heaps have been introduced from 10g onwards. I have been using Steve Adams’ script shared_pool_free_lists.sql. Is that not accurate anymore?”

Shared pool free lists

I have a great respect for Steve Adams . In many ways, he has been a great virtual mentor and his insights are so remarkable.

Coming back to the question, I have used Steve’s script before and it is applicable prior to Oracle version 9i. In 9i, sub-heaps were introduced. Further, shared pool durations were introduced in Oracle version 10g. So, his script may not be applicable from version 9i onwards. We will probe this further in this blog.

This is the problem with writing anything about internals stuff, they tend to change from version to version and In many cases, our work can become obsolete in future releases(including this blog!).

In version 9i, each sub-heap of the shared_pool has its own free list. In version 10g and 11g, each duration in sub-heap has its own free list. This is visible through x$ksmsp and column x$ksmsp.ksmchdur indicates the duration that chunk belongs to. In 9i, that column always has a value of 1 (at least, that I have experimented so far). In 10g & 11g (up to 11.1.0.7), there are exactly 4 durations in each sub-heap and values range from 1-4 for this column ksmchdur. Each duration has its own free list.

Shared_pool_free_list.sql script

ORA-4031 and Shared Pool Duration

After reading my earlier post on shared pool A stroll through shared pool heap , one of my client contacted me with an interesting ORA-4031 issue. Client was getting ORA-4031 errors and shared pool size was over 4GB ( in a RAC environment). Client DBA queried v$sgastat to show that there is plenty of free memory in the shared pool. We researched the issue and it is worth blogging. Client DBA was confused as to how there can be ORA-4031 errors when the shared pool free memory is few GBs.

Heapdump Analysis

At this point, it is imperative to take heapdump in level 2 and Level 2 is for the shared pool heap dump. [ Please be warned that it is not advisable to take shared pool heap dumps excessively, as that itself can cause performance issue. During an offline conversation, Tanel Poder said that heapdump can freeze instance as his clients have experienced.]. This will create a trace file in user_dump_dest destination and that trace file is quite useful in analyzing the contents of shared pool heap. Tanel Poder has an excellent script heapdump_analyzer . I modified that script adding code for aggregation at hea, extent and type levels to debug this issue further and it is available as heapdump_dissect.ksh . ( with a special permission from Tanel to publish this script.)

Shared pool review

RAC, parallel query and udpsnoop

I presented about various performance myths in my ‘battle of the nodes’ presentation. One of the myth was that how spawning parallel query slaves across multiple RAC instances can cause major bottleneck in the interconnect. In fact, that myth was direct result of a lessons learnt presentation from a client engagement. Client was suffering from performance issues with enormous global cache waits running in to 30+ms average response time for global cache CR traffic and crippling application performance. Essentially, their data warehouse queries were performing hundreds of parallel queries concurrently with slaves spawning across three node RAC instances.

Of course, I had to hide the client details and simplified using a test case to explain the myth. Looks like either a)my test case is bad or b) some sort of bug I encountered in 9.2.0.5 version c) I made a mistake in my analysis somewhere. Most likely it is the last one :-( . Greg Rahn questioned that example and this topic deserves more research to understand this little bit further. At this point, I don’t have 9.2.0.5 and database is in 10.2.0.4 and so we will test this in 10.2.0.4.

udpsnoop

UDP is one of the protocol used for cache fusion traffic in RAC and it is the Oracle recommended protocol. In this article, UDP traffic size must be measured. Measuring Global cache traffic using AWR reports was not precise. So, I decided to use a dtrace tool kit tool:udpsnoop.d to measure the traffic between RAC nodes. There are two RAC nodes in this setup. You can read more about udpsnoop.d. That tool udpsnoop.d can be downloaded from dtrace toolkit . Output of this script is of the form: