Query transformation is a set of techniques used by the optimizer to rewrite a query and optimizer it better. Few optimization paths open up to the optimizer after query transformation. Some query transformations must be costed to be chosen and some do not need to be costed. For example, if a table can be eliminated completely from the join, then that transformation is applied and need to cost that transformation is minimal.
Test case
We will use the following test case to illustrate the concepts behind Query transformation. Some of the optimizations that we see here works from version 11gR1 onwards and so, these test cases might not work in the versions 10g and below.
create table backup.t1 (n1 number not null primary key, n2 number not null, n3 varchar2(256) ); insert into backup.t1 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100); create table backup.t2 (n1 number not null primary key , n2 number not null, n3 varchar2(256) ); alter table backup.t1 add constrainT t1_fk foreign key (n2) references backup.t2(n1) insert into backup.t2 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100); insert into backup.t1 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100);
Join elimination (JE)
In RAC, every data block is mastered by an instance. Mastering a block simply means that master instance keeps track of the state of the block until the next reconfiguration event (due to instance restart or otherwise).
Hash to the master
These data blocks are mastered in block ranges. For example, range of blocks starting from file 10, block 1 through block 128 may be mastered by instance 1, blocks from file 10, block 129 through 256 are mastered by instance 2 etc. Of course, there are differences between various versions 10g, 11g etc, but Idea here is that block ranges are uniformly mastered between various instances so that Global cache grants are evenly distributed among the instances. Interestingly, length of the block range is 128 from 10g onwards (Julian Dyke mentioned that is 1089 in 9i, but I have not personally tested it). Of course, Support recommends you to unset db_file_multiblock_read_count which will be auto adjusted to 128 which means that Full block range can be read with fewer GC messages, I suppose. I digress.
Further, Michael Möller pointed out that this hash-algorithm is further optimized: The hash-algorithm used when initially computing the master node from the DBA, results in a “virtual master”, which is then translated to a real (online&open) master by a lookup table (the length of which is the maximum number of possible nodes (128 ?). This means that when one node goes off/on-line, RAC does NOT have to recalculate the hash for all blocks, but only distribute the new Hash-to-node table. (One can later visualize dynamic remastering as an additional lookup table between the hash value and node. This table also needs redistributing on node changes.)
It is very disappointing to me that I had to cancel my trip to RMOUG training days. I am sick and was not able to catch the flight due to that.
But, I can always share my presentations here. I had two presentations planned in this training day and can be accessed as below:
Advanced RAC troubleshooting
Riyaj_Advanced_rac_troubleshooting_RMOUG_2010_doc
Riyaj_Advanced_rac_troubleshooting_RMOUG_2010_ppt
Why optimizer hates my sql
Riyaj_Why_optimizer_hates_my_sql_2010
RMOUG training days audience: Please accept my sincere apologies.
I will be presenting on two topics in HOTSOS’ 2010, an Oracle performance focused conference, in my home town Dallas, Texas. You can read about my presentation topics in HOTSOS’ 2010 Riyaj . This symposium is a valued conference for performance engineers, DBAs and developers who are interested to know learn about performance. There are many great speakers presenting in this conference and the main page for this conference is HOTSOS ’2010 . BTW, My friend Alex Gorbachev interviewed Gary Goodman and posted a video in his blog also.
Tanel Poder is conducting the HOTSOS training day this year. You can’t miss his training day and I heard that he is working on a MOTS (Mother Of all Tuning Script) and planning to release that in HOTSOS ’2010.
On behalf of the Dallasites, I invite you to visit Dallas and attend this great conference.
I wish you a Happy and prosperous New Year ’2010!
Global cache performance metrics are not correctly measured. It is not understood clearly either. There are even few blogs and web pages disseminating incorrect information. This blog entry is an attempt to offer few methods and scripts to understand global cache performance.
Always review all instances
It is very important to review the performance metrics from all instances in that RAC cluster, not just one instance that you are connected. If you have access to AWR reports, then it is critical to generate AWR reports (or statspack reports) from all instances. But, the problem is that, DBAs tend to generate AWR reports after logging in to each instance iteratively, enter couple of parameters and then reports are generated. Not exactly a convenient practice.
REM connect to each instance separately, type in the beginning snap_id and ending snap_id for each node etc.. sqlplus mydba@proddb1 @$ORACLE_HOME/rdbms/admin/awrrpt.sql exit; sqlplus mydba@proddb2 @$ORACLE_HOME/rdbms/admin/awrrpt.sql exit; sqlplus mydba@proddb3 @$ORACLE_HOME/rdbms/admin/awrrpt.sql exit;
There are few issues with this approach. It is a cumbersome practice if the instance count is higher. In addition to that, all of AWR reports are, in turn, accessing underlying AWR tables. Physically, rows from all instances are together in the same block and so, by executing these reports connecting to various instances, Global cache traffic is increased. If the database is suffering from Global cache (GC) performance issues then generating reports connecting to various instances is probably not a grand idea.
I was reviewing a performance issue with a client recently. Problem is that increased global cache waits causing application slowdown affecting few critical business functions. Using one of my script gc_traffic.sql > and graphing the results with Excel spreadsheet, it is established that there is a marked increase in GC traffic today compared to week earlier. Similar jobs runs every day and so comparing two week days is sufficient to show the GC traffic increase. Graph is between total blocks and AWR snap time in 30 minutes interval. [Click the picture below to review the graph clearly.]
Identifying the object creating this increased GC traffic is essential to identify root cause. We were able to quickly determine that this increase in GC traffic was localized around few SQL statements using ADDM and AWR reports. We decided to focus on one SQL with an obvious increase in elapsed time compared to prior week. So, first question asked, is there a change in the plan? plan_hash_value was reviewed and quickly determined that there is no change in the plan_hash_value.
Little bit of history, there were few changes performed by the vendor over the weekend as a part of few bug fixes. Vendor’s argument was that since there is no change to the plan_hash_value, SQL access plan did not change and so, this can’t be due to vendor changes. My Client’s argument was that there were no changes to the environment and problem started after the application changes.
My earlier blog about shared pool duration got an offline response from one of my reader:
” So, you say that durations aka mini-heaps have been introduced from 10g onwards. I have been using Steve Adams’ script shared_pool_free_lists.sql. Is that not accurate anymore?”
Shared pool free lists
I have a great respect for Steve Adams . In many ways, he has been a great virtual mentor and his insights are so remarkable.
Coming back to the question, I have used Steve’s script before and it is applicable prior to Oracle version 9i. In 9i, sub-heaps were introduced. Further, shared pool durations were introduced in Oracle version 10g. So, his script may not be applicable from version 9i onwards. We will probe this further in this blog.
This is the problem with writing anything about internals stuff, they tend to change from version to version and In many cases, our work can become obsolete in future releases(including this blog!).
In version 9i, each sub-heap of the shared_pool has its own free list. In version 10g and 11g, each duration in sub-heap has its own free list. This is visible through x$ksmsp and column x$ksmsp.ksmchdur indicates the duration that chunk belongs to. In 9i, that column always has a value of 1 (at least, that I have experimented so far). In 10g & 11g (up to 11.1.0.7), there are exactly 4 durations in each sub-heap and values range from 1-4 for this column ksmchdur. Each duration has its own free list.
Shared_pool_free_list.sql script
After reading my earlier post on shared pool A stroll through shared pool heap , one of my client contacted me with an interesting ORA-4031 issue. Client was getting ORA-4031 errors and shared pool size was over 4GB ( in a RAC environment). Client DBA queried v$sgastat to show that there is plenty of free memory in the shared pool. We researched the issue and it is worth blogging. Client DBA was confused as to how there can be ORA-4031 errors when the shared pool free memory is few GBs.
Heapdump Analysis
At this point, it is imperative to take heapdump in level 2 and Level 2 is for the shared pool heap dump. [ Please be warned that it is not advisable to take shared pool heap dumps excessively, as that itself can cause performance issue. During an offline conversation, Tanel Poder said that heapdump can freeze instance as his clients have experienced.]. This will create a trace file in user_dump_dest destination and that trace file is quite useful in analyzing the contents of shared pool heap. Tanel Poder has an excellent script heapdump_analyzer . I modified that script adding code for aggregation at hea, extent and type levels to debug this issue further and it is available as heapdump_dissect.ksh . ( with a special permission from Tanel to publish this script.)
Shared pool review
I presented about various performance myths in my ‘battle of the nodes’ presentation. One of the myth was that how spawning parallel query slaves across multiple RAC instances can cause major bottleneck in the interconnect. In fact, that myth was direct result of a lessons learnt presentation from a client engagement. Client was suffering from performance issues with enormous global cache waits running in to 30+ms average response time for global cache CR traffic and crippling application performance. Essentially, their data warehouse queries were performing hundreds of parallel queries concurrently with slaves spawning across three node RAC instances.
Of course, I had to hide the client details and simplified using a test case to explain the myth. Looks like either a)my test case is bad or b) some sort of bug I encountered in 9.2.0.5 version c) I made a mistake in my analysis somewhere. Most likely it is the last one
. Greg Rahn questioned that example and this topic deserves more research to understand this little bit further. At this point, I don’t have 9.2.0.5 and database is in 10.2.0.4 and so we will test this in 10.2.0.4.
udpsnoop
UDP is one of the protocol used for cache fusion traffic in RAC and it is the Oracle recommended protocol. In this article, UDP traffic size must be measured. Measuring Global cache traffic using AWR reports was not precise. So, I decided to use a dtrace tool kit tool:udpsnoop.d to measure the traffic between RAC nodes. There are two RAC nodes in this setup. You can read more about udpsnoop.d. That tool udpsnoop.d can be downloaded from dtrace toolkit . Output of this script is of the form:
Recent comments
1 week 12 hours ago
1 week 19 hours ago
4 weeks 55 min ago
11 weeks 6 days ago
14 weeks 6 days ago
19 weeks 3 days ago
19 weeks 3 days ago
19 weeks 5 days ago