Search

OakieTags

Who's online

There are currently 0 users and 31 guests online.

Recent comments

Affiliations

trace

How Exadata smartscans work

I guess everybody who is working with Oracle databases and has been involved with Oracle Exadata in any way knows about smartscans. It is the smartscan who makes the magic happen of full segment scans with sometimes enormously reduced scan times. The Oracle database does smartscans which something that is referred to as ‘offloading’. This is all general known information.

But how does that work? I assume more people are like me, and are anxious to understand how that exactly works. But the information on smartscans is extremely scarce. Of course there is the Oracle public material, which looks technical, but is little/nothing more than marketing. On My Oracle Support, I can’t find anything on the inner working. Even in the ‘Expert Oracle Exadata’ book (which I still regard as the best source of Exadata related information) there is no material on the mechanics of smartscans.

[Pythian Ads Network] Mastering Oracle Trace Data Online Class Reunion

This is a quick announcement that Method-R is organizing the online class reunion for the participants of their Mastering Oracle Trace Data classes. Cary Millsap and Ron Crisco will entertain us with stories and useful tips around processing and analyzing Oracle 10046 traces. Having Method-R done special training for Pythian about a year ago, I [...]

“Mastering Oracle Trace Data” by Cary Millsap right after the UKOUG Conference in Birmingham

My good friend (and personal hero) Cary Millsap is doing a series of one day classes around the world — Mastering Oracle Trace Data. One of them is conveniently scheduled in Birmingham Thursday next week right after the UKOUG Conference. It’s not far from the Birmingham ICC where the UKOUG Technology and Business Suite Conference [...]

Essential tools for Exadata performance experiments

Like I said in a previous post I have started working in the Exadata performance field, which is really exciting, especially after you get it to work really fast!

Also, finding out what your session is spending time on is important if you are just getting started.

I found the following indispensable tools for Exadata performance analysis:

  • Session Snapper from Tanel Poder, available from his website
  • The Tuning and Diagnostic Pack license
  • The Real Time SQL Monitor package
  • A good understanding of DBMS_XPLAN
  • Oracle 10046 traces
  • collectl – I have blogged about it before here and mention it for reference only

There are probably a lot more than those, but these are the bare essentials. Let’s have a look at them in a bit more detail.

Snapper

I don’t think I have to lose a single word about snapper-it’s an established tool for performance diagnostics tought by Tanel in his advanced troubleshooting course. I use snapper primarily for analysis of smart scans. Below is an example for a session using serial execution to scan a table (best viewed with Firefox):

SQL> select count(1) from order_items;

SQL> @snapper all 5 1 243
Sampling SID 243 with interval 5 seconds, taking 1 snapshots...
setting stats to all due option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
243, SOE       , STAT, session logical reads                                     ,     95.02k,        19k,
243, SOE       , STAT, user I/O wait time                                        ,        331,       66.2,
243, SOE       , STAT, non-idle wait time                                        ,        332,       66.4,
243, SOE       , STAT, non-idle wait count                                       ,      1.51k,      302.2,
243, SOE       , STAT, physical read total IO requests                           ,        776,      155.2,
243, SOE       , STAT, physical read total multi block requests                  ,        746,      149.2,
243, SOE       , STAT, physical read requests optimized                          ,         30,          6,
243, SOE       , STAT, physical read total bytes                                 ,    778.67M,    155.73M,
243, SOE       , STAT, cell physical IO interconnect bytes                       ,    778.67M,    155.73M,
243, SOE       , STAT, consistent gets                                           ,     95.06k,     19.01k,
243, SOE       , STAT, consistent gets from cache                                ,          7,        1.4,
243, SOE       , STAT, consistent gets from cache (fastpath)                     ,          7,        1.4,
243, SOE       , STAT, consistent gets direct                                    ,     95.05k,     19.01k,
243, SOE       , STAT, physical reads                                            ,     95.05k,     19.01k,
243, SOE       , STAT, physical reads direct                                     ,     95.05k,     19.01k,
243, SOE       , STAT, physical read IO requests                                 ,        776,      155.2,
243, SOE       , STAT, physical read bytes                                       ,    778.67M,    155.73M,
243, SOE       , STAT, calls to kcmgcs                                           ,          7,        1.4,
243, SOE       , STAT, file io wait time                                         ,        394,       78.8,
243, SOE       , STAT, Number of read IOs issued                                 ,        776,      155.2,
243, SOE       , STAT, no work - consistent read gets                            ,     95.15k,     19.03k,
243, SOE       , STAT, cell flash cache read hits                                ,         30,          6,
243, SOE       , TIME, DB CPU                                                    ,      1.96s,   391.94ms,    39.2%, |@@@@      |
243, SOE       , TIME, sql execute elapsed time                                  ,         6s,       1.2s,   120.0%, |@@@@@@@@@@|
243, SOE       , TIME, DB time                                                   ,         6s,       1.2s,   120.0%, |@@@@@@@@@@|
243, SOE       , WAIT, direct path read                                          ,       3.3s,   660.48ms,    66.0%, |@@@@@@@   |
243, SOE       , WAIT, kfk: async disk IO                                        ,        5ms,        1ms,      .1%, |          |
--  End of Stats snap 1, end=2011-08-23 16:20:03, seconds=5

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
74% | 1p7y7pvzmxx3y   | direct path read          | User I/O
26% | 1p7y7pvzmxx3y   | ON CPU                    | ON CPU

--  End of ASH snap 1, end=2011-08-23 16:20:03, seconds=5, samples_taken=47

Watch out for the cell smart entries to figure out of a smart scan is happening. In the above example, there wasn’t one.

The same query again, but this time making use of session offloading:

SQL> select /*+full(t)*/ count(1) from order_items t;

SQL> @snapper all 5 1 243
Sampling SID 243 with interval 5 seconds, taking 1 snapshots...
setting stats to all due option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
243, SOE       , STAT, session logical reads                                     ,    427.69k,     85.54k,
243, SOE       , STAT, user I/O wait time                                        ,        165,         33,
243, SOE       , STAT, non-idle wait time                                        ,        167,       33.4,
243, SOE       , STAT, non-idle wait count                                       ,      3.52k,        703,
243, SOE       , STAT, enqueue waits                                             ,         12,        2.4,
243, SOE       , STAT, enqueue requests                                          ,         10,          2,
243, SOE       , STAT, enqueue conversions                                       ,         15,          3,
243, SOE       , STAT, enqueue releases                                          ,         10,          2,
243, SOE       , STAT, global enqueue gets sync                                  ,         25,          5,
243, SOE       , STAT, global enqueue releases                                   ,         10,          2,
243, SOE       , STAT, physical read total IO requests                           ,      4.56k,      911.6,
243, SOE       , STAT, physical read total multi block requests                  ,      4.31k,      862.2,
243, SOE       , STAT, physical read total bytes                                 ,       3.5G,     700.6M,
243, SOE       , STAT, cell physical IO interconnect bytes                       ,      1.52G,    303.95M,
243, SOE       , STAT, ges messages sent                                         ,         12,        2.4,
243, SOE       , STAT, consistent gets                                           ,    427.54k,     85.51k,
243, SOE       , STAT, consistent gets from cache                                ,         75,         15,
243, SOE       , STAT, consistent gets from cache (fastpath)                     ,         75,         15,
243, SOE       , STAT, consistent gets direct                                    ,    427.47k,     85.49k,
243, SOE       , STAT, physical reads                                            ,    427.47k,     85.49k,
243, SOE       , STAT, physical reads direct                                     ,    427.47k,     85.49k,
243, SOE       , STAT, physical read IO requests                                 ,      4.56k,        911,
243, SOE       , STAT, physical read bytes                                       ,       3.5G,    700.36M,
243, SOE       , STAT, calls to kcmgcs                                           ,         75,         15,
243, SOE       , STAT, file io wait time                                         ,     17.45k,      3.49k,
243, SOE       , STAT, cell physical IO bytes eligible for predicate offload     ,      3.49G,       699M,
243, SOE       , STAT, cell smart IO session cache lookups                       ,          5,          1,
243, SOE       , STAT, cell smart IO session cache hits                          ,          5,          1,
243, SOE       , STAT, cell physical IO interconnect bytes returned by smart scan,      1.52G,    303.25M,
243, SOE       , STAT, cell session smart scan efficiency                        ,         -3,        -.6,
243, SOE       , STAT, table scans (long tables)                                 ,          5,          1,
243, SOE       , STAT, table scans (direct read)                                 ,          5,          1,
243, SOE       , STAT, table scan rows gotten                                    ,    129.59M,     25.92M,
243, SOE       , STAT, table scan blocks gotten                                  ,    422.53k,     84.51k,
243, SOE       , STAT, cell scans                                                ,          5,          1,
243, SOE       , STAT, cell blocks processed by cache layer                      ,    501.97k,    100.39k,
243, SOE       , STAT, cell blocks processed by txn layer                        ,    501.97k,    100.39k,
243, SOE       , STAT, cell blocks processed by data layer                       ,     426.9k,     85.38k,
243, SOE       , STAT, cell blocks helped by minscn optimization                 ,    501.91k,    100.38k,
243, SOE       , STAT, cell simulated session smart scan efficiency              ,       3.5G,    699.17M,
243, SOE       , STAT, cell IO uncompressed bytes                                ,       3.5G,    699.17M,
243, SOE       , TIME, DB CPU                                                    ,      3.98s,   796.68ms,    79.7%, |@@@@@@@@  |
243, SOE       , TIME, sql execute elapsed time                                  ,      6.31s,      1.26s,   126.2%, |@@@@@@@@@@|
243, SOE       , TIME, DB time                                                   ,      6.31s,      1.26s,   126.2%, |@@@@@@@@@@|
243, SOE       , WAIT, enq: KO - fast object checkpoint                          ,     2.01ms,      402us,      .0%, |          |
243, SOE       , WAIT, cell smart table scan                                     ,      1.65s,   329.33ms,    32.9%, |@@@@      |
243, SOE       , WAIT, events in waitclass Other                                 ,     5.69ms,     1.14ms,      .1%, |          |
--  End of Stats snap 1, end=2011-08-23 16:21:16, seconds=5

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
66% | adhm3mbjfzysd   | ON CPU                    | ON CPU
34% | adhm3mbjfzysd   | cell smart table scan     | User I/O

--  End of ASH snap 1, end=2011-08-23 16:21:16, seconds=5, samples_taken=44

The enq: KO fast object checkpoint and the “cell smart table scan” are giveaways for a smart scan.

DBMS_XPLAN

Another proven, useful tool in the arsenal of the performance analyst. For Exadata you might want to use the display_cursor function. It can take a sql_id, a child number and a format parameter. I like the format ‘ALL’ best, as it gives me most information about a statement.

Note that even though you might get the word “storage” in the execution plan, it doesn’t mean you actually see a smart scan in the end! Always check the session wait events and session counters for the word “smart” to make sure one is occurring.

SQL> select * from table(dbms_xplan.display_cursor('3xjhbw9m5u9qu',format=>'ALL'))
2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
SQL_ID  3xjhbw9m5u9qu, child number 0
-------------------------------------
select count(*) from order_items where product_id > 10

Plan hash value: 2209137760

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |       |       |   249K(100)|          |
|   1 |  SORT AGGREGATE               |                 |     1 |     4 |            |          |
|*  2 |   INDEX STORAGE FAST FULL SCAN| ITEM_PRODUCT_IX |   338M|  1293M|   249K  (2)| 00:50:00 |
-------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1
2 - SEL$1 / ORDER_ITEMS@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("PRODUCT_ID">10)
filter("PRODUCT_ID">10)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) COUNT(*)[22]

31 rows selected.

This leads me to the next tool, SQL Monitoring, and it rocks.

Real Time SQL Monitoring

This is yet another awesome tool in your repository if you have the license. It allows you to check what’s happening during the execution of a SQL statement, which is more than useful in larger data warehouse style queries.

The easiest and most accessible way is to use OEM Grid Control or database console to view the report. I personally like to run the report in a putty session on a large screen. To make it easier to run the report based on a SQL ID I created the below script “report.sql”:

set trim on long 2000000 longchunksize 2000000
set trimspool on  pagesize 0 echo off timing off linesize 1000
select DBMS_SQLTUNE.REPORT_SQL_MONITOR(
sql_id => '&1',
report_level=>'ALL')
from dual;

With this at hand you can query v$sesssion for a SQL_ID and feed it into the code snippet.

Unfortunately I can’t upload text files to wordpress for security reasons, and posting the output here isn’t possible since the report is rather wide. I decided to run the report with HTML output and type “EM” instead an post the print screens. The command I used is sjhown below:

SQL> select dbms_sqltune.report_sql_monitor(sql_id=>’&1′, type=>’EM’, report_level=>’+histogram’) from dual;

The output of the script would be far too wide so I moved it into the attachement of the EM report type here for an insert /*+ append */ into table select ….

You can use this to even look at statistics gathering-in the below two examples I executed dbms_stats-this report has been created for a stats gathering session with a DOP of 16

What I like best about the monitoring report is that it points out where you are in the execution plan, and it provides an estimate for how much work has already been done. More intelligent people have written lots about using these tools, off the top of my head I’d check Greg Rahn’s as well as Doug Burn’s weblogs for more information.

Oracle 10046 trace

Cary Millsap has written the standard about extended trace files and I strongly recommend reading his book and papers on the subject.

Whilst you can run queries in isolation and easily identify them during testing, the matter gets considerably more difficult as soon as you have to trace a session from a connection pool or other third party application. Most of them are not instrumented making tracing by

Of course you could use the MRTools to enable tracing on a wider scale and then trawl through the trace files with their help, but it might be difficult to get permission to do so on a busy production system.

The lesson learned for me was how to enable tracing in parallel query: this MOS note “Master Note; How to Get a 10046 trace for a Parallel Query [ID 1102801.1]” has the answer. It becomes more difficult if your PQ spans instances, but there are ways around this. You also should consider using DBMS_MONITOR to enable session trace if you can’t use “alter session set events” for some reason. Be sure to read all of the note, especially the bit at the bottom which explains the new 11g interface using “alter session set sql_trace” syntax.

I personally liked setting the tracefile identifier to something more easily identifiable. The ADRCI has a nice option to show trace files-if your ADR is consolidated in a cluster file system you could simply run “adcri > show tracefile -t” to list them all and pick the ones with your custom tracefile identifier.

What about command line utilities?

In the past I liked to use strace to see what a process was doing while “on cpu”. Unfortunately this isn’t really an option with Exadata, as most of the calls are purely network related. After all, RDS is a network protocol. Consider this output from strace during a 10 second query (taken on a database node):

#  strace -cp 29031
Process 29031 attached - interrupt to quit
Process 29031 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
88.85    0.033064           2     14635         5 poll
5.47    0.002035           0     14754      7377 setsockopt
2.68    0.000999         999         1           munmap
2.43    0.000906           0     22075      7321 recvmsg
0.56    0.000209           0      7399           sendmsg
0.00    0.000000           0         8         6 read
0.00    0.000000           0         2           write
0.00    0.000000           0         3           mmap
0.00    0.000000           0        60           rt_sigprocmask
0.00    0.000000           0        15         5 rt_sigreturn
0.00    0.000000           0        30           setitimer
0.00    0.000000           0         1           semctl
0.00    0.000000           0       262           getrusage
0.00    0.000000           0       112           times
0.00    0.000000           0         1           io_setup
0.00    0.000000           0         1           io_destroy
0.00    0.000000           0        23           semtimedop
------ ----------- ----------- --------- --------- ----------------
100.00    0.037213                 59382     14714 total

Not too enlightening-all I/O happens in the cells.

Summary

These are the tools I found very useful in getting started with Exadata performance analysis, and they are very good for the job at hand. Note that snapper can be used for getting information about parallel query and SQL Monitor is a blessing when it comes to looking at huge parallel queries involving lots of tables.

Creating Optimizer Trace Files

Many Oracle DBA’s are probably familiar with what Optimizer trace files are and likely know how to create them. When I say “Optimizer trace” more than likely you think of event 10053, right? SQL code like this probably is familiar then:

alter session set tracefile_identifier='MY_10053';
alter session set events '10053 trace name context forever';
select /* hard parse comment */ * from emp where ename = 'SCOTT';
alter session set events '10053 trace name context off';

In 11g, a new diagnostic events infrastructure was implemented and there are various levels of debug output that you can control for sql compilation. ORADEBUG shows us the hierarchy.