Search

OakieTags

Who's online

There are currently 0 users and 20 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

More on oracle.com

This is just a quick update (basically agreeing with Robin Moffat in the comments here and Dom Brooks here ). One of the blogs I follow is that of the optimizer team. . They’ve recently just released quick update because their excellent white papers have moved – and er there’s no redirection in place. Oracle, [...]

Summer Seminars

I am doing a couple of one day seminars with Oracle University, currently planned for Austria and Switzerland. They go by the title “Grid Intrastructure and Database High Availability Deep Dive”, and can be accessed via these links.

To save you from having to get the abstract, I copied it from the Oracle University website:

Providing a highly available database architecture fit for today’s fast changing requirements can be a complex task. Many technologies are available to provide resilience, each with its own advantages and possible disadvantages. This seminar begins with an overview of available HA technologies (hard and soft partitioning of servers, cold failover clusters, RAC and RAC One Node) and complementary tools and techniques to provide recovery from site failure (Data Guard or storage replication).

In the second part of the seminar, we look at Grid Infrastructure in great detail. Oracle Grid Infrastructure is the latest incarnation of the Clusterware HA framework which successfully powers every single 10g and 11g RAC installation. Despite its widespread implementation, many of its features are still not well understood by its users. We focus on Grid Infrastructure, what it is, what it does and how it can be put to best use, including the creation of an active/passive cold failover cluster for web and database resources. Special focus will be placed on the various storage options (Cluster File System, ASM, etc), the cluster interconnect and other implementation choices and on troubleshooting Grid Infrastructure. In the final part of the seminar, we explore Real Application Clusters and its various uses, from HA to scalability to consolidation. We discuss patching and workload management, coding for RAC and other techniques that will allow users to maximise the full potential of the package.

See you there if you are interested!

Kevin Closson Joins EMC Data Computing Division To Focus On Greenplum Performance Engineering!

Last week the email account associated with my blog amassed no less than 83 emails from readers asking what I’m up to in response to the cliff-hanger I left in my post entitled Will Oracle Exadata Database Machine Eventually Support Offload Processing for Everything?

I appreciate all the email and I regret I was unable to answer any of them as I was taking some time away with my family.

I’ve resigned from my position of the last 4 years as performance architect in Oracle’s Exadata development organization and have joined the EMC Data Computing Division to focus on Greenplum in a performance engineering role.  While this is a big and exciting news piece for me personally, I need to make this a small and quick blog entry at this time.

Filed under: oracle

Tanel Poder's Online Seminars

What are good ways to learn about the inner workings of Oracle to troubleshoot performance and availability issues?

Fortunately there is only one good answer - just one good source - Tanel Poder's virtual seminars at http://tech.e2sn.com/oracle-training-seminars

Why? Let's examine the typical objectives and the various means to accomplish them. I'm sure you have heard this complaint before - "the database is slow". If I had a dime every time I heard it, well ... you know the rest! Most DBAs by now know the next best thing they should do - check the wait interface - V$SESSION. That's a very good first step.

From the view they figured out the session is waiting on cache buffer chain latch, and has been waiting 1234.56 seconds, measured to the precise microseconds and counting. They also got all the other data such as session Id, the statement and so and so forth. But then what? It's like a radio host reporting the stick market - x number of stocks are up, y are down, Dow is up by n, Nasdaq is down by m, and so on. All are factually correct; but none helps you in answering your question - why  your specific portfolio of stocks is down and what other stocks you should consider.

Of course, you have to pay a professional to get that information. In your database issue at the moment, that's what you have to do as well - you have to be a professional to decipher the further information. And that's the vital second step. After you identified what is the ailment, you have to understand how to eliminate that. Most seasoned professionals stop at the first step when the second step is the most important. And that's where you need Tanel's highly acclaimed Advanced Oracle Performance Troubleshooting seminar. It's not based on slides; but showing the demos right in front of you, using code you can understand and reuse.

In this specific example, in the first part of his series, under the section "Latches" he shows you how to get the specifics of the latch, for instance the latch children. There are some traditional fixes, of course; but the big question is what if (and, that is a big IF) these well understood fixes do not work? What do you do next. Pray?

No; turn to Tanel's class. He explains how to get information from various sources inside the database. In this case the trick is to find out who has the latch and who is waiting for it, and what specific latch is so popular. This picture helps (from Tanel's class). Reproduced with permission:

As they say, a picture is worth thousand words. Now that you know how to find that popular latch, you are well on your way to troubleshooting. Hopefully this is all you have to do.

But what if it's not. The problem still eludes a solution? Have no fear. In his hang analysis section, Tanel talks about taking system dumps and explains how to analyze them - again with a few slides; and showing the actual trace file and interpreting the file right in front of you.

Talk about tough luck; suppose the problem is still not solved. Tanel takes a process stack dump. Yes, it is not for the faint of the heart; but with the right training anyone can do it. He shows you how to take the stack trace and analyzes one right in front of you. Here is an example of how to interpret the stack trace (reproduced with permission)

And he explains each section with how to interpret the different data to come to the solution.

At this point no problem is big enough to stand this type of scrutiny. If the problem persists, well, it's most likely a bug then. Oh, yes, Tanel will state the bugs that could affect performance issues and point to the MetaLink notes as well.

Interested? I guess more like excited. I certainly am. Tanels' first class is running April 11-15 http://tech.e2sn.com/oracle-training-seminars. It's a virtual class; so you don't have to step away from wherever you are, in whatever state you are in. As long you have the internet connection and an agile mind to absorb the superb information presented, you are in for a massive dose of superior learning.

What about the scripts Tanel uses in the class? Are they proprietary? Do you have to buy them to use them? Do you need to write them down so fast that you capture all the details? Not at all. As a part of the course, Tanel will expose his entire script library to you.

Intersted? Oops; I lied! Actually the scripts and tools are all free, even if you don't attend his class. They are at http://tech.e2sn.com/oracle-scripts-and-tools. Download to your heart's content.

So, why is Tanel giving away his hard work for free? Well, buying the best tool from Home Depot will not make me a good carpenter; I must learn how to use them to build a deck. A script is only 20% of the solution; the rest 80% is knowledge. And the 80% is what Tanel's virtual seminars are all about. Hope you make the right choice. I know I have.

Speaking at a Virtual Conference #VirtaThon

I was putting together abstracts for Oracle Open World #oow11 this year and remembered something someone had asked me earlier - to present at a conference like this, how does one overcome the fear of delivering a session? In fact, it is a question asked of me several times.

This is not a trivial issue; it's a real problem. There are many folks who are otherwise excellent sources of knowledge, in fact fountains of practical ideas; but when asked to speak in front of a live audience, they would rather kiss a frog than step in from of the podium. The mortal fear of public speaking is one of the many challenges to get good speakers for conferences.

The second challenge is cost. Conferences are conducted at some physical place. Unless you live in that city, or within commutable distance, you have to travel there. Add to the plane fare, cost of hotel, food, rental car, and all that extra expenses the emotional turmoil of being away from the family, especially those folks with small children who would miss one of their parents a is not something you can just brush aside. Even if you are  not a parent, you may be a caregiver to a loved one and your absence will be hard on the cared one.

Finally, the change is not something most people like. You may be comfortable with your present surroundings, among familiar people you work with every day. Traveling to a new city and spending time with strangers may be exciting or daunting based on how you look at it. Perhaps you work from home everyday in your PJs. Getting the wrinkles off the pants to go to the conferences may be a lot. At least to some people.

The answer to all this may be a trend I see developing now - web based conferences. You can attend them in your PJs and speak at them in your PJs as well; no iron needed. One such conference is #VirtaThon (http://bit.ly/hc2Vjh) where several speakers, most of them widely known in the Oracle user community are speaking. Being a virtual conference, you don't have to travel anywhere to attend the sessions; you attend from the privacy and comfort of your own home or office. Remember, they are *not* webcasts; they are virtual conference sessions. So you actually participate in the sessions as you would do in a normal conference - ask questions, interact via chat and have follow ups after talks. If you are a speaker, there is nothing better - you don't have to travel to the venue. You need a computer with an internet connection. If you are uncomfortable speaking to a very live audience, it should be much easier speaking to virtual audience.No traveling, no TSA checks and no red eye flights.

Just to pique your interest, here is just a sampling of the speakers (in no particular order)

  • Dan Hotka
  • Bert Scalzo
  • Riyaj Shamsudeen
  • Syed Jaffer Hussain
  • Steven Fuerurestein
  • Jeremy Schneider
  • Guy Harrison 
  • Brian Huff
  • Lewis Cunningham
  • Mike Ault

And many more. Oh, I am speaking as well.

Interested? Submit an abstract at http://bit.ly/hc2Vjh as soon as possible. The deadline is approaching fast.

Oh, yeas, a little detail. You will not be speaking for free; you will earn money for your efforts. So, what's stopping you?

When Total Information Technology Failures Happen What Do You Do? I Drive for 11 Hours And Then Blog About It. Alaska Airlines / Horizon Air Computer Crash Crashes Spring Break 2011.

As I set out to make this blog entry I considered using the “OT” (off-topic) preface so as to respect readers’ time in case this ended up seeming like a SPAM entry. After typing for a moment I realized this is completely on-topic.  Consider the following quotes from the below-referenced web news pieces (bold font added for effect):

The central computer system for Alaska Airlines […]

We are working to restore the computer system and to accommodate our passengers […]

The computer system is used to plan all flights […]

A statement posted on the airline’s website said technical specialists had made some progress in restoring the system since it first went down at 3 a.m. […]

All of you who are regular readers of this site know why I highlighted certain words in bold font!

Why isn’t there any news yet questioning the obvious lack of business continuity systems, procedures, operations, switch-over to whatever redundant system Alaska Airlines /Horizon Air must certainly have in place?

Here is my take on the computer crash that crashed spring break. Please give it a read and then, perhaps, comment on the DR/BC failure that put this whole blog entry into motion:

Alaska Airlines / Horizon Air Computer Crash Crashes Spring Break

References:

CNN coverage of the Alaska Airlines / Horizon Air Computer Infrastructure Meltdown

http://www.businessweek.com/ap/financialnews/D9M72I4O0.htm

Filed under: oracle

How to Tune an Exadata

Q: How do you tune Exadata?
A: Look for long running queries that are not using Smart Scans and then fix them so they do.

We’ve worked on a bunch of Proof of Concepts (POC’s) for customers over the last year or so. These usually involve loading a few terabytes of data and running a bunch queries or some other workload on the data. Generally speaking, anything we have thrown at Exadata has blown the doors off of the platforms that the applications were previously running on. But occasionally we run into a situation where the speed up is just not what we’ve come to expect. Generally speaking it’s because we’ve done something that has kept Exadata from doing what it does best – Smart Scans. While my lead in is obviously a tongue in cheek over simplification, it is basically true. Unfortunately, it’s not as easy as it sounds to determine whether a statement has been offloaded, because our main tool for looking at how a statement was executed (the execution plan) doesn’t tell us whether a Smart Scan was used or not. So in this post, my intent is to give you a couple of options for determining whether Smart Scans are happening or not. Here’s a quick example showing a basic execution plan:

 
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
 
 
INSTANCE_NAME    STARTUP_TIME      CURRENT_TIME         DAYS    SECONDS
---------------- ----------------- ----------------- ------- ----------
SANDBOX1         24-MAR-2011 16:19 25-MAR-2011 22:57    1.28     110283
 
SYS@SANDBOX> 
SYS@SANDBOX> set timing on
SYS@SANDBOX> @avgskew3
SYS@SANDBOX> select avg(pk_col) from kso.skew3
  2  where col1 > 0
  3  /
 
AVG(PK_COL)
-----------
 16093750.2
 
Elapsed: 00:00:34.80
SYS@SANDBOX> select sql_id, sql_text from v$sql
  2  where sql_text like 'select avg(pk_col) from kso.skew3 where col1 > 0';
 
SQL_ID        SQL_TEXT
------------- ----------------------------------------------------------------------
4p62g77m9myak select avg(pk_col) from kso.skew3 where col1 > 0
 
Elapsed: 00:00:00.14
SYS@SANDBOX> @dplan
Enter value for sql_id: 4p62g77m9myak
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  4p62g77m9myak, child number 0
-------------------------------------
select avg(pk_col) from kso.skew3 where col1 > 0
 
Plan hash value: 2684249835
 
------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       |   535K(100)|          |
|   1 |  SORT AGGREGATE            |       |     1 |    11 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW3 |   383M|  4028M|   535K  (1)| 01:47:02 |
------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - storage("COL1">0)
       filter("COL1">0)
 
 
20 rows selected.
 
Elapsed: 00:00:00.22

The storage line in the predicate section indicates that a Smart Scan is possible, but it doesn’t actually tell us that one occurred. So how can you tell. Well there are several ways.

  1. You can Millsap it. (generate a 10046 trace)
  2. You can Poder it. (use Tanel Poder’s snapper script to check stats and wait events)
  3. You can Wolfgang it. (generate a 10053 trace) – well actually this doesn’t work since the optimizer doesn’t know whether a statement will do a Smart Scan or not.
  4. Or you can look in v$sql – I wrote a little script called fsx.sql (short for Find_Sql_eXadata.sql) to do that.

I think that tracing is the most foolproof way to verify a Smart Scan (just look for “cell smart table/index scan” wait events). But it can be a little cumbersome to generate a trace and then find it. (Note: Method-R has a great tool to make this easier called MR Trace which is a plug in for Oracle’s SQL Developer). Tanel’s snapper script is an awesome tool that is very versatile – so it’s a very valid option as well. But both of these methods depend on the fact that you can catch the statement of interest while it is executing. They provide no way of looking back at statements that ran in the past. My fsx script is not nearly as comprehensive as either of these approaches, but it has an advantage in that it looks at values stored in v$sql (which are also captured in AWR by the way). This allows us to do analysis that is not limited to what is happening right now. (i.e. we don’t have to catch the query while it’s running).

So how does it work?

The v$sql view contains a column (IO_CELL_OFFLOAD_ELIGIBLE_BYTES) which tells us whether this child has been offloaded or not. Very simply, if the column contains a value greater than 0, then the statement was processed with a Smart Scan. Here’s the basic trick:

   decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,'No','Yes') Offload 

So basically the fsx script just does a decode on the column and if it’s value is 0 then it returns ‘No’, otherwise it returns ‘Yes’. The script does a few other things too like attempting to calculate the savings in terms of reduced volume of data being transferred across the InfiniBand fabric that the Smart Scan was responsible for. I called the output column IO_SAVED_% although it’s not really I/O, it’s bytes transferred between the storage layer and the database layer. There are some situations where the results of this calculation don’t seem to make much sense, by the way. But that’s too much to go into here. (There are more details in our upcoming Apress Exadata Book if you’re so inclined.) Anyway, here’s a quick example of using the fsx script.

SYS@SANDBOX> @fsx
Enter value for sql_text: select avg(pk_col) from kso.skew3%
Enter value for sql_id: 
 
SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
4p62g77m9myak      0 2684249835      2      18.31      0 Yes          71.85 select avg(pk_col) from kso.skew3 where col1 > 0

So that’s pretty easy and straight forward. You can enter a bit of a SQL statement’s text or a SQL_ID or both to locate statements of interest in v$sql. The script’s output will tell you whether it was offloaded or not. But what if we want to get a feel for how the whole system is running with regard to Offloading. We could simply add a couple of additional where clauses to the fsx script to allow us to limit the rows returned based on whether the statements were offloaded or not and maybe add a filter on average execution time as well so we can just look at the long running statements. Have a look at fsxo.sql which does just that.

SYS@SANDBOX> @fsxo
Enter value for sql_text: 
Enter value for sql_id: 
Enter value for min_etime: 10
Enter value for offloaded: 
 
SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
09m6t5qpgkywx      0 1885411402      1     116.79      0 No             .00 select /*+ bloom join 2  use_hash (skew temp_skew) */ a.col2, sum(a.co
1nfa7trushhpm      0 2684249835      2      15.87      0 Yes          74.60 select /*+ result_cache */ avg(pk_col) from kso.skew3 where col1 > 1
35q8ahgw2xhsp      0 3734762968      1      37.12      0 Yes          34.22 select /*+ bloom join  use_hash (skew temp_skew) */ a.col2, sum(a.col1
4p62g77m9myak      0 2684249835      2      18.31      0 Yes          71.85 select avg(pk_col) from kso.skew3 where col1 > 0
5zruc4v6y32f9      0          0      2     362.05      0 No             .00 DECLARE job BINARY_INTEGER := :job;  next_date TIMESTAMP WITH TIME ZON
b6usrg82hwsa3      0          0      2     305.30      0 No             .00 call dbms_stats.gather_database_stats_job_proc (  )
 
6 rows selected.
 
SYS@SANDBOX> @fsxo
Enter value for sql_text: 
Enter value for sql_id: 
Enter value for min_etime: 
Enter value for offloaded: YES
 
SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
0qa98gcnnza7h      0  568322376      2       3.56      0 Yes        -905.77 select avg(pk_col) from kso.skew where col1 > 0
1nfa7trushhpm      0 2684249835      2      15.87      0 Yes          74.60 select /*+ result_cache */ avg(pk_col) from kso.skew3 where col1 > 1
266gctwscrnn2      0  568322376      3       1.06      0 Yes        -805.04 select /*+ result_cache */ avg(pk_col) from kso.skew where col1 > 1
2uzgbm8azqqv3      0 2974987230      2       1.56      0 Yes          71.79 select avg(pk_col) from kso.skew_encrypt where col1 > 0
35q8ahgw2xhsp      0 3734762968      1      37.12      0 Yes          34.22 select /*+ bloom join  use_hash (skew temp_skew) */ a.col2, sum(a.col1
4p62g77m9myak      0 2684249835      2      18.31      0 Yes          71.85 select avg(pk_col) from kso.skew3 where col1 > 0
d15cdr0zt3vtp      0   62424106      1        .31      0 Yes          99.96 SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT', 'YYYY-MM-DD HH24:
 
7 rows selected.
 
SYS@SANDBOX> @fsxo
Enter value for sql_text: %skew%
Enter value for sql_id: 
Enter value for min_etime: 5
Enter value for offloaded: 
 
SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
09m6t5qpgkywx      0 1885411402      1     116.79      0 No             .00 select /*+ bloom join 2  use_hash (skew temp_skew) */ a.col2, sum(a.co
1nfa7trushhpm      0 2684249835      2      15.87      0 Yes          74.60 select /*+ result_cache */ avg(pk_col) from kso.skew3 where col1 > 1
35q8ahgw2xhsp      0 3734762968      1      37.12      0 Yes          34.22 select /*+ bloom join  use_hash (skew temp_skew) */ a.col2, sum(a.col1
399m90n8jzpu6      0 1923773943      2       5.09      0 No             .00 select avg(pk_col) from kso.skew
4p62g77m9myak      0 2684249835      2      18.31      0 Yes          71.85 select avg(pk_col) from kso.skew3 where col1 > 0

So the idea is to be able to take a high level look at what’s being offloaded and what’s not. Obviously you can do the same thing with AWR data for a longer look back in history. But I’ll leave that as an exercise for the reader. Note that my lab system was recently bounced and so the number of statements is very small. On larger production systems you will probably want to limit yourself to longer running statements.

By the way, there is another tool that can be very helpful in determining if a statement was offloaded, DBMS_SQLTUNE.REPORT_SQL_MONITOR. But I’ll have to save that for another post.

My Secret iPad App Addictions…

When using my iPad, I spend the vast majority of my time in a browser. I don’t have many apps and most of the ones I have I don’t use regularly, but I have a couple that have become a secret addiction…

We City – I love this app. Like all these “earn money and build stuff” apps, you don’t have to spend a lot of time on each visit, but you tend to come back several times a day. I guess I check in about 10 times a day, each time for about 1 minute. I’ve found myself taking my iPad round to friends houses so I can check in a couple of times when I’m there, under the guise of checking my emails. Truly pathetic I know, but that’s addictions for you. :)

GarageBand – I’ve been hearing people rave about GarageBand for ages, but I always assumed it was some sort of Guitar Hero game so I ignored it. It was all a bit random, but the other day I installed it to see what all the fuss was about and it is totally awesome. I was in bands at University, but it’s been years since I’ve done anything musical. I pick up a guitar form time to time, but I can barely play anymore. Anyway, a few minutes with this app and you can start building up pretty cool sounding tunes. You can play the instruments themselves, throw in some autoplay stuff for the instruments you don’t play and even use some of the predefined loops. If you get the right connectors, you can even plug in real guitars, keyboards and mics. I don’t think you’ll be hearing me on the radio any time soon, but it’s great fun and well worth the £2.99 I paid for it. I hadn’t realized I already have GarageBand installed on my MacBook Pro. I guess I should take a look and see what that can do that the iPad app can’t, but to be honest, it’s more fun messing with the iPad than sitting at a keyboard.

Cheers

Tim…




Ambiguity

“There is no space problem.”

If you saw this comment in the middle of a thread about some vaguely described Oracle problem, which of the following would you think was the intended meaning:

    There is a problem – we have no space.
    We do not have a problem with space

Wouldn’t it make life so much easier to choose between:

    We are not seeing any Oracle errors.
    We are seeing Oracle error: “ORA-01653: unable to extend table X by N in tablespace Z”

(That’s just one of many possible space-related errors, of course.)

A Marathon...

If you don't see me updating asktom too much in the next two weeks - it'll be because:

then I'm off to
And then you'd think I'd be heading home - but not quite - I have a quick stop in Orlando Florida on Monday April 11th for IOUG Collaborate and then I go home :)