# Oakies Blog Aggregator

## Comparative Window Functions...

I've been known as a huge fan of Analytic functions (as evidenced by the Rock and Roll linkability!)

And - they could be getting better in the near future. Read this document for a proposal to allow analytics to access the current row value to be compared against any other row value in a defined window.

I've already supplied them with my feedback (which started with "this is an awesome idea") - and you can too - by posting it here. They'll be checking back to see what you say.

Also, this is being proposed as well:

Another window function extension, not contained in the attached proposal, is the notion of VALUE based windows. Currently, we have ROW based (or physical) and RANGE based (logical) windows. RANGE window has limitation in that there can only be one sort key in window ORDER BY. On the other hand, ROW based window is agnostic to column value and can be non-deterministic.

The new VALUE based window allows one to include all rows with "n" values before or after the current row's value. For example, VALUE 2 PRECEDING and 3 FOLLOWING would include all rows with 2 values that are prior to current row's value and all rows with 3 values that come after the current row's value in sort order.

ticker txndate volume
orcl 1 10
orcl 2 10 <--------------------------- start of window for (orcl,6,12)
orcl 2 11
orcl 2 11
orcl 3 11
orcl 6 12 <=== assume this is current row
orcl 7 12
orcl 11 11
orcl 11 12
orcl 11 12
orcl 13 11 <------------------------- end of window for (orcl,6,12)

Similar RANGE window would have rows [orcl,6,12] through [orcl,7,12]. Similar
ROW window would include rows [orcl,3,1] through [orcl,11,11].

The VALUE based window would find usefulness when there are gaps in the dataset. For example, a query like "find the intra-day maximum for a stock in the past three trading days". Today, to do this one has to aggregate on trading date and then compute the moving max (in the past 3 days).

VALUE based window can have multiple keys in ORDER BY.

Thanks in advance for any feedback or ideas you might have on this.

On November 6th MyOracleSupport went into production to replace Metalink. MyOracleSupport is build using Flash technology which isn’t totally accessible to visually impaired people who rely on screen-readers. Although Flash can be made accessible, it remains difficult to use in my opinion and I prefer using an HTML interface where available. When logging in to [...]

## Data Modeling

Most readers of the blog are probably DBA's, or do DBA work along with development or other duties.
Though my title is DBA, Data Modeling is something I really like to do.
When first learning Oracle, I cut my teeth on data modeling, and used CASE 5.1 on unix to model a database system. True, CASE 5.0 used an Oracle Forms 3.x based interface, and the GUI modeling was unix only.
That was alright with me, as the Form interface allowed manual changes to be made quite quickly.
And the graphic modeling tool was fairly decent, even on a PC running Hummingbird X Server.
When Designer 2000 came out, it was clearly a more capable tool. Not only did it do everything that CASE 5.1 could do, it could do more. I won't make any silly claim that I was ever able to fully exploit D2K, as it was an end-to-end tool that could do much more than model data and databases.
What it could do with just the databases however was quite good.  Data models could be created, and then a physical database could be generated from the model.
Changes in the database model could be reverse engineered back to the model, and changes in the model could be forward engineered in to the physical model. D2K could truly separate logical and physical models, and allow changes to be migrated back and forth between the two.
There are other high end tools such as Erwin which can no doubt accomplish the same thing, but I have not used them.
One important differentiation for me between D2K and other tools was that D2K worked with Barker Notation, which is the notation I first learned, and the one I still prefer.
I should not speak of Designer 2000 in past tense I guess, as it is still available from Oracle as part of the Oracle Development Suite, but is now called Oracle Designer.  It just hasn't received much attention in the past few years, as I think many people have come to think of data modeling as too much overhead.
I've tried several low end tools in the past few years, and while some claim to separate logical and physical models, those that I have tried actually do a rather poor job of it.
All this leads to some new (at least, new to me) developments from of all places, Microsoft.
Maybe you have heard of Oslo, Microsoft's Data Modeling toolset that has been in development for the past couple years.
If you're just now hearing about it, you will likely be hearing much more. The bit I have read has made me think this will be a very impressive tool.
If you have done data modeling, you have likely used traditional tools that allow you to define entities, drop them on a graphical model, and define relationships.
The tool you used may even have allowed you to create domains that could be used to provide data consistency among the entities.
Oslo is different.
Oslo incorporates a data definition language M. The definitions can be translated to T-SQL, which in turn can be used to create the physical aspects of the model.  M also allows easy creation of strongly typed data types which are carried over into the model.
Whether Oslo will allow round trip engineering ala D2K, I don't yet know.
I do however think this is a very innovative approach to modeling data.
Here are a few Oslo related links to peruse :
You may be thinking that I have given SQL Developer Data Modeler short shrift.
Along with a lot of other folks, I eagerly anticipated the arrival of SQL Developer Data Modeler.
And along with many others, was disappointed to learn that this add on to SQL Developer would set us back a cool \$3000 US per seat.  That seems a pretty steep price for tool that is nowhere near as capable as Oracle Designer, which is included as part of the Oracle Internet Developer Suite. True the price is nearly double that of SQL Modeler at \$5800, but you get quite a bit more than just Designer with the Suite.
As for the cost of Oslo, it's probably too early to tell.
Some reading suggests that it will be included as part of SQL Server 2008, but it's probably too soon to tell.
Why all the talk about a SQL Server specific tool?
Because data modeling has been in a rut for quite some time, and Microsoft seems to have broken out of that rut.  It's time for Oracle to take notice and provide better tools for modeling, rather than upholding the status quo.

## Why We Made Method R

Twenty years ago (well, a month or so more than that), I entered the Oracle ecosystem. I went to work as a consultant for Oracle Corporation in September 1989. Before Oracle, I had been a language designer and compiler developer. I wrote code in lex, yacc, and C for a living. My responsibilities had also included improving other people's C code: making it more reliable, more portable, easier to read, easier to prove, and easier to maintain; and it was my job to teach other people in my department how to do these things themselves. I loved all of these duties.

In 1987, I decided to leave what I loved for a little while, to earn an MBA. Fortunately, at that time, it was possible to earn an MBA in a year. After a year of very difficult work, I had my degree and a new perspective on business. I interviewed with Oracle, and about a week later I had a job with a company that a month prior I had never heard of.

By the mid-1990s, circumstances and my natural gravity had matched to create a career in which I was again a software developer, optimizer, and teacher. By 1998, I was the manager of a group of 85 performance specialists called the System Performance Group (SPG). And I was the leader of the system architecture and system management consulting service line within Oracle Consulting's Global Steering Committee.

My job in the SPG role was to respond to all the system performance-related issues in the USA for Oracle's largest accounts. My job in the Global Steering Committee was to package the success of SPG so that other practices around the world could repeat it. The theory was that if a country manager in, say, Venezuela, wanted his own SPG, then he could use the financial models, budgets, hiring plans, training plans, etc. created by my steering committee group. Just add water.

But there was a problem. My own group of 85 people consisted of two very different types of people. About ten of these 85 people were spectacularly successful optimizers whom I could send anywhere with confidence that they'd thrive at either improving performance or proving that performance improvements weren't possible. The other 75 were very smart, very hard-working people who would grow into the tip of my pyramid over the course of more years, but they weren't there yet.

The problem was, how to you convert good, smart, hard-working people in the base of the SPG pyramid into people in the tip? The practice manager in Venezuela would need to know that. The answer, of course, is supposed to be the Training Plan. Optimally, the Training Plan consists of a curriculum of a few courses, a little on-the-job training, and then, presto: tip of the pyramid. Just add water.

But unfortunately that wasn't the way things worked. What I had been getting instead, within my own elite group, was a process that took many years to convert a smart, hard-working person into a reasonably reliable performance optimizer whom you could send anywhere. Worse yet, the peculiar stresses of the job—like being away from home 80% of the time, and continually visiting angry people each week, having to work for me—caused an outflow of talent that approximately equaled the inflow of people who made it to the tip of the pyramid. The tip of my pyramid never grew beyond roughly 10 people.

The problem, by definition, was the Training Plan. It just wasn't good enough. It wasn't that the instructors of Oracle's internal "tuning" courses were doing a poor job of teaching courses. And it wasn't that the course developers had done a poor job of creating courses. On the contrary, the instructors and course developers were doing excellent work. The problem was that the courses were focusing on the wrong thing. The reason that the courses weren't getting the job done was that the very subject matter that needed teaching hadn't been invented yet.

I expect that the people who write, say, the course called "Braking System Repair for Boeing 777" to have themselves invented the braking system they write about. So, the question was, who should be responsible for inventing the subject matter on how to optimize Oracle? I decided that I wanted that person to be me. I deliberated carefully and decided that my best chance of doing that the way I wanted to do it would be outside of Oracle. So in October 1999, ten years and one week after I joined the company, I left Oracle with the vision of creating a repeatable, teachable method for optimizing Oracle systems.

Ten years later, this is still the vision for my company, Method R Corporation. We exist not to make your system faster. We exist to make you faster at making all your systems faster. Our work is far from done, but here is what we have done:

• Written white papers and other articles that explain Method R to you at no cost.
• Written a book called Optimizing Oracle Performance, where you can learn Method R at a low cost.
• Created a Method R course (on which the book is based), to teach you how to diagnose and repair response time problems in Oracle-based systems.
• Spoken at hundreds of public and private events where we help people understand performance and how to manage it.
• Provided consulting services to make people awesome at making their systems faster and more efficient.
• Created the first response time profiling software ever for Oracle software applications, to let you analyze hundreds of megabytes of data without drudgery.
• Created a free instrumentation library so that you can instrument the response times of Oracle-based software that you write.
• Created software tools to help you be awesome at extracting every drop of information that your Oracle system is willing to give you about your response times.
• Created a software tool that enables you to record the response time of every business task that runs on your system so you can effortlessly manage end-user performance.

As I said, our work is far from done. It's work that really, really matters to us, and it's work we love doing. I expect it to be a journey that will last long into the future. I hope that our journey will intersect with yours from time to time, and that you will enjoy it when it does.

## Latency Hiding

A few weeks ago, James Morle posted an article called "Latency hiding for fun and profit." Latency hiding one of the fundamental skills that, I believe, distinguishes the people who are Really On The Ball from the people who Just Don't Get It.

Last night, I was calling to my 12-year old boy Alex to come look at something I wanted him to see my computer. At the same time, his mom was reminding him to hurry up if he wanted something to eat, because he only had five minutes before he had to head up to his bedroom. "Alex, come here," I told him, putting a little extra pressure on him. "Just a second, Dad." I looked up and notice that he was unwrapping his ready-made ham and cheese sandwich that he had gotten out of the freezer. He dropped it into the microwave and initiated its two-minute ride, and then he came over to spend two minutes looking at my computer with me while his sandwich cooked. Latency hiding. Excellent.

James's blog helped me put a name to a game that I realize that I play very, very often. Today, I realized that I play the latency hiding game every time I go through an airport security checkpoint. How you lay your stuff on the X-ray machine conveyor belt determines how long you're going to spend getting your stuff off on the other side. So, while I'm queued for the X-ray, I figure out how to optimize my exit once I get through to the other side.

When I travel every week, I don't really have to think too much about it; I just do the same thing I did a few days ago. When I haven't been through an airport for a while, I go through it all in my mind a little more carefully. And of course, airport rules change regularly, which adds a little spice to the analysis. Some airports require me to carry my boarding pass through the metal detector; others don't. Some airports let me keep my shoes on. Some airports let me keep my computer in my briefcase.

Today, the rules were:

• I had my briefcase and my carry-on suitcase.
• Boarding pass can go back into the briefcase.
• Shoes off.
• 1-quart ziplock back of liquids and gels: out.
• MacBook: out.

Here's how I put my things onto the belt, optimized for latency hiding. I grabbed two plastic boxes and loaded the belt this way:

1. Plastic box with shoes and ziplock bag.
2. Suitcase.
3. Plastic box with MacBook.
4. Briefcase.

That way, when I cleared the metal detector, I could perform the following operations in this order:

1. Box with shoes and ziplock bag arrive.
2. Put my shoes on.
3. Take the ziplock bag out of the plastic box.
4. Suitcase arrives.
5. Put the ziplock bag back into my suitcase.
6. Box with MacBook arrives.
7. Take my MacBook out.
8. Stack the two boxes for the attendant.
9. Briefcase arrives.
10. Put the MacBook into the briefcase.
11. Get the heck out of the way.

Latency hiding helps me exit a slightly uncomfortable experience a little more quickly, and it helps me cope with time spent queueing—a process that's difficult to enjoy—for a process that's itself difficult to enjoy.

I don't know what a lot of the other people in line are thinking while they're standing there for their 15 minutes, watching 30 people ahead of them go through the same process they'll soon endure, 30 identical times. Maybe it's finances or football or cancer or just their own discomfort from being in unusual surroundings. For me, it's usually latency hiding.

## MetaLink, we barely knew ye

But, we wish we had more time to get better acquainted.

If you work with Oracle, you probably know that MetaLink went the way of the Dodo as part of an upgrade to My Oracle Support during the weekend of November 6th, 2009.

And so far it hasn't gone too well, as evidenced by these threads on Oracle-L:

Many people were lamenting the loss of MetaLink well before its demise, but I don't think any were quite expecting the issues that are currently appearing.

A few have reported that it is working fine for them, but personally, I have found  it unusable all morning.

At least one issue with MetaLink appears to have been cleared up with MOS, that is while I was able to login to it last week.

During a routine audit of who had access to our CSI numbers, I came across a group of consultants that were no longer working for our company, and froze their accounts.  The next day I received a frantic voice mail  from a member of the consulting firm, and he informed me that they had no access to MetaLink because I had frozen their accounts.

I returned the call just a few minutes later, but they had already been able to resolve the issue, as one of their consultants with admin rights had been unaffected, and was able to unfreeze their accounts.
Removing them from the CSI is the better procedure, but in the past when I have attempted to do so, I found that there were still open issues owned by the accounts, and could not remove them. The application owners had been very clear that this access should be removed, so I froze the accounts, so that is what I did on this occasion as well.

This all seemed quite bizarre to me.  This must be a very strange schema in the ML user database, and some strange logic to go along with it.  By granting a user access to a CSI, MetaLink was giving me Carte Blanche to effectively remove them from MetaLink.
How has My Oracle Support fixed this?  Try as I might, I could not find a 'freeze' button in user administration in MOS.  So the fix seems to have been "remove the button"

## The Oracle Wait Interface Is Useless (sometimes) – Part One: The Problem Definition

So here we go, this is part one of this experiment in blogging and co-writing. Tanel has actually written some good stuff already for this, but I wanted to try and formalise things under a common title and make it easier to follow between our sites.

I thought it would be logical to start this process by producing a more concrete problem definition, so that’s the focus of this part. It’s unlikely that we will come up with a complete method in this initial work, but hopefully the wheels will at least turn a little by the end of it!

So first of all, why would I dare to say that the Oracle Wait Interface is useless? Well, partly because I quite like titles that are a little bit catchy, and partly because it is indeed sometimes useless. The emphasis is on the word sometimes, though, because the Oracle Wait Interface is still the single most useful feature in any database product. Wow – that’s quite a claim, isn’t it? This isn’t the place to fully explain why that is, and many others have written great works on this subject already. Check out Cary Millsap’s works, notably his book, Optimizing Oracle Performance, which focuses in great detail on this subject. For the sake of this article, however, here’s why it is so useful: It tells you where the time goes. Think about it: If something is running too slowly, knowing where the time is used up is the single piece of information required to focus on the right subject for tuning.

So what’s wrong with the Oracle wait interface? Just one thing, actually – it is designed to  provide visibility of relatively slow waits. The reason for this is simply that there is a slight overhead in timing every single wait. If that overhead becomes a noticeable proportion of the actual wait itself then the measurement becomes inaccurate (and makes the problem worse). On UNIX-like platforms (yes, that includes Linux), the actual timing interface is implemented using `gettimeofday(2)` system calls, one before the event and one after the event. This call gives microsecond granularity of timing, at least in theory (on my Opteron 280 test machine, gettimeofday() calls take 1.5 microseconds). So, using this kind of mechanism for events that take a relatively long time makes perfect sense – disk I/O, for example, that will take at least three orders of magnitude longer to complete than the timing calls themselves. Conversely, they make no sense at all for calls that take even as little as 50 microseconds, as the 3 microsecond penalty for measuring the wait is 6% of the actual event time itself at that point. There you go, that’s the beginning of the justification that the wait interface is useless,  in a nutshell.

But hang on, isn’t 50 microseconds really, really fast? Well no, actually, it isn’t. Taking Intel’s Nehalem processor (with Quickpath) as an example, a memory latency is around the 50 ns range – three orders of magnitude faster than the 50 microsecond cut-off that I just arbitrarily invented. Memory access is also the slowest thing that a CPU can do (without factoring in peripheral cards) – in this case the CPU has to wait for about 150 cycles while that memory access takes place. So it’s very possible to have a function call that does fairly complex work and is still an order of magnitude or two faster than the gettimeofday() system call.

Time for an example. Actually, this is a variation on the example that made me start thinking about this – I had been perfectly happy with the Oracle Wait Interface until this point for 99% of cases!

OK, so a user just called you, complaining that his query is simply not ever completing. Following the usual course of action, we might take a few samples from v\$session_wait (or v\$session from 10g onwards) to look at the current wait state for the process, just in case it’s something obvious:

```SID EVENT                          PROCESS                  STATE

---- ------------------------------ ------------------------ -------------------
8 SQL*Net message to client      15032                    WAITED SHORT TIME```
Well, that isn’t too revealing. Let’s now turn on extended sql tracing to get a list of all wait events as they transition:
```SQL> oradebug setospid 15033
Oracle pid: 20, Unix process pid: 15033, image: oracle@elise03.sa.int (TNS V1-V3)
SQL> oradebug event 10046 trace name context forever, level 12
Statement processed.```

After a few seconds, let’s see if there is anything in the trace file:

```*** 2009-11-09 10:54:36.934
*** SESSION ID:(8.10393) 2009-11-09 10:54:36.934
*** CLIENT ID:() 2009-11-09 10:54:36.934
*** SERVICE NAME:(SYS\$USERS) 2009-11-09 10:54:36.934
*** MODULE NAME:(sqlplus@elise03.sa.int (TNS V1-V3)) 2009-11-09 10:54:36.934
*** ACTION NAME:() 2009-11-09 10:54:36.934

Received ORADEBUG command (#1) 'event 10046 trace name context forever, level 12' from process 'Unix process pid: 15082, image: '

*** 2009-11-09 10:54:36.935
Finished processing ORADEBUG command (#1) 'event 10046 trace name context forever, level 12'```

No, this is not truncated output – there is nothing in this file at all, apart from the actual invocation of the tracing. OK, so what’s next? How about looking at ‘top’ to see if the process if busy:

```top - 10:53:15 up 2 days, 19:50,  4 users,  load average: 1.04, 0.55, 0.57
Tasks: 148 total,   2 running, 146 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.8%us,  0.4%sy,  0.0%ni, 73.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3995468k total,  3877052k used,   118416k free,   115432k buffers
Swap:  6029304k total,   170388k used,  5858916k free,  3289460k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15033 oracle    25   0 2901m 1.1g 1.1g R 99.9 28.5   3:01.79 oracle
3284 oracle    15   0 2903m  54m  46m S  1.7  1.4  60:19.33 oracle
15071 oracle    16   0 2902m  24m  21m S  1.3  0.6   0:00.04 oracle
15069 oracle    15   0 12740 1112  816 R  0.7  0.0   0:00.07 top
```

Crikey, our process is more or less consuming 100% of a CPU! So, we are not waiting for anything, but the user still has no results, and the process is very busy doing something. Let’s now try the next logical step – let’s truss/strace the process:

```[oracle@elise03 trace]\$ strace -tp 15033
Process 15033 attached - interrupt to quit
11:28:06 gettimeofday({1257766086, 104118}, NULL) = 0
11:28:06 getrusage(RUSAGE_SELF, {ru_utime={2270, 615813}, ru_stime={0, 634903}, ...}) = 0
11:28:06 gettimeofday({1257766086, 104402}, NULL) = 0
11:28:06 getrusage(RUSAGE_SELF, {ru_utime={2270, 615813}, ru_stime={0, 634903}, ...}) = 0
11:28:08 gettimeofday({1257766088, 105930}, NULL) = 0
11:28:08 getrusage(RUSAGE_SELF, {ru_utime={2272, 609510}, ru_stime={0, 634903}, ...}) = 0
11:28:08 gettimeofday({1257766088, 106186}, NULL) = 0
11:28:08 getrusage(RUSAGE_SELF, {ru_utime={2272, 609510}, ru_stime={0, 634903}, ...}) = 0
11:28:10 gettimeofday({1257766090, 110887}, NULL) = 0
11:28:10 getrusage(RUSAGE_SELF, {ru_utime={2274, 611206}, ru_stime={0, 634903}, ...}) = 0
11:28:10 gettimeofday({1257766090, 111143}, NULL) = 0
11:28:10 getrusage(RUSAGE_SELF, {ru_utime={2274, 611206}, ru_stime={0, 634903}, ...}) = 0```
Not much to see here. Those system calls are emitted every two seconds, and are almost certainly just the result of Oracle updating the statistics for the database time model (v\$sess_time_model), but we’re really not doing much at all in terms of system calls. Hang on a minute, the database time model – surely that will help us here?
```SQL> select stat_name,value from v\$sess_time_model where sid=8;
STAT_NAME                                          VALUE
--------------------------------------------- ----------
DB time                                       2802976117
DB CPU                                        2801767067
background elapsed time                                0
background cpu time                                    0
parse time elapsed                                 74853
hard parse elapsed time                            66626
sql execute elapsed time                      2802910770
connection management call elapsed time            21308
failed parse elapsed time                              0
failed parse (out of shared memory) elapsed t          0
hard parse (sharing criteria) elapsed time         62605
hard parse (bind mismatch) elapsed time                0
PL/SQL execution elapsed time                       5377
inbound PL/SQL rpc elapsed time                        0
PL/SQL compilation elapsed time                    18303
Java execution elapsed time                            0
repeated bind elapsed time                             0
RMAN cpu time (backup/restore)                         0

19 rows selected.

SQL>
```

At last, we have some kind of a symptom, and confirmation that the user is indeed actually trying to do useful work rather than just spinning on the CPU. The symptom is that there is an increasing amount of time being allocated as ‘sql execute elapsed time’.  In my opinion, this is where the time model statistics fail to deliver when the opportunity is presented to it. There are just 19 statistics reported here on 11gR2, and the only help we are given from its output is that we are using a lot of DB time, a lot of DB CPU and a lot of sql execute time. So we can surmise that we are doing a very CPU intensive query, and that’s about it – no finer granularity provided and this would seem to be the logical place for such instrumentation…

OK, so we are now just confirmed – we have a nasty query that is just using CPU and never waiting for anything. Let’s not start guessing at this stage what the problem is, let’s try and find out the real answer. At this stage, we might jump to v\$sql_plan_statistics_all to find out what is going on. These statistics are only updated when statistics_level is set to ALL, and even then do not update until the statement is complete. In our little example here, the query is already running – we can’t set statistics_level=all, and we can’t wait until the query completes – it might never do so! Of course, it’s probably OK to get the user to kill the query and restart with statistics_level=all, or even the ctrl-c would allow the stats to update. However, let’s assume neither of those things are possible, as this is only an example case.
So what techniques can we use to find the problem? One of them might be the new SQL Execution Monitoring in 11g (which looks very nice), if you have the required license – but that is a very specific case where the problem is a SQL execution problem. What if it is not a SQL execution problem? We need a more general method for finding the answer. That’s the subject of part two of this blog, over to Tanel for Part Two!

## another (possible) nonsense correlation

I was reading news stories on Reuter's this morning and came across a new study. Researchers have determined that men who work in unchallenging jobs with little control over their future tend to be less active off the job as well.Now, I don't doubt that there is a relationship between a passive work role and the amount of activity someone engages in off the job. However there are a few quotes

## Explaining the number of Consistent Gets

Last week I received an email from a friend, who wishes to remain anonymous, with the question why Oracle needed 8 consistent gets to perform a full table scan on a table where all the rows are stored in just one data block. There are several possibilities that can cause this and that is what [...]

## Starting Oracle Blog

Quite a long time ago I was tempted to start blogging about Oracle and then I decided just not to do that but rather I started to blog about my flying around Europe to present at Oracle conferences. However, I created the blog but never activated. The nomination for Oracle ACE changed this decision and I'll try to write about technical stuff from time to time, but don't expect that I will be so active as some of Oracle bloggers.