Oakies Blog Aggregator

a formula for failure (or an expensive redesign)

If 'Premature optimization is the root of all evil.'then 'Premature automation is the propagator of many evils.'else 'Failure to optimize is the abyss.'end;

Excited about NoCOUG Winter Conference

NoCOUG is hosting its winter conference next week – On February 11th.
As usual, we’ll have the best speakers and presentations ever. This time I’m extra happy because two of the speakers that are going to be there, Dr. Neil Gunther and Robyn Sands, are there because I was wowed by them in a previous conference and asked our Director of Conference Programming to invite them. And they agreed! I believe it is the first time that either of them presents at NoCOUG and I’m very excited about this.

I’m sure I don’t need to introduce Robyn Sands to any Oracle professional – She’s an OakTable member who talks a lot about the right ways to manage performance. She is very scientific and precise but she gives very practical advice that is very applicable.

Dr. Neil Gunther is a well known performance expert. So well known that he has his own Wikipedia article. I first ran into his work when I did performance testing work, something like 6 years ago. From his articles, I learned the importance of having performance models without which you cannot interpret your results and know when your tests were faulty. I ran into him again when Tanel Poder mentioned that Dr. Neil Gunther is now doing work that will be relevant to Oracle professionals. He appeared in HotSos few years back and now we get to see him at NoCOUG – with both a keynote session and a technical session. He invited the crowds to ask questions at his blog, so you can participate.

In addition to these two prestigious names, we have few local celebrities giving presentations: Ahbaid Gaffoor, lead DBA at Amazon, will show his make-based deployment methodology. If you don’t have a deployment methodology, this presentation is a must-see. Maria Colgan will give a presentation about data loading for data warehouses. Although she’s an Oracle presenter, which sometimes means “marketing”, Maria is smart and knowledgeable and if you are doing data warehouse work – she is worth listening to.

I’ll be presenting “What Every DBA Should Know About TCP/IP Networks”. The presentation is about network problems I’ve had to solve in the last year and how I solved them with some basic knowledge of networks, a packet sniffer and an envelope. If you ever wondered how to make your network admin take you seriously, how to get more bang from your bandwidth and whether or not you should care about your SDU, you should definitely show up.

I’m looking forward to meeting old and new friends in the conference. Its going to be a blast.

DEVCON Luzon 2010

I just recently I became a member of the PSIA Tech Council… The company I’m working for is a member of PSIA which makes up 90% of the country’s software sector promoting the growth and global competitiveness of the Philippine software industry, also an active partner of the government and academe in implementing programs that benefit the industry.

The PSIA, PSIA Tech Council, together with the Awesome and Cool sponsors will be having the Luzon leg of DEVCON here in Manila!

Below are the details of this awesome event:

09 February 2010, 4-9pm, SMX Convention Center Function Room 1

Sync. Support. Succeed.

Get together to be connected, enhance skills and support each other to achieve success!

Designed to be a premier gathering of all Filipino software engineers, DEVCON facilitates collaboration, interaction and mentoring among leading practitioners of the Philippine software industry. DEVCON adapts global best practices for skills improvement and professional advancement among Filipino software engineers. It features three main elements which has successful formats used in international technology gatherings:

> Lightning Talks – a fast-paced presentation on any topic of interest
> Birds of a Feather – a dynamic discussion of opposing perspectives on mutual topics
> Hackathon – providing rapid learning of a new technology through hands-on demonstration or joint coding onsite

Register online for your FREE seat. </p />
</p></div>

    	  	<div class=

Oracle Peformance Visualization…

Coskan Gundogar and Karl Arao have written two interesting articles about Oracle performance analysis and visualization, check these out!

Coskan’s article:

Karl’s article:

Note that in March I will be releasing PerfSheet v3.0, which will have lots of improvements! ;-)

Share/Bookmark

Bind Variable Peeking – execution plan inefficiency

In my Beyond Oracle Wait interface article I troubleshooted a test case where an execution plan somehow went “crazy” and started burning CPU, lots of logical IOs and the query never completed.

I have uploaded the test case I used to my new website, to a section where I will upload some of my demo scripts which I show at my seminars (and people can download & test these themselves too):

http://tech.e2sn.com/oracle-seminar-demo-scripts

Basically what I do is this:

  1. I run the query with bind variable values where only a handful of rows match the filter condition. Thus Oracle picks nested loop join (and indexed access path)
  2. Then I run the same query with different bind values, where a lot of rows match the filter condition. Oracle reuses existing execution plan (with nested loops!!!). Oracle ends up looping through a lot of blocks again and again (because nested loop visits the “right” side of the join once for every row coming from the “left” side of the join).

Using nested loops over lots of rows is a sure way to kill your performance.

And an interesting thing with my script is that the problem still happens in Oracle 11.1 and 11.2 too!

Oracle 11g has Adaptive Cursor Sharing, right? This should take care of such a problem, right? Well no, adaptive bind variable peeking is a reactive technique – it only kicks in after the problem has happened!

So feel free to download the script, review it and test it out!

Share/Bookmark

Small Intermezzo Towards Miracle OpenWorld 2010

Created a small YouTube introduction movie as requested by Moans for the Miracle OpenWorld 2010 event this year. I am happy they more or less resolved their issues with Oracle Legal regarding the naming for the event (see the movie and news section on the Miracle OpenWorld 2010 main page).

Anyway, made a small attempt (in HD) to tell a bit about XML on YouTube.

As said:

    “An exploration in mending and gluing things together, while it absolutely doesn’t make any sense what I am saying, but then again, who cares…its bogus anyway…”

Hope to CU around somewhere this year.

:-)

Tuning the Application, Database and Hardware


Tuning Example in 3 parts



part 1 - "The Database is hanging!" AKA "the application has problems, good grief"


I wonder if you can imagine, or have had the experience of the application guys calling with anger and panic in their voices saying "the database is sooo slow, you've got to speed it up."
What's your first reaction? What tools do you use? How long does it take to figure out what's going on?
Let's take a look at how it would work with DB Optimizer. When I get a call like this I take a look at the database with DB Optimizer:



I can clearly see that the database is not bottlenecked and there must be a problem on the application.
Why do I think it's the application and not the database? The database is showing plenty of free CPU in the load chart, the largest chart, on the top, in the image above. In the load chart, there is a horizontal red line. The red line represents the number of CPU's on the system, which in this case is 2 CPUs. The CPU line is rarely crossed by bars which represent the load on the database, measured in average number of sessions. The session activity is averaged over 5 samples over 5 seconds, thus bars are 5 seconds wide. The bars above fall mostly about 1 average active session and the bars are rarely green. Green represents CPU load. Any other color bar indicates a sessions waiting. The main wait in this case is orange, which is log file sync, ie waits for commits. Why is the database more or less idle and why are most of the waits we do see for "commit"? I look at the code coming to the database and see something like this:
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
Doing single row inserts and committing after each is very inefficient. There is a lot of time wasted on network communication which is why the database is mainly idle, when the application thinks it's running full speed ahead, it is actually waiting mainly on network communication and commits. If we commit less and batch the work we send to the database, reducing network communications, we will run much more efficiently. Changing the code to
begin
for i in 1..1000 loop
insert into foo values ('a');
-- commit;
end loop;
end;
/
commit;
improves the communication delay and now we get a fully loaded database but we run into database configuration issues.


Part 2 It *is* the database (ie DBA get to work)


In the above DB Optimizer screen, the same workload was run 4 times. We can see that the time (width of the load) reduced, and the percent of activity on CPU increased.
Runs:
1. "log file sync" , the orange color, is the biggest color area, which means uses are waiting on commits, still even though we are committing less in the code. In this case we moved the log files to a faster device. (you can see the checkpoint activity just after run 1 where we moved the log files)
2 "buffer busy wait" , the burnt red, is the biggest color area. We drilled down on the buffer busy wait event in the Top Event section and the details tells use to move the table from a normal tablespace to an Automatice Segmenet Space Managed tablepace.
3."log file switch (checkpoint incomplete)" , the dark brown, is the largest color area, so we increased the size of the log files. (you can see the IO time spent creating the new redo logs just after run 3 )
4. the run time is the shortest and all the time is spent on the CPU which was our goal - ie to take advanteage of all the processors and run the batch job as quickly as possible.


Part 3 It's the machine (rock paper scissors)
Now that the application is tuned and the database is tuned let's run a bigger load:

We can see that the CPU load is constantly over the max CPU line. How can we have a bigger CPU load than there are actually CPUs on the machine? Because, this actually means that the demand for CPU is higher than the CPU available on the machine. In the image above there are 2 CPUs on the machine but and average of 3 users who think they are on the CPU, which means that on average 1 users is not really on the CPU but ready to run on the CPU and waiting for the CPU.
At this point we have two options - in this case we are only running one kind of load, ie the insert. For inserts we can actually go even further tuning this insert and use Oracle's bulk load commands:
declare
TYPE IDX IS TABLE OF Integer INDEX BY BINARY_INTEGER;
MY_IDX IDX;
BEGIN
for i in 1..8000 loop
MY_IDX(i):=1;
end loop;
FORALL indx IN MY_IDX.FIRST .. MY_IDX.LAST
INSERT INTO foo ( dummy )
VALUES ( MY_IDX(indx) );
COMMIT;
end;
/

But if this was an application that had a lot of different SQL and the SQL load was well distributed across the system then we'd have a case for adding more hardware to the system. Making the decision to add more hardware can be a difficult decision because in general the information to make the decision is unknown, unclear or just plain confusing, but DB Optimizer makes it easy and clear, which can save weeks and months of wasteful meetings and debates. For example

If we look in the bottom left, there is no SQL that takes up a significant amount of load, ie there is no outlier SQL that we could tune and gain back a lot of wasted CPU. We'd have to tune many many SQL and make improvements on most of them to gain back enough CPU to get our load down below the max CPU line. In this case, adding CPUs to the machine might be the easiest and most cost affective solution.
Conclusion:
With the load chart we can quickly and easily identify the bottlenecks in the database, take corrective actions, and see the results. IN part 1 we had an application problem, in part 2 we had 3 database configuration issues and in part 3 we had a hardware sizing issue. In all 3 chapters DB Optimizer provides a clear and easy presentation of the data and issues making solutions clear.


Upcoming Presentation Slots, Spring 2010

I will be presenting the Oracle Wait Interface is Useless Dog and Pony show a couple of times this spring. It was first presented at the UKOUG 2009 with Tanel Poder, and will be seeing the light of day at the following events:

I hope to see some of you there, and will get the final part posted before RMOUG – so the picture will be complete. If you’ve got any questions, it’s a good opportunity to come and heckle at one of those events!

Craig Shallahamer is now blogging!

Craig Shallahamer is now blogging… check out his initial post here!

Earlier I had a few mail exchanges with him and got to check his new work.. called the Stress Identifier. Hmm.. so how it will be helpful? Well if you have already characterized your workload, you may want to definitively determine which internal database activity is stressing your CPU, IO, and network subsystems. So this tool will avoid the guess work of selecting that “unit of work”… check out the link for more details…

And most of all.. grab a copy of his new book Oracle Performance Firefighting… which got lots of good stuff on performance </p />
</p></div>

    	  	<div class=

Hung database and can't connect?

I can be exasperating when the database is slow or hanging and I can't even connect to see what's happening. In those cases, here is a trick from Tanel Poder:

If you have such an instance hang you can use a preliminary connection (which starts the process and attaches to SGA, but doesn't initialize SGA structs nor allocate any state objects):

sqlplus -prelim "/as sysdba"

oradebug dump latches 1
oradebug dump hanganalyze 4

Here's a good write up by Arup Nanda