Search

Top 60 Oracle Blogs

Recent comments

August 2011

New create_1_hint_sql_profile.sql

I modified my create_1_hint_sql_profile.sql script (which I blogged about here: Single Hint Profiles) to allow any arbitrary text sting including quote characters. This is a script that I use fairly often to apply a hint to a single SQL statement that is executing in a production system where we can’t touch the code for some reason. For example, it’s sometimes useful to add a MONITOR hint or a GATHER_PLAN_STATISTICS hint to a statement that’s behaving badly so we can get more information about what the optimizer is thinking. I recently updated the script to allow special characters in the hint syntax. This feature is useful when you want to add something like an OPT_PARAM hint that takes quoted arguments.

dbms_xplan (4)

This little note on how dbms_xplan behaves was prompted by a very simple question on OTN which raised a point that I often manage to forget (temporarily). I’ve chosen to explain it through a little demonstration. 

Session 1 – cut-n-paste (with minor cosmetic changes):

SQL> select max(n2) from t1 where n1 = 15;

   MAX(N2)
----------
        15

1 row selected.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------
SQL_ID  b8ud16xgnsgt7, child number 0
-------------------------------------
select max(n2) from t1 where n1 = 15

Plan hash value: 269862921

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |    15 |   120 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=15)

20 rows selected.

Session 2 – cut-n-paste:

SQL> alter session set workarea_size_policy = manual;

Session altered.

SQL> select max(n2) from t1 where n1 = 15;

   MAX(N2)
----------
        15

1 row selected.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------
SQL_ID  b8ud16xgnsgt7, child number 1
-------------------------------------
select max(n2) from t1 where n1 = 15

Plan hash value: 269862921

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |    15 |   120 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=15)

20 rows selected.

SQL>

Because I’ve changed the optimizer environment for the second session Oracle has created a second child cursor for query – even though the execution plan turned out to be exactly the same. (The fact that you can get two child cursors  with the same plan sometimes surprises people, but it’s not a rare occurrence.) You’ll notice that the two sessions report different values for child number.

So let’s use a third session to find the plans for the sql_id that the previous outputs show: b8ud16xgnsgt7. Here’s the complete cut-n-paste (again with minor cosmetic changes):

SQL> select * from table(dbms_xplan.display_cursor('b8ud16xgnsgt7'));

PLAN_TABLE_OUTPUT
--------------------
SQL_ID  b8ud16xgnsgt7, child number 0
-------------------------------------
select max(n2) from t1 where n1 = 15

Plan hash value: 269862921

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |    15 |   120 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=15)

20 rows selected

Question: We’ve got the plan for child number 0, what happened to child number 1 ?
Answer: We didn’t ask for it.

The default value for the second parameter to display_cursor() is zero. If you want to see all the available plans for a given sql_id, you need to supply an explicit null to the call, viz:

SQL> select * from table(dbms_xplan.display_cursor('b8ud16xgnsgt7',null));

(I won’t bother to cut-n-paste the output – it just lists the two plans one after the other in child order, reporting a total of 40 rows selected.)

If you want to read other notes that make significant points about dbms_xplan, there’s an entry for it in the Categories drop-down list to the top right of the screen.

Facebook Adopts Oracle Exalogic Elastic Cloud, Replaces Tens Of Thousands Of Servers. Memorable Conference Keynote Addresses.

BLOG UPDATE 2011.08.17: In the original I misstated the Facebook page view rate to be 260,000,000 per day when it is in fact 260,000,000,000 per month. I used the correct per-second rate anyway so this update does not change the original in that regard.

BLOG UPDATE 2011.08.16: The blog title is a come-on.

Conference Keynotes: One Year Ago versus Eleven Years Ago
I spent some time rummaging through my scrapbook over the weekend. I stumbled on a program for the IBM Partnerworld 2000 conference. It might be difficult to see, but I scanned it in to show that on the left hand side there is an announcement of the keynote session with (former) U.S. President George H.W. Bush and on the bottom of the right hand page is an announcement of the session I was offering the same day. That was a really cool conference! The audience (certainly me included) thoroughly enjoyed the keynote. That was eleven years ago but I recall that my session was a bunch of blah-blah-blah compared the keynote.

There was another keynote, only one year ago, that is burned much more clearly into my memory. The conference was Oracle Openworld 2010 and, of course, the keynote speaker was Larry Ellison. Unlike the IBM conference of long ago I had no speaking session. I certainly would have had a session, due to the overwhelming number of votes racked up by my proposed “Oracle Mix Suggest-a-Session” (105 votes), but the Oracle Mix/Conference folks decided, but didn’t share the fact until afterwards, that Oracle employees weren’t eligible to participate in the Suggest-A-Session program. Boo hoo. Attendees missed a lot because the session topic was Do-It-yourself Exadata-level Performance.

Do It Yourself  Exadata-Level Performance
A DIY Exadata-level performance configuration is not what I’m blogging about. However, it seems I just stepped in it.  Quote me on this:

 If you configure a lot of I/O bandwidth and a lot of CPU you can match Exadata performance with Real Application Clusters and conventional hardware.

…but I’m not blogging about that.

Big Sounding Claims
So, back to the comparison between the circa-2000 IBM Partnerworld keynote and last year’s Openworld keynote. The majority of fanfare last year was Exalogic. I remember this Larry Ellison sound bite :

A single setup is capable of handling 1 million HTTP requests per second; two running side-by-side could handle Facebook’s HTTP request workload

For those who don’t know, the following is true:

That’s roughly 100,000 per second (260 * 10^9)/(30*24*60*60). So, the claims of a two-rack Exalogic configuration handling Facebooks “HTTP request workload” is actually quite true. So why am I blogging about it then? The numbers are indeed stunning! There is a huge difference between the “HTTP request workload” and serving meaningful content. But I’m not blogging about that.

Those Facebook Engineers are Real Dummies
Well, they must be, if they were using  “thousands and thousands” of servers to do what they were doing even back in 2008:

Because we have thousands and thousands of computers, each running a hundred or more Apache processes […]

Silly Facebook engineers. They could save so much space, power and cooling if they only listened to certain keynote wisdom.

Your Choice: A Two-Rack Exalogic Configuration Or 25 Itsy-Bitsy Xeon 5600 Cores
If A two-rack Exalogic configuration can supplant all the processing power of Facebook’s web serving farm, I have to do the math. I won’t argue that Exalogic can perform “1 millions HTTP requests per second.” Because it surely can. I will, however, focus on just how much Westmere-EP CPU it takes to do the 100,000 HTTP requests/sec it would take to handle just the request portion of Facebook’s workload.

I think I’ll fire up httpd on one of my Linux servers and crank up ab(8) (ApacheBench) to see how much CPU is takes to handle a given rate of requests. The workload is pulling about 4K for each request. The server is a 2-socket Westmere-EP (Xeon 5600) system running Linux 2.6:

# ab -n 1000000 -c 24 http://172.28.8.250/
 This is ApacheBench, Version 2.0.40-dev apache-2.0
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking 172.28.8.250 (be patient)
 Completed 100000 requests
 Completed 200000 requests
 Completed 300000 requests
 Completed 400000 requests
 Completed 500000 requests
 Completed 600000 requests
 Completed 700000 requests
 Completed 800000 requests
 Completed 900000 requests
 Finished 1000000 requests
Server Software: Apache/2.2.3
 Server Hostname: 172.28.8.250
 Server Port: 80
Document Path: /
 Document Length: 3985 bytes
Concurrency Level: 24
 Time taken for tests: 62.656311 seconds
 Complete requests: 1000000
 Failed requests: 0
 Write errors: 0
 Non-2xx responses: 1000002
 Total transferred: 4183008366 bytes
 HTML transferred: 3985007970 bytes
 Requests per second: 15960.08 [#/sec] (mean)
 Time per request: 1.504 [ms] (mean)
 Time per request: 0.063 [ms] (mean, across all concurrent requests)
 Transfer rate: 65196.45 [Kbytes/sec] received
 Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 0 0 0.0 0 1
 Processing: 0 1 4.4 1 885
 Waiting: 0 0 4.5 0 884
 Total: 0 1 4.4 1 885
Percentage of the requests served within a certain time (ms)
 50% 1
 66% 1
 75% 1
 80% 1
 90% 2
 95% 3
 98% 4
 99% 5
 100% 885 (longest request)

No tuning, out of the box gives me 15960.08 HTTP requests/second. And the CPU? Well, it peaked at 66% idle, or 34% utilized.  Of course this is SMT (hypterthreaded) so it’s 8 processor threads or 4 cores:

# vmstat 3
 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 2 0 127816 259408 230712 47307252 0 0 92 2 0 0 0 0 100 0 0
 0 0 127816 259400 230712 47307352 0 0 0 0 1105 432 0 0 100 0 0
 0 0 127816 259408 230712 47307352 1 0 1 12 1074 330 0 0 100 0 0
 0 0 127816 259392 230712 47307352 0 0 0 0 1114 441 0 0 100 0 0
 5 0 127816 259044 230712 47308164 0 0 0 8 7439 23103 4 4 92 0 0
 10 0 127816 237848 230728 47318736 1 0 1 4963 7509 94804 17 14 70 0 0
 2 0 127816 244196 230736 47331552 0 0 0 48 6944 69394 16 12 72 0 0
 6 0 127816 231328 230744 47330476 0 0 0 5927 10704 62483 13 10 78 0 0
 14 0 127816 227280 230756 47307412 1 0 1 0 5756 85341 15 13 72 0 0
 18 0 127816 239084 230764 47306240 0 0 0 6467 6003 83381 15 13 72 0 0
 10 0 127816 237880 230780 47306440 1 0 1 6036 6257 82766 15 12 73 0 0
 1 0 127816 268676 230784 47302892 0 0 0 0 7258 42912 9 7 85 0 0
 9 0 127816 259512 230788 47307268 0 0 0 3169 19715 31181 8 6 85 0 0
 6 0 127816 236700 230804 47319992 1 0 1 3 5636 106346 19 15 66 0 0
 3 0 127816 240576 230816 47330944 0 0 1 7223 12153 49646 12 9 79 0 0
 2 0 127816 229308 230828 47329748 0 0 0 5669 8747 87512 15 12 72 0 0
 3 0 127816 241880 230840 47332748 1 0 1 1 6669 77569 16 13 71 0 0
 7 0 127816 229984 230848 47327988 0 0 0 5672 13008 56876 13 10 77 0 0
 8 0 127816 235400 230864 47330736 0 0 0 51 5411 101470 18 15 67 0 0
 2 0 127816 246684 230868 47328748 1 0 1 7179 12156 34923 9 7 84 0 0
 8 0 127816 234116 230880 47325008 0 0 0 3977 13480 55217 12 9 79 0 0
 10 0 127816 229848 230888 47301592 1 0 1 0 5712 86969 15 13 72 0 0
 9 0 127816 242936 230900 47302292 0 0 0 6248 5907 82740 15 13 72 0 0
 7 0 127816 234464 230908 47312100 0 0 0 0 6211 81525 14 12 74 0 0
 3 0 127816 260600 230916 47310580 1 0 1 6019 10948 49947 10 8 81 0 0
 0 0 127816 252816 230920 47314964 0 0 0 2847 11985 19545 5 4 90 0 0
 0 0 127816 252840 230920 47316000 0 0 0 0 1078 358 0 0 100 0 0
 0 0 127816 252952 230920 47316000 11 0 11 15 1108 426 0 0 100 0 0

With 3,990 HTTP requests per second/core I only need 25 Westmere-EP cores to “handle Facebook’s HTTP request workload.”  That’s 13 2U servers.

It’s both absurd, and insulting to Facebook, to suggest a couple of Exalogic systems can do anything worthwhile under Facebook’s insanely huge web workload. Since Facebook opted not to replace all their servers with a few tiles of Exalogic goodies, what did they do? Well, they built a behemoth data center in my part of the world. These are panorama shots so click and drag right to left:

Thanks to the Scobleizer for that. I’ve seen the outside of this building while driving by. If you read about the state of the art environmental engineering these guys put into this facility you’ll know why I’d love to get in there to see it—and you’ll certainly know why the odds are, um, slim Exalogic could replace even a partial fragment of a fraction of a percentage of half of one side of a single aisle of this datacenter.

So, no, Facebook didn’t really replace hundreds of metric tonnes of hardware with two 42U racks worth of Exalogic Elastic Cloud. I meant to say Facebook apparently didn’t get the memo.

Summary
Keynotes are interesting. Some are memorable.

Filed under: oracle

Enough Already MOS!

So another My Oracle Support Update at the weekend. Today I get the following results when searching the knowledge base: Thanks a bunch, Oracle! We pay for this stuff and you continually screw it up in basic ways. And no Chrome is not an unusual browser. And yes flash is up to date. It isn’t [...]

Real-Time SQL Monitoring - Retention

As I suggested in my last post, there's at least one more reason that your long-running SQL statements might not appear in SQL Monitoring views (e.g. V$SQL_MONITOR) or the related OEM screens.

When the developers at my current site started to use SQL Monitoring more often, they would occasionally contact me to ask why a statement didn't appear in this screen, even though they knew for certain that they had run it 3 or 4 hours ago and had selected 'All' or '24 Hours' from the 'Active in last' drop-down list.

I noticed when I investigated that some of our busiest test systems only displayed statements from the past hour or two, even when selecting 'All' from the drop-down. I asked some friends at Oracle about this and they informed me that there is a configurable limit on how many SQL plans will be monitored that is controlled by the _sqlmon_max_plan hidden parameter. It has a default value of the number of CPUs * 20 and controls the size of a memory area dedicated to SQL Monitoring information. This is probably a sensible approach in retrospect because who knows how many long-running SQL statements might be executed over a period of time on your particular system?

I included this small snippet of information in my SQL Monitoring presentations earlier this year because it's become a fairly regular annoyance and planned to blog about it months ago but first I wanted to check what memory area would be increased and whether there would be any significant implications.

Now that I've suggested to my client that we increase it across our systems I had a dig around in various V$ views to try to identify the memory implications but didn't notice anything obvious. My educated guess is that the additional memory requirement is unlikely to be onerous on modern systems but would still like to know for sure and so I'll keep digging but, if anyone knows already, I'd be very interested ...

Updated later - thanks to Nick Affleck for pointing out the additional 's' I introduced on the parameter name. Fixed now to read _sqlmon_max_plan

Advert: Cary Millsap in London

If you're interested in Oracle performance (or system performance in general) and you don't know who Cary Millsap is then I'm surprised. If you've never read 'Optimizing Oracle Performance', then I think you should.

Most of all, and this isn't to diminish Cary's other work in any way, if you've never had the pleasure of hearing Cary explain complex performance topics in a way that makes them fun and easy to understand then you really are missing something special.

Well now's your chance to put this right. Although it's pretty easy to hear Cary speaking at one of the many user group conferences he pops up at over the pond, the chance to hear him speak for a whole day in the UK is pretty rare.

Sign up. I know training budgets are tight at the moment, but I bet you won't regret it!

(Oh, and in case you think this is just rampant advertising for a friend, you should see the number of mails I get these days asking me to review things I don't have the slightest interest in, in the hope I'll blog about them. I only blog about things I think are worth my and your time ...)

How To Become An Oracle Expert (The Scientist)

I’ve been invited by the Sydney Oracle Meetup Group to participate in an open discussion on “How To Become An Oracle Expert” next Tuesday, 16 August 2011 at 5:30pm. There’s a great lineup, including: -  Chris Muir (Oracle ACE Director, Australia) -  Connor McDonald (Oracle ACE Director, Australia) -  Craig Shallahamer (Oracle ACE Director, US) -  [...]

Friday Philosophy – Blogging Style and Aim

I’ve recently looked back at some of my earlier blog postings and also some notes I made at the time I started. I had a few aims at the start, pretty much in this order:

  • A place to put all those Oracle thoughts and ideas, for my own benefit
  • Somewhere to record stuff that I keep forgetting
  • I’d started commenting on other blogs and felt I was maybe too verbal on them
  • To increase my profile within the Oracle community
  • To share information, because I’m quite socialist in that respect
  • To learn more

It very quickly morphed into something slightly different though.

Firstly, it is not really somewhere that I record thoughts and ideas or where I record stuff that I forget. When I am busy, I sometimes only get half way to the bottom of resolving an issue or understanding some feature of Oracle. I tend to create little documents about them but I can lose track of them. I initially intended to put these on my blog. The thing is though, I don’t feel I can blog about them because I might be wrong or I raise more questions than I answer. I don’t think a public blog about technology is a good place to have half-baked ideas and I certainly don’t want people:

  1. reading and believing something that is wrong
  2. thinking I do not know what I am talking about
  3. seeing my rough notes as boy are they rough, often with naughty words in them and slang. Converting them to a familly-friendly format takes time. 

You see, there is the point about increasing my profile in the community. Part of me hates the conceit that you have to be seen as all-knowing or never wrong, as no one is all-knowing and never wrong. In fact, I think most of us find it hard to like people who put themselves as such.  But if I put out a blog saying “it works this way” and I am wrong or I simply say it in a clumsy way or I assume some vital prior knowledge, I could be making people’s lives harder not easier, so I spend a lot of effort testing and checking. It takes me a lot, lot longer to prepare a technical blog than I ever thought it would before I started. And yes, I accept I will still get it wrong sometimes.

Another consideration is that I make my living out of knowing a lot about Oracle. If I post a load of blogs saying something like “gosh I wish I understood how Oracle locks parts of the segment as it does an online table rebuild and handles the updates that happen during it”, then I obviously don’t know about that. Or I put out a post about how I currently think it works and I’m wrong. Tsch, I can’t be that good! How much should I have to think about how I am selling myself as a consultant? There is a difference between being liked and being perceived as good at what you do. If you want someone to design a VLDB for you, you probably don’t care if s/he is a nice person to spend an evening in the pub with - but you certainly care if they seem to be fundamentally wrong about oracle partitioning.

Balancing that, if you saw my recent post on Pickler Fetch you will see that I was wrong about a couple of things and there was some stuff I did not know yet. But I learnt about those wrong things and lack of knowledge, so I feel good about that. That was one of my original aims, to learn. Not only by having to check what I did but by people letting me know when I was wrong.

What about style? I can be quite flippant and, oh boy, can I go on and on. I know some people do not like this and, if you want a quick solution to an oracle problem, you probably do not want to wade through a load of side issues and little comments. You just want to see the commands, the syntax and how it works. Well, that is what the manuals are for and there a lot of very good web sites out there that are more like that. If you do not like my verbose style then, hey that’s absolutely fine.  But I like to write that way and so I shall.

So after over 2 years of blogging, I seem to have settled into a style and my aims have changed.

  • I try to be helpful and cover things in detail.
  • I try to polish what I present a lot, lot more than I do for my own internal notes. Maybe too much.
  • I’m going to write in a long-winded way that some people will not enjoy but it is my style.
  • I’m going to try and worry less about looking perfect as I am not.

I suppose what I could do is start a second, private blog with my half-baked stuff on it. But I just don’t think I’ve got the time :-)

 

 

 

 

The New Order Oracle Coding Challenge 4 – Tic Tac Toe

(Back to the Previous Post in the Series) Tic Tac Toe, the game of X’s and O’s, was an oddly popular game in elementary school.  When playing the game you quickly learn a couple of rules: Because X always places his mark first (alternating between X and O), there is an unfair advantage for the player placing [...]

Pimp my collectl-advanced system monitoring using collect-utils part I

I have recently written about collectl, a truly superb troubleshooting utility, in a previous post. After comments from Mark Seeger (the author) and Kevin Closson (who has used it extensively and really loves it), I have decided to elaborate a bit more about what you can do with collectl.

Even though it’s hard to believe, collectl’s functionality can be extended by using the collectl-utilities from sourceforge, available here: http://collectl-utils.sourceforge.net/

Like collectl, you can either download a source tgz file or a noarch-RPM. Collectl-utils consist of three major tools, out of which I’d like to introduce the first one: colplot. When finding time I’ll create a post about the other part, most likely about colmux first.

colplot

I mentioned in said previous post that you can use the “-P” option to generate output in a plot format. This in turn can be fed to your favourite spreadsheet application, or alternatively into gnuplot. When chosing to use a spreadsheet application, it’s your responsibility to decide what to do with the raw data, each time you load a plotfile. Maybe, one day I’ll write a collectl-analyzer which does similar things to nmon-analyzer, but that has to wait for now. So if you are lazy like me, you need another alternative, and it comes easily accessible in the form of gnuplot.

Although I am very impressed by what gnuplot can do, I never had the time or energy to get to grips with all its options. When at University I used Mathematica 2 (yes it’s been a while) and thought the plot2d() function was complex …

Now for the good news: the complexity of gnuplot is nicely hidden by colplot, which takes the burden of generating the plot files away from the user. And to make it more comfortable, all of this happens through a web interface. All you need is a web server such as apache and a little bit of initial configuration for it to work. I should also note that colplot can be used on the command line as well, but that is out of scope of this article.

This time around I downloaded the source tarball rather than the RPM as I wanted more control over the installation process. If you chose the RPM it is good to know that it has intelligence to tell SLES apart from RHEL and updates the web server configuration accordingly. If you decide to manually install colplot, check the INSTALL script as it can help you getting started. And don’t forget to read INSTALL-colplot and consult colplot-apache.conf for a sample apache configuration. The latter can go to /etc/httpd/conf.d on RHEL and will take effect after a reloading of the apache configuration. You also need collectl installed on the host running the collect-utils.

Colplot uses a directory, usually called plotfiles, where the recorded collectl output is stored. By default, it resides in the same directory as colplot but can be changed in the GUI.

I am thinking of using NFS to export the plotfiles directory, so that each monitored host could mount the directory and store output files. The more progressive use of SSHFS is probably out of scope for most database servers, but on my lab I’m king and do what I like. I personally found it easiest to use “collectl -P -f /mnt/sshfs/plotfiles/ ” to generate the files, where mount/sshfs/plotfiles was mounted from the web server host. If you are planning on generating the colplot output file names manually, i.e. not pointing to a directory, then make sure they are unique! This makes is easy to compare systems, as we’ll see below. One thing I noticed is that detail files all get their own trace file name in the form “host-date.type”, where type is dsk for detailed disk information etc.

After all the webserver setup is complete, point your browser to the host where you installed colplot. As I said, the “plotfiles” directory is scanned for files, which are processed. You see the following screen:

Using it

In the GUI, the first step you define which time of day with matching/gathered collectl information you would like to visualise (open the above screenshot in a separate window to better follow this discussion).

  • You can limit the information to be displayed to a certain time period, i.e. if you captured a day’s worth of statistics but only need the hour from 13:00 to 14:00 that’s a simple setting in the user interface
  • Alternatively, select “last 60 minutes” for the most recent period

You can also list the contents of the plotfiles directory, or even change the location-but bear in mind that the webserver still has to be able to read files from there!

If you like, you can instruct colplot to narrow down the files to be plotted by editing “filenames containing”. If the plotfiles directory contains information to satisfy the period/names you are interested in, it will plot it after a click on “Generate plot”. I suggest a display of “SysPlot” initially, which plots the systems recorded in the colplot files side by side. This is very useful for comparison of system health, especially in clusters. You should experiment with the different plot settings, which are very useful to do all sorts of analysis and allow aggregation on days, system and plots and various combinations of these. By the way the system name is derviced from the hostname when using the colplot -P -f /path/to/plotfiles/ … command.

Once you familiarised yourself with the options, you can further narrow down which data you interested in. I would suggest the “All Plots” option to get you started, unless of course you know what you are after. Colplot, like collectl differentiates between “summary” and “detail” plots. Of course, it can only plot what you recorded! Each of these has a convenient “All Plots” option to display all the information gathered. Here’s the slightly cropped output from a 2 node cluster I gathered (click for a larger view):

A very usful function is to email the results either in PDF or PNG format. For this to work you need uuencode (package sharutils in RHEL) on the web server host, and you need to be able to send email via the command line-colplot uses the mail (1) utility to send email. Sending a PDF is probably more useful than the PNG option, as the latter will send a each graph separately in a tar archive.

Summary

I can’t say how impressed I am with colplot, it’s really great for working out what happened when during a benchmark. The great things is the comparison of systems side by side which gives clear indications of imbalances and trensing. Using colplot is also a lot easier than writing your own spreadsheet macros to visualise the data. I really like it!