I think I made a mistake. I recently set up a twitter account @KerryOracleGuy (of course plain old @KerryOsborne was already taken). I only signed up because I wanted to see what @TanelPoder was saying. I wasn’t planning on tweeting at all when I signed up. I just wanted to see what Tanel was saying. Well the first day a bunch of people signed up to follow me. The whole thing seems a little strange. Although it is pretty easy to keep up with what people are doing. So I guess I’m going to give it a try and see if I like it. I have tweeted a few times and I think I am starting to get the hang of it (it’s hard to teach an old dog new tricks). I have figured out how to use tiny url’s now. I think I need to figure out how to tweet pictures as well. I’ve heard there is a Twit Pic for doing that. If I use it though, I guess that would make me a “Twit Pic-er”, which I don’t much like the sound of. So maybe I’ll stick to text based tweets for now.
I’m following up on a conference almost half a year later — try to bet that! Actually, this blog post was written more than 3 months ago and was sitting in my drafts waiting the moment I understand why I really wrote it. 3 months later… I still don’t know but I thought I should [...]
Tomorrow - July 28th at 10:00 PDT (13:00 East Coast US Time) - I'll be delivering a free Webinar on database security. If you are interested - you can register using this link: http://event.on24.com/r.htm?e=328712&s=1&k=D11C69EADAB1A25981D9E53966EE5D15&partnerref=blog1_sec_dbsecmulti
One of the biggest problems wih RAC is getting it installed. It’s not something I do often – which means it takes me far too long to do when I need to do it. But sometimes you’ve just got to go ahead and find the time. Three or four years ago I built myself a RAC stack – four nodes, because that’s the minimum number of sensibel testing. (If you want to know why: two nodes is a very special case because you don’t have to worry about three-way negotiations; three nodes is a special case because if you’re testing failure scnerios then three nodes fails to two nodes – which is a special case. So four nodes is the smallest generic RAC installation – and that’s ignoring the question of standby systems, of course.)
That’s four boxes I built from kits, an old laptop to do iSCSI emulation, and a spare 250GB external drive. I’ve got two routers there, one for the public network, one for the private; the boxes have 2GB of RAM and dual core pentium chips, and the O/S is Redhat. When I use it I tend to use Cygwin on my laptop and run up lots of telnet sessions. The system is very good for testing failures and node evictions because even with two big fans and a 4″ air gap (the white bits are polystyrene blocks) the third box down tends to overheat after about 45 minutes and crash – so most of the time I’m actually only using a three-node RAC.
Anyway, the reason for showing you this picture is because I’ve finally retired the old boxes and got some new kit. My new RAC stack, costing only £1,600, is on the right. As you can see, it’s a laptop (if you’ve got a big lap).
It’s got 16GB of RAM, 2 x 500GB discs, and a 2nd generation i7 quad-core CPU which – according to Windows 7 Pro - is also capable of double-threading. I haven’t actually had time to set up the RAC system yet (I think I’ll have a couple of free days for that around the end of August), I’ve only got as far as installing a basic Linux virtual machine as the first step in the process. But for the photo I created four virtual Windows XP Pro machines under VMWare, then started six Oracle instances across the four machines. The battery lasted about 90 minutes, but apart from that I think I’ve got a good indication that it will handle a four-node linux RAC with a windows Grid Control front-end if I want it to.
Footnote: Every time I buy some new kit (which is about once a year) I am stunned by the rate of change. My first PC cost me £2.300: it was an Apple II with the full 64KB of RAM, 2 x 360KB floppy disc drives (5.25″), 1MHz CPU (6502) and the special graphics card that allowed something like 320 x 250 pixels – and Visicalc was still faster than Excel on my latest laptop.
Thursday, July 28 11:00am PDT : Register Here
Last post I talked about the Method R performance tuning tools for Oracle. The creation of these tools was spear headed by Cary Milsap and now Cary will be talking live, for free, in two days on a webinar along with Kerry Osborne and Karen Morton.
Get the skinny on Oracle performance tuning straight from the horses mouth!
This will be a techincal webinar with good technical content. The webinar is being hosted by Embarcadero. Here is the writeup on the webinar from Embarcadero.
Oracle performance tuning and optimization is not always easy. Now’s your chance to hear from people who have been there and ask them the questions that are keeping you up at night.
Three senior DBAs and performance tuning specialists and members of the elite OakTable network will discuss best performance optimization and tuning practices for the Oracle® Database.
Participate in this online roundtable to gain specific insight and learn about advanced performance tips for the Oracle® Database. During this event you will:
About our panelists:
Karen Morton is a Senior DBA Performance and Tuning Specialist for Fidelity Information Services. Having used Oracle since the early 90’s, she began consulting and teaching others how to use Oracle over a decade ago. Karen is a frequent speaker at conferences and user groups, an Oracle ACE, and a member of the OakTable network. Karen has co-authored three books from Apress: Beginning Oracle SQL, Pro Oracle SQL and Expert Oracle Practices: Oracle Database Administration from the Oak Table. She blogs at karenmorton.blogspot.com.
Cary Millsap has been part of the Oracle community since 1989. He is the founder and president of Method R Corporation. He is widely known in the Oracle community as a speaker, educator, consultant, and writer. He wrote Optimizing Oracle Performance with Jeff Holt, and he is a co-author of Oracle Insights: Tales of the Oak Table . Cary is also an architect of the Method R software tools that help professionals around the world manage Oracle performance. He blogs at carymillsap.blogspot.com and tweets at twitter.com/CaryMillsap.
Kerry Osborne began working with Oracle in 1982. He has worked both as a developer and a DBA. For the past several years, he has been focused on understanding Oracle internals and solving performance problems. He is an OakTable member and is the author of an upcoming Apress book Expert Oracle Exadata. Mr. Osborne is also a co-founder of Enkitec, an Oracle-focused consulting company headquartered in Dallas, Texas. He blogs at kerryosborne.oracle-guy.com.
Register today. And don’t forget to fill in your questions on the webinar registration form.
I’m all alone, so are we all We’re all clones All are one and one are all ~ Alice Cooper Two separate discussions, one at work and one around the virtual OakTable , have made me aware that maybe not as many Oracle professionals as I imagined are aware that Oracle provides a facility to [...]
Just a short note to remind everyone that the excellent InSync11 Conference to be held this year at the Sydney Convention Centre on 16-17 August 2011 is but a few weeks away. With a great lineup of experts such as Tom Kyte, Tim Hall, Graham Wood, Chris Muir, Connor McDonald, Tony Jambu, Marcelle Kratochvil to name but a very [...]
A number of things to report on the fitness front…
I’ve switched some of my cardio sessions on the elliptical from regular interval training to Tabata Protocol. A little over 2 months ago Dominic Brooks pointed this out to me. I gave it a try then, but I wasn’t fit enough to cope with it. Now it is just a complete nightmare, rather than impossible. So general fitness is going really well. There are the obvious ups and downs, but I feel really good about things and I don’t find the gym daunting now. I know I’m going to survive.
I’ve backed off the weight training a bit. There are a couple of reasons for this. First, I was putting on too much muscle. I used to do weights as a kid and I guess the whole muscle memory thing is true. The tubby meat-head thing is not a look I’m interested in. Second, shifting a lot of weight was starting to aggravate my joints. I’ve had operations on my left elbow and knee and both of them have been playing up for a couple of days after doing heavy lifting. Those two factors combined seemed like pretty clear signals that weight training is no longer the right thing for me.
To replace the absent weight training I’ve been doing some circuit training, mostly using functional training exercises. It feels pretty good and isn’t freaking out my joints so far. I’m not a gym-class type of guy, but I figured I need some technique coaching, so yesterday evening I went to a kettlebells class. The bruises on my arms and shoulders this morning would suggest my technique is not quite right, but I’ve already learned a bunch of stuff, so that’s good. Two visits to the gym yesterday was not a great idea, but I’m glad I got off my ass and made it to the kettlebells class. It would have been very easy to leave it for next week.
As part of the functional training I’ve also started to do handstands again. I used to be pretty good at them, thanks to the yoga, but as I got heavier they got a bit scary and my center of gravity shifted somewhat. Now I’m hitting them pretty well and walking on my hands pretty comfortably. I can even pull out a few half-handstand-pushups on a good day. I was in the dance studio practicing handstands and someone asked me if I was a gymnast, which cracked me up. Is there a gymnastics for fatties competition I can enter?
Still another 30 years and 30 pounds to go…
Someone beat me to it in this good post: Oracle Tip: Counting ROWS for all tables in a Schema . So here a reminder for me where to find it.
#993333; font-weight: bold;">SELECT #993333; font-weight: bold;">TABLE_NAME #66cc66;">, to_number#66cc66;">(extractvalue#66cc66;">(xmltype#66cc66;">( dbms_xmlgen#66cc66;">.getxml#66cc66;">(#ff0000;">'select count(*) c from '#66cc66;">||table_name#66cc66;">)#66cc66;">) #66cc66;">,#ff0000;">'/ROWSET/ROW/C'#66cc66;">)#66cc66;">) #993333; font-weight: bold;">COUNT #993333; font-weight: bold;">FROM user_tables;
That said, I don’t like the use of DBMS_XMLGEN (performance reasons/compatibility), so also for future reference, I also still have to come up with a better one. The use of extractvalue has at least the advantage that this one should be backwards compatible to 184.108.40.206.
It looks like Oracle has introduced with the Oracle 220.127.116.11 patch set a new "cost is time" model for the time estimate of the Cost-Based Optimizer (CBO).
In order to understand the implications let me summarize the evolution of the CBO in terms of cost / time estimate so far:
1. Oracle 7 and 8
The traditional I/O based costing introduced with Oracle 7 in principle counted the number of required single and multi-block reads to arrive at the final cost. A potential drawback of this approach was the missing differentiation of multi- and single-block reads - one multi-block read created the same cost as a one single-block read. Although the model used an "adjusted" multi-block read count to make full table scans more costlier than indicated by larger "db_file_multiblock_read_count" settings (and hence accounted for smaller extents and blocks already cached in the buffer cache making multi-block reads smaller than requested) it still potentially favoured full table scans over index access paths.
From Oracle 8 on one could play with the OPTIMIZER_INDEX_COST_ADJ / OPTIMIZER_INDEX_CACHING parameter to adjust this shortcoming of the costing model in particular for OLTP biased applications.
2. Oracle 9i
Oracle introduced with Oracle 9i the System Statistics along with a more sophisticated cost calculation model.
In short, System Statistics offer the following features:
- Different treatment of single-block and multi-block operations
- Time-based optimization using average timings for single- and multi-block reads
- Cost calculation includes a CPU cost component
- Gather actual hardware capabilities to base the calculations on actual system capabilities and workload pattern
More details can be found for example in my "Understanding System Statistics" blog series.
So with System Statistics the CBO actually calculates an estimated execution time - you can see this in the EXPLAIN PLAN output: With System Statistics enabled it includes a TIME column.
Simplified spoken the time estimate is simply the average time for a single block read (SREADTIM) times the number of single block reads plus the average time for a multi-block read (MREADTIM) times the number of multi-block reads plus the estimated number of cpu operations divided by the cpu operations per second (CPUSPEED / CPUSPEEDNW). So the cost with System Statistics is actually based on a time estimation.
For consistency reasons it has been decided to use the same unit as before, so the estimated time is simply divided by the SREADTIM to arrive at the same cost unit as with traditional I/O based costing which is number of single-block reads (although plans involving full segment scan operations usually arrive at different costs than the traditional costing, so consistency is hardly given anyway).
Right from the beginning in Oracle 9i the System Statistics could be gathered in WORKLOAD mode, which means that Oracle takes two snapshots of certain performance statistics and calculates the System Statistics parameters like SREADTIM, MREADTIM, MBRC etc. from the delta values.
3. Oracle 10g
Starting with Oracle 10g System Statistics were enabled by default with so called default NOWORKLOAD settings. It even allowed to generate an artificial load on the system by gathering the NOWORKLOAD System Statistics simply using a predefined I/O pattern to gather the disk transfer speed (IOTFRSPEED) and disk seek time (IOSEEKTIM) - these values are then used to derive the SREADTIM and MREADTIM values - the two most important ingredients of the enhanced cost/time calculation.
So since Oracle 9i there is a built-in functionality to measure the capabilities of the underlying hardware - from 10g on either based on a particular workload pattern or by submitting an artificial predefined load.
Furthermore Oracle provides a well-defined API as part of the DBMS_STATS package for dealing with System Statistics: They can be gathered, deleted, exported, imported, manually defined and even gathered directly into a separate statistics table to build a history of System Statistics gathered.
4. Oracle 11g
In 11g Oracle introduced the I/O calibration routine as part of the Resource Manager. Note that so far this feature didn't have a direct relationship to the Cost Based Optimizer - it could be used however to control the maximum parallel degree using the PARALLEL_IO_CAP_ENABLED parameter.
The first thing that puzzled me when dealing with that new functionality was the lack of a well-defined API to maintain the gathered information. There is a single call in the resource manager package (DBMS_RESOURCE_MANAGER.CALIBRATE_IO) to run the I/O calibration, but apart from that no additional functionality for maintenance. No way to delete the calibration results, export or import them, or even manually override.
If you want to get an understanding what this means, have a look at the MOS document "Automatic Degree of Parallelism in 18.104.22.168 [ID 1269321.1]" that, besides stating that there can be problems with the actual I/O calibration like gathering unreasonable values or not running to completion, shows you how to manipulate an internal SYS table to override the values gathered which also requires to bounce the instance in order to become effective.
I find it hard to understand why Oracle hasn't address these handling shortcomings in the meantime, particularly given the fact that with Oracle 22.214.171.124 the I/O resource calibration becomes mandatory if you want to make use of the new Auto-DOP feature that has been introduced with 126.96.36.199. Fiddling with a SYS-owned table doesn't sound like a well-designed feature to me, and the calibration functionality is not exactly "brand-new".
5. Oracle 188.8.131.52
So starting with 184.108.40.206 the new "cost is time" calculation comes into the picture. If you have values in the corresponding SYS.RESOURCE_IO_CALIBRATE$ table (that is simply externalized by the DBA_RSRC_IO_CALIBRATE view) then something really odd happens:
The cost that has been calculated according to the System Statistics model - which is already a time estimate based on three different components - time for single-block and multi-block reads as well as the estimated CPU time is now converted into a data volume by simply multiplying the resulting cost with the default block size. Dividing this data volume by the throughput as indicated by the I/O calibration results (it looks like the value MAX_PMBPS is relevant) arrives at a new estimated execution time.
Let's have a look at a working example: With default NOWORKLOAD System Statistics, 8KB default block size and a unset db_file_multiblock_read_count that results in a MultiBlockReadCount (MBRC) of 8 to be used internally for calculation of a full table scan (FTS) the time estimate for a FTS of 10,000 blocks (80MB) will be based on 1,250 multi-block reads, which are estimated to take 26ms each - this gives us a time estimate of 32.5 seconds. The CPU time associated with that full table scan operation will be added on top so that the final result will be something between 32.5 and 33 seconds. Let's stick to the 32.5 seconds - this time estimate corresponds to approx. 2,710 single-block reads by simply dividing the time by 12ms which happens to be the SREADTIM value for default NOWORKLOAD System Statistics with above configuration - this value will be close to the cost shown (minor variations are depending on the CPU speed determined).
Cost / time estimate for a FTS of a 10,000 block segment with 8KB block size, default NOWORKLOAD System Statistics and default MBRC of 8 used for cost calculation (_db_file_optimizer_read_count = 8):
Now if you happen to have a value of MAX_PMBPS of 4MB/sec as I/O Resource Calibration result (I chose this very conservative value deliberately because it happens to be the same transfer rate that the default NOWORKLOAD System Statistics assumes (4096 bytes per millisec), the following new time calculation will happen instead:
2,710 will be multiplied with the 8KB default block size to arrive at a data volume, in this case approx. 21 MB
This approx. 21 MB is now divided by the 4MB/sec, to arrive at a new time estimate of approx. 5.3 seconds, rounded up to 6 seconds. Note that the original time estimate was 32.5 seconds.
Cost / time estimate for a FTS of a 10,000 block segment with 8KB block size, default NOWORKLOAD System Statistics and default MBRC of 8 used for cost calculation (_db_file_optimizer_read_count = 8) but MAX_PMBPS set to 4MB/sec:
You can see this also happening in the 10053 CBO trace file:
Now this approach strikes me as odd for several reasons:
- A sophisticated time estimate (remember that it even includes a CPU time component that has nothing to do with an I/O volume) is turned into data volume to arrive at a new time estimate using a rather simplistic approach
- As you can see from above example the "data volume" calculated does not correspond to the actual I/O volume that we know from the System Statistics cost/time calculation - remember that the actual segment size in this case was 80MB, not 20MB. This is of course caused by the underlying calculation of the original time estimate based on multi-block reads. So why we would turn the cost/time into some data volume that has nothing to do with the actual data volume used for the original cost/time calculation is above me
- There is already an I/O calibration routine available as part of the System Statistics functionality that can be used to arrive at more realistic time estimates based on the gathered System Statistics information - why a second one has been introduced? Furthermore this raises the question: If I'm required to run the I/O calibration to enable Auto-DOP - shouldn't I then also "calibrate" my System Statistics to arrive at a "calibrated" cost estimate? After all the new "Cost Is Time" approach uses the cost estimate for the new time estimate.
- As already outlined there is no officially documented way to properly deal with the I/O calibration results - manually poking into SYS-owned tables doesn't really count
So you probably think, why bother? The cost estimate is left untouched, only the TIME column is affected. So execution plans shouldn't change since they are still chosen based on the lowest cost estimate - and the lower the cost, the lower the new time estimate.
You'll appreciate however that the previous simple correlation between the cost and the time estimate is no longer true with 220.127.116.11 and resource calibration results available: So far you could simply divide the time estimate by the SREADTIM value to arrive at the cost, or the other way around, you could multiply the cost by the SREADTIM value to arrive at the time estimate - or use both values to arrive at the SREADTIM value - since the time divided by the cost should give you the approximate value of SREADTIM.
The point with 18.104.22.168 and the I/O resource calibration is that the new time estimate is obviously used for the Auto-DOP feature to drive two crucial decisions:
- Is the statement a candidate for parallel execution? This is controlled via the parameter PARALLEL_MIN_TIME_THRESHOLD that defaults to 10 seconds in 22.214.171.124
- If it is a candidate for parallel execution what is the optimal DOP? This is of course depending on a lot of different inputs but also seems to be based on the new time estimate - that, as just explained, arrives at a (wrong) data volume estimate in a questionable way
As a side note, Oracle at present recommends to set the value of MAX_PMBPS to 200 for Exadata environments rather than relying on the results of the actual I/O calibration - another indication that the I/O calibration results as of now are questionable.
Oracle introduced with the 126.96.36.199 patch set a new model for the estimated "Execution Time" if I/O resource calibration results are available. As outlined above the new approach seems to be questionable (at least), but will be used for crucial decisions regarding the new Auto-DOP feature. It will be interesting to see the further development in this area, whether for example the new time algorithm will be changed in upcoming releases or the influence of the I/O calibration on the CBO calculations will be extended.
If you want to make use of the new Auto-DOP feature in 188.8.131.52 you should be aware of these relationship - the MAX_PMBPS parameter drives the new time estimation and the Auto-DOP calculations.