Search

Top 60 Oracle Blogs

Recent comments

August 2009

How much disk space did Snow Leopard really save?

Like apparently hundreds of thousands of others, I upgraded my machines running Mac OS X from version 10.5 (Leopard) to 10.6 (Snow Leopard) last Friday. I'm now a Snow Leopard user, and I like it just fine.

I was excited about this upgrade, because I love the notion that the people who released it care about optimizing the performance of my system. One of the optimizations I looked forward to was reclaiming over 6 GB of disk space after the upgrade (see Bertrand Serlet's announcement at 00:20:48 to 00:21:11 in the WWDC 2009 keynote video).

Lots of people in the Twittersphere were excited about the space savings, too. Before I upgraded, I checked to see what people were tweeting, just to make sure I wasn't about to walk off a cliff. Many people mentioned tremendous disk space savings that were well in excess of the 6 GB that Apple promised. Pretty exciting.

We have two Mac computers. Here's how the savings went for us:

Mac #1      10.5      10.6      Savings
------- --------- --------- ---------
Total 148.73 GB 159.70 GB
Free 47.70 GB 59.71 GB 12.01 GB

Mac #2 10.5 10.6 Savings
------- --------- --------- ---------
Total 185.99 GB 199.71 GB
Free 68.66 GB 83.95 GB 15.29 GB

So, ...wow, we saved over twice as much space as Apple had advertised. But there's a curiosity in the numbers. Do you see it? How did my total capacity get bigger as the result of a software upgrade? The answer is that my capacity didn't really get bigger; it's just that Apple now measures disk space differently in 10.6.

I knew this was coming because of this article called "Snow Leopard's New Math." Snow Leopard still uses the abbreviation "GB" to refer, now, to 109 bytes, whereas, before, Leopard used the abbreviation "GB" to refer to 230 bytes. The problem, see, is that 109 ≠ 230. In fact, 230 is bigger. So in Snow Leopard, Apple is dividing by a smaller unit than it used to, which results in disk capacities and file sizes looking bigger than they used to. (Here's a good article about that.)

It is misleading that Apple used the same abbreviation—"GB"—to refer to two different units of measure. However, Apple is well justified in using "GB" in Snow Leopard. IEEE 1541-2002 says the right abbreviations would have been "GiB" (gibibytes) in 10.5 and "GB" (gigabytes) in 10.6. By that standard, Snow Leopard is right, and Leopard was wrong. All's well that ends well, I suppose.

Now, back to the space savings question. How much space did I really save when I upgraded to Snow Leopard? To answer that, I need to convert one of the two columns in my analysis (labeled "10.5" and "10.6") to the other column's unit, so I can subtract. Since when I watched the WWDC keynote film, my mindset was of 10.5-style "gigabytes" (properly gibibytes), I'll convert to GiB. Here's the answer:

Mac #1      10.5        10.6      Savings
------- ---------- ---------- ---------
Total 148.73 GiB 148.73 GiB
Free 47.70 GiB 55.61 GiB 7.91 GiB

Mac #2 10.5 10.6 Savings
------- ---------- ---------- ---------
Total 185.99 GiB 185.99 GiB
Free 68.66 GiB 78.18 GiB 9.52 GiB

That's still spectacular, and I'm plenty happy with it. I have basically bought a whole bunch of performance enhancements and 17 GiB of disk space for $49 plus tax (I bought the Snow Leopard upgrade family pack). I think that's a pretty good deal.

This whole story reminded me of the old days when I used to install Oracle for a living. People would buy, say, a brand-new 100,000,000-byte disk drive and then be upset when the df utility showed considerably less than 100 "MB" of free space. Part of the explanation was that df reported in mibibytes, not millions of bytes.

It's interesting to note that in Snow Leopard, df -h now reports in Bi/Ki/Mi/Gi units, and df -H reports in B/K/M/G units (defined as IEEE 1541 defines them). Smart.

latch: cache buffers chains latch contention – a better way for finding the hot block

Here's a treat for Oracle performance professionals and geeks who are looking for more systematic ways for cache buffers chains (CBC) latch contention troubleshooting. Cache buffers chains latches are taken when a process wants to walk through a cache buffer hash chain, looking if the block with required DBA (data block address) is in buffer cache. If the block happens to be in cache, then in most cases it has to be pinned first before use and unpinned after use, to make sure no-one else can perform an incompatible operation on that block at the same time.

New presentation: Deriving Optimal Configurations Using 11g Database Replay

Jeremiah Wilton’s presentation shows how to use Oracle 11g Real Application Testing to quantify effect of system and database configuration changes.  As an example, he uses Real Application Testing to validate the Automatic Advisor recommendations, and uncovers some interesting results.

Check out the presentation on our whitepaper page.

Select COUNT(*) and COUNT(column) are different things!

Every now and then I see someone wondering why Oracle is “returning wrong results” for some count queries when counting using COUNT(column_name) instead of COUNT(*) or COUNT().

Oracle is actually returning correct results, its just that sometimes the people asking the questions haven’t realized that COUNT(column) is something semantically different from COUNT(*).

COUNT(*) operation counts all rows fed to it by execution plan branch under it.

COUNT(column) operation on the other hand counts all non-null values in that column from rows fed to it by execution plan branch under it.

And here’s a little example:

Exciting seminars in Dallas arena

If you live in Dallas area, there are two very important seminars coming up:

  1. Tanel Poder is presenting his Advanced Oracle troubleshooting seminar on September 9-11, 2009. .
  2. Jonathan Lewis presenting two day seminar cost based optimization and writing efficient SQL on Oct 28 and Oct 29 2009
  3. .

These two gentlemen needs no introduction. Tanel Poder is an expert in Oracle internals and Jonathan Lewis is a Guru in Cost based optimization and writing efficient SQL.

Tell them I sent ya :-)

Update: I just heard from Tanel and he is unable to make it to Dallas this September. He is rescheduling his seminar and I can’t wait for his seminar. I will update this blog with his rescheduled dates.

a public apology

I think I've mentioned this somewhere on my blog before, but last year right before I left for the Miracle Oracle Open World conference, I found myself unexpectedly responsible for a new system. Up until that point, I was working in an 'advisory' role for the project - available to answer questions, offer guidance, etc. I had no authority to direct the work, it was just assumed that the other

Knowing the trend of Deadlock occurrences from the Alert Log

Recently, my client deployed a new application and had this intermittent “Deadlock Storm” …

A trace file was sent and I was able to pinpoint the cause of the deadlock and the session that caused it.
The deadlock was a TX enqueue with mode of 4 (S – share) which could be verified by looking at the following lines of the Process State dump:

   last wait for 'enq: TX - row lock contention' blocking sess=0x 7000000cb239d60 seq=7849 wait_time=2929705 seconds since wait started=3
            name|mode=54580004, usn<<16 | slot=a0028, sequence=283f2

the "enqueue and lock mode" is explained as:
mode=54580004 (see above)
5458 (hex) = TX (ascii)
0004 (hex) = mode 4 (S – share)

Alter system kill session and ORA-00031: session marked for kill

I’m sure you are all familiar with this situation:

SQL> alter system kill session '152,33';
alter system kill session '152,33'
*
ERROR at line 1:
ORA-00031: session marked for kill

The session trying to issue the kill will hang for 60 seconds and then return this “session marked for kill” message. And the target session does not get killed at all.

So why is that?

The issue is in what this alter system kill command is doing. It’s not actually killing the target session (like kill -9 would do for OS processes). It just sets a bit in the target sessions state object, which marks that the target session should end. But its entirely up the target session to check this bit and act on it!

So, intead of ALTER SYSTEM KILL SESSION, the command should look something like ALTER SYSTEM ASK SESSION TO COMMIT SUICIDE.

All the kill session command is doing is ASK the target session to clean up and exit – via setting that bit.

Optimizer cleverness

At present I'm quite busy and therefore don't have much time to spent on writing blog notes, but I couldn't resist to publish this small and simple test case.

Often you can read (mostly unqualified) rants in various places and forums about the Cost Based Optimizer how stupid, unpredictable etc. it seems to be.

So I think it's time to demonstrate how clever the optimizer sometimes can be.

Consider the following setup:

drop table t_opt_clever purge;

-- Use PCTFREE 99 so that only one row per (leaf) block
-- This can tell us how many "rows" had to be inspected
-- by checking the number of (leaf) blocks accessed
-- Unfortunately Oracle (usually) doesn't provide the information
-- how many rows have been accessed in the execution plan,
-- but only how many rows are returned by an operation
create table t_opt_clever (
id not null constraint pk_opt_clever primary key,
col1 not null,
col2 not null,
col3 not null,
col4 not null,
col5 not null,
filler
)
pctfree 99
pctused 1
as
select
level as id
, round(dbms_random.value(0, 200)) as col1
, round(dbms_random.value(0, 400)) as col2
, case
when level <= 666
then 'FIRST_BUCKET'
when level <= 833
then 'SECOND_BUCKET'
when level <= 1000
then 'THIRD_BUCKET'
end as col3
, round(dbms_random.value(0, 600)) as col4
, round(dbms_random.value(0, 800)) as col5
, rpad('x', 100, 'x') as filler
from
dual
connect by
level <= 1000;

create index idx_opt_clever1 on t_opt_clever (col5, col1, col4, col2) pctfree 99 compute statistics;

create index idx_opt_clever2 on t_opt_clever (col5, col1, col3, col4, col2) pctfree 99 compute statistics;

exec dbms_stats.gather_table_stats(null, 'T_OPT_CLEVER')

-- scale the table and index by factor 1000
exec dbms_stats.set_table_stats(null, 'T_OPT_CLEVER', numrows => 1000000, numblks => 30000)

exec dbms_stats.set_index_stats(null, 'PK_OPT_CLEVER', numrows=> 1000000, numlblks => 2000, numdist=>1000000, clstfct => 100000, indlevel => 3)

exec dbms_stats.set_index_stats(null, 'IDX_OPT_CLEVER1', numrows=> 1000000, numlblks => 14000, numdist=>1000000, clstfct => 1000000, indlevel => 3)

exec dbms_stats.set_index_stats(null, 'IDX_OPT_CLEVER2', numrows=> 1000000, numlblks => 16000, numdist=>1000000, clstfct => 1000000, indlevel => 3)

Basically this simulates a 1,000,000 rows table with two suboptimal indexes given the following Top 100 query:

-- Now which index can be efficiently used by the optimizer?
select
*
from (
select
*
from
t_opt_clever
where
col3 = 'FIRST_BUCKET'
order by
col3, col5, col1, col4, col2
)
where
rownum <= 100;

Now what do you think, can one of these indexes efficiently be used by the optimizer, and if yes, which one?

At first sight both indexes can't be used to satisfy the requested sort order to avoid a costly full scan of data and a corresponding SORT ORDER BY (STOPKEY) operation, and can't be used efficiently to filter the data because the filter predicate is not among the leading columns.

Let's check the result:

SQL> select * from table(dbms_xplan.display_cursor(null, null, '+COST ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID fz6vky8n5a3xq, child number 0
-------------------------------------
select * from ( select * from t_opt_clever where
col3 = 'FIRST_BUCKET' order by col3, col5, col1, col4, col2 ) where
rownum <= 100

Plan hash value: 4203008252

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.29 | 256 | 100 |
| 2 | VIEW | | 1 | 101 | 109 (0)| 100 |00:00:00.29 | 256 | 100 |
| 3 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 333K| 109 (0)| 100 |00:00:00.29 | 256 | 100 |
|* 4 | INDEX FULL SCAN | IDX_OPT_CLEVER2 | 1 | 101 | 8 (0)| 100 |00:00:00.01 | 156 | 0 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=100)
4 - access("COL3"='FIRST_BUCKET')
filter("COL3"='FIRST_BUCKET')

24 rows selected.

That is quite interesting, the index IDX_OPT_CLEVER2 is used and no SORT ORDER BY operation can be found in the execution plan, although the index doesn't match the requested sort order. And here comes the cleverness of the optimizer: It recognizes that due to the filter predicate on COL3 this index can actually be used to satisfy the sort order because it is not relevant for the resulting order since COL3 will always be the constant value of the filter predicate. And the same applies to IDX_OPT_CLEVER1, by the way.

But IDX_OPT_CLEVER2 is more efficient than using IDX_OPT_CLEVER1 because the filter predicate can be evaluated on the index data already eliminating some of the rows before visiting the table. Depending on the clustering factor this can make a significant difference to the cost of the operation, since random row accesses to table rows potentially require to access a different block per row.

This can be seen when forcing the usage of IDX_OPT_CLEVER1:

SQL> select * from table(dbms_xplan.display_cursor(null, null, '+COST ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5tgmgfvyyx6z6, child number 0
-------------------------------------
select * from ( select /*+ index(t_opt_clever idx_opt_clever1) */ * from
t_opt_clever where col3 = 'FIRST_BUCKET' order by col3,
col5, col1, col4, col2 ) where rownum <= 100

Plan hash value: 678132971

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.20 | 310 | 54 |
| 2 | VIEW | | 1 | 101 | 312 (1)| 100 |00:00:00.20 | 310 | 54 |
|* 3 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 101 | 312 (1)| 100 |00:00:00.20 | 310 | 54 |
| 4 | INDEX FULL SCAN | IDX_OPT_CLEVER1 | 1 | 1000K| 8 (0)| 154 |00:00:00.01 | 156 | 0 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=100)
3 - filter("COL3"='FIRST_BUCKET')

23 rows selected.

Two things can be seen here:

1. The optimizer is again smart and is able to avoid the SORT ORDER BY operation, because the index IDX_OPT_CLEVER1 can also be used to return in the data in the requested order, again because COL3 is constant.

2. Using IDX_OPT_CLEVER1 is less efficient because more table rows have to be visited to apply the filter predicate.

The fact that the indexes can only be used efficiently under this special circumstance can be verified by changing the filter predicate so that COL3 can have more than a single value and therefore it's no longer possible to avoid an ORDER BY operation:

-- Change the filter predicate and force index
select
*
from (
select /*+ index(t_opt_clever idx_opt_clever2) */
*
from
t_opt_clever
where
col3 in ('FIRST_BUCKET', 'SECOND_BUCKET')
order by
col5, col1, col4, col2
)
where
rownum <= 100;
SQL> select * from table(dbms_xplan.display_cursor(null, null, '+COST ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID axr6u0yvdk50f, child number 0
-------------------------------------
select * from ( select /*+ index(t_opt_clever idx_opt_clever2) */ * from
t_opt_clever where col3 in ('FIRST_BUCKET', 'SECOND_BUCKET') order by col3, col5, col1,
col4, col2 ) where rownum <= 100

Plan hash value: 2229390605

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.02 | 1835 | | | |
| 2 | VIEW | | 1 | 666K| 703K (1)| 100 |00:00:00.02 | 1835 | | | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | 666K| 703K (1)| 100 |00:00:00.02 | 1835 | 20480 | 20480 |18432 (0)|
| 4 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 666K| 683K (1)| 833 |00:00:00.01 | 1835 | | | |
|* 5 | INDEX FULL SCAN | IDX_OPT_CLEVER2 | 1 | 666K| 16100 (1)| 833 |00:00:00.01 | 1002 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=100)
3 - filter(ROWNUM<=100)
5 - filter(("COL3"='FIRST_BUCKET' OR "COL3"='SECOND_BUCKET'))

25 rows selected.

Without the index hint the optimizer chooses a full table scan. Forcing e.g. the index IDX_OPT_CLEVER2 shows that indeed all rows had to be processed first and additionally a sort operation was necessary.

So it's interesting to note that the optimizer recognizes special cases where single value predicates allow an index usage that otherwise wouldn't be possible. This is a nice move, since it allows to perform above query in quite an efficient manner although the setup is suboptimal (e.g. a different index with COL3 as leading column or an appropriate IOT could be more suitable, depending on what else is done with the table). Under these (simulated) circumstances this optimization makes quite a difference compared to the otherwise only possible full table scan operation of a 30,000 blocks table.

By the way, above results could be reproduced on 10.2.0.4 and 11.1.0.7 Win32 using default system statistics and an 8KB LMT MSSM tablespace.

An interview with me

Iggy Fernandez posted an interview with me (published in NoCOUG journal) on his blog.
If you are interested in a little bit of my history and some more general (not-so-technical) thoughts of mine, check it out here:
http://iggyfernandez.wordpress.com/2009/08/08/great-expectations-an-inte...