Search

OakieTags

Who's online

There are currently 0 users and 28 guests online.

Recent comments

Affiliations

August 2011

Trouble-shooting

How do you trouble-shoot a problem ? It’s not an easy question to answer when posed in this generic fashion; but perhaps it’s possible to help people trouble-shoot by doing some examples in front of them. (This is why I’ve got so many statspack/AWR examples – just reading a collection of different problems helps you to get into the right mental habit.)

So here’s a problem someone sent me yesterday. Since it only took a few seconds to read, and included a complete build for a test case, with results, and since it clearly displayed an Oracle bug, I took a look at it. (I’ve trimmed the test a little bit, there were a few more queries leading up to the error):


create table person (id number(2), name varchar2(10)) ;

insert into person values (1, 'Alpha') ;
insert into person values (2, 'Bravo') ;
insert into person values (3, 'Charlie') ;
insert into person values (4, 'Charles') ;
insert into person values (5, 'Delta') ;

create or replace view vtest as
select id, 'C' as letter from person where name like 'C%' ;

select p.id, p.name, v.id, v.letter
from person p
left join vtest v on v.id = p.id
order by p.id ;

The problem was that 10.2.0.4 and 11.2.0.2 gave different results – and the 11.2.0.2 result was clearly wrong. So the question was: “is there something broken with outer joins on views, or possibly ANSI outer joins?” (The ansswer to the last question is always “probably” as far as I’m concerned, but I wouldn’t turn that into a “yes” without checking first.) Here are the two results:

10.2.0.4:
========
        ID NAME               ID L
---------- ---------- ---------- -
         1 Alpha
         2 Bravo
         3 Charlie             3 C
         4 Charles             4 C
         5 Delta

11.2.0.2
========
        ID NAME               ID L
---------- ---------- ---------- -
         1 Alpha                 C
         2 Bravo                 C
         3 Charlie             3 C
         4 Charles             4 C
         5 Delta                 C

Clearly the extra ‘C’s in the letter column are wrong.

So what to do next ? Knowing that Oracle transforms ANSI SQL before evaluating an execution plan I decided to run the 10053 trace. Sometimes you get lucky and see the “unparsed SQL” in this trace file, a representation (though not necessarily 100% exact) image of the statement for which Oracle will generate a plan. I was lucky, this was the unparsed SQL (cosmetically enhanced):


SELECT
	P.ID ID,
	P.NAME NAME,
	PERSON.ID ID,
	CASE  WHEN PERSON.ROWID IS NOT NULL THEN 'C' ELSE NULL END  LETTER
FROM
	TEST_USER.PERSON P,
	TEST_USER.PERSON PERSON
WHERE
	PERSON.ID  (+) = P.ID
AND	PERSON.NAME(+) LIKE 'C%'
ORDER BY
	P.ID
;

So I ran this query, and found that the same error appeared – so it wasn’t about ANSI or views. So possibly it’s something about the CASE statement and/or the ROWID in the CASE statement, which I tested by adding three extra columns to the query:

        person.name,
        person.rowid,
        CASE  WHEN PERSON.name IS NOT NULL THEN 'C' ELSE NULL END  LETTER

With these extra columns I got the following results from the query:

        ID NAME               ID NAME       ROWID              L L
---------- ---------- ---------- ---------- ------------------ - -
         1 Alpha                                               C
         2 Bravo                                               C
         3 Charlie             3 Charlie    AAAT7gAAEAAAAIjAAC C C
         4 Charles             4 Charles    AAAT7gAAEAAAAIjAAD C C
         5 Delta                                               C

So the CASE did the right thing with the person.name column, but the wrong thing with the person.rowid column.
Time to get onto MOS (Metalink).

I searched the bug database with the key words: case rowid null
This gave me 2,887 hits, so I added the expression (with the double quotes in place) “outer join”
This gave me 110 hits, so from the “product category” I pick “Oracle Database Products”
This gave me 80 hits, and the first one on the list was:

Bug 10269193: WRONG RESULTS WITH OUTER JOIN AND CASE EXPRESSION OPTIMIZATION CONTAINING ROWID

The text matched my problem, so job done – except it’s reported as not fixed until 12.1

This isn’t a nice bug, of course, because the particular problem can be generated automatically in the transformation of ANSI outer joins to Oracle outer joins, so you can’t just change the code.

In passing, it’s taken me 31 minutes to write this note – that’s 10 minutes longer than it took to pin down the bug, but I have to say I got lucky on two counts: first, that the “unparsed SQL” was available, second that my choice of key words for MOS got me to the bug so quickly (which is where I usually find I waste most time).

A Day of Real World Performance Raffle...

If you haven't heard - I'm participating in a series of seminars entitled "Real World Performance". It is a rather unique forum - with three simultaneous presenters and three screens. I find it more interesting than a seminar delivered by a single person - all day long. We haven't had anyone fall asleep yet - there is enough moving around and back and forth on the stage to keep you engaged all day long.

To get things jump started, I'm running a raffle of sorts. I'm going to give away the coolest t-shirt you'll ever own:


and a signed copy of

How Do You Contribute?

Do you know your organization’s business? Making yourself valuable to the organization is easier if you understand how you contribute to its success. Talk to your customers, read annual reports, get involved in understanding what your organization does for a living. If you work for a corporation how do you contribute to the bottom line? If you work for a government or non-profit organization how does what you do help it achieve its goals? Once you know what makes your organization succeed you can make sure you are a contributor and increase your value. Great IT groups have become part of the core business by learning what customers need to do then building the tools help the business run better. IT is rarely a revenue-producer but recently smart organizations are leveraging handheld technology to enable direct communication with customers and users. iPads, tablets, and smart phones can be leveraged to provide rich customer experiences.

The New Order Oracle Coding Challenge 2

August 2, 2011 (Back to the Previous Post in the Series) (Forward to the Next Post in the Series) In the previous article in this series we tried to develop different methods for identifying special numbers between 1 and 1,000,000, such that the number formed by reversing the order of the digits will evenly divide into the original number.  [...]

IOT part 3 – Significantly Reducing IO

<..IOT1 – the basics
<….IOT2 – Examples and proofs
IOT4 – Boosting Buffer Cache Efficiency..>
IOT5 – Primary Key issues….>

In the previous two posts I covered the basics of Index Organized Tables (IOTs) and then created some test tables to demonstrate the benefit of IOTs that is most often covered – reducing the IO needed to get a single record by one IO, say from 5 to 4. {Whether this is a buffer get from memory or a disc IO depends on if the block is cached, of course}.

In this post I am going to show how IOTs can far more significantly reduce the IO when several related rows are required.

Below is one of my test tables, the one that is a normal heap table and has a primary key, CHHE_PK on PARE_ID and CRE_DATE:

mdw11> desc child_heap
 Name                                      Null?    Type
 ----------------------------------------- -------- --------------
 PARE_ID                                   NOT NULL NUMBER(10)
 CRE_DATE                                  NOT NULL DATE
 VC_1                                      NOT NULL VARCHAR2(100)
 DATE_1                                             DATE
 NUM_1                                              NUMBER(2)
 NUM_2                                              NUMBER(2)

--
mdw11> select count(*),count(distinct(pare_id)) from child_heap

  COUNT(*) COUNT(DISTINCT(PARE_ID))
---------- ------------------------
   1000000                     9999

As you can see, the table has 1 million records and 9,999 values for PARE_ID, there are approx 100 records per parent. The data was created to match a common situation – that of a bit of data coming in for each parent every day. See post 2 for details.

The result of this is that the data for any given parent is scattered through the table. As the data comes in for a given day, the data for the first parent is added to the end of the table, followed by all the data for all the other parents who have data that day. The next day this is repeated, so the child records for a given parent are interspersed with the child records for many other parents.

The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_HEAP
where PARE_ID=12

Oracle quickly works down the index to the leaf block containing the first key that matches the range. This takes, in my example, 4 block reads. Oracle now works through the index entries and, via the rowid, identifies the exact block to visit in the table for each key. For each key it has to visit a new block – because the data is scattered through the table. This is what the clustering_factor in the index statistics is measuring, how often contiguous rows in the index are for the same block. In our case, almost never.
In my diagram I do not number those table reads but in my simplistic diagram it would be 10 further reads.
If Oracle reaches the end of the leaf block before it reaches the end of the range of key values, oracle follows the pointer in the leaf block (not shown) to the next leaf block (whcih is another block read) and continues working through the keys until the range scan is completed.

In my simplified diagram I only have 6 entries per leaf block. In reality, and in my example tables, this is more like a few hundred. 247 in the case of CHHE_PK.

Now let’s consider my Index Organized Table, CHILD_IOT. It has exactly the same columns as CHILD_HEAP and the data was created in the same way. However, because it is an IOT, as the data came in it was inserted into the primary key index and is thus in an ordered state.

The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_IOT
where PARE_ID=12

Oracle works down the index to the leaf block where the range scan begins and now simply works along the leaf blocks. There is no need to go and visit the table as there is no table.

In my IOT diagram the leaf entries are longer and there are fewer in each leaf block, ie 5. So my scan has to visit 3 leaf blocks rather than 2. In reality the difference is more pronounced, in my example table there are actually 56 rows per leaf block, compared to the 247 in the index on the heap table. As such, my scan on the IOT will cover more leaf blocks but this is insignificant compared to the reduction in block visits caused by not having to go hunt down records scattered over the table. Even in the unlikely event of my IOT being deeper by 1 level (an extra layer of branch blocks) due to the reduces entries per leaf block, I would still be winning for range scans.

That is all nice theory and pictures. As ever, we need to back this up with some real tests. Firstly, I am using SQL*Plus and I need to set my arraysize large enough so that I do not introduce extra consistent gets through selecting small sets of rows between client and server. You will need to do the same to see similar results to me.
{I keep meaning to do a dedicated post on arraysize but H.Tonguç YIlmaz has a nice post already on it.}

set arraysize 200
set autotrace on

Now I will select all the records for PARE_ID=10, including a column not in the Primary Key, so that the table needs to be visited. I did this twice to remove the parsing overhead:

select pare_id,cre_date,vc_1
from child_heap
where pare_id =10
order by cre_date

   PARE_ID CRE_DATE  VC_1
---------- --------- -----------------------------------------------------------------------
        10 17-APR-11 LDOBKMLCYCSQYBDFIUISJWQAHNYSQOSUQJKIGCSEJHDPOFFLHHXYSMDSQNUB
        10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOIC
        10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOICOSFTSYNO
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJKZNII
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
        10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEFQQEYM
        10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEF
...
        10 19-JUL-11 BNOYCIDTFJHPPOYPSVAVKJSYUNVPGPHLJXUOIKYKASKHYGZNVHVFFGPVAKN
        10 25-JUL-11 HDFGAQWTYZBSVYVXTFFRDIAKRYWFUPFCNDCETHUWHSQUITHHVUEJTJ

82 rows selected.

Execution Plan
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |   100 |  6900 |   103   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |   100 |  6900 |   103   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | CHHE_PK    |   100 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         86  consistent gets
          0  physical reads

82 rows collected and 86 consistent gets. That will be 4 consistent gets to process the index blocks and 82 for the table blocks.

Now let’s repeat that on the IOT:

select pare_id,cre_date,vc_1
from child_IOT
where pare_id =10
order by cre_date
mdw11> /
any key>

   PARE_ID CRE_DATE  VC_1
---------- --------- ------------------------------------------------------------
        10 17-APR-11 QJHQXTQAYEUICPNDQTYMMFZPWJSIDLBKOXYTHLEHKTVWUPKQMWUUX
        10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZYAS
        10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZY
        10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJE
        10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJ
...
        10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNK
        10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNKCN
        10 25-JUL-11 BCLLVPYMWAAQOVLILXARQZXEGAQAARPURIFKFKHROUSFORRYYXQZUAJHDBL

108 rows selected.

Execution Plan
----------------------------------------------------------------------------
| Id  | Operation        | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |         |   100 |  6900 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| CHIO_PK |   100 |  6900 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads

We actually gathered more data, 108 rows compared to 82, all for 6 consistent gets compared to 86 consistent gets. That is a reduction to less than 10% of the original effort.

Now for a more extreme test. I am going to select a single row summary of data for 10 parents, flushing the cache between each run to show the impact when you have to do real IO to support those consistent gets. This is on a fairly old {4 years} laptop with a rather tired hard disc

alter system flush buffer_cache

System altered.

Elapsed: 00:00:00.18

--
--

select count(*),sum (num_1)
from child_heap
where pare_id between 50 and 60

  COUNT(*) SUM(NUM_1)
---------- ----------
      1155      12031

Elapsed: 00:00:06.39

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |     7 |  1203   (0)| 00:00:18 |
|   1 |  SORT AGGREGATE              |            |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |  1200 |  8400 |  1203   (0)| 00:00:18 |
|*  3 |    INDEX RANGE SCAN          | CHHE_PK    |  1200 |       |     7   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1157  consistent gets
       1112  physical reads

--
--

alter system flush buffer_cache

System altered.

Elapsed: 00:00:00.18

--
--

select count(*),sum (num_1)
from child_iot
where pare_id between 50 and 60

  COUNT(*) SUM(NUM_1)
---------- ----------
      1111      11528

Elapsed: 00:00:00.29

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     7 |    24   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |     7 |            |          |
|*  2 |   INDEX RANGE SCAN| CHIO_PK |  1200 |  8400 |    24   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         25  consistent gets
         25  physical reads

The Heap took 6.39 seconds and required 1112 physical reads to support 1157 consistent gets.
The IOT took 0.29 seconds and required 25 physical reads to support 25 consistent gets.

I think we can all see that IOTs have the potential to greatly reduce physical and logical IO. Perhaps we should all be using IOTs more.

Final point. The Heap version took less physical reads than consistent gets as some blocks read into the block buffer cache held data required later in the query.

The impact of IOTs on the buffer cache will be the topic of my next post on IOTs. I think { hope:-) } that many of you will be very interested and impressed by what you could gain…

Captain America: The First Avenger…

I went to see Captain America: The First Avenger at the weekend. It’s kinda odd because there are lots of elements that make me think I should like this film, but I found it all a little dull. I actually found myself saying “Meh” as I stood up at the end of the film. That’s not a good sign.

I liked the intro to the story. All the stuff from the trailers. Skinny guy becomes superhero etc. I think that works really well and I was looking forward to him actually doing something useful. That’s where it all went a little flat really. Most of the stuff he did just didn’t seem very “superhero” to me. I know nothing about the Captain America comics or history, but he seems like a pretty lame superhero.

What didn’t help was the World War II setting. Being brought up in the UK means you are constantly bombarded films and references to WWII. Pretty much every day there is a WWII film on one of the channels. Not wanting to be disrespectful to the people that were involved at the time, but it has no relevance to my life. You might as well be talking about The Hundred Years’ War. In fact, I would go as far as to say that *most* war-related films don’t do anything for me. It’s just not a genre that really appeals. With that in mind, a lame superhero in a setting that turns me off was never going to hit home for me I guess.

A few bits of the plot seemed very “Indian Jones” to me. It felt a bit lazy in that regard. It will be interesting to see how Captain America stacks up in the 21st century as part of The Avengers film(s) when they are made. Maybe in a modern setting I will have a change of heart.

Cheers

Tim…




Linux takes a page from Solaris… pmap available on Linux.

Recently, there was a thread on an internal alias of old Sun guys.  The problem at hand was to track down a process that is consuming memory on Linux.  This is the type of problem that can be solved many ways (ps, top, etc…), but to my amazement someone mentioned that pmap could be used for Linux…. I guess I didn’t get the memo :)

About 6 months back I wrote a few entries that discussed Solaris tools for the Linux Guy in the posts:

HCC And Virtual Columns

This is just a short heads-up note to those dealing with HCC-enabled tables (so at present this applies only to Exadata customers).

As already outlined in a previous post about compression restrictions tables with HCC enabled do not support dropping columns - DROP COLUMN gets silently converted into SET UNUSED and DROP UNUSED COLUMNS throws an error to be unsupported.

I've recently come across an interesting variation of this restriction. Obviously Oracle treats virtual columns in this case the same: If you drop a virtual column of a HCC-enabled table it doesn't get dropped but is also silently turned into an unused column - which doesn't really make sense to me since dropping it doesn't require any physical modification to the underlying data structures.

Now you might wonder why this could be relevant? Well it can be important for several reasons: