Search

Top 60 Oracle Blogs

Recent comments

March 2011

oracle.com inaccessible

Well not really, but it is for me at the moment. A while back I could not log onto the oracle.com website with my personal account details. I kept getting invalid/username password. I clicked the link to reset my password and got a new password. This also did not work, so I tried my work [...]

Princep’s Fury…

Princep’s Fury is the fifth book in the Codex Alera series by Jim Butcher. The book starts a few months on from where the Captain’s Fury ended.

This book seems a little calmer in comparison, mostly because I was still recovering from the onslaught of the previous book I guess. Each book in the six part series only tells a fraction of the whole story, but the previous four books were written in such a way that they also felt reasonably self contained. Princep’s Fury in comparison feels like the author was planning for the last book more than concentrating on this one. That sounds kinda damning, but even though it drifts a little at times, it was still cool.

Cheers

Tim…




Princep’s Fury…

Princep’s Fury is the fifth book in the Codex Alera series by Jim Butcher. The book starts a few months on from where the Captain’s Fury ended.

This book seems a little calmer in comparison, mostly because I was still recovering from the onslaught of the previous book I guess. Each book in the six part series only tells a fraction of the whole story, but the previous four books were written in such a way that they also felt reasonably self contained. Princep’s Fury in comparison feels like the author was planning for the last book more than concentrating on this one. That sounds kinda damning, but even though it drifts a little at times, it was still cool.

Cheers

Tim…




Nested Loops Join – the Smaller Table is the Driving Table, the Larger Table is the Driving Table

March 21, 2011 I occasionally see discussions about Oracle Database behavior that make me wonder… is it true, can I generate a test case that validates the statement, and just as important, can I generate a test case that refutes the statement.  An interesting question was posted the the OTN forums regarding apparently conflicting advice [...]

Will Oracle Exadata Database Machine Eventually Support Offload Processing for Everything?

In my recent post entitled Exadata Database Machine: The Data Sheets Are Inaccurate! Part – I, I drew attention to the fact that there is increasing Exadata-related blog content produced by folks that know what they are talking about. I think that is a good thing since it would be a disaster if I were the only one providing Exadata-related blog content.

The other day I saw Tanel Poder blogging about objects that are suitable targets for Smart Scan. Tanel has added bitmap indexes to his list. Allow me to quickly interject that the list of what can and cannot be scanned with Smart Scan is not proprietary information. There are DBA views in every running Oracle Database 11g Release 2 instance that can be queried to obtain this information.  Tanel’s blog entry is no taboo.

So, while Tanel is correct, I think it is also good to simply point out that the seven core Exadata fundamentals do in fact cover this topic. I’ll quote the relevant fundamentals:

Full Scan or Index Fast Full Scan.

  • The required access method chosen by the query optimizer in order to trigger a Smart Scan.

Direct Path Reads.

  • Required buffering model for a Smart Scan. The flow of data from a Smart Scan cannot be buffered in the SGA buffer pool. Direct path reads can be performed for both serial and parallel queries. Direct path reads are buffered in process PGA (heap).

So, another way Tanel could have gone about it would have been to ask, rhetorically, why wouldn’t Exadata perform a Smart Scan on a bitmap index if the plan chooses access method full? The answer would be simple—no reason. It is an index after all and can be scanned with fast full scan.  So why am I blogging about this?

Can I Add Index Organized Tables To That List?
In a recent email exchange, Tanel asked me why Smart Scan cannot attack an index organized table (IOT). Before I go into the outcome of that email exchange I’d like to revert to a fundamental aspect of Exadata that eludes a lot of folks. It’s about the manner in which data is stored in the Exadata Storage Servers and how that relates to offload processing such as Smart Scan.

Data stored in cells is striped by Automatic Storage Management (ASM) across the cells with course-grain striping (granularity established by the ASM allocation unit size). With Exadata, the allocation unit size by default—and best-practice—is 4MB. Therefore, tables and indexes are scattered in 4MB chunks across all the cells’ disks.

Smart Scan performs multiple, asynchronous 1MB reads for allocation units (thus four 1MB asynchronous reads for adjacent 1MB storage regions). As the I/O operations complete, Smart Scan performs predicate operations (filtration) upon each storage region (1MB). If the data contained in a 1MB region references another portion of the database (as is the case with a traditional chained row for instance), Smart Scan cannot completely process that storage region. The blocks that reference indirect data are sent to the database grid in standard block form (the same form as when reading an ASM disk on conventional storage). The database server then chases the indirection because only it has the code to map the block-level indirection to an ASM AU in some cell, somewhere. Cells cannot ask other cells for data because cells don’t know anything about each other.

Thus far, in this blog post, I’ve taken the recurring question of whether Smart Scan works on a certain type of object (in this case IOT) and broadened the discussion to focus on a fundamental aspect of Exadata. So what does this broadened scope have to do with Smart Scan on IOT? Well, when I read that email from Tanel I used logic based on the fundamentals and shot off an answer. Before that hasty reply to Tanel I recalled IOT has the concept of an overflow tablespace. The concept of overflow tablespace—in my mind—has “indirection” written all over it. Later I became more curious about IOT so I scanned through the code (server side) and couldn’t find any hard barriers against Smart Scan on IOT. I was stumped (trust me that aspect of the code is not all that straightforward) so I asked the developers that own that specific part of the server. I found out my logic was faulty. I was wrong. It turns out that Smart Scan for IOT is simply not implemented. I’m not insinuating that means “implemented yet” either. That isn’t the point of this blog entry. Neither is admitting I was wrong in my original answer to Tanel.

Does The List Of Smart Scan-Compatible Objects Keep Growing And Growing?
Neither confessing how I shot off a wrong answer to Tanel, nor specifics about IOT Smart Scan support are the central points of this blog entry, just what is my agenda?  Primarily, I wanted to remind folks about the fundamental aspect of Exadata regarding indirection and Smart Scan (e.g., chained row, etc) and secondarily, I wanted to point out that the list of objects fit for Smart Scan is limited for reasons other than feasibility. Time to market is important. I know that. If an object like IOT is not commonly used in the data warehousing use-case it is unnecessary work to implement support for Smart Scan. But therein lies the third hidden agenda item for this post which is to question our continual pondering over the list of objects that support Smart Scan.

Offload processing is a good thing. I wonder, is the goal to offload more and more?  Some is good, certainly more must be better in a scale-out solution. Could offload support grow to the point where Exadata nears a state of “total offload processing?”  Would that be a bad thing? Well,  “total offload processing” is, in fact, impossible since cells do not contain discrete segments of data but instead the scattering of data I wrote about above.  However, more  can be offloaded. The question is just how far does that go and what does it mean in architectural terms? Humor me for another moment in this “total offload processing” train of thought.

If, over time, “everything”—or even nearly “everything”—is offloaded to the Exadata Storage Servers there may be two problems. First, offloading more and more to the cells means the query-processing responsibility in the database grid is systematically reduced. What does that do to the architecture? Second, if the goal is to pursue offloading more and more, the eventual outcome gets dangerously close to “total offload processing.” But, is that really dangerous?

So let me ask: In this hypothetical state of “total offload processing” to Exadata Storage Servers (that do not share data by the way), isn’t the result a shared-nothing MPP?  Some time back I asked myself that very question and the answer I came up with put in motion a series of events leading to a significant change in my professional career. I’ll blog about that as soon as I can.

Filed under: oracle

Upgrade Whoa

(The title’s a pun, by the way – an English form of humour that is not considered good unless it’s really bad.)

Very few people try to email me or call me with private problems – which is the way it should be, and I am grateful to my audience for realizing that this blog isn’t trying to compete with AskTom – but I do get the occasional communication and sometimes it’s an interesting oddity that’s worth a little time.

Today’s blog item is one such oddity – it was a surprise, it looked like a nasty change in behaviour, and it came complete with a description of environment, and a neatly formatted, complete, demonstration. For a discussion of the problem in Spanish you can visit the blog of John Ospino Rivas, who sent me the original email and has written his own blog post on the problem.

We start with a simple table, and then query it with a ‘select for update from two different sessions:


drop table tab1 purge;

create table tab1(
	id	number,
	info	varchar2(10),
	constraint tab1_pk primary key (id)
		using index (create index idx_tab1_pk on tab1(id))
);

insert into tab1 values(1,'a');
insert into tab1 values(2,'a');
insert into tab1 values(3,'a');
commit;

execute dbms_stats.gather_table_stats(user,'tab1',cascade=>true)

column id new_value m_id

set autotrace on explain

select  id
from    tab1
where   id = (
            select  min(id)
            from    tab1
        )
for update
;

set autotrace off

prompt	=============================================================
prompt  Now repeat the query in another session and watch it lock
prompt	And use a third session to check v$lock
prompt  Then delete here, commit and see what the second session does
prompt	=============================================================

accept X prompt 'Press return to delete and commit'

set verify on
delete from tab1 where id = &m_id;
commit;

The fact that the primary key index is created as a non-unique index isn’t a factor that affects this demonstration.

Given the query and the data in the table, you won’t be surprised by the result of the query from the first session (for convenience I’ve captured the selected value using the ‘column new_value’ option). Here’s the result of the query and its execution plan:


        ID
----------
         1

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |     3 |     1   (0)| 00:00:01 |
|   1 |  FOR UPDATE                  |             |       |       |            |          |
|*  2 |   INDEX RANGE SCAN           | IDX_TAB1_PK |     1 |     3 |     0   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE            |             |     1 |     3 |            |          |
|   4 |     INDEX FULL SCAN (MIN/MAX)| IDX_TAB1_PK |     3 |     9 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"= (SELECT MIN("ID") FROM "TAB1" "TAB1"))

At this point the program issues instructions to repeat the query from a second session, then waits for you to press Return. When you run the same query from another session it’s going to see the data in read-consistent mode and try to select and lock the row where ID = 1, so the second session is going to hang waiting for the first session to commit or rollback.

Here’s the key question: what’s the second session going to return when you allow the first session to continue, delete the row it has selected, and commit ? Here’s the answer if you’re running 10.2.0.3 or 11.1.0.6 (which is what I happen to have easily available):

SQL> select  id
  2  from    tab1
  3  where   id = (
  4              select  min(id)
  5              from    tab1
  6          )
  7  for update
  8  ;

        ID
----------
         2

1 row selected.

Now, this seems perfectly reasonable to me – especially since I’ve read Tom Kyte’s notes on “write consistency” and seen the “rollback and restart” mechanism that kicks in when updates have to deal with data that’s changed since the start of the update. Session 2 had a (select for) update, and when it finally got to a point where it could lock the data it found that the read-consistent version of the data didn’t match the current version of the data so it restarted the statement at a new SCN. At the new SCN the current highest value was 2.

Now here’s what happened when I ran the test under 11.2.0.2:

SQL> select  id
  2  from    tab1
  3  where   id = (
  4              select  min(id)
  5              from    tab1
  6          )
  7  for update
  8  ;

no rows selected

The upgrade produces a different answer !

At first sight (or guess) it looks as if the query has run in two parts – the first part producing the min(id) of 1 using a read-consistent query block, with the second part then using the resulting “known value” to execute the outer select (shades of “precompute_subquery”) and restarting only the second part when it discovers that the row it has been waiting for has gone away.

It doesn’t really matter whether you think the old behaviour or the new behaviour is correct – the problem is that the behaviour has changed in a way that could silently produce unexpected results. Be careful if any of your code uses select for update with subqueries.

As a defensive measure you might want to change the code to use the serializable isolation level – that way the upgraded code will crash with Oracle error ORA-08177 instead of silently giving different answers:

SQL> alter session set isolation_level = serializable;

Session altered.

SQL> get afiedt.buf
  1  select  /*+ gather_plan_statistics */
  2          id
  3  from    tab1
  4  where   id = (
  5              select  min(id)
  6              from    tab1
  7          )
  8* for update
  9  /
from    tab1
        *
ERROR at line 3:
ORA-08177: can't serialize access for this transaction

It might be a way of avoiding this specific problem, of course, but it’s not a frequently used feature (the first pages of hits on Google are mostly about SQL Server) so who knows what other anomalies this change in isolation level might introduce.

PX COORDINATOR FORCED SERIAL operation

This is just a short heads-up for those that come across an execution plan showing the PX COORDINATOR FORCED SERIAL operation. I don't have official confirmation but according to my tests a plan with such an operation effectively means: Cost for parallel execution but execute serially (You might remember that I've recently mentioned in another post that there is the possibility to have a plan executed in parallel but costed serially, weird isn't it). Why such an operation exists is not clear to me - obviously it would make much more sense to cost straight away for serial execution in such a case. Probably there is a good reason, otherwise such an operation didn't exist but I think at least the costing is questionable in current versions.

LOBREAD SQL Trace entry in Oracle 11.2 (and tracing OPI calls with event 10051)

A few days ago I looked into a SQL Tracefile of some LOB access code and saw a LOBREAD entry there. This is a really welcome improvement (or should I say, bugfix of a lacking feature) for understanding resource consumption by LOB access OPI calls. Check the bottom of the output below:
*** 2011-03-17 14:34:37.242 WAIT #47112801352808: nam='SQL*Net message from client' ela= 189021 driver id=1413697536 #bytes=1 p3=0 obj#=99584 tim=1300390477242725 WAIT #0: nam='gc cr multi block request' ela= 309 file#=10 block#=20447903 class#=1 obj#=99585 tim=1300390477243368 WAIT #0: nam='cell multiblock physical read' ela= 283 cellhash#=379339958 diskhash#=787888372 bytes=32768 obj#=99585 tim=1300390477243790 WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390477243865 [.

LOBREAD SQL Trace entry in Oracle 11.2 (and tracing OPI calls with event 10051)

A few days ago I looked into a SQL Tracefile of some LOB access code and saw a LOBREAD entry there. This is a really welcome improvement (or should I say, bugfix of a lacking feature) for understanding resource consumption by LOB access OPI calls. Check the bottom of the output below:

*** 2011-03-17 14:34:37.242
WAIT #47112801352808: nam='SQL*Net message from client' ela= 189021 driver id=1413697536 #bytes=1 p3=0 obj#=99584 tim=1300390477242725
WAIT #0: nam='gc cr multi block request' ela= 309 file#=10 block#=20447903 class#=1 obj#=99585 tim=1300390477243368
WAIT #0: nam='cell multiblock physical read' ela= 283 cellhash#=379339958 diskhash#=787888372 bytes=32768 obj#=99585 tim=1300390477243790
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390477243865
[...snipped...]
WAIT #0: nam='SQL*Net more data to client' ela= 2 driver id=1413697536 #bytes=2048 p3=0 obj#=99585 tim=1300390477244205
WAIT #0: nam='SQL*Net more data to client' ela= 4 driver id=1413697536 #bytes=2048 p3=0 obj#=99585 tim=1300390477244221
WAIT #0: nam='gc cr multi block request' ela= 232 file#=10 block#=20447911 class#=1 obj#=99585 tim=1300390477244560
WAIT #0: nam='cell multiblock physical read' ela= 882 cellhash#=379339958 diskhash#=787888372 bytes=32768 obj#=99585 tim=1300390477245579
WAIT #0: nam='SQL*Net more data to client' ela= 16 driver id=1413697536 #bytes=2020 p3=0 obj#=99585 tim=1300390477245685
WAIT #0: nam='SQL*Net more data to client' ela= 6 driver id=1413697536 #bytes=2048 p3=0 obj#=99585 tim=1300390477245706
WAIT #0: nam='SQL*Net more data to client' ela= 5 driver id=1413697536 #bytes=1792 p3=0 obj#=99585 tim=1300390477245720
#ff0000;">LOBREAD: c=1000,e=2915,p=8,cr=5,cu=0,tim=1300390477245735

In past versions of Oracle the CPU (c=) usage figures and other stats like number of physical/logical reads of the LOB chunk read OPI call were just lost – they were never reported in the tracefile. In past only the most common OPI calls, like PARSE, EXEC, BIND, FETCH (and recently CLOSE cursor) were instrumented with SQL Tracing. But since 11.2(.0.2?) the LOBREAD’s are printed out too. This is good, as it reduces the amount of guesswork needed to figure out what are those WAITs for cursor #0 – which is really a pseudocursor.

Why cursor#0? It’s because normally, with PARSE/EXEC/BIND/FETCH, you always had to specify a cursor slot number you operated on (if you fetch from cursor #5, it means that Oracle process went to slot #5 in the open cursor array in your session’s UGA and followed the pointers to shared cursor’s executable parts in library cache from there). But LOB interface works differently – if you select a LOB column using your query (cursor), then all your application gets is a LOB LOCATOR (sort of a pointer with LOB item ID and consistent read/version SCN). Then it’s your application which must issue another OPI call (LOBREAD) to read the chunks of that LOB out from the database. And the LOB locator is independent from any cursors, it doesn’t follow the same cursor API as regular SQL statements (as it requires way different functionality compared to a regular select or update statement).

So, whenever a wait happened in your session due to an access using a LOB locator, then there’s no specific cursor responsible for it (as far as Oracle sees internally) and that’s why a fake, pseudocursor #0 is used.

Note that on versions earlier than 11.2(.0.2?) when the LOBREAD wasn’t printed out to trace – you can use OPI call tracing (OPI stands for Oracle Program Interface and is the server-side counterpart to OCI API in the client side) using event 10051. First enable SQL Trace and then the event 10051 (or the other way around if you like):

SQL> @oerr 10051

ORA-10051: trace OPI calls

SQL> alter session set events '10051 trace name context forever, level 1';

Session altered.

Now run some LOB access code and check the tracefile:

*** 2011-03-17 14:37:07.178
WAIT #47112806168696: nam='SQL*Net message from client' ela= 6491763 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627178602
OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
CLOSE #47112806168696:c=0,e=45,dep=0,type=1,tim=1300390627178731
OPI CALL: type=94 argc=28 cursor=  0 name=V8 Bundled Exec
=====================
PARSING IN CURSOR #47112802701552 len=19 dep=0 uid=93 oct=3 lid=93 tim=1300390627179807 hv=1918872834 ad='271cc1480' sqlid='3wg0udjt5zb82'
select * from t_lob
END OF STMT
PARSE #47112802701552:c=1000,e=1027,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3547887701,tim=1300390627179805
EXEC #47112802701552:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3547887701,tim=1300390627179884
WAIT #47112802701552: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627179939
WAIT #47112802701552: nam='SQL*Net message from client' ela= 238812 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627418785
OPI CALL: type= 5 argc= 2 cursor= 26 name=FETCH
WAIT #47112802701552: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627418945
FETCH #47112802701552:c=0,e=93,p=0,cr=5,cu=0,mis=0,r=1,dep=0,og=1,plh=3547887701,tim=1300390627418963
WAIT #47112802701552: nam='SQL*Net message from client' ela= 257633 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627676629
#ff0000;">OPI CALL: type=96 argc=21 cursor=  0 name=#ff0000;">LOB/FILE operations
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627676788
[...snip...]
WAIT #0: nam='SQL*Net more data to client' ela= 2 driver id=1413697536 #bytes=1792 p3=0 obj#=99585 tim=1300390627677054
LOBREAD: c=0,e=321,p=0,cr=5,cu=0,tim=1300390627677064

Check the bold and especially the red string above.  Tracing OPI calls gives you some extra details of what kind of tasks are executed in the session. The “LOB/FILE operations” call indicates that whatever lines come after it (unlike SQL trace call lines where all the activity happens before a call line is printed (with some exceptions of course)) are done for this OPI call (until a next OPI call is printed out). OPI call tracing should work even on ancient database versions…

By the way, if you are wondering, what’s the cursor number 47112801352808 in the “WAIT #47112801352808″ above? Shouldn’t the cursor numbers be small numbers?

Well, in 11.2.0.2 this was also changed. Before that, the X in CURSOR #X (and PARSE #X, BIND #X, EXEC #X, FETCH #X) represented the slot number in your open cursor array (controlled by open_cursors) in your session’s UGA. Now, the tracefile dumps out the actual address of that cursor. 47112801352808 in HEX is 2AD94DC9FC68 and it happens to reside in the UGA of my session.

Naturally I asked Cary Millsap about whether he had spotted this LOBREAD already and yes, Cary’s way ahead of me – he said that Method-R’s mrskew tool v2.0, which will be out soon, will support it too.

It’s hard to not end up talking about Cary’s work when talking about performance profiling and especially Oracle SQL trace, so here are a few very useful bits which you should know about:

If you want to understand the SQL trace & profiling stuff more, then the absolute must document is Cary’s paper on the subject – Mastering Performance with Extended SQL Trace:

Also, if you like to optimize your work like me (in other words: you’re proactively lazy ;-) and you want to avoid some boring “where-the-heck-is-this-tracefile-now” and “scp-copy-it-over-to-my-pc-for-analysis” work then check out Cary’s MrTrace plugin (costs ~50 bucks and has a 30-day trial) for SQL Developer. I’ve ended up using it myself regularly although I still tend to avoid GUIs:

Share

LOBREAD SQL Trace entry in Oracle 11.2 (and tracing OPI calls with event 10051)

A few days ago I looked into a SQL Tracefile of some LOB access code and saw a LOBREAD entry there. This is a really welcome improvement (or should I say, bugfix of a lacking feature) for understanding resource consumption by LOB access OPI calls. Check the bottom of the output below:
*** 2011-03-17 14:34:37.242 WAIT #47112801352808: nam='SQL*Net message from client' ela= 189021 driver id=1413697536 #bytes=1 p3=0 obj#=99584 tim=1300390477242725 WAIT #0: nam='gc cr multi block request' ela= 309 file#=10 block#=20447903 class#=1 obj#=99585 tim=1300390477243368 WAIT #0: nam='cell multiblock physical read' ela= 283 cellhash#=379339958 diskhash#=787888372 bytes=32768 obj#=99585 tim=1300390477243790 WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390477243865 [.