Search

OakieTags

Who's online

There are currently 0 users and 38 guests online.

Recent comments

Affiliations

Troubleshooting

oradebug peek

An odd little discovery today. The peek command of oradebug is fairly well documented on the internet, and I’ve used it from time to time to find the contents of memory. Earlier on today, though, I was running 11.1.0.6 and it didn’t do what I was expecting. I was trying to dump a log buffer block from memory so after logging on as sys and finding the location for the block I duly typed:


oradebug setmypid
peek 0x09090000 512

The response I got looked, as expected, like this:


[9090000, 9090200) = 00002201 0000E3D7 00000974 E1808044 00340035 00350035 00340035
0034002F 0022002F 0309002C 050C0AC2 010860C4 02C1023E 056F7807 08290714 ...

As you can see, the dump range is given as 512 bytes (the variation in bracketing for the "[9090000,9090200)" means include the low address but exclude the high address) but the dump itself is only 15 words. My memory told me that this is what Oracle does on screen and that all I had to do was read the trace file for the full 512 bytes. Unfortunately my memory was wrong - the trace file was limited to the same 15 words.

It wasn't until I added an extra number to the command that I got the full 512 bytes I wanted in the trace file:

=== On screen ===
SQL> oradebug peek 0x09090000 512 1
[9090000, 9090200) = 00002201 0000E3D7 00000974 E1808044 00340035 00350035 00340035
0034002F 0022002F 0309002C 050C0AC2 010860C4 02C1023E 056F7807 08290714 ...

=== from the trace file ===
Processing Oradebug command 'peek 0x09090000 512 1'
[9090000, 9090200) = 00002201 0000E3D7 00000974 E1808044 00340035 00350035 ...
Dump of memory from 0x9090018 to 0x9090200
9090010                   00340035 0034002F          [5.4./.4.]
9090020 0022002F 0309002C 050C0AC2 010860C4  [/.".,........`..]
9090030 02C1023E 056F7807 08290714 056F7807  [>....xo...)..xo.]
9090040 082A0714 023DC203 0303C102 091215C2  [..*...=.........]
     ...
90901E0 03083207 02083DC2 C20303C1 C2031215  [.2...=..........]
90901F0 00012D09 00000009 00000000 0016010B  [.-..............]

*** 2011-05-20 16:35:11.156
Oradebug command 'peek 0x09090000 512 1' console output:
[9090000, 9090200) = 00002201 0000E3D7 00000974 E1808044 00340035 00350035 00340035 0034002F 0022002F 0309002C 050C0AC2 010860C4 02C1023E 056F7807 08290714 ...

Apart from the repeating the on-screen dump, the trace file now reports the full dump, with both a raw list and a character-based interpretation of each 16 bytes that starts, unfortunately, by repeating the first 24 (0×18) bytes in the same format as the onscreen format.

Maybe I’m wrong about how I used to peek data – maybe it’s just a variation in this version of Oracle.

ASSM bug reprise - part 2

Introduction

In the first part of this post I've explained some of the details and underlying reasons of bug 6918210. The most important part of the bug is that it can only be hit if many row migrations happen during a single transaction. However, having excessive row migrations is usually a sign of poor design, so this point probably can't be stressed enough:

If you don't have excessive row migrations the bug can not become significant

Of course, there might be cases where you think you actually have a sound design but due to lack of information about the internal workings it might not be obvious that excessive row migrations could be caused by certain activities.

NoCOUG Journal – Ask the Oracle ACEs – Why is my database slow?

Dave Abercrombie mailed me earlier that the May 2011 NoCOUG Journal is already out and can be viewed online at http://www.nocoug.org/Journal/NoCOUG_Journal_201105.pdf.

I met Dave at Hotsos 2011 when he attended my presentation.. apparently we have the same interest about Statistics, Linear Regression, AWR, AAS and he’s got awesome blogs about it at http://aberdave.blogspot.com. Also being the Journal Editor at Northern California Oracle Users’s Group, he invited me to write a short article about the question:

ASSM bug reprise - part 1

This was meant to be published shortly after my latest quiz night post as an explanatory follow up, but unfortunately I only managed to complete this note by now.

There is a more or less famous bug in ASSM (see bug 6918210 in MOS as well as Greg Rahn's and Jonathan Lewis' post) in versions below 11.2 that so far has been classified as only showing up in case of a combination of larger block sizes (greater the current default of 8K) and excessive row migrations. With such a combination it was reproducible that an UPDATE of the same data pattern residing in an ASSM tablespace caused significantly more work than doing the same in a MSSM tablespace, because apparently ASSM had problems finding suitable blocks to store the migrated rows.

Overflow

Here’s a little gem I hadn’t come across before (because I hadn’t read the upgrade manuals). Try running the following pl/sql block in 9i, and then 10g (or later):

declare
        v1      number(38);
begin
        v1 := 256*256*256*256;
        dbms_output.put_line(v1);
end;
/

In 9i the result is 4294967296; but for later versions the result is:


declare
*
ERROR at line 1:
ORA-01426: numeric overflow
ORA-06512: at line 4

It’s not a bug, it’s expected behaviour. The expression consists of integers only, so Oracle uses INTEGER arithmetic that limits the result to roughly 9 significant figures. If you want the block to work in newer versions of Oracle you have to add a decimal point to (at least) one of the operands to make Oracle use NUMBER arithmetic that takes it up to roughly 38 significant figures.

Consistent Gets

There’s an interesting question on the OTN database forum at present – why does an update of 300,000 rows take a billion buffer visits. (There are 25 indexes on the table – so you might point a finger at then initially, but only one of the indexes is going to be changed by the update so that should only account for around an extra 10 gets per row in a clean environment.)

The answer to the question hadn’t been reached by the time I wrote this note – and this note isn’t intended as a suggested cause of the issue, it’s just an example of the type of thing that could cause an apparent excess of buffer visits. Here’s a little bit of code I’ve just tested on 11.1.0.6 using an 8KB block size

drop sequence t1_seq;
create sequence t1_seq;

drop table t1;

create table t1 as
select
	rownum id, rpad('x',10) small_vc
from
	all_objects
where
	rownum <= 11
;

execute dbms_stats.gather_table_stats(user,'t1')

select * from t1;

pause

execute snap_my_stats.start_snap
update t1 set small_vc = upper('small_vc') where id = 11;
execute snap_my_stats.end_snap

(The calls to the “snap_my_stats” package simply record the current contents of v$mystat joined to v$statname before and after the update and print the changes.)

The code simply creates a sequence and a table with 11 rows and no indexes, then updates one specific row in the table. However, where the “pause” appears, I start up 10 separate sessions to do the following:

column seqval new_value m_seq

select t1_seq.nextval seqval from dual;

update t1 set small_vc = upper(small_vc) where id = &m_seq;

pause

exit

So when I hit return on the pause for the first session, there are 10 separate active transactions on the single block in my table, one for each row except row 11. (And now you know what the sequence was for.)

Here’s a subset of the statistics from v$mystat after my update statement – remember, all I’ve done is update one row in one block using a tablescan:


Name                                                                     Value
----                                                                     -----
session logical reads                                                       45
db block gets                                                                3
db block gets from cache                                                     3
consistent gets                                                             42
consistent gets from cache                                                  42
consistent gets from cache (fastpath)                                        3
consistent gets - examination                                               39
db block changes                                                             7
consistent changes                                                          13
calls to kcmgrs                                                             26
calls to kcmgas                                                              1
calls to get snapshot scn: kcmgss                                            7
redo entries                                                                 3
redo size                                                                  764
undo change vector size                                                    236
data blocks consistent reads - undo records applied                         13
active txn count during cleanout                                            20
table scan blocks gotten                                                     1

Note the last statistics – just one block accessed by tablescan – compared to the session logical reads at 45 buffer visits.
That 45 buffer visits comes from 3 current (db) block gets and 42 consistent gets.
Of the 42 consistent gets 39 are examinations, which – in the absence of indexes and hash clusters are visits to undo blocks
The 39 undo visits are to find 13 undo records to apply, and 26 visits (to undo segment headers) to find 13 transaction SCNs.

What you’re seeing is one session doing (relatively speaking) a lot of work to hide the effects of other sessions which have not yet committed their transactions. (This was only a quick test, so I haven’t examined why the larger figures appear in multiples of 13 rather than multiples of 10 – the number of other transactions – and since this note is just trying to demonstrate a concept I won’t be looking into it any further.)

If you have a number of “non-interfering” transactions – i.e. transactions that don’t actually lock each other out – on a single table then you could find that they spend more time hiding each other’s work than they do doing their own work.

Footnote:

The numbers change significantly if I commit the 10 transactions (but wait until they’ve all executed, so they are all active at the same time) before I do the update to the 11th row.

The numbers changed even more surprisingly when I forgot to collect stats on the table in my initial example of the test.

Cursor_sharing

For those looking to the next upgrade – here’s an early warning from Oracle:

ANNOUNCEMENT: Deprecating the cursor_sharing = ‘SIMILAR’ setting (Doc ID 1169017.1)

“We recommend that customers discontinue setting cursor_sharing = SIMILAR due to the many problematic situations customers have experienced using it. The ability to set this will be removed in version 12 of the Oracle Database (the settings of EXACT and FORCE will remain available). Instead, we recommend the use of Adaptive Cursor Sharing in 11g.”

The rest of the note contains some interesting information about the behaviour and side effects of this option – which may also help you debug some library cache issues if you’re currently running with this value set in 11g.

Evidence

Here’s a nice example on the OTN database forum of Dom Brooks looking at the evidence.

  • The query is slow – what does the trace say.
  • There’s “row source execution” line that says we get 71,288 rows before doing a hash un ique drops it to 3,429 rows.
  • There’s a statement (upper case, bind variables as :Bn) in the trace file that has been executed 71,288 times
  • A very large fraction of the trace file time is in the secondary statement
  • There’s a user-defined function call in the original select list, before a ‘select distinct’.

Conclusion: the code should probably do a “distinct” in an inline view before calling the function, reducing the number of calls to the function from 71,288 to 3,429.

Footnote: There may be other efficiency steps to consider – I’m always a little suspicious of a query that uses “distinct”: possibly it’s hiding an error in logic, possibly it should be rewritten with an existence subquery somewhere, but sometimes it really is the best strategy. There’s are some unusual statistics names coming from autotrace in the OP’s system – I wonder if he’s installed one of Tanel Poder’s special library hacks.

More CR

Following on from yesterday’s post on consistent reads, I thought I’d make the point that the way you work can make an enormous difference to the amount of work you do. Here’s a silly little demo (in 10.2.0.3):

drop table t1 purge;
create table t1 (id number, n1 number);
insert into t1 values (1,0);
insert into t1 values (2,0);
commit;

execute dbms_stats.gather_table_stats(user,'t1')
execute snap_my_stats.start_snap

begin
	for i in 1..1000 loop
		update t1 set n1 = i where id = 1;
	end loop;
end;
/

execute snap_my_stats.end_snap

set doc off
doc

Output - 10.2.0.3 (some rows deleted)
Name                                                                     Value
----                                                                     -----
opened cursors cumulative                                                   11
user calls                                                                   6
recursive calls                                                          1,068
recursive cpu usage                                                          7
session logical reads                                                    4,041
CPU used when call started                                                   7
CPU used by this session                                                     7
DB time                                                                      6
db block gets                                                            1,030
db block gets from cache                                                 1,030
consistent gets                                                          3,011
consistent gets from cache                                               3,011
consistent gets - examination                                                4
db block changes                                                         2,015
change write time                                                            4
free buffer requested                                                    1,014
switch current to new buffer                                             1,000
calls to kcmgas                                                          1,014
calls to get snapshot scn: kcmgss                                        3,009
redo entries                                                               960
redo size                                                              295,160
undo change vector size                                                111,584
no work - consistent read gets                                           1,004
table scans (short tables)                                               1,001
table scan rows gotten                                                   2,002
table scan blocks gotten                                                 1,001
buffer is pinned count                                                   1,000
execute count                                                            1,009

#

I’ve created two rows in a table, then updated one of them 1,000 times – using a table scan to do the update. I haven’t yet committed my transaction. At this point I’m going to use a second session to run the same update loop on the second row in the table:

begin
	for i in 1..1000 loop
		update t1 set n1 = i where id = 2;
	end loop;
end;
/

Name                                                                     Value
----                                                                     -----
opened cursors cumulative                                                    8
user calls                                                                   6
recursive calls                                                          1,009
recursive cpu usage                                                        170
session logical reads                                                  965,999
CPU used when call started                                                 172
CPU used by this session                                                   172
DB time                                                                    173
db block gets                                                            1,030
db block gets from cache                                                 1,030
consistent gets                                                        964,969
consistent gets from cache                                             964,969
consistent gets - examination                                          961,965
db block changes                                                         3,016
consistent changes                                                   1,001,000
free buffer requested                                                    1,015
CR blocks created                                                        1,001
calls to kcmgas                                                          1,015
calls to get snapshot scn: kcmgss                                        3,008
redo entries                                                             1,936
redo size                                                              358,652
undo change vector size                                                111,608
data blocks consistent reads - undo records applied                  1,001,000
cleanouts and rollbacks - consistent read gets                           1,001
immediate (CR) block cleanout applications                               1,001
active txn count during cleanout                                         2,000
cleanout - number of ktugct calls                                        1,001
IMU CR rollbacks                                                        41,041
table scans (short tables)                                               1,001
table scan rows gotten                                                   2,002
table scan blocks gotten                                                 1,001
execute count                                                            1,006

Many of the statistics are (virtually) identical (e.g. “execute count”, “db block gets”, “free buffer requested”); some show an increase by 1,000 (often from zero) – largely because we have to worry 1,000 times about cleaning out the current block and creating a read-consistent version so that we can check if it can be updated.

But the most noticeable changes are in the “CPU time” and “consistent gets” because of the 1,000 times we have to apply 1,000 undo records to the block as we create the read-consistent version of the block. The CPU time has gone from 7 (hundredths of a second) to 172 because of (roughly) 1,000,000 “consistent gets – examination”. As I mentioned yesterday, this matches closely to “data blocks consistent reads – undo records applied” so we know why they are happening. Watch out in your batch jobs – if you have a lot of concurrent update activity going on a significant fraction of the workload may be the constant re-creation of CR clones.

However, there is another interesting detail to watch out for – what happens if I change the update execution path from a tablescan to an indexed access path:

create table t1 (id number, n1 number);
insert into t1 values (1,0);
insert into t1 values (2,0);
commit;

execute dbms_stats.gather_table_stats(user,'t1')
create index t1_i1 on t1(id);                            --  Make indexed access path available.

Then with an index hint in my update code, I get the following effects (having done the same update loop on row 1 in the first session, of course):


begin
	for i in 1..1000 loop
		update /*+ index(t1) */ t1 set n1 = i where id = 2;     -- indexed access path hinted
	end loop;
end;
/

Name                                                                     Value
----                                                                     -----
opened cursors cumulative                                                    7
user calls                                                                   6
recursive calls                                                          1,006
recursive cpu usage                                                         11
session logical reads                                                    2,036
CPU used when call started                                                  11
CPU used by this session                                                    11
DB time                                                                     11
db block gets                                                            1,030
db block gets from cache                                                 1,030
consistent gets                                                          1,006
consistent gets from cache                                               1,006
consistent gets - examination                                                6
db block changes                                                         2,015
free buffer requested                                                       14
shared hash latch upgrades - no wait                                     1,000
calls to kcmgas                                                             14
calls to get snapshot scn: kcmgss                                        1,004
redo entries                                                               960
redo size                                                              295,144
undo change vector size                                                111,608
index crx upgrade (positioned)                                           1,000
index scans kdiixs1                                                      1,000
execute count                                                            1,005

The difference is astonishing – where did all the ‘create CR copy’ activity go ?

I’ve pointed out before now that choosing a different execution plan for an update can have a big impact on performance – this is just another example demonstrating the point.

Consistent Reads

Here’s a quick demo to make a point about consistent reads (prompted by a question on the Oracle-L mailing list):

SQL> drop table t1;

Table dropped.

SQL> create table t1 (n1 number);

Table created.

SQL> insert into t1  values(0);

1 row created.

SQL> begin
  2  for i in 1..1000 loop
  3  update t1 set n1 = i;
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

Note that I haven’t issued a commit in this session, and all I’ve got is a single row in the table (and because it’s my usual demo setup of locally managed tablespaces with uniform extents of 1MB using freelist management I know that that one row is in the first available block of the table).

How much work is a second session going to do to scan that table ?

SQL> alter system flush buffer_cache;
SQL> execute snap_my_stats.start_snap
SQL> select * from t1;
SQL> set serveroutput on size 1000000 format wrapped
SQL> execute snap_my_stats.end_snap
---------------------------------
Session stats - 18-Apr 11:33:01
Interval:-  2 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
session logical reads                                                      967
consistent gets                                                            967
consistent gets from cache                                                 967
consistent gets - examination                                              964
consistent changes                                                       1,001
CR blocks created                                                            1
data blocks consistent reads - undo records applied                      1,001
IMU CR rollbacks                                                            41

The snap_my_stats package is similar in concept to Tom Kyte’s “runstats” or Tanel Poder’s “snapper” program to capture changes in values in the dynamic performance views over short time periods. In this case I’ve deleted all but a few of the larger changes, and a couple of small changes.

The figure that stands out (probably) is the “session logical reads” – we’ve done 967 logical I/Os to scan a tables of just one block. The reason for this is that we’ve created a read-consistent copy of that one block (“CR blocks created” = 1), and it has taken a lot of work to create that copy. We’ve had to apply 1,001 undo records (“data blocks consistent reads – undo records applied” = 1001).

Most of those undo records come from individual accesses (which are of the cheaper “consistent gets – examination” type that only need a single get on the “cache buffers chains” latch) to undo blocks, following the “UBA (undo block address)” pointer in the relevant ITL entry of the table block, but since this is a 10g database the last few undo records come out of the “In-memory Undo” of the other session. Basically the cloning operation is something like this:

  1. Get current block
  2. Notice uncommitted transaction
  3. Clone current block – then ignore current block
  4. Get Undo block indicated by uncommitted ITL and apply undo change vector
  5. Repeat step four – 1,000 times
  6. Block is now clear of all uncommitted transactions
  7. There are no committed transactions with a commit SCN later than the start of query
  8. Display contents of block

It is an interesting point that as the first session created undo records it would pin and fill undo blocks – so would only do a few current gets (one for each block) on the undo blocks it was using. As another process reverses out the changes in a CR clone it has to get and release each undo block every time it wants a single undo record … applying undo records introduces far more latch and buffer activity that the original generation of the undo.

Footnote

It’s worth knowing that there are three statistics relating to applying undo records:

transaction tables consistent reads - undo records applied        Estimating "old" commit SCNs during delayed block cleanout
data blocks consistent reads - undo records applied               Creating CR clones
rollback changes - undo records applied                           The result of a real "rollback;"

See this posting on comp.databases.oracle.server for a rough description of transaction table consistent reads; and this elderly posting highlighting a benefit of knowing about rollback changes.

Footnote 2

The second step in the list of actions is: “Notice uncommitted transaction”. It’s probably worth pointing out that another part of the ITL entry holds the transaction id (“xid”) which implicitly identifies the undo segment and transaction table slot in that segment that has been used to hold  the transaction state. The current contents of that slot allow Oracle to determine whether or not (and when, if necessary) the transaction was committed.