redo

Redo OP Codes:

This posting was prompted by a tweet from Kamil Stawiarski in response to a question about how he’d discovered the meaning of Redo Op Codes 5.1 and 11.6 – and credited me and Julian Dyke with “the hardest part”.

Over the years I’ve accumulated (from Julian Dyke, or odd MoS notes, etc.) and let dribble out the occasional interpretation of a few op codes – typically in response to a question on the OTN database forum or the Oracle-L listserver, and sometimes as a throwaway comment in a blog post, but I’ve never published the full set of codes that I’ve acquired (or guessed) to date.

Transactions and SCNs

It’s general knowledge that the Oracle database is ACID compliant, and that SCNs or ‘system change numbers’ are at the heart of this mechanism. This blogpost dives into the details of how the Oracle engine uses these numbers.

Oracle database version 12.1.0.2.161018
Operating system version: OL 7.2, kernel: 4.1.12-61.1.14.el7uek.x86_64 (UEK4)

Redo generation
Whenever DML is executed, redo is generated in the form of ‘change vectors’. These change vectors are copied into the redo buffer as part of the transaction, during the transaction. The function that performs this action is called ‘kcrfw_copy_cv()’. This can be derived by watching the foreground process perform memory copy into the memory area of the redo buffer.

In order to do this, you first need to find the memory area of the redo buffer. This can be done by executing ‘oradebug setmypid’ and ‘oradebug ipc’ as sysdba, and examine the resulting trace file:

Truncate

The old question about truncate and undo (“does a truncate generate undo or not”) appeared on the OTN database forum over the week-end, and then devolved into “what really happens on a truncate”, and then carried on.

The quick answer to the traditional question is essentially this: the actual truncate activity typically generates very little undo (and redo) compared to a full delete of all the data because all it does is tidy up any space management blocks and update the data dictionary; the undo and redo generated is only about the metadata, not about the data itself.

Flashback logging

When database flashback first appeared many years ago I commented (somewhere, but don’t ask me where) that it seemed like a very nice idea for full-scale test databases if you wanted to test the impact of changes to batch code, but I couldn’t really see it being a good idea for live production systems because of the overheads.

Just in case

For those who don’t read Oracle-l and haven’t found Nikolay Savvinov’s blog, here’s a little note pulling together a recent question on Oracle-L and a relevant (and probably unexpected) observation from the blog. The question (paraphrased) was:

The developers/data modelers are creating all the tables with varchar2(4000) as standard by default “Just in case we need it”. What do you think of this idea?

SSD

There’s never enough time to read everything that’s worth reading, so even though Guy Harrison’s blog is one of the ones worth reading I find that it’s often months since I last read it. Visiting it late last night, I found an interesting batch of articles spread over the last year about the performance of SSD – the conclusions may not be what you expect, but make sure you read all the articles or you might end up with a completely misleading impression:

Private Redo

Following a question on the Oracle Core Addenda pages, here’s a little script to tell you about the sizes of the public and private redo threads currently active in the instance. It’s a minor variation of a script I published in Appendix D (Dumping and Debugging), page 237 to show the addresses of current activity in the various log buffers:

Redo

In the comments to a recent blog posting about log file syncs, Tony Hasler has produced a stunning observation about Oracle and ACID, in particular the D (durability) bit of transactions.

The observation can best be described with a demonstration (which I have run on versions from 8.1 to 11.2) involving three sessions, one of which has to be connected with sysdba privileges.

Session 1

create table t1(n1 number);
insert into t1 values(1);
commit;

session 2 (sysdba)

oradebug setorapid NNN  -- where NNN is v$process.pid for lgwr
oradebug suspend  -- suspend lgwr processing

session 1

update t1 set n1 = 2;
commit;    -- the session hangs, waiting for log file sync

session 3

select n1 from t1;

Is session 3 going to report the value 1 or 2 ?

Since lgwr has not been able to write the redo generated by session 1 you might think that the answer would 1; but the session has already done all the work normally associated with a commit, (despite any indications you might see in the manual that the data is not committed until it is written) and the actual answer to my question is 2.

Now go back to session 2 and issue:

shutdown abort
startup

When the database starts up, is a query against t1 going to return 1 or 2 ?

It ought to return 2, because after instance recovery you ought to be able to repeat the query of session 3 and get the same answer. But you will see 1 – lgwr didn’t write the transaction’s commit record before the crash, so even though session 3 saw the result of the transaction, the transaction wasn’t recoverable. (In passing, session 3 could have been an incoming query from a remote database.)

When I first saw this behaviour my first thought was that you would have to be very unlucky to see a real problem because the window of opportunity for an error was very small, but after making a few comments about the behaviour at the NoCOUG conference earlier on today I found myself at lunchtime sitting next to someone who described a problem that they had had a few years earlier when their archive destination had become full and a complex piece of synchronisation between two databases had failed in a strange way.

If lgwr has to stop for any reason, that doesn’t stop sessions running on for a little while until the log buffer is full – and if those sessions commit (each one will hang after it commits, of course) you could end up with the results of a number of committed, but unrecoverable, transactions being captured by a remote database and then you’re in trouble if the local database crashes – especially if the local programs decided to re-run the programs to generate the missing data … and that’s the problem my lunchtime companion may have been seeing. (I think I may soon be doing a few experiments to see if I can reproduce the problem – one of the other ideas I’ve assumed to be true is that you can’t get more data into the log buffer if the log writer has initiated a log file switch that has not yet completed, maybe that’s wrong as well.)

Footnote: See comment 2- the problem is arguably one of isolation rather than durability (though I have finally come down on the side of the problem being durability).

Update (Sunday 10:30 am – GMT/UTC + 1)

There have been some interesting comments and suggestions in response to this posting, and I think there is room for some elucidation about how things work and speculation about how they could be changed.

When you modify data blocks you generate redo change vectors (for the blocks you want to modify, and for undo blocks) and these redo change vectors are combined to form a redo change record. Ignoring a small complexity introduced in 10g, the changes are applied to the blocks in memory as your session copies a redo record into the log buffer; the order (ignoring 10g again) is: copy record into buffer, apply change to undo block, apply change to target block. So changes you make are immediately applied to the current blocks (again ignoring 10g).

When you commit a transaction you update the transaction table slot that identifies your transaction to show that the transaction is committed – this is a change to an undo segment header block so it generates a redo change vector. THe handling of this vector is completely standard – your session creates a redo record from it and puts the record into the log buffer – which means it is applied to the undo block at that moment, which means everyone can now see the effects of your transaction. AFTER the record is in the redo buffer your session will post lgwr to write the log buffer to the log file and goes into a “log file sync” wait. But this means (a) your transaction is visible to everyone else before it is recoverable (durable) and (b) you’re the only session that thinks the transaction hasn’t yet “really committed”.

Ideally your session needs to do something that puts the “commit record” into log buffer without letting anyone else see the undo segment header change, and only lets them see the change after your session has been posted by lgwr (and even that leaves a window for inconsistent recovery if the instance crashes after the write has happened but before your session has been posted – see my reply to Martin Berger). A change of this type, though, would probably require a massive rewrite of the logging mechanism because it would make the commit record handling different from every other record. There’s also the side effect this might have on the “group commit” mechanism and consequent scalability.

Prompted by an observation by Sokrates about “other” sessions waiting on log file sync in 11.2, and comments from Mark Farnham about a “write guarantee”, I’ve been trying to think through the consequences of a possible strategy for doing a “two-phase” commit with some “delayed logging” on the undo segment headers. The steps are as follows:

  • Session commits and updates the undo segment header with ‘commit waiting on lgwr’ status rather than ‘committed’, and posts lgwr.
  • Any session looking at a block affected by that transaction now has to deal with indeterminate data – so waits on the transaction slot (with enq: TX in mode 4) rather than assuming a rollback or commit. This session, however, could be allowed to check where the log writer had got to and recognise that the relevant write had been completed (and this is analogous to existing code that runs when a session times out on a log file sync). Note that the underlying strategy here is one that is already in place to cater for distributed transactions caught between the PREPARE and COMMIT phases. The waiting session could also post pmon if it thought that the driving session had failed and was unable to receive the post back from lgwr (and this is analogous to the current behaviour when a process is trying to get a latch that seems to have been held too long).
  • When the original session is posted by lgwr to continue it could modify the transaction table slot without generating redo – or possibly generate it but not issue a log file sync for it – thus allowing the second session to release its TX (mode 4) and continue. (This would require some special handling during recovery for transactions that had been written to lgwr but not confirmed back to their sessions.)
  • There is a consistency problem, though, because commits are sequenced by SCN, and at present lgwr simply posts all sessions that it can see should be cleared by its most recent write – they need not resume in exactly the right order, which means transactions could go from “waiting on lgwr” to “committed” in the wrong order. This means we need another two-step approach: lgwr to write process ids to a linked list in commit (i.e. log buffer) order, with a new background process created to walk the list and post each waiter in turn, waiting (or watching) for it to acknowledge before going on to the next entry in the list.

Feel free to poke holes.

Audit Ouch!

A few days ago I was rehearsing a presentation about how to investigate how Oracle works, and came across something surprising. Here’s a simple bit of code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum				id,
	lpad(rownum,10,'0')		small_vc,
	case mod(rownum,1000)
		when 0 then 'Y' else 'N'
	end 				flag
from
	generator	v1,
	generator	v2
where	rownum <= 1000000
;

-- gather table stats here

execute snap_redo.start_snap
execute snap_my_stats.start_snap

update
	/*+ full(t1) */
	t1
set
	flag = 'N'
where
	flag = 'Y'
;

execute snap_my_stats.end_snap
execute snap_redo.end_snap

The two “snap” packages take a starting snapshot of v$mystat/v$statname and v$sysstat (the latter restricted to redo statistics only) and an ending snapshot, and report the change between snapshots. Here are the results for just the redo statistics from the update when I ran the code on 10.2.0.3:

Name                                                                     Value
----                                                                     -----
redo entries                                                             1,907
redo size                                                              439,260
redo ordering marks                                                         17
redo subscn max counts                                                      18

You may at this point be wondering why the first two figures are roughly twice the size they ought to be; you might, of course, be wondering what you’re supposed to be seeing here (in which case you need to see my presentation “The Beginners’ Guide to becoming an Expert.”

My code updates one row in every thousand and each row is in a separate block. That means I should see roughly one redo entry per row at roughly 250 bytes per redo entry. Since I’m using Oracle 10g (single instance) the total number of entries ought to be slightly less than 1,000 because the first 40 or 50 should be accumulated in the private redo and in-memory undo areas. So we have about 950 redo entries too many; where did they come from ?

The test takes about 60 seconds to run, so I repeated it taking snapshots of a couple more dynamic performance views – which didn’t actually help very much – and then ran it one more time after switching log files so that I could dump the log file and find the relevant redo entries easily after the update. I won’t bore you with an extract from the log file dump, I’ll just point out that it showed me two redo records per row – op code 11.4 followed by op code 11.5, which is: “lock row, update row” – when it should simply have been showing nothing but “update row” records.

It took me a little while to figure out what was going on, but the title of this piece gives it away. A few days earlier I had enabled audit (audit_trail = db_extended) on my laptop to demonstrate a point to a client – the redundant “lock row” records appeared as a side effect (which I hadn’t noticed at the client site because we were focussing on a completely different point – which I might write about next week).

Here’s the really important point – I had NOT enabled any audit options on this table, the only thing I had done was make it possible for Oracle to record audit records. The spare records were NOT redo records relating to the aud$ table; instead, the simple act of enabling audit introduced a completely spurious “lock row” record on every update I did.

So I changed audit_trail to ‘none’, bounced the database and repeated the test, and got back to the result that I had been expecting:

Name                                                                     Value
----                                                                     -----
redo entries                                                               961
redo size                                                              295,996
redo ordering marks                                                         13
redo subscn max counts                                                      14

As you can see, we’re back to 1,000 redo records (minus the odd 40 or 50) at about 250 bytes per entry (allowing for the size of the first “private” redo record that combined lots of changes).

Note: Having observed this problem on 10.2.0.3 with audit_trail = db_extended I repeated the tests with audit_trail=db, and then tested a couple of older and newer versions of Oracle. The phenomenon appeared only on 10.2.0.3, and it didn’t matter whether I chose db or db_extended.

Footnote: the day after I made this discovery I gave what must have been my worst ever presentation: not only had I lost three hours sleep working out the details of this problem, I’d made a couple of “quick changes” to some of my demo scripts to do some extra checks and I had failed to clean up properly afterwards so I kept getting the “wrong” results and confusing myself in my live demonstrations.

Redo

A couple of days ago I published a link to some comments I had made on OTN about differences in redo generation between 10g and earlier versions of Oracle. This raised a few questions that suggested a follow-up (or perhaps “prequel”) note might be a little helpful. So I’ve created a simple SQL script to help demonstrate the differences and show how some of the pieces hang together. The whole posting is a little lengthy, but a large fraction of it is just the continuous output from a couple of trace files from which I’ll be highlighting just a few lines.

drop table t1;

create table t1 (
	n1	number,
	n2	number,
	n3	number,
	n4	number
);

create index i1 on t1(n1);
create index i2 on t1(n2);

insert into t1 values (1,1,1,1);
commit;

alter system switch logfile;

insert into t1 values (2,2,2,2);
insert into t1 values (3,3,3,3);

commit;

execute dump_log

All I’ve done is create a simple table with two indexes. I insert a row and commit because the first insert of a row into a table results in all sorts of funny bits of recursive activity (mostly related space management) and I want to get that out of the way so that it doesn’t obscure the more typical activity. Then I “switch logfile”, do a little work, and use a little procedure I’ve written to dump the current log file. The log file switch is just a convenient way to keep the dump small – but there are strategies for being selective about the parts of the redo log file you dump.

The work I’ve done is just two inserts into the table – this means I expect to find something like the following bits of activity listed in the redo log file:

    start transaction
    modify block in table
    modify block in index 1
    modify block in index 2
    modify block in table
    modify block in index 1
    modify block in index 2
    end transaction

What I’m going to show you next is the redo log file dump from Oracle 9.2.0.8 and then the corresponding redo log file dump from 10.2.0.3.

In the 9.2.0.8 dump we’ll see the “traditional” sequence of redo generation, which follows very closely to the steps listed above and also shows that Oracle pairs a “redo change vector” for a table or index block with the matching “redo change vector” for an undo record that describes how to reverse the table (or index) change – an undo record, of course, is just a piece of information that gets stored in an undo block. In general, each pair of redo change vectors will be gathered into a single “redo record”.

In the 10.2.0.3 dump we’ll see that the same redo change vectors still exist but they’ve all been lumped into a single redo record, and their ordering in that record is completely different from the 9.2.0.8 ordering, demonstrating (or at least supporting) the point that the “private redo/in-memory undo” mechanism uses two separate buffers, one for redo change vectors related to table/index blocks (the private redo bit) and the other for redo change vectors related to undo records (the in-memory undo bit).

The key expressions to watch out for in the dump are: “REDO RECORD” and “CHANGE #” (I’ve inserted a few blank lines  into the trace file to give a little white space between records and vectors):

Redo log file dump from 9i

REDO RECORD - Thread:1 RBA: 0x000360.00000002.0010 LEN: 0x0170 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:23 AFN:2 DBA:0x00800039 SCN:0x0000.03ec99bf SEQ:  1 OP:5.2
ktudh redo: slt: 0x0016 sqn: 0x00001a69 flg: 0x0012 siz: 80 fbi: 0
            uba: 0x00800040.075d.26    pxid:  0x0000.000.00000000

CHANGE #2 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99be SEQ:  1 OP:5.1
ktudb redo: siz: 80 spc: 2708 flg: 0x0012 seq: 0x075d rec: 0x26
            xid:  0x0004.016.00001a69
ktubl redo: slt: 22 rci: 0 opc: 11.1 objn: 45648 objd: 45648 tsn: 12
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x00800040.075d.25
prev ctl max cmt scn:  0x0000.03ebf084  prev tx cmt scn:  0x0000.03ebf106
KDO undo record:
KTB Redo
op: 0x03  ver: 0x01
op: Z
KDO Op code: DRP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1)

CHANGE #3 TYP:2 CLS: 1 AFN:11 DBA:0x02c0000a SCN:0x0000.03ec99f2 SEQ:  1 OP:11.2
KTB Redo
op: 0x01  ver: 0x01
op: F  xid:  0x0004.016.00001a69    uba: 0x00800040.075d.26
KDO Op code: IRP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) size/delt: 15
fb: --H-FL-- lb: 0x2  cc: 4
null: ----
col  0: [ 2]  c1 03
col  1: [ 2]  c1 03
col  2: [ 2]  c1 03
col  3: [ 2]  c1 03

CHANGE #4 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:  0 OP:5.20
session number   = 9
serial  number   = 6
transaction name =

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000002.0180 LEN: 0x0108 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  1 OP:5.1
ktudb redo: siz: 96 spc: 2626 flg: 0x0022 seq: 0x075d rec: 0x27
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 38 opc: 10.22 objn: 45649 objd: 45649 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x000a.012.00002552 uba: 0x0080009c.09cb.0c
                      flg: C---    lkc:  0     scn: 0x0000.03ec99f2
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=96880916 indexid=0x2c00409 block=0x02c0040a
purge leaf row
key: (10):  02 c1 03 06 02 c0 00 0a 00 01

CHANGE #2 TYP:2 CLS: 1 AFN:11 DBA:0x02c0040a SCN:0x0000.03ec99f2 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0004.016.00001a69    uba: 0x00800040.075d.27
Block cleanout record, scn:  0x0000.03ec99f5 ver: 0x01 opt: 0x02, entries follow...
  itli: 2  flg: 2  scn: 0x0000.03ec99f2
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 14
insert key: (10):  02 c1 03 06 02 c0 00 0a 00 01

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000003.0098 LEN: 0x0108 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  2 OP:5.1
ktudb redo: siz: 96 spc: 2528 flg: 0x0022 seq: 0x075d rec: 0x28
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 39 opc: 10.22 objn: 45650 objd: 45650 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x000a.012.00002552 uba: 0x0080009c.09cb.0d
                      flg: C---    lkc:  0     scn: 0x0000.03ec99f2
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=96880916 indexid=0x2c00489 block=0x02c0048a
purge leaf row
key: (10):  02 c1 03 06 02 c0 00 0a 00 01

CHANGE #2 TYP:2 CLS: 1 AFN:11 DBA:0x02c0048a SCN:0x0000.03ec99f2 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0004.016.00001a69    uba: 0x00800040.075d.28
Block cleanout record, scn:  0x0000.03ec99f5 ver: 0x01 opt: 0x02, entries follow...
  itli: 2  flg: 2  scn: 0x0000.03ec99f2
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 14
insert key: (10):  02 c1 03 06 02 c0 00 0a 00 01

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000003.01a0 LEN: 0x00f4 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  3 OP:5.1
ktudb redo: siz: 64 spc: 2430 flg: 0x0022 seq: 0x075d rec: 0x29
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 40 opc: 11.1 objn: 45648 objd: 45648 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
KDO undo record:
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.26
KDO Op code: DRP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 2(0x2)

CHANGE #2 TYP:0 CLS: 1 AFN:11 DBA:0x02c0000a SCN:0x0000.03ec99f5 SEQ:  1 OP:11.2
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.29
KDO Op code: IRP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 2(0x2) size/delt: 15
fb: --H-FL-- lb: 0x2  cc: 4
null: ----
col  0: [ 2]  c1 04
col  1: [ 2]  c1 04
col  2: [ 2]  c1 04
col  3: [ 2]  c1 04

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000004.00a4 LEN: 0x00c8 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  4 OP:5.1
ktudb redo: siz: 80 spc: 2364 flg: 0x0022 seq: 0x075d rec: 0x2a
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 41 opc: 10.22 objn: 45649 objd: 45649 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.27
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=96880916 indexid=0x2c00409 block=0x02c0040a
purge leaf row
key: (10):  02 c1 04 06 02 c0 00 0a 00 02

CHANGE #2 TYP:0 CLS: 1 AFN:11 DBA:0x02c0040a SCN:0x0000.03ec99f5 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.2a
REDO: SINGLE / -- / --
itl: 2, sno: 2, row size 14
insert key: (10):  02 c1 04 06 02 c0 00 0a 00 02

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000004.016c LEN: 0x00c8 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  5 OP:5.1
ktudb redo: siz: 80 spc: 2282 flg: 0x0022 seq: 0x075d rec: 0x2b
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 42 opc: 10.22 objn: 45650 objd: 45650 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.28
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=96880916 indexid=0x2c00489 block=0x02c0048a
purge leaf row
key: (10):  02 c1 04 06 02 c0 00 0a 00 02

CHANGE #2 TYP:0 CLS: 1 AFN:11 DBA:0x02c0048a SCN:0x0000.03ec99f5 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.2b
REDO: SINGLE / -- / --
itl: 2, sno: 2, row size 14
insert key: (10):  02 c1 04 06 02 c0 00 0a 00 02

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000005.0044 LEN: 0x0070 VLD: 0x01
SCN: 0x0000.03ec99f6 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:23 AFN:2 DBA:0x00800039 SCN:0x0000.03ec99f5 SEQ:  1 OP:5.4
ktucm redo: slt: 0x0016 sqn: 0x00001a69 srt: 0 sta: 9 flg: 0x12
ktucf redo: uba: 0x00800040.075d.2b ext: 0 spc: 2200 fbi: 0

The dump from 9.2.0.8 consists of seven redo records – and most of those records consist of two redo change vectors. We’ll start by looking at the fourth redo record (starting at line 112) which is a typical “old-style” redo record. It consists of two redo change vectors.

In the chosen record, change #1 (line 115) describes how to change an undo block at address (DBA) 0×00800040 which is operating at sequence number (seq: ) 0x075d. The change is to insert a new undo record, with record number (rec: ) 0×29. If you look at lines 126 – 129 you see that this undo record is an instruction to Delete a Row Piece (DRP) from data block (bdba) 0x02c0000a; the row piece is identified as the pointed to by slot 2 of table 0 in the block’s row directory.

Change # 2 (line 131) describes the change needed to create that row piece (note the IRP – Insert Row Piece at line 135). Again we see references to the block, table number and slot number.

The next two redo records are very similar: they describe how to insert and delete a key entry for each of the two indexes that have to be maintained as the row is inserted into the table – again the redo change vector for the “undo” appears before the redo change vector for the “apply”.

The first three redo records form a similar group – the first one is about inserting/removing a table row, the second and third are about the insert/remove pairs for the two indexes – but the first record of the three is more complex, consisting of four redo change vectors rather than just two.

The first redo record happens to be the first record of the transaction – so it contains (for 9i and below) two extra redo change vectors. The standard pair for the table row appear as change #2 and change #3, but in change #1 we see the redo change vector describing a change to the undo segment header block (line 5: ktudh) as we acquire a “transaction table slot” (see my glossary, and “cleanout”) specifically slot 0×0016, which we stamp with sequence number 0x00001a69. The final change vector in this record (change #4 at line 41) is marked as a “Media Recovery Marker” – it’s actually a type of audit label – and the example here is the short form; there is a much longer version used for the first transaction of a session’s lifetime, which is also used after each log file switch that takes place in a session’s lifetime (and in 8i there was a bug that meant the full vector appeared for every transaction start – with could result in a nasty volume of excess redo).

The last redo record is the so-called “commit record”. It consists of just one change vector (line 209). This change vector describes the update to the undo segment header as we commit (ktucm) the transaction; the status (sta: ) of transaction table slot 0×0016 is set to 9 to show that the slot is now free, and the “free pool” list (ktucf) is updated to show that the last undo block we were using has some space available in it that could be used by the next transaction that acquires a slot in this undo segment.

So that’s the redo generated by 9i. The basic pattern is that a typical redo record consists of a pair of redo change vectors, the first describing an undo record and the second describing how to make the required data change.

An important scalability detail (or threat, rather) that goes with this implementation is that the session has to allocate memory in the redo log buffer and copy each redo record in turn so, for our table with two indexes, every row we insert requires us to acquire the redo allocation latch and redo copy latch three times.

Now we move on to see what shows up in the log file when we run our code on 10.2.0.3

Redo log file dump from 10g

REDO RECORD - Thread:1 RBA: 0x00034b.00000002.0010 LEN: 0x0638 VLD: 0x0d
SCN: 0x0000.00d60a85 SUBSCN:  1 12/30/2010 14:24:01

CHANGE #1 TYP:2 CLS: 1 AFN:6 DBA:0x01804d8a OBJ:70577 SCN:0x0000.00d60a81 SEQ:  5 OP:11.2
KTB Redo
op: 0x01  ver: 0x01
op: F  xid:  0x0008.025.00003632    uba: 0x008031d3.07f1.27
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01804d8a  hdba: 0x01804d89
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) size/delt: 15
fb: --H-FL-- lb: 0x2  cc: 4
null: ----
col  0: [ 2]  c1 03
col  1: [ 2]  c1 03
col  2: [ 2]  c1 03
col  3: [ 2]  c1 03

CHANGE #2 TYP:0 CLS:31 AFN:2 DBA:0x00800079 OBJ:4294967295 SCN:0x0000.00d60a5f SEQ:  1 OP:5.2
ktudh redo: slt: 0x0025 sqn: 0x00003632 flg: 0x0012 siz: 104 fbi: 0
            uba: 0x008031d3.07f1.27    pxid:  0x0000.000.00000000

CHANGE #3 TYP:2 CLS: 1 AFN:6 DBA:0x01804e0a OBJ:70578 SCN:0x0000.00d60a81 SEQ:  2 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0008.025.00003632    uba: 0x008031d3.07f1.28
Block cleanout record, scn:  0x0000.00d60a84 ver: 0x01 opt: 0x02, entries follow...
  itli: 2  flg: 2  scn: 0x0000.00d60a81
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 14
insert key: (10):  02 c1 03 06 01 80 4d 8a 00 01

CHANGE #4 TYP:2 CLS: 1 AFN:6 DBA:0x01804e8a OBJ:70579 SCN:0x0000.00d60a81 SEQ:  2 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0008.025.00003632    uba: 0x008031d3.07f1.29
Block cleanout record, scn:  0x0000.00d60a84 ver: 0x01 opt: 0x02, entries follow...
  itli: 2  flg: 2  scn: 0x0000.00d60a81
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 14
insert key: (10):  02 c1 03 06 01 80 4d 8a 00 01

CHANGE #5 TYP:0 CLS: 1 AFN:6 DBA:0x01804d8a OBJ:70577 SCN:0x0000.00d60a85 SEQ:  1 OP:11.2
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.2a
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01804d8a  hdba: 0x01804d89
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 2(0x2) size/delt: 15
fb: --H-FL-- lb: 0x2  cc: 4
null: ----
col  0: [ 2]  c1 04
col  1: [ 2]  c1 04
col  2: [ 2]  c1 04
col  3: [ 2]  c1 04

CHANGE #6 TYP:0 CLS: 1 AFN:6 DBA:0x01804e0a OBJ:70578 SCN:0x0000.00d60a85 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.2b
REDO: SINGLE / -- / --
itl: 2, sno: 2, row size 14
insert key: (10):  02 c1 04 06 01 80 4d 8a 00 02

CHANGE #7 TYP:0 CLS: 1 AFN:6 DBA:0x01804e8a OBJ:70579 SCN:0x0000.00d60a85 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.2c
REDO: SINGLE / -- / --
itl: 2, sno: 2, row size 14
insert key: (10):  02 c1 04 06 01 80 4d 8a 00 02

CHANGE #8 TYP:0 CLS:31 AFN:2 DBA:0x00800079 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  1 OP:5.4
ktucm redo: slt: 0x0025 sqn: 0x00003632 srt: 0 sta: 9 flg: 0x2
ktucf redo: uba: 0x008031d3.07f1.2c ext: 38 spc: 1718 fbi: 0

CHANGE #9 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a5d SEQ:  1 OP:5.1
ktudb redo: siz: 104 spc: 2250 flg: 0x0012 seq: 0x07f1 rec: 0x27
            xid:  0x0008.025.00003632
ktubl redo: slt: 37 rci: 0 opc: 11.1 objn: 70577 objd: 70577 tsn: 5
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x008031d3.07f1.26
prev ctl max cmt scn:  0x0000.00d5fa91  prev tx cmt scn:  0x0000.00d5fb64
txn start scn:  0x0000.00d60a81  logon user: 52  prev brb: 8401354  prev bcl: 0 KDO undo record:
KTB Redo
op: 0x03  ver: 0x01
op: Z
KDO Op code: DRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01804d8a  hdba: 0x01804d89
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1)

CHANGE #10 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  1 OP:5.1
ktudb redo: siz: 96 spc: 2144 flg: 0x0022 seq: 0x07f1 rec: 0x28
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 39 opc: 10.22 objn: 70578 objd: 70578 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0005.01e.000035e3 uba: 0x008034a5.083a.1d
                      flg: C---    lkc:  0     scn: 0x0000.00d60a81
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=0 indexid=0x1804e09 block=0x01804e0a
(kdxlpu): purge leaf row
key: (10):  02 c1 03 06 01 80 4d 8a 00 01

CHANGE #11 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  2 OP:5.1
ktudb redo: siz: 96 spc: 2046 flg: 0x0022 seq: 0x07f1 rec: 0x29
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 40 opc: 10.22 objn: 70579 objd: 70579 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0005.01e.000035e3 uba: 0x008034a5.083a.1e
                      flg: C---    lkc:  0     scn: 0x0000.00d60a81
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=0 indexid=0x1804e89 block=0x01804e8a
(kdxlpu): purge leaf row
key: (10):  02 c1 03 06 01 80 4d 8a 00 01

CHANGE #12 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  3 OP:5.1
ktudb redo: siz: 64 spc: 1948 flg: 0x0022 seq: 0x07f1 rec: 0x2a
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 41 opc: 11.1 objn: 70577 objd: 70577 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
KDO undo record:
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.27
KDO Op code: DRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01804d8a  hdba: 0x01804d89
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 2(0x2)

CHANGE #13 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  4 OP:5.1
ktudb redo: siz: 80 spc: 1882 flg: 0x0022 seq: 0x07f1 rec: 0x2b
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 42 opc: 10.22 objn: 70578 objd: 70578 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.28
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=0 indexid=0x1804e09 block=0x01804e0a
(kdxlpu): purge leaf row
key: (10):  02 c1 04 06 01 80 4d 8a 00 02

CHANGE #14 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  5 OP:5.1
ktudb redo: siz: 80 spc: 1800 flg: 0x0022 seq: 0x07f1 rec: 0x2c
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 43 opc: 10.22 objn: 70579 objd: 70579 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.29
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=0 indexid=0x1804e89 block=0x01804e8a
(kdxlpu): purge leaf row
key: (10):  02 c1 04 06 01 80 4d 8a 00 02

The entire transaction is wrapped up into a single redo record with fourteen redo change vectors. (The 9i dump showed a total of fifteen redo change vectors – we’ve lost the “audit” vector in our 10g setup.) In order, the vectors describe how to…

    insert first row into table
    acquire transaction table slot (start transaction – slightly odd order)
    insert first row into first index
    insert first row into second index
    insert second row into table
    insert second row into first index
    insert second row into second index
    release transaction table slot (commit transaction)
    create undo record showing how to remove first row from table
    create undo record showing how to remove first row from first index
    create undo record showing how to remove first row from second index
    create undo record showing how to remove second row from table
    create unto record showing how to remove second row from first index
    create undo record showing how to remove second row from second index

It is reasonable to infer from this that Oracle is generating a list of “forward” changes (i.e. the private redo) in one buffer and a list of “reverse” changes (i.e. the in-memory undo) in another buffer – and simply concatenates the two lists on the commit.

On the plus side, if we’ve only got one redo record we only have to allocate and copy into the (public) redo buffer once – which reduces contention on a single important hot spot in the Oracle database; on the other hand we have to keep a number of data blocks pinned in memory until the commit, and we have to modify all those data blocks at the same time as we copy the two private buffers into the (public) redo log buffer, and this requirement introduces a different concurrency threat.

Because of this alternative threat there is a fairly tight limit on how much change a single transaction is allowed to make using this new mechanism . For 32-bit Oracle the buffers are about 64Kb each, and for 64-bit Oracle they are about 128KB – and if a session fills either of the two buffers it has acquired then Oracle flushes the contents to the public log buffer and the session then carries on using the public log buffer in the “traditional” way (while still keeping “ownership” of the pair of private buffers that it had allocated.)

A final thought – if you issue a rollback instead of a commit, the “commit record” (change #8 in the above) disappears from the redo record, and the redo stream gets a further six redo records showing how each of the previous undo records have been applied at the user’s request; these records are followed by one last redo record which describes the release of the transaction table slot i.e. a “commit record”.