rollback internals

While researching redo log internals for V00D00 we had to face the fact, that we know shit about real transactional behavior. When I say "real", I mean – under the hood.
Even with a very simple stuff like COMMIT and ROLLBACK we were constantly amazed by the internal mechanisms.

Today let’s take ROLLBACK under the investigation. According to documentation:

The ROLLBACK statement ends the current transaction and undoes any changes made during that transaction.

Cool. But what it means? First of all, you have to realize that all changes in redo logs are in a form of REDO RECORD which has its own address, known as RBA or RS_ID.

Sample RS_ID (RBA) looks like this: 0x00000a.00008c0f.006c

  • 00000a = 10 : this is sequence#
  • 00008c0f = 35855 : this is a block number in a redo log
  • 006c = 108 : this is offset in a redo block, where this redo record begins

Each redo record consists of one or more change vectors and each change vector has it’s own op code. Check out this article by Jonathan Lewis: https://jonathanlewis.wordpress.com/2017/07/25/redo-op-codes/

If you update EMPLOYEES table and change 2 rows, you will get usually one redo record with at least 4 change vectors – 2 for actual updates (so REDO) and 2 for reversing the operation (undo)

You can learn it from redolog or archivelog dumps:

Let’s modify a few rows:

SQL> update hr.employees
  2  set salary=salary*2
  3  where department_id=100;

6 rows updated.

SQL> commit;

Commit complete.

Now I’ll make a dump of an archivelog which contains those changes:

SQL> alter system switch logfile;

System altered.

SQL> get get_arch
  1  with v_arch as
  2  (
  3   select name, rank() over (order by sequence# desc) as rn
  4   from v$archived_log
  5  )
  6  select name
  7  from v_arch
  8* where rn=1
SQL> /

NAME
--------------------------------------------------------------------------------
+DATA/SKIPER/ARCHIVELOG/2017_09_20/thread_1_seq_13.303.955209887

SQL> alter system dump logfile '+DATA/SKIPER/ARCHIVELOG/2017_09_20/thread_1_seq_13.303.955209887';

System altered.

SQL> get tracefile
  1* select value from v$diag_info where name='Default Trace File'
SQL> /

VALUE
--------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/skiper/skiper/trace/skiper_ora_24204.trc

The dump file is quite big and hard to analyze at the beginning, so I wrote a small and simple (primitive even) tool in Python which will help you to analyze the dump file. The tool can be downloaded from here.

It’s tested against Python 7.5. You can use it like this:

[oracle@skiper ~]$ python logfile_dump_parser.py
parse file /path/to/a/redo/dump object_id_or_name   	 if you want to parse file (duh!)
connect to oracle user/password@ip:port/service     	 if you want to connect to db to get table names instead ids
exit to quit
VOODOO >

At the beginning you will get a prompt – you can tell the tool to parse a dumpfile, providing DATA_OBJECT_ID of a table, which you’d like to search in a file or you can connect to Oracle (cx_Oracle module for Python required) with a simple connect string and parse a dump file, using name of the table.

VOODOO > connect to oracle system/oracle@localhost:1521/skiper
VOODOO > parse file /u01/app/oracle/diag/rdbms/skiper/skiper/trace/skiper_ora_24204.trc EMPLOYEES

Redo record block: 42 offset: 16 [rs_id: 0x00000d.0000002a.0010]
	 change 1 operation UPDATE xid = 0x0004.019.000004f2
	 change 2 operation UNDO HEADER xid = 0x0004.019.000004f2
	 change 3 operation UPDATE xid = 0x0004.019.000004f2
	 change 4 operation UPDATE xid = 0x0004.019.000004f2
	 change 5 operation UPDATE xid = 0x0004.019.000004f2
	 change 6 operation UPDATE xid = 0x0004.019.000004f2
	 change 7 operation UPDATE xid = 0x0004.019.000004f2
	 change 8 operation COMMIT xid = 0x0004.019.000004f2
	 change 9 operation UNDO xid = 0x0004.019.000004f2
	 change 10 operation UNDO xid = 0x0004.019.000004f2
	 change 11 operation UNDO xid = 0x0004.019.000004f2
	 change 12 operation UNDO xid = 0x0004.019.000004f2
	 change 13 operation UNDO xid = 0x0004.019.000004f2
	 change 14 operation UNDO xid = 0x0004.019.000004f2

As you can see, this simple update changed 6 rows and generated one redo record with 14 change vectors – 6X UPDATE (redo) 6X UNDO, one UNDO HEADER modification and one COMMIT. This whole redo record is a single, atomic operation for the database.

Let’s see what will happen when we will ROLLBACK this kind of transaction instead.

First we have to repeat the operation and generate archivelog dump for our parser:

SQL> get upd
  1  update hr.employees
  2  set salary=salary*2
  3* where department_id=100
SQL> /

6 rows updated.

SQL> rollback;

Rollback complete.

SQL> alter system switch logfile;

System altered.

SQL> @get_arch

NAME
--------------------------------------------------------------------------------
+DATA/SKIPER/ARCHIVELOG/2017_09_20/thread_1_seq_29.319.955212079

SQL> alter system dump logfile '+DATA/SKIPER/ARCHIVELOG/2017_09_20/thread_1_seq_29.319.955212079';

System altered.

SQL> @tracefile

VALUE
--------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/skiper/skiper/trace/skiper_ora_3601.trc

And now we can parse logfile with my little tool (note, that you have to make connection to the database only once per parsing session)

VOODOO > parse file /u01/app/oracle/diag/rdbms/skiper/skiper/trace/skiper_ora_3601.trc EMPLOYEES

Redo record block: 47 offset: 16 [rs_id: 0x00001d.0000002f.0010]
	 change 1 operation UPDATE xid = 0x0002.001.00000615
	 change 2 operation UNDO HEADER xid = 0x0002.001.00000615
	 change 3 operation UPDATE xid = 0x0002.001.00000615
	 change 4 operation UPDATE xid = 0x0002.001.00000615
	 change 5 operation UPDATE xid = 0x0002.001.00000615
	 change 6 operation UPDATE xid = 0x0002.001.00000615
	 change 7 operation UPDATE xid = 0x0002.001.00000615
	 change 8 operation UNDO xid = 0x0002.001.00000615
	 change 9 operation UNDO xid = 0x0002.001.00000615
	 change 10 operation UNDO xid = 0x0002.001.00000615
	 change 11 operation UNDO xid = 0x0002.001.00000615
	 change 12 operation UNDO xid = 0x0002.001.00000615
	 change 13 operation UNDO xid = 0x0002.001.00000615

Redo record block: 50 offset: 228 [rs_id: 0x00001d.00000032.00e4]
	 change 1 operation UPDATE xid = 0x0002.001.00000615
	 change 2 operation ROLLBACK SINGLE STATEMENT (not final) xid = 0x0002.001.00000615

Redo record block: 50 offset: 424 [rs_id: 0x00001d.00000032.01a8]
	 change 1 operation UPDATE xid = 0x0002.001.00000615
	 change 2 operation ROLLBACK SINGLE STATEMENT (not final) xid = 0x0002.001.00000615

Redo record block: 51 offset: 124 [rs_id: 0x00001d.00000033.007c]
	 change 1 operation UPDATE xid = 0x0002.001.00000615
	 change 2 operation ROLLBACK SINGLE STATEMENT (not final) xid = 0x0002.001.00000615

Redo record block: 51 offset: 320 [rs_id: 0x00001d.00000033.0140]
	 change 1 operation UPDATE xid = 0x0002.001.00000615
	 change 2 operation ROLLBACK SINGLE STATEMENT (not final) xid = 0x0002.001.00000615

Redo record block: 52 offset: 20 [rs_id: 0x00001d.00000034.0014]
	 change 1 operation UPDATE xid = 0x0002.001.00000615
	 change 2 operation ROLLBACK SINGLE STATEMENT (not final) xid = 0x0002.001.00000615

Redo record block: 52 offset: 216 [rs_id: 0x00001d.00000034.00d8]
	 change 1 operation UPDATE xid = 0x0002.001.00000615
	 change 2 operation ROLLBACK SINGLE STATEMENT (final) xid = 0x0002.001.00000615
COMMIT [ending rollback]  for transaction 0x0002.001.00000615 [block: 52 offset: 432 rs_id: 0x00001d.00000034.01b0]

Nice! So for each and every modified row, Oracle had to generate separate redo record with 2 change vectors – one vector is an actual UPDATE built from UNDO information, which changes a row to its previous value and one rollback code (5.6 – ROLLBACK SINGLE STATEMENT (not final) or 5.11 – ROLLBACK SINGLE STATEMENT (final))

 

  • 5.6 – ROLLBACK SINGLE STATEMENT (not final) – this means that the rollback is in progress and there can be more operations like this
  • 5.11 – ROLLBACK SINGLE STATEMENT (final) – this means that the rollback is finished and there should be no more rollback records

Each complete ROLLBACK is actually finished by COMMIT (OP:5.4). This is logical because ROLLBACK is generating commands to reverse previous operations and then it can COMMIT the whole transaction.

The interesting thing is with partial rollback. Let’s UPDATE again 6 records but then let’s try to insert a row which will violate a primary key:

SQL> get upd
  1  update hr.employees
  2  set salary=salary*2
  3* where department_id=100
SQL> /

6 rows updated.

SQL> insert into hr.employees
  2  values (100,'Dupa','Blada','DBLADA','666.66.666',sysdate,'ST_CLERK',5600,null,100,100);
insert into hr.employees
*
ERROR at line 1:
ORA-00001: unique constraint (HR.EMP_EMP_ID_PK) violated

Now let’s check the parser:

V00D00 > parse file /u01/app/oracle/diag/rdbms/skiper/skiper/trace/skiper_ora_6603.trc EMPLOYEES

Redo record sequence: 32 block: 38 offset 16 [rs_id: 0x000020.00000026.0010]
	 change 1 operation UPDATE xid = 0x0005.004.0000061e
	 change 2 operation UNDO HEADER xid = 0x0005.004.0000061e
	 change 3 operation UPDATE xid = 0x0005.004.0000061e
	 change 4 operation UPDATE xid = 0x0005.004.0000061e
	 change 5 operation UPDATE xid = 0x0005.004.0000061e
	 change 6 operation UPDATE xid = 0x0005.004.0000061e
	 change 7 operation UPDATE xid = 0x0005.004.0000061e
	 change 8 operation UNDO xid = 0x0005.004.0000061e
	 change 9 operation UNDO xid = 0x0005.004.0000061e
	 change 10 operation UNDO xid = 0x0005.004.0000061e
	 change 11 operation UNDO xid = 0x0005.004.0000061e
	 change 12 operation UNDO xid = 0x0005.004.0000061e
	 change 13 operation UNDO xid = 0x0005.004.0000061e

Redo record sequence: 32 block: 41 offset 188 [rs_id: 0x000020.00000029.00bc]
	 change 1 operation OP:14.11 xid = 0x0005.004.0000061e

Redo record sequence: 32 block: 41 offset 252 [rs_id: 0x000020.00000029.00fc]
	 change 1 operation OP:13.22 xid = 0x0005.004.0000061e

Redo record sequence: 32 block: 41 offset 328 [rs_id: 0x000020.00000029.0148]
	 change 1 operation UNDO xid = 0x0005.004.0000061e
	 change 2 operation INSERT xid = 0x0005.004.0000061e

Redo record sequence: 32 block: 42 offset 172 [rs_id: 0x000020.0000002a.00ac]
	 change 1 operation DELETE xid = 0x0005.004.0000061e
	 change 2 operation ROLLBACK SINGLE STATEMENT (not final) xid = 0x0005.004.0000061e
COMMIT  for transaction 0x0005.004.0000061e [sequence: 33 block: 3 offset 16 [rs_id: 0x000021.00000003.0010]

Great! So we can see in here one familiar redo record, some change vectors from layer 13 (Segment management) and 14 (Extent management) and then our redo record for INSERT, followed by a redo record with a DELETE (ROLLBACK information)!

So each time you are doing something to a database, redo is being generated – even if you violate constraints! After constraint violations, Oracle is doing a partial rollback of this transaction.

But the funniest thing can be seen in this situation:

SQL> update hr.employees
  2  set employee_id=1;
update hr.employees
*
ERROR at line 1:
ORA-00001: unique constraint (HR.EMP_EMP_ID_PK) violated

We could expect, that modifying the second row to the same value as the previous one will throw this kind of exception. But redo log analysis shows something different:

V00D00 > parse file /u01/app/oracle/diag/rdbms/skiper/skiper/trace/skiper_ora_9695.trc EMPLOYEES

Redo record sequence: 34 block: 8771 offset 16 [rs_id: 0x000022.00002243.0010]
	 change 1 operation UNDO HEADER xid = 0x000a.002.00000508
	 change 2 operation UNDO xid = 0x000a.002.00000508
	 change 3 operation UPDATE xid = 0x000a.002.00000508

Redo record sequence: 34 block: 8773 offset 96 [rs_id: 0x000022.00002245.0060]
	 change 1 operation UNDO xid = 0x000a.002.00000508
	 change 2 operation UPDATE xid = 0x000a.002.00000508

Redo record sequence: 34 block: 8774 offset 100 [rs_id: 0x000022.00002246.0064]
	 change 1 operation UNDO xid = 0x000a.002.00000508
	 change 2 operation UPDATE xid = 0x000a.002.00000508

Redo record sequence: 34 block: 8775 offset 268 [rs_id: 0x000022.00002247.010c]
	 change 1 operation UPDATE xid = 0x000a.002.00000508
	 change 2 operation ROLLBACK SINGLE STATEMENT (not final) xid = 0x000a.002.00000508

Redo record sequence: 34 block: 8776 offset 136 [rs_id: 0x000022.00002248.0088]
	 change 1 operation UPDATE xid = 0x000a.002.00000508
	 change 2 operation ROLLBACK SINGLE STATEMENT (not final) xid = 0x000a.002.00000508

Redo record sequence: 34 block: 8777 offset 172 [rs_id: 0x000022.00002249.00ac]
	 change 1 operation UPDATE xid = 0x000a.002.00000508
	 change 2 operation ROLLBACK SINGLE STATEMENT (final) xid = 0x000a.002.00000508
COMMIT [ending rollback]  for transaction 0x000a.002.00000508 [sequence: 34 block: 8777 offset 384 [rs_id: 0x000022.00002249.0180]

We can see that Oracle did 3 modifications and 3 rollbacks after getting a signal that constraint was violated. So maybe we can assume that constraint checking is an asynchronous operation?

Stay tuned for more! Hope you will have fun with analyzing redo dumps with my simple Python tool </p />
</p></div>

    	  	<div class=