Search

Top 60 Oracle Blogs

Recent comments

Flashback logging

When database flashback first appeared many years ago I commented (somewhere, but don’t ask me where) that it seemed like a very nice idea for full-scale test databases if you wanted to test the impact of changes to batch code, but I couldn’t really see it being a good idea for live production systems because of the overheads.

Features and circumstances change, of course, and someone recently pointed out that if your production system is multi-terabyte and you’re running with a dataguard standby and some minor catastrophe forces you to switch to the standby then you don’t really want to be running without a standby for the time it would take for you to use restore and recover an old backup to create a new standby and there may be cases where you could flashback the original primary to before the catastrophe and turn it into the standby from that point onward. Sounds like a reasonable argument to me – but you might still need to think very carefully about how to minimise the impact of enabling database flashback, especially if your database is a datawarehouse, DSS, or mixed system.

Imagine you have a batch processes that revolve around loading data into an empty table with a couple of indexes – it’s a production system so you’re running with archivelog mode enabled, and then you’re told to switch on database flashback. How much impact will that have on your current loading strategies ? Here’s a little bit of code to help you on your way – I create an empty table as a clone of the view all_objects, and create one index, then I insert 1.6M rows into it. I’ve generated 4 different sets of results: flashback on or off, then either maintaining the index during loading or marking it unusable then rebuilding it after the load. Here’s the minimum code:


create table t1 segment creation immediate tablespace test_8k
as
select * from all_objects
where   rownum < 1
;

create index t1_i1 on t1(object_name, object_id) tablespace test_8k_assm_auto;
-- alter index t1_i1 unusable;

insert /*+ append */ into t1
with object_data as (
        select --+ materialize
                *
        from
                all_objects
        where
                rownum <= 50000
),
counter as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 32
)
select
        /*+ leading (ctr obj) use_nl(obj) */
        obj.*
from
        counter         ctr,
        object_data     obj
;

-- alter index t1_i1 rebuild;

Here’s a quick summary of the timing I got  before I talk about the effects (running 11.2.0.4):

Flashback off:
Maintain index in real time: 138 seconds
Rebuild index at end: 66 seconds

Flashback on:
Maintain index in real time: 214 seconds
Rebuild index at end: 112 seconds

It is very important to note that these timings do not allow you to draw any generic conclusions about optimum strategies for your systems. The only interpretation you can put on them is that different circumstances may lead to very different timings, so it’s worth looking at what you could do with your own systems to find good strategies for different cases.

Most significant, probably, is the big difference between the two options where flashback is enabled – if you’ve got to use it, how do you do damage limitation. Here are some key figures, namely the file I/O stats and the some instance activity stats, I/O stats first:


"Real-time" maintenance
---------------------------------
Tempfile Stats - 09-Mar 11:41:57
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,088      22,454      20.638        .063         296        .000     1,011      22,455        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

---------------------------------
Datafile Stats - 09-Mar 11:41:58
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3      24,802      24,802       1.000        .315      24,802        .315           0        .000          2,386      20,379        .239    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         718      22,805      31.762        .001           5        .000         713        .002            725      22,814        .002    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       8,485       8,485       1.000        .317       8,485        .317           0        .000            785       6,938        .348    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

Mark Unusable and Rebuild
---------------------------------
Tempfile Stats - 09-Mar 11:53:04
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,461      10,508       7.192        .100           1        .017       407      10,508        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

---------------------------------
Datafile Stats - 09-Mar 11:53:05
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3          17          17       1.000       5.830          17       5.830           0        .000             28          49       1.636    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         894      45,602      51.009        .001           2        .002         892        .001            721      22,811        .026    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       2,586       9,356       3.618        .313         264       3.064       2,322        .001          2,443       9,214        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

There are all sorts of interesting differences in these results due to the different way in which Oracle handles the index. For the “real-time” maintenance the session accumulates the key values and rowids as it writes the table, then sorts them, then does an cache-based bulk update to the index. For the “rebuild” strategy Oracle simply scans the table after it has been loaded, sorts the key values and indexes, then writes the index to disc using direct path writes; you might expect the total work done to be roughly the same in both cases – but it’s not.

I’ve shown 4 files: the temporary tablespace, the undo tablespace, the tablespace holding the table and the tablespace holding the index; and the first obvious difference is the number of blocks written and read and the change in average read size on the temporary tablespace. Both sessions had to spill to disc for the sort, and both did a “one-pass” sort; the difference in the number of blocks written and read appears because the “real-time” session wrote the sorted data set back to the temporary tablespace one more time than it really needed to – it merged the sorted data in a single pass but wrote the data back to the temporary tablespace before reading it again and applying it to the index (for a couple of points on tracing sorts, see this posting). I don’t know why Oracle chose to use a much smaller read slot size in the second case, though.

The next most dramatic thing we see is that real-time maintenance introduced 24,800 single block reads with 20,000 blocks written to the undo tablespace (with a few thousand more that would eventually be written by dbwr – I should have included a “flush buffer_cache” in my tests), compared to virtually no activity in the “rebuild” case. The rebuild generates no undo; real-time maintenance (even starting with an empty index) generates undo because (in theory) someone might look at the index and need to see a read-consistent image of it. So it’s not surprising that we see a lot of writes to the undo tablespace – but where did the reads come from? I’ll answer question that later.

It’s probably not a surprise to see the difference in the number of blocks read from the table’s tablespace. When we rebuild the index we have to do a tablescan to acquire the data; but, again, we can ask why did we see 22,800 blocks read from the table’s tablespace when we were doing the insert with real-time maintenance. On a positive note those reads were multiblock reads, but what caused them? Again, I’ll postpone the answer.

Finally we see that the number of blocks read (reason again postponed) and written to the index’s tablespace are roughly similar. The writes differ because because the rebuild is doing direct path writes, while the real-time maintenance is done in the buffer cache, so there are some outstanding index blocks to be written. The reads are similar, though one test is exclusively single block reads and the other is doing (small) multiblock reads – which is just a little bit more efficient.  The difference in the number of reads is because the rebuild was at the default pctfree=10 while the index maintenance was a massive “insert in order” which would have packed the index leaf blocks at 100%.

To start the explanation – here are the most significant activity stats – some for the session, a couple for the instance:


"Real-time" maintenance
-----------------------
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                        33,263
redo entries                                                           118,290
redo size                                                          466,628,852
redo size for direct writes                                        187,616,044
undo change vector size                                            134,282,356
flashback log write bytes                                          441,032,704

Rebuild
-------
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                           156
redo entries                                                            35,055
redo size                                                          263,801,792
redo size for direct writes                                        263,407,628
undo change vector size                                                122,156
flashback log write bytes                                          278,036,480

The big clue is the “physical reads for flashback new”. When you modify a block, if it hasn’t been dumped into the flashback log recently (as defined by the hidden _flashback_barrier_interval parameter) then the original version of the block has to be written to the flashback log before the change can be applied; moreover, if a block is being “newed” (Oracle-speak for being reformatted for a new use) it will also be written to flashback log. Given the way that the undo tablespace works it’s not surprising if virtually every block you modify in the undo tablespace has to be written to the flashback log before you use it. The 33,264 blocks read for “flashback new” consists of the 24,800 blocks read from the undo tablespace when we were maintaining the index in real-time plus a further 8,460 from “somewhere” – which, probably not coincidentally, matches the number of blocks read from the index tablespace as we create the index. The odd thing is that we don’t see the 22,800 reads on the table’s tablespace (which don’t occur when flashback is off) reported as “physical reads for flashback new”; this looks like a reporting error to me.

So the volume of undo requires us to generate a lot of flashback log as well as the usual increase in the amount of redo. As a little side note, we get confirmation from these stats that the index was rebuilt using direct path writes – there’s an extra 75MB of redo for direct writes.

Summary

If you are running with flashback enabled in a system that’s doing high volume data loading remember that the “physical reads for flashback new” could be a major expense. This is particularly expensive on index maintenance, which can result in a large number single block reads of the undo tablespace. The undo costs you three times – once for the basic cost of undo (and associated redo), once for the extra reads, and once for writing the flashback log. Although you have to do tablescans to rebuild indexes, the cost of an (efficient, possibly direct path) tablescan may be much less than the penalty of the work relating to flashback.

Footnote: since you can’t (officially) load data into a table with an unusable unique index or constraint, you may want to experiment with using non-unique indexes to support unique/PK constraints and disabling the constraints while loading.