Top 60 Oracle Blogs

Recent comments

UNDO sizing

90% databases that I see for the first time have the same issue with UNDO tablespace: it’s over sized, yet still causing infamous ORA-1555 errors at times. Here is why.

UNDO is over-sized because of a dumb monitoring which treats UNDO tablespace in the same way as normal datafiles, i.e. the monitoring suggests to have some free space in UNDO. So it notifies DBAs and tells them to increase UNDO datafiles. And most DBAs are doing it straight away without thinking. This is the most common reason why databases end up with ridiculously large UNDO tablespaces.
It shouldn’t be this way. UNDO is a rollback history: there’s some really old history which can be ignored, there’s some history that might be necessary for a long running transaction, delayed block cleanout of flashback query, and finally there’s active transactions rollback history. Unless you have a special test environment, at any single point in time of a database instance nobody knows when and which UNDO data might be required. Oracle provides 3 options to handle this unknown:

  • let Oracle to decide how much undo to keep based on undo_retention and autoextensible undo data files
  • let Oracle to decide how much undo to keep based on the fixed size of UNDO tablespace
  • ask Oracle to guarantee undo_retention setting

Since datafiles are usually autoextensible, option 1 is the most common scenario. There are two possibilities with this option:
1) undo_retention is left default at 900 seconds. This effectively means Oracle doesn’t need to keep a lot of undo history since you didn’t ask for it. And no matter how big your UNDO is, this setup may easily lead to “snapshot too old” errors – Oracle doesn’t need to keep much UNDO, and is happily re-using undo space quickly enough
2) undo_retention is set to relatively high value, like few hours. This should set the minimum undo retention time, and, in theory, should guarantee you enough transactions history in the UNDO to run queries of at least undo_retention duration. However I’ve seen it more than once that setting undo_retention with autoextensible UNDO doesn’t help to avoid ORA-1555 even for queries that are much shorter than undo_retention. I don’t know how to explain this, and I don’t have a test case and I hate Oracle Support

With fixed size UNDO Oracle behaves rather differently and tries to utilize all the allocated UNDO datafiles as much as possible. It ignores undo_retention and just shows you tuned_undoretention that represents how much history you keep in your fixed size UNDO. I think this is the only reliable way to run Oracle DB in production. In fact documentation suggests to use this approach. It’s just not default.

Recently I’ve seen a database with multi terabyte UNDO. Its size was about 20% of all the datafiles in the system. This prompted me to try to write a query which would compute approximate UNDO size required to handle the load.
Here’re the ideas:

  • the minimum UNDO size should be derived from the maximum number of active undo blocks over the run time of the database. It’s most likely not enough, but gives a good starting point to understanding how much space active transactions require in extreme situations
  • the maximum UNDO size should be equal to the maximum of the sum of ‘undo change vector size’ statistic over the sliding window representing target undo_retention with retention guarantee. This value most likely is way more than it is required for a normal database operation.
  • sufficient UNDO size should be somewhere between those two numbers. I think it should be derived empirically starting from some intermediate best guess between min and max; then adjusted if ORA-1555 or inefficient undo space error appear

Here’s the script I’ve come up with:

prompt UNDO sizing report
prompt ==================
prompt Reports current undo set up, and minimum/maximum required undo tablespace size based on historical AWR data
prompt Note:
prompt   a) if your undo datafiles are not fixed size (all of them), you are doing it wrong most likely
prompt   b) awr_snapshots_count is used as a period to calculate maximum required undo size with guaranteed undo retention
prompt   c) there is no way to calculate sufficient undo size with a good precision and this report is an approximation
prompt For feedback

col inst_id                         format 99 head 'In|st'
col current_size_mb                 format 999,999,999 head 'Current|undo, MB'
col is_autoextensible               format a4 head 'Auto|ext?'
col undo_retention                  format 999.9 head 'Retention|hours'
col undo_size_min_mb                format 999,999,999 head 'Minimumal|req UNDO, MB'
col undo_size_guarantee_mb          format 999,999,999 head 'Max req|UNDO, MB'
col longest_sql                     format 999,999.9 head 'Longest|SQL, h'
col longest_sql_id                  format a13 head 'Longest|sql_id'
col max_ora1555_cnt                 format 999,999 head 'Max ORA-|1555 cnt'
col max_no_space_cnt                format 999,999 head 'Max no|space cnt'

  ua.undo_retention/3600 undo_retention,
  um.longest_sql/3600 longest_sql,
  gv$parameter p,
  ( -- how much undo is required to guarantee undo retention for awr_snapshots_count period
      max(required_undo_mb) undo_size_guarantee_mb
        sum(undo_size) over (partition by inst_id order by begin_interval_time rows &awr_snapshots_count preceding) required_undo_mb
          s.instance_number inst_id,
          round((ss.value - lag(ss.value) over (partition by s.instance_number order by s.begin_interval_time))/1024/1024) undo_size
          dba_hist_snapshot s,
          v$database d,
          dba_hist_sysstat ss,
          v$statname n
          s.dbid = d.dbid and
          s.dbid = ss.dbid and
          s.instance_number = ss.instance_number and
          s.snap_id = ss.snap_id and
          ss.stat_id = n.stat_id and
 = 'undo change vector size'
    group by inst_id
  ) u,
  ( -- minimally required undo as max active blocks
      uh.instance_number inst_id,
      max(activeblks * p.value/1024/1024) undo_size_min_mb,
      max(maxquerylen) longest_sql,
      max(maxquerysqlid) keep (dense_rank first order by maxquerylen desc) longest_sql_id,
      max(ssolderrcnt) max_ora1555_cnt,
      max(nospaceerrcnt) max_no_space_cnt
      dba_hist_snapshot s,
      v$database d,
      dba_hist_undostat uh,
      v$parameter p
      s.dbid = d.dbid and
      s.dbid = uh.dbid and
      s.instance_number = uh.instance_number and
      s.snap_id = uh.snap_id and = 'db_block_size'
    group by
  ) um,
  ( -- current undo setup
      round(sum(bytes)/1024/1024) current_size_mb,
      max(t.autoextensible) is_autoextensible,
      max(ur.value) undo_retention
      dba_data_files t, 
      gv$parameter p,
      (select inst_id, value from gv$parameter where name = 'undo_retention') ur
      t.tablespace_name = p.value and = 'undo_tablespace' and
      p.inst_id = ur.inst_id
    group by
  ) ua
  p.inst_id = u.inst_id and = 'undo_retention' and
  u.inst_id = ua.inst_id and
  ua.inst_id = um.inst_id

Example output:

Enter value for awr_snapshots_count: 5

 In      Current Auto Retention    Minimumal      Max req    Longest Longest       Max ORA-    Max no
 st     undo, MB ext?     hours req UNDO, MB     UNDO, MB     SQL, h sql_id        1555 cnt space cnt
--- ------------ ---- --------- ------------ ------------ ---------- ------------- -------- ---------
  1       65,508 YES         .3        3,678        9,213        1.7 6xxxxxxxxxxxg        0         0
  2       55,508 YES         .3        5,325       26,797        1.7 6xxxxxxxxxxxg        0         0

What does the output mean?

  • this is a 2 node RAC with autoextensible UNDO tablespaces of more than 50G each
  • undo_retention is default 900 seconds (non important rounding error from 0.25)
  • maximum active undo blocks as reported in dba_hist_undostat were few GBs per instance and these numbers are around minimally required UNDO tablespaces
  • maximum UNDO vector size that was generated for 5 consecutive AWR snapshots is 9 and 26 G respectively. Those are required UNDO tablespaces to guarantee undo_retention of 6 hours (hourly snapshots in this DB).
  • few FYI columns: the longest SQL, its sql_id and max number of ORA-1555 and space errors (should be sum perhaps)
  • based on these numbers I think I’d try to set UNDO fixed size about 20G for both nodes. This is smaller than currently allocated UNDO, and it should still be enough to run the DB without errors. This DB didn’t have any undo errors though, so technically, no need to touch it unless it’s broken or there’s not enough space for such UNDO

I tried to use Undo Advisor for this DB and the only result I got was … “Undo tablespace is OK.” </p />

    	  	<div class=