Search

OakieTags

Who's online

There are currently 0 users and 39 guests online.

Recent comments

Data Guard transport lag in OEM 12c

I have come across this phenomenon a couple of times now so I thought it was worth writing up.

Consider a scenario where you get an alert because your standby database has an apply lag. The alert is generated by OEM and when you log in and check-it has indeed an apply lag. Even worse, the apply lag increases with every refresh of the page! I tagged this as an 11.2 problem but it’s definitely not related to that version.

Here is a screenshot of this misery:

 Lag in OEM

Now there are of course a number of possible causes:

  • There is a lag
  • You are not using Real Time Apply

The first one is easy to check: look at the redo generation rate on the primary database to see if it’s any different. Maybe you are currently loading lots of data? Maybe a batch job has been initiated that goes over a lot of data… the possibilities are nearly endless.

Another, more subtle interpretation could be that you are not using Real Time Apply. How can you check? In the broker command line interface for example:

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxPerformance
  Databases:
    pri - Primary database
      Warning: ORA-16789: standby redo logs not configured

    sby - Physical standby database
      Warning: ORA-16789: standby redo logs not configured

Fast-Start Failover: DISABLED

Configuration Status:
WARNING

The warnings about missing standby redo logs show that you cannot possibly use Real Time Apply (it needs standby redo logs). The other option is in the database itself:

SQL> select dest_id,status,database_mode,recovery_mode
  2  from v$archive_dest_status
  3  where status <> 'INACTIVE';

   DEST_ID STATUS    DATABASE_MODE   RECOVERY_MODE
---------- --------- --------------- -----------------------
         1 VALID     MOUNTED-STANDBY MANAGED
        32 VALID     UNKNOWN         IDLE

Did you notice dest_id of 32? That’s a bit of an unusual one, more on that later (since you can only set log_archive_dest_x where x is {1,31}).

So indeed we have managed recovery active, but not using Real Time Apply. This is expressed in the database status:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   28 seconds
  Apply Lag:       28 seconds
  Real Time Query: OFF
  Instance(s):
    sby

A few moments later when you query the database again the lag has increased:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   3 minutes 22 seconds
  Apply Lag:       3 minutes 22 seconds
  Real Time Query: OFF
  Instance(s):
    sby

This is to be expected-the primary is still happily processing user requests. The cure is to add standby redo logs, as suggested in so many places and described in the Data Guard documentation. After the successful addition of SRLs the lag should disappear. A restart of managed recovery using the broker will show something along these lines on the standby:

2014-01-30 14:35:18.353000 +00:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
Attempt to start background Managed Standby Recovery process (sby)
MRP0 started with pid=24, OS id=4854
MRP0: Background Managed Standby Recovery process started (sby)
2014-01-30 14:35:23.406000 +00:00
 started logmerger process
Managed Standby Recovery starting Real Time Apply
...
2014-01-30 14:37:12.595000 +00:00
Media Recovery Waiting for thread 1 sequence 20 (in transit)
2014-01-30 14:37:13.691000 +00:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 20 Reading mem 0
  Mem# 0: +DATA/sby/onlinelog/group_5.266.838218819

Two important bits of information are shown here: Managed Standby Recovery starting Real Time Apply and the fact that it is using the standby redo log. Sure enough, after the database is in sync with its primary and uses the log, the lag is gone:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    sby

And also in the OEM view:

OEM-lag-02

Slight Variation

I have also seen this problem in OEM where the transport lag was near 0 and therefore hardly visible due to the scale of the graph. The apply lag nevertheless resulted from the primary working and the current log hasn’t shipped to the standby-obviously before the implementation of standby redo logs. You saw a spike mounting in the OEM view until the next log switch on the primary when the apply lag dropped to 0 for a brief moment before increasing again.

Summary

Real Time Apply is a very very useful feature, especially when used together with the maximum availability protection mode. The real risk of not using standby redo logs – and implicitly no RT Apply – is that you lose data since the current online redo log on the primary has not been copied across. If you need to activate your standby you will be some transactions short of the primary. The larger the online redo log, the larger the gap.