Search

OakieTags

Who's online

There are currently 0 users and 26 guests online.

Recent comments

New Events for Data Guard and Synchronous Redo Transport in 12c (2)

After the baseline has been established in the first part of this series it’s time to measure the effect of the network in this part. The second test will introduce an interesting feature: Using Linux’s own Traffic Shaper/Quality of Services module I will add a delay of 100ms to the Data Guard network interface card (NIC) to slow things down a little.

WARNING: this is of course a lab or VM-only situation. I can’t exert control over wire quality in my (own) switches, hence some software magic is needed on my virtual ones. This post is intended to be merely for educational purposes, not for use at work.

I am continuing to use the 2 node RAC 12.1.0.2.170117 primary database on Oracle Linux 7 with UEK 4 and an identical RAC to host my standby database.

All database traffic still uses eth0 – my public network (192.168.100.0/24). In order not to affect my SSH session with the planned network changes I created a new out-of-band management interface (eth2) on the Data Guard standby database servers. I love KVM and virtio drivers for the NICs: a few commands in virsh later and every RAC node has a new NIC, and of course without rebooting. Adding some nmcli magic and I can configure all interfaces in 1 line!

Here’s what it looks like from connection manager (the system is Oracle Linux 7.3)

[root@rac12sec1 ~]# nmcli c show
NAME  UUID                                  TYPE            DEVICE 
eth0  3e8915f1-10ce-4da7-99b8-d2a6947f9ff7  802-3-ethernet  eth0   
eth1  0ebe1279-40e1-4faf-a0be-49588d347fcf  802-3-ethernet  eth1   
mgmt  51054729-19ae-49e7-ba18-c6b41475ec1b  802-3-ethernet  eth2   

My eth0 device maps to the public networks, e.g. all database traffic including Data Guard. The private interconnect uses eth1 with eth2 as the management interface. I have deliberately kept this simple, usually I’d request a dedicated set of interfaces to route Data Guard traffic (I blogged about that earlier in the context of RAC) here:

Please read past part I of the Adding an additional SCAN-series to get the full story.

Change protection mode

Following the scenario I outlined in part I I first have to change the protection mode to Maximum Availability. Again, please see part I of this series for an explanation about Protection Levels, my choice of protection level, and why “Maximum Availability” might or might not work for you.

Since I’m using the Data Guard Broker, I need to head over and change the mode in dgmgrl.

DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 40 seconds ago)

As with many other maintenance tasks, you use the edit command to change the configuration.

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Error: ORA-16627: operation disallowed since no standby databases would 
remain to support protection mode

Failed.

I seem to have gotten a bit ahead of myself – changing to the more serious protection modes requires changing the transport mode from async to sync. Let’s do that, and then change the protection mode:

DGMGRL> edit database 'NCDBA' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'NCDBB' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxAvailability
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 19 seconds ago)

DGMGRL> 

That looks better now, the first half of the job is done. The second half of this article details how to make the network look a lot worse.

Net Emulator

Using the Linux netem kernel module I can simulate many complex network situations – I found https://wiki.linuxfoundation.org/networking/netem to be a good reference.

I should probably say at this point that no one would use a link that has 100 ms delay between sites for (sync) redo transport. That simply doesn’t make sense. I used 100ms to have a large enough factor so that I can actually be sure that a deviation of performance figures is actually a result of my fiddling with the network latency. Large numbers make spotting outliers easier.

How do you mess with the network? That is simple when you are on Linux! On my standby cluster I use the tc command, part of the iproute package. This is the “normal” state:

[root@rac12sec1 ~]# tc qdisc
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
[root@rac12sec1 ~]# 

I am merely using these tools, if you want to understand them, have a look at the Linux Advanced Routing & Traffic Control HOWTO chapter 9, and refer to the link to the Linux Foundation wiki I mentioned earlier. Then take it from there.

The syntax of tc qdisc is complex at first sight:

[root@rac12sec1 ~]# tc qdisc help
Usage: tc qdisc [ add | del | replace | change | show ] dev STRING
       [ handle QHANDLE ] [ root | ingress | parent CLASSID ]
       [ estimator INTERVAL TIME_CONSTANT ]
       [ stab [ help | STAB_OPTIONS] ]
       [ [ QDISC_KIND ] [ help | OPTIONS ] ]

       tc qdisc show [ dev STRING ] [ingress]
Where:
QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }
OPTIONS := ... try tc qdisc add  help
STAB_OPTIONS := ... try tc qdisc add stab help

The above allows us to introduce NetEM to the mix. On my Linux 7.3/UEK4 system it offers the following options:

[root@rac12sec1 ~]# tc qdisc add netem help
Usage: ... netem [ limit PACKETS ] 
                 [ delay TIME [ JITTER [CORRELATION]]]
                 [ distribution {uniform|normal|pareto|paretonormal} ]
                 [ corrupt PERCENT [CORRELATION]] 
                 [ duplicate PERCENT [CORRELATION]]
                 [ loss random PERCENT [CORRELATION]]
                 [ loss state P13 [P31 [P32 [P23 P14]]]
                 [ loss gemodel PERCENT [R [1-H [1-K]]]
                 [ ecn ]
                 [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]
                 [ rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]]

That’s a wealth of options. I can simulate somewhat unreliable WAN links, even with variations in delay! Refer to the Linux Foundation wiki for more details on how to simulate networks more realistically.

Introducing Delay

I prefer to keep it simple for this post, so I’ll stick with the most basic setting for network delay. I am introducing this to the Data Guard network on both of my standby nodes while connected to the management interface eth2:

[root@rac12sec1 ~]# tc qdisc add dev eth0 root netem delay 100ms

[root@rac12sec1 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

[root@rac12sec2 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

[root@rac12sec2 ~]# tc qdisc show dev eth0 
qdisc netem 8001: root refcnt 2 limit 1000 delay 100.0ms

The effect can be seen immediately by pinging one of the standby nodes:

[oracle@rac12pri1 ~]$ ping rac12sec1
PING rac12sec1.example.com (192.168.100.20) 56(84) bytes of data.
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=1 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=2 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=3 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=4 ttl=64 time=100 ms
^C
--- rac12sec1.example.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 100.368/100.415/100.478/0.229 ms
[oracle@rac12pri1 ~]$ 

So 100ms delay it is!

Starting the Benchmark

With the scene set, it’s time to start the benchmark. The command to start it resembles the one I used for part I, except for the name of the results file.

[oracle@loadgen1 bin]$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxavail_netem.xml -rt 00:10
Author  :	 Dominic Giles
Version :	 2.5.0.971

Results will be written to results_20_maxavail_netem.xml.
Hit Return to Terminate Run...

Time		Users	TPM	TPS

1:08:56 PM      20      5188    94          

What effect does this have on the database? Let’s check the Data Guard broker first:

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          10 seconds (computed 0 seconds ago)
  Average Apply Rate: 207.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 211.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

So it looks like there isn’t any lag visible with the benchmark underway, however it took a little while for it to start as visible in the first example. Here is the output from Tanel Poder’s ASHTOP script again (using Tanel Poder’s ashtop again – which requires you to have the Diagnostic Pack license as it access v$active_session_history):

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 08:10','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 08:15','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
      298     1.0   24% | oracle@rac12pri1 (LGWR)                          SYNC Remote Write                        Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      298     1.0   24% | oracle@rac12pri1 (NSS2)                          Redo Transport MISC                      Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      276      .9   23% | oracle@rac12pri2 (LGWR)                          SYNC Remote Write                        Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      274      .9   23% | oracle@rac12pri2 (NSS2)                          Redo Transport MISC                      Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
        8      .0    1% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 08:10:16 2017-02-17 08:14:59                 1
        7      .0    1% | oracle@rac12pri2 (LMS0)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:10:43 2017-02-17 08:14:51                 1
        5      .0    0% | oracle@rac12pri1 (LMS0)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:10:23 2017-02-17 08:13:48                 1
        5      .0    0% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 08:10:00 2017-02-17 08:13:24                 1
        5      .0    0% | oracle@rac12pri2 (NSV1)                                                                                   ON CPU  2017-02-17 08:12:43 2017-02-17 08:14:44                 1
        4      .0    0% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 08:11:16 2017-02-17 08:13:35                 1
        3      .0    0% | oracle@rac12pri2 (CJQ0)                                                                                   ON CPU  2017-02-17 08:12:39 2017-02-17 08:12:51                 2
        3      .0    0% | oracle@rac12pri2 (RSM0)                                                                                   ON CPU  2017-02-17 08:11:44 2017-02-17 08:14:44                 3
        2      .0    0% | oracle@rac12pri1 (LMON)                          control file sequential read             System I/O      WAITING 2017-02-17 08:11:51 2017-02-17 08:14:01                 1
        2      .0    0% | oracle@rac12pri1 (LMS1)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:11:14 2017-02-17 08:11:44                 1
        2      .0    0% | oracle@rac12pri2 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 08:12:14 2017-02-17 08:14:24                 1
        2      .0    0% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 08:12:59 2017-02-17 08:14:42                 1
        1      .0    0% | oracle@rac12pri1 (CKPT)                          control file sequential read             System I/O      WAITING 2017-02-17 08:13:46 2017-02-17 08:13:46                 1
        1      .0    0% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 08:10:28 2017-02-17 08:10:28                 1
        1      .0    0% | oracle@rac12pri1 (LGWR)                          Redo Transport Close                     Other           WAITING 2017-02-17 08:12:22 2017-02-17 08:12:22                 1
        1      .0    0% | oracle@rac12pri1 (LGWR)                          log file sequential read                 System I/O      WAITING 2017-02-17 08:14:16 2017-02-17 08:14:16                 1

20 rows selected.

As you can see the top waits are in wait class “other”: “SYNC Remote Write” and “Redo Transport MISC”. I have pulled an AWR diff report that shows an interesting picture. The first snapshot set represents the full network performance, while the second one shows the effect of the 100ms network delay. Only relevant information is shown.

WORKLOAD REPOSITORY RAC Compare Period Report for

Database Summary
~~~~~~~~~~~~~~~~
                            Database                   Snapshot Ids     Number of Instances    Number of Hosts    Report Total (minutes)
               ----------------------------------   -----------------   -----------------   -----------------   ---------------------
Snapshot Set            Id Name      RAC Block Si      Begin      End   In Repor    Total   In Repor    Total      DB time Elapsed ti
------------   ----------- --------- --- --------   -------- --------   -------- --------   -------- --------   ---------- ----------
First (1st)      481582958 NCDBA     YES     8192        303      304          2        2          2        2         22.1       10.2
Second (2nd)     481582958 NCDBA     YES     8192        301      302          2        2          2        2        141.9       10.6
Database Instances Included In Report
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                                             Elapsed              Instance Avg Activ
Set       DB Id Inst Instance  Release         Host       Startup         Begin Snap Time End Snap Time   Time (min) DB time (m Up Time (h  Sessions Platform
--- ----------- ---- --------- --------------- ---------- --------------- --------------- --------------- ---------- ---------- ---------- --------- ---------------
1st   481582958    1 NCDBA1    12.1.0.2.0      rac12pri1  17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09       10.0        2.0        5.2       0.2 Linux x86 64-bi
                   2 NCDBA2    12.1.0.2.0      rac12pri2  17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09       10.0       20.1        5.2       2.0 Linux x86 64-bi

2nd   481582958    1 NCDBA1    12.1.0.2.0      rac12pri1  17-Feb-17 04:00 17-Feb-17 08:06 17-Feb-17 08:17       10.5      134.0        4.3      12.8 Linux x86 64-bi
                   2 NCDBA2    12.1.0.2.0      rac12pri2  17-Feb-17 04:00 17-Feb-17 08:07 17-Feb-17 08:17       10.5        7.9        4.3       0.8 Linux x86 64-bi

...

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                  2.2                 13.3      514.3                      0.0                  0.2    1,600.0
                      CPU time:                  1.5                  0.5      -66.0                      0.0                  0.0        0.0
           Background CPU time:                  0.5                  0.2      -56.9                      0.0                  0.0        0.0
             Redo size (bytes):            983,488.1            322,026.1      -67.3                  4,164.1              4,192.4        0.7
         Logical read (blocks):             30,773.7              8,299.3      -73.0                    130.3                108.0      -17.1
                 Block changes:              5,665.8              1,886.0      -66.7                     24.0                 24.6        2.3
        Physical read (blocks):                 10.5                 18.3       74.4                      0.0                  0.2      500.0
       Physical write (blocks):                654.9                196.8      -69.9                      2.8                  2.6       -7.6
              Read IO requests:                 10.1                 12.9       27.3                      0.0                  0.2      325.0
             Write IO requests:                503.8                110.4      -78.1                      2.1                  1.4      -32.4
                  Read IO (MB):                  0.1                  0.1       75.0                      0.0                  0.0        0.0
                 Write IO (MB):                  5.1                  1.5      -69.9                      0.0                  0.0        0.0
                  IM scan rows:                  0.0                  0.0        0.0                      0.0                  0.0        0.0
       Session Logical Read IM:
                    User calls:                515.8                188.2      -63.5                      2.2                  2.5       12.4
                  Parses (SQL):                 24.9                 17.7      -28.8                      0.1                  0.2      109.1
             Hard parses (SQL):                  0.1                  0.2       45.5                      0.0                  0.0        0.0
            SQL Work Area (MB):                  0.1                  0.5      253.8                      0.0                  0.0      253.8
                        Logons:                  1.3                  3.9      191.7                      0.0                  0.1      400.0
                Executes (SQL):              2,587.7                853.0      -67.0                     11.0                 11.1        1.4
                  Transactions:                236.2                 76.8      -67.5

...

Top Timed Events   First DB/Inst: NCDBA/NCDBA1 Snaps: 303-304 (Elapsed time: 612.266 sec  DB time: 1326.09 sec),  Second DB/Inst: NCDBA/NCDBA1 Snaps: 301-302 (Elapsed time: 638.691 sec  DB time: 8514.42 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes
-> Summary for all instances

                                               1st                                                                                                2nd
------------------------------------------------------------------------------------------------   ------------------------------------------------------------------------------------------------
Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time   Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time
------------------------------ ------------- ------------ ------------ ------------- -----------   ------------------------------ ------------- ------------ ------------ ------------- -----------
 CPU time                                             N/A        919.5           N/A        69.3    log file sync                 Commit              49,010      7,976.7         162.8        93.7
 log file sync                 Commit             144,284        270.4           1.9        20.4    Redo Transport MISC           Other               11,392      1,192.0         104.6        14.0
 log file parallel write       System I/O         141,581        156.6           1.1        11.8    SYNC Remote Write             Other               11,392      1,179.1         103.5        13.8
 gc current block 2-way        Cluster             98,483         71.9           0.7         5.4    CPU time                                             N/A        328.7           N/A         3.9
 gc cr block 2-way             Cluster             71,469         47.9           0.7         3.6    gcs log flush sync            Other                4,480         56.0          12.5         0.7
 target log write size         Other               22,284         26.5           1.2         2.0    gc current block 2-way        Cluster             62,843         43.3           0.7         0.5
 gc current grant busy         Cluster             22,561         16.8           0.7         1.3    gc current block busy         Cluster                284         39.3         138.4         0.5
 control file sequential read  System I/O           8,497          6.0           0.7         0.5    log file switch (checkpoint i Configuration           75         37.1         495.1         0.4
 db file sequential read       User I/O             5,862          5.4           0.9         0.4    rdbms ipc reply               Other                   32         27.1         845.4         0.3
 log file sequential read      System I/O          14,895          4.8           0.3         0.4    gc cr block 2-way             Cluster             39,015         26.0           0.7         0.3
-log file switch (checkpoint i Configuration          212          3.6          17.2         0.3   -log file parallel write       System I/O          11,267         20.3           1.8         0.2
-gcs log flush sync            Other                  461          0.4           0.9         0.0   -gc current grant busy         Cluster             12,427          8.2           0.7         0.1
-gc current block busy         Cluster                126          0.3           2.7         0.0   -control file sequential read  System I/O           7,326          5.0           0.7         0.1
-                                                     N/A          N/A           N/A         N/A   -db file sequential read       User I/O             6,615          4.7           0.7         0.1
-                                                     N/A          N/A           N/A         N/A   -log file sequential read      System I/O             174          0.4           2.4         0.0
                          --------------------------------------------------------------------------------------------------------------------

This doesn’t look too good, does it? The throughput is down substantially, which is visible in Swingbench itself: in the first example I managed to achieve 255 TPS while I’m averaging 90-95 in the synch + delay example. The numbers from the AWR excerpt shown above confirm this, and allows you to work out where the problem is.

Here is a print screen taken from OEM (that’s a different benchmark execution), showing the same situation. Make sure you are licensed appropriately to use this view.

oemexpress_with_latencyhttps://martincarstenbach.files.wordpress.com/2017/03/oemexpress_with_la... 600w, https://martincarstenbach.files.wordpress.com/2017/03/oemexpress_with_la... 150w" sizes="(max-width: 300px) 100vw, 300px" />

Summary Part II

So yes, there is a very noticeable affect of the network delay, and the AWR report shows the relevant wait events in 12c indicating where time is spent.

In my opinion it’s a bit unfortunate to have the new waits in the “Other” wait class, but that’s how it is.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf