Search

OakieTags

Who's online

There are currently 0 users and 33 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

EM12c Release 4, Health Overview

As part of the projects I work on at Oracle, it’s often ensuring that those customers that wish to deploy Enterprise Manager, (EM12c) to large environments, have the correct settings and are tuned to offer the best performance from the Oracle Management repository database, weblogic and up to the console URL’s accessed by users.  This means that these large environments often receive recommendations from our experts  that differ from what EM12c comes “out of the box” with.

For those that aren’t receiving internal Oracle training on what to look for and how to tune EM12c tiers, there are some new features in release 4 that should be checked out by anyone using EM12c.

EM12c Health Overview

The Health Overview is accessible via the Setup menu, (right side of Console), Manage Cloud Control and Health Overview.

ho1

We’ll go over each of these new monitoring tools, but the Health Overview includes valuable information about the health of both the Oracle Management Repository, (OMR) and the Oracle Management Service, (OMS).

The overview page breaks down into easy to understand sections.  The first is basic information and workload on the OMS:

ho2

From here you can see all pertinent, high level information about the OMS/OMR, including OMS information, the number of agents, including status counts on availability and if a load balancer is used in the EM12c configuration.  Important target status availability is posted, along with how many administrators have been given access to the EM console.

Below this we can view the backlog graph on the right hand side of the page:

ho2

That right hand graph is important since along with the upload rate, you can see if there is a consistent backlog of XML files to be uploaded and that can signal performance trouble.  A backlog can cause problems, as this beings there is a backlog for the loader can delay receipt of critical alerts and information about a target.  If the backlog becomes too extensive, an agent can reach a threshold on how many files it can handle backlogged and stop collecting, which is a significant issue.  It’s important that if serious backlog issues are noted, tuning options to deter from them, (like add a load balancer to assist in managing the workload or a second OMS.)

Repository Details

The next section includes connection information, which also has the service name, the database name and database type, the port and job system status.  On the right is a graph showing if any backoff requests have been sent.  These occur when the OMS is busy processing an XML upload and requests the agent to hold off on sending anymore files until it has finished.

ho3

Notification Information

Scanning down from the backoff graph in the Health Overview displays the Notification backlog graph.  Knowing how backlogged your time-sensitive notifications are performing is crucial when someone asks if there is anyway to know why one of the notifications weren’t received in a timely manner and you can quickly assess if it is an issue with EM12c or if the problem resides elsewhere.

ho4

Alerts

The last section in the health overview covers incident management.  You can easily see if there are any Metric Collection Errors, (corresponding this to the backlog data above if necessary), access related Metric Errors and Related Alerts.

ho5

You also can choose to launch the Incident Manager from the Health Overview console if you wish to get more details about all incidents currently in the queue. This view is really to give you a very high level account of what incidents are currently open and related alerts and metric errors.  Use that button to launch the Incident Manager if you wish to see what the alerts are all about.

We’ll dig into the deeper diagnostic data offered in EM12c, release 4 for the OMR, OMS and Agents in subsequent posts, so until next time!



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [EM12c Release 4, Health Overview], All Right Reserved. 2014.

The First PASS Summit Bloggers’ Meetup

We are stoked to announce the first ever PASS Summit Bloggers Meetup!

What: PASS Summit Bloggers Meetup 2014
When: Thursday, November 6th, 5pm – 7pm
Where: Sports Restaurant & Bar, 140 4th Ave North, Suite 130, Seattle, WA 98109
How: Please comment with “COUNT ME IN” if coming — we need to know attendance numbers.

We’re excited to meet old friends, and make new ones in Seattle this year. Pythian will sponsor munchies and drinks. There will be a networking contest with some cool prizes, plus everyone will get their very own PASS Summit Bloggers Meetup shirt! Meetups are a great way to make new friends in the community, so come on out — all bloggers are welcome!

If you are planning to attend, please comment here with the phrase “COUNT ME IN”. This will help us ensure we have the attendance numbers right. Please provide your blog URL with your comment — it’s a Bloggers Meetup after all! Make sure you comment here if you are attending so that we have enough room, food, and drinks.

The location is perfect to get ready for the appreciation event — minutes walk from EMP Museum! Snacks and drinks before the big event and mingle with fellow bloggers. What can be better?

Of course, do not forget to blog and tweet about this year’s bloggers meetup using #Summit14 #sqlpass. See you there!

first_rows(10)

No, not the 10th posting about first_rows() this week – whatever it may seem like – just an example that happens to use the “calculate costs for fetching the first 10 rows” optimizer strategy and does it badly. I think it’s a bug, but it’s certainly a defect that is a poster case for the inherent risk of using anything other than all_rows optimisation.  Here’s some code to build a couple of sample tables:


begin
	dbms_stats.set_system_stats('MBRC',16);
	dbms_stats.set_system_stats('MREADTIM',12);
	dbms_stats.set_system_stats('SREADTIM',5);
	dbms_stats.set_system_stats('CPUSPEED',1000);
end;
/

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(1,1000))	n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t1_n1 on t1(id, n1);

create table t2
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(10001,20001))	x1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t2_i1 on t2(x1);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

end;
/


create or replace view  v1
as
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 101 and 300
union all
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 501 and 700
;

The key feature of this demonstration is the UNION ALL view and what the optimizer does with it when we have first_rows_N optimisation – this is a simplified model of a production problem I was shown a couple of years ago, so nothing special, nothing invented. Here’s a query that behaves badly:


select
	/*+ gather_plan_statistics */
	v1.small_vc,
	v1.n1
from
	v1,
	t2
where
	t2.id = v1.id
and	t2.x1 = 15000
;

I’m going to execute this query in three different ways – as is, using all_rows optimisation; as is, using first_rows_10 optimisation, then using all_rows optimisation but with the necessary hints to make it follow the first_rows_10 execution path. Here are the resulting plans from an instance of 12.1.0.2 (the same thing happens in 11.2.0.4):


first_rows_10 plan
----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |     1 |    35 |   107   (0)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |     1 |    35 |   107   (0)| 00:00:01 |
|   2 |   VIEW                               | V1    |    24 |   600 |     4   (0)| 00:00:01 |
|   3 |    UNION-ALL                         |       |       |       |            |          |
|*  4 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

all_rows plan
------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                          |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   VIEW                                 | V1    |     1 |    25 |     8   (0)| 00:00:01 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

first_rows_10 plan hinted under all_rows optimisation
---------------------------------------------------------------------------------------------- 
| Id  | Operation                    | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | 
---------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT             |       |   200 |  8600 |       |  6124   (3)| 00:00:01 | 
|*  1 |  HASH JOIN                   |       |   200 |  8600 |    17M|  6124   (3)| 00:00:01 |
|   2 |   VIEW                       | V1    |   402K|    12M|       |  5464   (3)| 00:00:01 | 
|   3 |    UNION-ALL                 |       |       |       |       |            |          | 
|*  4 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2731   (3)| 00:00:01 | 
|*  5 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2733   (3)| 00:00:01 | 
|   6 |   TABLE ACCESS BY INDEX ROWID| T2    |   100 |  1000 |       |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |   100 |       |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

I’m not sure why the first_rows_10 plan uses “table access by rowid batched”, but I’d guess it’s because the optimiser calculates that sorting the index rowids before visiting the table may have a small benefit on the speed of getting the first 10 rows – eventually I’ll get around to checking the 10053 trace file. The important thing, though, is the big mistake in the strategy, not the little difference in table access.

In the first_rows_10 plan the optimizer has decided building an in-memory hash table from the UNION ALL of the rows fetched from the two copies of the t1 table will be fast and efficient; but it’s made that decision based on the assumption that it will only get 10 rows from each copy of the table – and at run-time it HAS to get all the relevant t1 rows to build the hash table before it can get any t2 rows. We can get some idea of the scale of this error when we look at the hinted plan under all_rows optimisation – it’s a lot of redundant data and a very expensive hash table build.

In contrast the all_rows plan does an efficient indexed access into the t2 table then, for each row, does a join predicate pushdown into the union all view using an indexed access path. If we only wanted to fetch 10 rows we could stop after doing a minimum amount of work. To demonstrate the error more clearly I’ve re-run the experiment for the first two plans from SQL*PLus, setting the arraysize to 11, the pagesize to 5, and stopping after the first 10 rows. Here are the plans showing the rowsource execution stats:


first_rows_10 plan
------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   107 (100)|     12 |00:00:00.43 |   35150 |       |       |          |
|*  1 |  HASH JOIN                           |       |      1 |      1 |   107   (0)|     12 |00:00:00.43 |   35150 |    24M|  3582K|   23M (0)|
|   2 |   VIEW                               | V1    |      1 |     24 |     4   (0)|    400K|00:00:06.48 |   35118 |       |       |          |
|   3 |    UNION-ALL                         |       |      1 |        |            |    400K|00:00:04.20 |   35118 |       |       |          |
|*  4 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.71 |   17559 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.63 |   17559 |       |       |          |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |       |       |          |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

all_rows plan
-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |      1 |        |   904 (100)|     12 |00:00:00.01 |     213 |
|   1 |  NESTED LOOPS                          |       |      1 |     43 |   904   (1)|     12 |00:00:00.01 |     213 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |
|   4 |   VIEW                                 | V1    |     28 |      1 |     8   (0)|     12 |00:00:00.01 |     181 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |     28 |        |            |     12 |00:00:00.01 |     181 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    212K|     4   (0)|      8 |00:00:00.01 |      93 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      8 |00:00:00.01 |      85 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    213K|     4   (0)|      4 |00:00:00.01 |      88 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      4 |00:00:00.01 |      84 |
-----------------------------------------------------------------------------------------------------------------------

If I had set the optimizer_mode to first_rows_10 because I really only wanted to fetch (about) 10 rows then I’ve managed to pay a huge overhead in buffer visits, memory and CPU for the privilege – the all_rows plan was much more efficient.

Remember – we often see cases where the first_rows(n) plan will do more work to get the whole data set in order to be able to get the first few rows more quickly (the simplest example is when the optimizer uses a particular index to get the first few rows of a result set in order without sorting rather than doing a (faster) full tablescan with sort. This case, though, is different: the optimizer is choosing to build a hash table as if it only has to put 10 rows into that hash table when it actually HAS to build the whole has table before it can take any further steps – we don’t get 10 rows quicker and the rest more slowly; we just get 10 very slow rows.

Footnote:

It’s possible that this is an example of bug 9633142: (FIRST_ROWS OPTIMIZER DOES NOT PUSH PREDICATES INTO UNION INLINE VIEW) but that’s reported as fixed in 12c, with a couple of patches for 11.2.0.2/3. However, setting “_fix_control”=’4887636:off’, does bypass the problem. (The fix control, introduced in 11.1.0.6 has description: “remove restriction from first K row optimization”)

Interesting observations executing SLOB2 with ext4 and xfs on SSD

My lab server has 2 SSDs, one is connected using SATA 2 and another is connected using SATA 3. I’d expect the SATA 3 connected device to be equally well equipped or even better to do work than the “old” interface. I ran SLOB on these devices to find out if there was a difference. To my great surprise the SATA2 – connected SSD performed a lot better than the SATA 3 device, as shown in the AWR report! Initially I was not entirely sure why, since the FIO results on both devices are roughly equal. You will see why though when reading this post. In summary: use XFS for any concurrent writes. Or maybe ASM.

Let’s investigate

Let’s do a little I/O investigation because a) it’s cool and b) you can.

This is Oracle Linux 6.5 with UEK 3. A word of warning though-ext4 has been, let’s say, a little flaky. I first found out about that in 2011 (http://martincarstenbach.wordpress.com/2011/11/04/an-interesting-problem-with-ext4-on-oracle-linux-5-5/) and Oracle now published a MOS note: ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (Doc ID 1487957.1). I am on a reasonably recent version so the observations in this post are probably not related to the above notes. The device in /dev/sda uses SATA2 and the file system /u01/app/oracle/oradata is XFS formatted. The other SSD, /dev/sde uses EXT4 but is connected via SATA3 provided by a PCIe v2 x4 card. I wish I had PCIe v3 but my Opteron 6200 system doesn’t support native SATA6G.

SLOB

I created a 12c non-CDB database for SLOB with the IOPS tablespace on my SATA 2 SSD that’s mounted on XFS. Every test has been repeated four times. Should you want to follow along, here is my slob.conf:

[/m/kvm/oracle]$ cat ~/SLOB2/SLOB/slob.conf 

UPDATE_PCT=20
RUN_TIME=120
WORK_LOOP=0
SCALE=10000
WORK_UNIT=256
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=8
SHARED_DATA_MODULUS=0

# Settings for SQL*Net connectivity:
#ADMIN_SQLNET_SERVICE=slob
#SQLNET_SERVICE_BASE=slob
#SQLNET_SERVICE_MAX=2
#SYSDBA_PASSWD="change_on_install"

export UPDATE_PCT RUN_TIME WORK_LOOP SCALE WORK_UNIT LOAD_PARALLEL_DEGREE REDO_STRESS SHARED_DATA_MODULUS

[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]

Notice the UPDATE_PCT-that will become important later. The SLOB configuration has been created using a call to ./setup.sh with IOPS as the target tablespace and 128 users which is the default. My database uses this initialisation file shown below, which is more or lesse the same as the default (and my buffer cache is indeed 16M)

[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]
[/m/kvm/oracle]$ cat /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initSLOB12.ora 

#set the following to paths that make sense on your system:
db_create_file_dest = '/u01/app/oracle/oradata'
control_files='/u01/app/oracle/oradata/SLOB12/controlfile/o1_mf_b52b4lvk_.ctl'

db_name = SLOB12
compatible = 12.1.0.2.0
undo_management = auto
db_block_size = 8192
db_files = 2000
processes = 500
shared_pool_size = 1G
db_cache_size = 10M     # Will force maximum physical I/O
#db_cache_size = 15G    # Sufficient for cached runs with reference scale (10,000 rows) and 128 sessions
filesystemio_options = setall
parallel_max_servers = 0
log_buffer = 134217728
cpu_count = 2
pga_aggregate_target = 1G
remote_login_passwordfile = exclusive
workarea_size_policy = auto
recyclebin = off

# The following are needed to force db file sequential read. Omit for direct path short scans.
# If you would like to investigate a blend of short scans mixed with single-block random I/O
# then comment out these three parameters :
_db_block_prefetch_limit = 0
_db_block_prefetch_quota = 0
_db_file_noncontig_mblock_read_count = 0

# Uncomment the following if suffering library cache contention (at larger numbers of sessions) or high log file sync waits
# at CPU-saturated levels.
#_cursor_obsolete_threshold=130
#_high_priority_processes=LGWR

[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]

After the fourth execution of ./runit.sh 16 I moved the IOPS data file from /dev/sda to /dev/sde and ran another 4 tests.

Observations

I ran an AWR difference report to compare the performance of the XFS and EXT4 SSDs. The most interesting bit is right on the top. Apologies for the very wide output-that’s the way the text report lays it out.

WORKLOAD REPOSITORY COMPARE PERIOD REPORT

Snapshot Set    DB Id    Instance     Inst Num Release     Cluster Host          Std Block Size
------------ ----------- ------------ -------- ----------- ------- ------------ ---------------
First (1st)   1269760792 SLOB12              1 12.1.0.2.0  NO      ol62.localdo     8192
Second (2nd)  1269760792 SLOB12              1 12.1.0.2.0  NO      ol62.localdo     8192

Snapshot Set  Begin Snap Id Begin Snap Time            End Snap Id End Snap Time                  Avg Active Users           Elapsed Time (min)            DB time (min)
------------ -------------- ------------------------- ------------ ------------------------- -------------------------- -------------------------- --------------------------
1st                      41 31-Oct-14 12:18:14 (Fri)           42 31-Oct-14 12:20:17 (Fri)                      15.8                       2.0                      32.1
2nd                      49 31-Oct-14 12:38:18 (Fri)           50 31-Oct-14 12:40:20 (Fri)                      15.8                       2.0                      32.3
                                                                                      ~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                      %Diff:                     0.2%                       0.0%                       0.5%

Host Configuration Comparison
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                     1st                  2nd                 Diff     %Diff
----------------------------------- -------------------- -------------------- -------------------- ---------
Number of CPUs:                                       24                   24                    0       0.0
Number of CPU Cores:                                  24                   24                    0       0.0
Number of CPU Sockets:                                 2                    2                    0       0.0
Physical Memory:                                64413.7M             64413.7M                   0M       0.0
Load at Start Snapshot:                             9.13                 9.57                  .44       4.8
Load at End Snapshot:                              15.38                 15.6                  .22       1.4
%User Time:                                         1.24                  .64                  -.6     -48.4
%System Time:                                        .75                  .74                 -.02      -1.3
%Idle Time:                                        97.89                 98.6                  .71       0.7
%IO Wait Time:                                     35.21                 9.06               -26.15     -74.3
Cache Sizes
~~~~~~~~~~~
                          1st (M)    2nd (M)   Diff (M)    %Diff
---------------------- ---------- ---------- ---------- --------
Memory Target
.....SGA Target
..........Buffer Cache       16.0       16.0        0.0      0.0
..........Shared Pool     1,024.0    1,024.0        0.0      0.0
..........Large Pool
..........Java Pool          16.0       16.0        0.0      0.0
..........Streams Pool
.....PGA Target           1,024.0    1,024.0        0.0      0.0
Log Buffer                  128.0      128.0        0.0      0.0
In-Memory Area

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                 15.8                 15.8        0.2                      2.5                  4.4       78.7
                      CPU time:                  0.5                  0.3      -29.8                      0.1                  0.1       28.6
           Background CPU time:                  0.1                  0.1      -40.0                      0.0                  0.0        0.0
             Redo size (bytes):          8,721,293.5          4,652,147.0      -46.7              1,379,498.8          1,311,306.6       -4.9
         Logical read (blocks):             10,403.9              5,231.1      -49.7                  1,645.6              1,474.5      -10.4
                 Block changes:              5,216.5              2,640.2      -49.4                    825.1                744.2       -9.8
        Physical read (blocks):              7,412.5              3,750.0      -49.4                  1,172.5              1,057.0       -9.8
       Physical write (blocks):              3,369.0              1,729.0      -48.7                    532.9                487.4       -8.5
              Read IO requests:              7,411.8              3,749.3      -49.4                  1,172.4              1,056.8       -9.9
             Write IO requests:              3,018.0              1,538.2      -49.0                    477.4                433.6       -9.2
                  Read IO (MB):                 57.9                 29.3      -49.4                      9.2                  8.3       -9.8
                 Write IO (MB):                 26.3                 13.5      -48.7                      4.2                  3.8       -8.4
                  IM scan rows:                  0.0                  0.0        0.0                      0.0                  0.0        0.0
       Session Logical Read IM:
                    User calls:                  1.4                  1.4       -1.4                      0.2                  0.4       77.3
                  Parses (SQL):                  5.9                  6.3        7.5                      0.9                  1.8       91.4
             Hard parses (SQL):                  0.1                  0.3      107.1                      0.0                  0.1      300.0
            SQL Work Area (MB):                  0.6                  0.4      -25.9                      0.1                  0.1      -25.9
                        Logons:                  0.1                  0.1        0.0                      0.0                  0.0      100.0
                Executes (SQL):                 38.1                 23.7      -37.6                      6.0                  6.7       11.1
                  Transactions:                  6.3                  3.5      -43.8

                                               First               Second       Diff
                                     ---------------      ---------------     ------
     % Blocks changed per Read:                 50.1                 50.5        0.3
              Recursive Call %:                 98.1                 97.6       -0.6
    Rollback per transaction %:                  0.0                  0.0        0.0
                 Rows per Sort:                 68.3                 35.4      -32.9
    Avg DB time per Call (sec):                 11.3                 11.5        0.2
...

I copied the header  from the report  to show you that nothing really changed between the test executions, and that I’m going to do a lot of I/O since the buffer cache is only 16 MB.

What is striking is that the physical reads and writes are roughly half of what I saw during the first batch of executions. The culprit is found quickly. In this scenario I chose SLOB is primarily performing sequential reads:

Top Timed Events   First DB/Inst: SLOB12/SLOB12 Snaps: 41-42 (Elapsed time: 122.27 sec  DB time: 1925.83 sec),  Second DB/Inst: SLOB12/SLOB12 Snaps: 49-50 (Elapsed time: 122.614 sec  DB time: 1935.33 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes

                                               1st                                                                                                2nd
------------------------------------------------------------------------------------------------   ------------------------------------------------------------------------------------------------
Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time   Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time
------------------------------ ------------- ------------ ------------ ------------- -----------   ------------------------------ ------------- ------------ ------------ ------------- -----------
 db file sequential read       User I/O           906,150      1,899.1           2.1        98.6    db file sequential read       User I/O           459,637      1,920.9           4.2        99.3
 db file parallel write        System I/O         116,318         60.0           0.5         3.1    db file async I/O submit      System I/O           2,975         97.2          32.7         5.0
 CPU time                                             N/A         57.7           N/A         3.0    CPU time                                             N/A         40.2           N/A         2.1
 log file parallel write       System I/O           3,485         47.4          13.6         2.5    log file parallel write       System I/O           1,881          3.2           1.7         0.2
 db file async I/O submit      System I/O           7,425          3.7           0.5         0.2    db file parallel write        System I/O           4,670          0.9           0.2         0.0
...

The IO times for the second interval (the EXT4 case) are twice as what they are on the first. But why? I thought at first that the PCIe card adds the extra latency. And I came up with a few other ideas that all remained that: ideas. Ideas are nice, but proof is better. Maybe there is indeed something odd? Or maybe there is something in the way that Oracle uses the device? To find out I used FIO. FIO is my favourite I/O benchmark when it comes to measuring raw performance. If the FIO benchmark results match the Oracle results then indeed there must be something wrong with the SSD or the way it is addressed. I spent a bit of time to compile FIO and ran a set of tests.

FIO

I was really quite curious to see what was happening during the execution of FIO-was the SATA3 SSD really slower? So I started FIO tests, using FIO 2.1.3. Using the same test harness, just the destination is different:

[~/fio-2.1.13]$ cat rand-read-sd*
[rand-read]
numjobs=4
rw=randread
directory=/u01/fio/data
size=5G
bs=8k
ioengine=libaio
buffered=0
iodepth=8

[rand-read]
numjobs=4
rw=randread
directory=/m/kvm/oracle
size=5G
bs=8k
ioengine=libaio
buffered=0
iodepth=8

[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]
[~/fio-2.1.13]$

The /u01 file system sits on /dev/sda, and uses XFS. /m/kvm/oracle is on /dev/sde, using EXT4. Here is the output from the tests:

[~/fio-2.1.13]$ ./fio rand-read-sde.fio
rand-read: (g=0): rw=randread, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=8
...
fio-2.1.13
Starting 4 processes
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 3 (f=3): [r(3),_(1)] [99.2% done] [192.8MB/0KB/0KB /s] [24.7K/0/0 iops] [eta 00m:01s]
rand-read: (groupid=0, jobs=1): err= 0: pid=6124: Thu Oct 30 15:41:15 2014
  read : io=5120.0MB, bw=44174KB/s, iops=5521, runt=118686msec
    slat (usec): min=4, max=109, avg=13.06, stdev= 6.14
    clat (usec): min=128, max=43082, avg=1434.23, stdev=586.65
     lat (usec): min=140, max=43094, avg=1447.47, stdev=586.41
    clat percentiles (usec):
     |  1.00th=[  516],  5.00th=[  708], 10.00th=[  820], 20.00th=[  988],
     | 30.00th=[ 1112], 40.00th=[ 1240], 50.00th=[ 1352], 60.00th=[ 1480],
     | 70.00th=[ 1608], 80.00th=[ 1816], 90.00th=[ 2160], 95.00th=[ 2480],
     | 99.00th=[ 3184], 99.50th=[ 3472], 99.90th=[ 4192], 99.95th=[ 4576],
     | 99.99th=[ 6752]
    bw (KB  /s): min=27088, max=70928, per=24.99%, avg=44119.50, stdev=6314.88
    lat (usec) : 250=0.01%, 500=0.78%, 750=6.02%, 1000=14.12%
    lat (msec) : 2=65.54%, 4=13.37%, 10=0.15%, 20=0.01%, 50=0.01%
  cpu          : usr=1.89%, sys=11.05%, ctx=421189, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6125: Thu Oct 30 15:41:15 2014
  read : io=5120.0MB, bw=44138KB/s, iops=5517, runt=118784msec
    slat (usec): min=4, max=108, avg=13.14, stdev= 6.14
    clat (usec): min=117, max=43252, avg=1435.31, stdev=587.66
     lat (usec): min=124, max=43264, avg=1448.64, stdev=587.40
    clat percentiles (usec):
     |  1.00th=[  502],  5.00th=[  700], 10.00th=[  820], 20.00th=[  988],
     | 30.00th=[ 1128], 40.00th=[ 1240], 50.00th=[ 1352], 60.00th=[ 1480],
     | 70.00th=[ 1624], 80.00th=[ 1816], 90.00th=[ 2160], 95.00th=[ 2480],
     | 99.00th=[ 3152], 99.50th=[ 3472], 99.90th=[ 4192], 99.95th=[ 4640],
     | 99.99th=[ 6688]
    bw (KB  /s): min=27504, max=69760, per=24.91%, avg=43972.22, stdev=6169.69
    lat (usec) : 250=0.07%, 500=0.92%, 750=5.84%, 1000=13.94%
    lat (msec) : 2=65.68%, 4=13.40%, 10=0.16%, 20=0.01%, 50=0.01%
  cpu          : usr=1.89%, sys=11.10%, ctx=421379, majf=0, minf=46
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6126: Thu Oct 30 15:41:15 2014
  read : io=5120.0MB, bw=44230KB/s, iops=5528, runt=118538msec
    slat (usec): min=4, max=120, avg=12.59, stdev= 5.61
    clat (usec): min=185, max=42871, avg=1433.09, stdev=581.75
     lat (usec): min=196, max=42881, avg=1445.85, stdev=581.45
    clat percentiles (usec):
     |  1.00th=[  532],  5.00th=[  716], 10.00th=[  828], 20.00th=[  988],
     | 30.00th=[ 1128], 40.00th=[ 1240], 50.00th=[ 1352], 60.00th=[ 1464],
     | 70.00th=[ 1608], 80.00th=[ 1800], 90.00th=[ 2128], 95.00th=[ 2480],
     | 99.00th=[ 3152], 99.50th=[ 3440], 99.90th=[ 4192], 99.95th=[ 4576],
     | 99.99th=[ 7008]
    bw (KB  /s): min=27600, max=65952, per=25.01%, avg=44149.65, stdev=5945.98
    lat (usec) : 250=0.01%, 500=0.63%, 750=5.90%, 1000=14.17%
    lat (msec) : 2=65.89%, 4=13.25%, 10=0.15%, 20=0.01%, 50=0.01%
  cpu          : usr=1.74%, sys=11.20%, ctx=426219, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6127: Thu Oct 30 15:41:15 2014
  read : io=5120.0MB, bw=47359KB/s, iops=5919, runt=110704msec
    slat (usec): min=4, max=114, avg=12.83, stdev= 5.90
    clat (usec): min=188, max=42988, avg=1337.06, stdev=571.06
     lat (usec): min=198, max=43000, avg=1350.07, stdev=570.63
    clat percentiles (usec):
     |  1.00th=[  446],  5.00th=[  620], 10.00th=[  748], 20.00th=[  900],
     | 30.00th=[ 1020], 40.00th=[ 1144], 50.00th=[ 1256], 60.00th=[ 1368],
     | 70.00th=[ 1512], 80.00th=[ 1704], 90.00th=[ 2040], 95.00th=[ 2352],
     | 99.00th=[ 3024], 99.50th=[ 3280], 99.90th=[ 3984], 99.95th=[ 4384],
     | 99.99th=[ 6624]
    bw (KB  /s): min=33664, max=51008, per=26.83%, avg=47369.34, stdev=4051.34
    lat (usec) : 250=0.01%, 500=1.45%, 750=8.72%, 1000=17.95%
    lat (msec) : 2=61.12%, 4=10.65%, 10=0.09%, 20=0.01%, 50=0.01%
  cpu          : usr=1.98%, sys=11.60%, ctx=410433, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: io=20480MB, aggrb=176551KB/s, minb=44137KB/s, maxb=47359KB/s, mint=110704msec, maxt=118784msec

Disk stats (read/write):
  sde: ios=2619049/4, merge=0/1, ticks=3663514/6, in_queue=3662827, util=100.00%

The best tool to look at IO without tracing block devices is collectl (in my opinion). I ran the extended disk stats in parallel:

[root@ol62 queue]# collectl -sD -i 5 --iosize --dskfilt sde
waiting for 5 second sample...

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sde              0      0    0    0  125340    129  123 1021    1020   219  1749      8   99
sde              0      0    0    0  119608     58  118 1017    1017   156  1666      7   88
sde         146968      0  18K    8       0      0    0    0       8    31     1      0   89
sde         165435      0  20K    8       2      0    0    6       7    31     1      0   99
sde         187097      0  23K    8       0      0    0    0       8    31     1      0  100
sde         187113      0  23K    8       0      0    0    0       8    31     1      0   99
sde         157684      0  19K    8       0      0    0    0       8    31     1      0   99
sde         186542      0  23K    8       1      0    0    4       7    31     1      0  100
sde         186721      0  23K    8       0      0    0    0       8    31     1      0   99
sde         158416      0  19K    8       0      0    0    0       8    31     1      0  100
sde         187168      0  23K    8       0      0    0    0       8    31     1      0   99
sde         186027      0  23K    8       0      0    0    0       8    31     1      0  100
sde         158366      0  19K    8       0      0    0    0       8    31     1      0   99
sde         186028      0  23K    8       0      0    0    0       8    31     1      0   99
sde         186416      0  23K    8       0      0    0    0       8    31     1      0  100
sde         159102      0  19K    8       0      0    0    0       8    31     1      0  100
sde         186348      0  23K    8       0      0    0    0       8    31     1      0  100
sde         186859      0  23K    8       0      0    0    0       8    31     1      0   99
sde         153996      0  19K    8       0      0    0    0       8    31     1      0   99
sde         186043      0  23K    8       0      0    0    0       8    31     1      0  100
sde         185348      0  23K    8       0      0    0    0       8    31     1      0   99
sde         158150      0  19K    8       0      0    0    0       8    31     1      0   99
sde         186148      0  23K    8       0      0    0    0       8    31     1      0   99
sde         185900      0  23K    8       0      0    0    0       8    31     1      0   99
sde         159353      0  19K    8       0      0    0    0       8    25     1      0   99
sde         167065      0  20K    8       0      0    0    0       8    22     0      0   86
sde              0      0    0    0       0      0    0    0       0     0     0      0    0

This matches the FIO output-and indeed, the device can do around 23k IOPS with an 8k IO size. What about the XFS mount?

[~/fio-2.1.13]$ ./fio rand-read-sda.fio
rand-read: (g=0): rw=randread, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=8
...
fio-2.1.13
Starting 4 processes
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 4 (f=4): [r(4)] [100.0% done] [191.3MB/0KB/0KB /s] [24.5K/0/0 iops] [eta 00m:00s]
rand-read: (groupid=0, jobs=1): err= 0: pid=6329: Thu Oct 30 15:52:03 2014
  read : io=5120.0MB, bw=49087KB/s, iops=6135, runt=106809msec
    slat (usec): min=3, max=93, avg= 7.90, stdev= 6.40
    clat (usec): min=280, max=35113, avg=1294.26, stdev=152.91
     lat (usec): min=285, max=35130, avg=1302.35, stdev=152.60
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1160], 10.00th=[ 1192], 20.00th=[ 1240],
     | 30.00th=[ 1272], 40.00th=[ 1272], 50.00th=[ 1288], 60.00th=[ 1304],
     | 70.00th=[ 1320], 80.00th=[ 1352], 90.00th=[ 1384], 95.00th=[ 1416],
     | 99.00th=[ 1480], 99.50th=[ 1512], 99.90th=[ 1672], 99.95th=[ 2224],
     | 99.99th=[ 3024]
    bw (KB  /s): min=46000, max=49536, per=25.02%, avg=49124.10, stdev=291.98
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.03%
    lat (msec) : 2=99.89%, 4=0.07%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=2.39%, sys=8.05%, ctx=505195, majf=0, minf=43
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6330: Thu Oct 30 15:52:03 2014
  read : io=5120.0MB, bw=49093KB/s, iops=6136, runt=106795msec
    slat (usec): min=3, max=114, avg= 7.55, stdev= 5.74
    clat (usec): min=263, max=35353, avg=1294.46, stdev=155.68
     lat (usec): min=271, max=35369, avg=1302.21, stdev=155.46
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1160], 10.00th=[ 1192], 20.00th=[ 1256],
     | 30.00th=[ 1272], 40.00th=[ 1288], 50.00th=[ 1288], 60.00th=[ 1304],
     | 70.00th=[ 1320], 80.00th=[ 1352], 90.00th=[ 1384], 95.00th=[ 1416],
     | 99.00th=[ 1480], 99.50th=[ 1512], 99.90th=[ 1672], 99.95th=[ 2224],
     | 99.99th=[ 3184]
    bw (KB  /s): min=45728, max=49472, per=25.02%, avg=49123.20, stdev=300.30
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.02%
    lat (msec) : 2=99.89%, 4=0.07%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=2.42%, sys=7.88%, ctx=549312, majf=0, minf=46
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6331: Thu Oct 30 15:52:03 2014
  read : io=5120.0MB, bw=49080KB/s, iops=6134, runt=106824msec
    slat (usec): min=3, max=137, avg= 7.70, stdev= 5.87
    clat (usec): min=266, max=35102, avg=1294.64, stdev=147.64
     lat (usec): min=274, max=35121, avg=1302.52, stdev=147.39
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1160], 10.00th=[ 1208], 20.00th=[ 1256],
     | 30.00th=[ 1272], 40.00th=[ 1272], 50.00th=[ 1288], 60.00th=[ 1304],
     | 70.00th=[ 1320], 80.00th=[ 1352], 90.00th=[ 1384], 95.00th=[ 1416],
     | 99.00th=[ 1480], 99.50th=[ 1512], 99.90th=[ 2064], 99.95th=[ 2256],
     | 99.99th=[ 3600]
    bw (KB  /s): min=45888, max=49536, per=25.02%, avg=49110.62, stdev=295.81
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.04%
    lat (msec) : 2=99.84%, 4=0.10%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=2.32%, sys=8.07%, ctx=536841, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6332: Thu Oct 30 15:52:03 2014
  read : io=5120.0MB, bw=49080KB/s, iops=6134, runt=106824msec
    slat (usec): min=3, max=106, avg= 7.66, stdev= 5.93
    clat (usec): min=274, max=35268, avg=1294.67, stdev=157.16
     lat (usec): min=281, max=35286, avg=1302.53, stdev=156.91
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1160], 10.00th=[ 1208], 20.00th=[ 1256],
     | 30.00th=[ 1272], 40.00th=[ 1272], 50.00th=[ 1288], 60.00th=[ 1304],
     | 70.00th=[ 1320], 80.00th=[ 1352], 90.00th=[ 1384], 95.00th=[ 1416],
     | 99.00th=[ 1480], 99.50th=[ 1512], 99.90th=[ 1944], 99.95th=[ 2224],
     | 99.99th=[ 3536]
    bw (KB  /s): min=45840, max=49536, per=25.02%, avg=49109.56, stdev=303.79
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.05%
    lat (msec) : 2=99.84%, 4=0.09%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=2.50%, sys=7.83%, ctx=533469, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: io=20480MB, aggrb=196318KB/s, minb=49079KB/s, maxb=49092KB/s, mint=106795msec, maxt=106824msec

Disk stats (read/write):
  sda: ios=2620077/60, merge=0/2, ticks=3348979/94, in_queue=3348631, util=100.00%
[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]
[~/fio-2.1.13]$

Here is some of the corresponding collectl output:

[root@ol62 queue]# collectl -sD -i 5 --iosize --dskfilt sda
waiting for 5 second sample...

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sda         196313      0  24K    8       5      0    1    9       8    31     1      0   99
sda         196681      0  24K    8       6      0    0   16       8    31     1      0  100
sda         196369      0  24K    8       7      0    1   12       8    31     1      0   99
sda         196217      0  24K    8       4      0    0   10       8    31     1      0  100
sda         196300      0  24K    8      14      0    2    6       8    31     1      0  100
sda         196699      0  24K    8       6      0    0   16       8    31     1      0   99
sda         196340      0  24K    8       3      0    0   16       8    31     1      0   99
sda         196350      0  24K    8       6      0    0   16       8    31     1      0  100
sda         196379      0  24K    8       6      0    0   16       8    31     1      0   99
sda         196560      0  24K    8       3      0    0   16       8    31     1      0   99
sda         196259      0  24K    8       8      0    1    8       8    31     1      0   99

Summary Number 1

So it would appear that the SSDs are more or less equal from FIO’s point of view. If you followed closely (hurray if you did-I appreciate this is a long post!) you will have noticed the following:

  • The SLOB test used updates (20%). There is furthermore just one data file for the IOPS tablespace (-> there is concurrency!)
  • The FIO test was based on random reads, no writes. It’s 4 jobs also worked on 4 distinct files.

I have tried to avoid the effect of file system buffering by setting buffer to 0 and I’m also using asynchronous IO as provided by libaio (just as Oracle does).

Theory Number 2

It would appear that XFS is better suited for concurrent writes. Let’s put that theory into action. I reformatted the SATA3 SSD with XFS and ran another test. Feast your eyes-comparing the AWR reports of the EXT4 test execution to XFS on /dev/sde:

WORKLOAD REPOSITORY COMPARE PERIOD REPORT

Snapshot Set    DB Id    Instance     Inst Num Release     Cluster Host          Std Block Size
------------ ----------- ------------ -------- ----------- ------- ------------ ---------------
First (1st)   1269760792 SLOB12              1 12.1.0.2.0  NO      ol62.localdo     8192
Second (2nd)  1269760792 SLOB12              1 12.1.0.2.0  NO      ol62.localdo     8192

Snapshot Set  Begin Snap Id Begin Snap Time            End Snap Id End Snap Time                  Avg Active Users           Elapsed Time (min)            DB time (min)
------------ -------------- ------------------------- ------------ ------------------------- -------------------------- -------------------------- --------------------------
1st                      49 31-Oct-14 12:38:18 (Fri)           50 31-Oct-14 12:40:20 (Fri)                      15.8                       2.0                      32.3
2nd                      55 31-Oct-14 15:35:23 (Fri)           56 31-Oct-14 15:37:25 (Fri)                      15.7                       2.0                      32.1
                                                                                      ~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                      %Diff:                     0.2%                      -0.5%                      -0.6%

Host Configuration Comparison
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                     1st                  2nd                 Diff     %Diff
----------------------------------- -------------------- -------------------- -------------------- ---------
Number of CPUs:                                       24                   24                    0       0.0
Number of CPU Cores:                                  24                   24                    0       0.0
Number of CPU Sockets:                                 2                    2                    0       0.0
Physical Memory:                                64413.7M             64413.7M                   0M       0.0
Load at Start Snapshot:                             9.57                 1.77                 -7.8     -81.5
Load at End Snapshot:                               15.6                14.15                -1.45      -9.3
%User Time:                                          .64                 2.07                 1.43     223.4
%System Time:                                        .74                 1.48                  .75     100.0
%Idle Time:                                         98.6                 96.4                 -2.2      -2.2
%IO Wait Time:                                      9.06                53.29                44.23     488.2
Cache Sizes
~~~~~~~~~~~
                          1st (M)    2nd (M)   Diff (M)    %Diff
---------------------- ---------- ---------- ---------- --------
Memory Target
.....SGA Target
..........Buffer Cache       16.0       16.0        0.0      0.0
..........Shared Pool     1,024.0    1,024.0        0.0      0.0
..........Large Pool
..........Java Pool          16.0       16.0        0.0      0.0
..........Streams Pool
.....PGA Target           1,024.0    1,024.0        0.0      0.0
Log Buffer                  128.0      128.0        0.0      0.0
In-Memory Area

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                 15.8                 15.8        0.2                      4.4                  1.5      -66.7
                      CPU time:                  0.3                  0.9      175.8                      0.1                  0.1        0.0
           Background CPU time:                  0.1                  0.1      133.3                      0.0                  0.0      -50.0
             Redo size (bytes):          4,652,147.0         15,186,494.1      226.4              1,311,306.6          1,426,136.0        8.8
         Logical read (blocks):              5,231.1             16,957.8      224.2                  1,474.5              1,592.5        8.0
                 Block changes:              2,640.2              8,922.1      237.9                    744.2                837.9       12.6
        Physical read (blocks):              3,750.0             12,560.9      235.0                  1,057.0              1,179.6       11.6
       Physical write (blocks):              1,729.0              5,855.0      238.6                    487.4                549.8       12.8
              Read IO requests:              3,749.3             12,560.8      235.0                  1,056.8              1,179.6       11.6
             Write IO requests:              1,538.2              5,219.5      239.3                    433.6                490.2       13.0
                  Read IO (MB):                 29.3                 98.1      234.9                      8.3                  9.2       11.6
                 Write IO (MB):                 13.5                 45.7      238.6                      3.8                  4.3       12.9
                  IM scan rows:                  0.0                  0.0        0.0                      0.0                  0.0        0.0
       Session Logical Read IM:
                    User calls:                  1.4                  1.4        2.9                      0.4                  0.1      -66.7
                  Parses (SQL):                  6.3                 10.3       62.7                      1.8                  1.0      -45.5
             Hard parses (SQL):                  0.3                  0.0      -86.2                      0.1                  0.0     -100.0
            SQL Work Area (MB):                  0.4                  0.2      -48.8                      0.1                  0.0      -48.8
                        Logons:                  0.1                  0.1        0.0                      0.0                  0.0      -50.0
                Executes (SQL):                 23.7                 58.0      144.3                      6.7                  5.4      -18.5
                  Transactions:                  3.5                 10.6      200.0

                                               First               Second       Diff
                                     ---------------      ---------------     ------
     % Blocks changed per Read:                 50.5                 52.6        2.1
              Recursive Call %:                 97.6                 99.0        1.5
    Rollback per transaction %:                  0.0                  0.0        0.0
                 Rows per Sort:                 35.4                 49.6       14.2
    Avg DB time per Call (sec):                 11.5                 11.3       -0.3

Top Timed Events   First DB/Inst: SLOB12/SLOB12 Snaps: 49-50 (Elapsed time: 122.614 sec  DB time: 1935.33 sec),  Second DB/Inst: SLOB12/SLOB12 Snaps: 55-56 (Elapsed time: 121.705 sec  DB time: 1923.77 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes

                                               1st                                                                                                2nd
------------------------------------------------------------------------------------------------   ------------------------------------------------------------------------------------------------
Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time   Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time
------------------------------ ------------- ------------ ------------ ------------- -----------   ------------------------------ ------------- ------------ ------------ ------------- -----------
 db file sequential read       User I/O           459,637      1,920.9           4.2        99.3    db file sequential read       User I/O         1,528,725      1,840.4           1.2        95.7
 db file async I/O submit      System I/O           2,975         97.2          32.7         5.0    CPU time                                             N/A        110.3           N/A         5.7
 CPU time                                             N/A         40.2           N/A         2.1    db file parallel write        System I/O         114,997         34.3           0.3         1.8
 log file parallel write       System I/O           1,881          3.2           1.7         0.2    free buffer waits             Configuration        2,318         28.0          12.1         1.5
 db file parallel write        System I/O           4,670          0.9           0.2         0.0    log file parallel write       System I/O           6,897         14.3           2.1         0.7
 control file sequential read  System I/O             499          0.2           0.3         0.0    db file async I/O submit      System I/O          12,819          5.2           0.4         0.3
 oracle thread bootstrap       Other                    7          0.1          18.1         0.0    buffer busy waits             Concurrency            407          2.7           6.7         0.1
 Disk file operations I/O      User I/O                77          0.1           0.9         0.0    Data file init write          User I/O               152          0.9           5.7         0.0
 library cache: mutex X        Concurrency             22          0.1           2.7         0.0    read by other session         User I/O                60          0.7          11.9         0.0
 Data file init write          User I/O                 5          0.0           5.6         0.0    enq: HW - contention          Configuration           18          0.7          39.3         0.0
-read by other session         User I/O                18          0.0           0.6         0.0   -control file sequential read  System I/O             841          0.5           0.5         0.0
-buffer busy waits             Concurrency             82          0.0           0.0         0.0   -oracle thread bootstrap       Other                    7          0.1          17.2         0.0
-                                                     N/A          N/A           N/A         N/A   -Disk file operations I/O      User I/O               136          0.1           0.8         0.0
-                                                     N/A          N/A           N/A         N/A   -library cache: mutex X        Concurrency             21          0.0           1.4         0.0
                          --------------------------------------------------------------------------------------------------------------------

Everything is just better – the IO times, the redo size, the number of IOPS etc. That’s what I would have expected from the outset. Good to know-I’ll no longer use EXT4 for Oracle databases. After discussing this with @fritshoogland he showed me a post by @kevinclosson that didn’t come to mind at the time:

http://kevinclosson.net/2012/03/06/yes-file-systems-still-need-to-support-concurrent-writes-yet-another-look-at-xfs-versus-ext4/

MySQL : What management tools do you use?

A quick question out to the world. What management tools do you use for MySQL?

We currently have:

  • MySQL Workbench : It’s OK, but I don’t really like it. It feels like half a product compared to tools I’ve used for other database engines…
  • phpMyAdmin : I’ve used this on and off for over a decade for my own website. While I’m typing this sentence, they’ve probably released 4 new versions. :) We have an installation of this which we use to access our MySQL databases should the need arise.
  • mysql Command Line : I use the command line and a variety of scripts for the vast majority of the things I do.

When I’m working with Oracle, my first port of call for any situation is to use SQL*Plus along with a variety of scripts I’ve created over the years. The performance stuff in Cloud Control (if you’ve paid for the Diagnostics and Tuning option) is the big exception to that of course.

I still consider myself a newbie MySQL administrator, but I’ve found myself spending more and more time at the command line, to the point where I rarely launch MySQL Workbench or phpMyAdmin these days. I’m wondering if that is common to other MySQL administrators, or if it is a carry over from my Oracle background…

Enquiring minds need to know!

Cheers

Tim…


MySQL : What management tools do you use? was first posted on October 31, 2014 at 9:35 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Index Advanced Compression vs. Bitmap Indexes (Candidate)

A good question from Robert Thorneycroft I thought warranted its own post. He asked: “I have a question regarding bitmapped indexes verses index compression. In your previous blog titled ‘So What Is A Good Cardinality Estimate For A Bitmap Index Column ? (Song 2)’ you came to the conclusion that ‘500,000 distinct values in a 1 […]

HPC versus HDFS: Scientific versus Social

There have been rumblings from the HPC community indicating a general suspicion of and disdain for Big Data technology which would lead one to believe that whatever Google, Facebook and Twitter do with their supercomputers is not important enough to warrant seriousness—that social supercomputing is simply not worthy.  A little of this emotion seems to […]

Quiz night

Here’s a little puzzle that came up on OTN recently.  (No prizes for following the URL to find the answer) (Actually, no prizes anyway). There’s more in the original code sample than was really needed, so although I’ve done a basic cut and paste from the original I’ve also eliminated a few lines of the text:


execute dbms_random.seed(0)

create table t
as
select rownum as id,
       100+round(ln(rownum/3.25+2)) aS val2,
       dbms_random.string('p',250) aS pad
from dual
connect by level <= 1000
order by dbms_random.value;

begin
  dbms_stats.gather_table_stats(ownname          => user,
                                tabname          => 'T',
                                method_opt       => 'for all columns size 254'
  );
end;
/

column endpoint_value format 9999
column endpoint_number format 999999
column frequency format 999999

select endpoint_value, endpoint_number,
       endpoint_number - lag(endpoint_number,1,0)
                  OVER (ORDER BY endpoint_number) AS frequency
from user_tab_histograms
where table_name = 'T'
and column_name = 'VAL2'
order by endpoint_number
;

alter session set optimizer_mode = first_rows_100;

explain plan set statement_id '101' for select * from t where val2 = 101;
explain plan set statement_id '102' for select * from t where val2 = 102;
explain plan set statement_id '103' for select * from t where val2 = 103;
explain plan set statement_id '104' for select * from t where val2 = 104;
explain plan set statement_id '105' for select * from t where val2 = 105;
explain plan set statement_id '106' for select * from t where val2 = 106;

select statement_id, cardinality from plan_table where id = 0;

The purpose of the method_opt in the gather_table_stats() call is to ensure we get a frequency histogram on val2; and the query against the user_tab_columns view should give the following result:


ENDPOINT_VALUE ENDPOINT_NUMBER FREQUENCY
-------------- --------------- ---------
           101               8         8
           102              33        25
           103             101        68
           104             286       185
           105             788       502
           106            1000       212

Given the perfect frequency histogram, the question then arises why the optimizer seems to calculate incorrect cardinalities for some of the queries; the output from the last query is as follows:


STATEMENT_ID                   CARDINALITY
------------------------------ -----------
101                                      8
102                                     25
103                                     68
104                                    100           -- expected prediction 185
105                                    100           -- expected prediction 502
106                                    100           -- expected prediction 212

I’ve disabled comments so that you can read the answer at OTN if you want to – but see if you can figure out the reason before reading it. (This reproduces on 11g and 12c – and probably on earlier versions all the way back to 9i).

I haven’t done anything extremely cunning with hidden parameters, materialized views, query rewrite, hidden function calls, virtual columns or any other very dirty tricks, by the way.

12.1.0.2 Introduction to Zone Maps Part II (Changes)

In Part I, I discussed how Zone Maps are new index like structures, similar to Exadata Storage Indexes, that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A Zone being a range of contiguous (8M) blocks. I […]

Metric Thresholds and the Power to Adapt

Metric thresholds have come a long way since I started working with OEM 10g.  I remember how frustrating it could be if an ETL load impacted the metric values that had to be set for a given IO or CPU load for a database when during business hours, a much lower value would be preferable.  Having to explain to the business why a notification wasn’t sent during the day due to the threshold set for resource usage for night time batch processing often went unaccepted.

With EM12c, release 4, we now have Time-based Static thresholds and Adaptive thresholds.  Both are incredibly valuable to ensuring the administrator is aware of issues before they become a problem and not let environments with askew workloads leave them unaware.

Both of these new features are available once you are logged into a target, then from the left side menu, , Monitoring, Metric and Collection Settings.  Under the Metrics tab you will find a drop down that can be changed from the default of Metrics with Thresholds to Time-based Static and Adaptive Thresholds which will allow you to view any current setup for either of these advanced threshold management.

adv_thresh_page2

To access the configuration, look below on the page for the Advanced Threshold Management link-

adv_thresh_page3

Time-Based Static Thresholds

The concept behind Time-based Static thresholds is that you have very specific workloads in a 24hr period and you wish to set thresholds based on the resource cycle.  This will require the administrator to be very familiar with the workload to set this correctly.  I understand this model very well, as most places I’ve been the DBA for, I was known for memorizing EXACTLY the standard flow of resource usage for any given database.

In the Time-based Static Threshold tab from the Metrics tab, we can configure, per target, (host, database, cluster) the thresholds by value and time that makes sense for the target by clicking on Register Metrics.

This will take you to a Metric Selector page that will help you set up the time-based static thresholds for the target and remember, this is target specific.  You can choose to set up as many metrics for a specific target or just one or two.  The search option allows for easy access to the metrics.

adaptive12

Choose which metrics you wish to set the time-based static thresholds for and click OK.

You can then set the values for each metric that was chosen for weekday or weekend, etc.

adaptive13

You will be warned that your metric thresholds will not be set until you hit the Save button.  Note: You won’t be able to click on it until you close this warning, as the Save button is BEHIND the pop-up warning.

If the default threshold changes for weekday day/night and weekend day/night are not adequate to satisfy the demands of the system workload, you can edit and change these to be more definitive-

adv_thrhld5

Once you’ve chosen the frequency change, you can then set up the threshold values for the more comprehensive plan and save the changes.  That’s all there is to it, but I do recommend tweaking as necessary if any “white noise” pages result from the static settings.

Removing Time-based Static Thresholds

To remove a time-based threshold for any metric(s), click on the select for each metric with thresholds that you wish to remove and click the Remove button.  You will be asked to confirm and the metric(s) time-based static threshold settings will be reverted to the default values or to values set in a default monitoring template for the target type.

Adaptive Thresholds

Unlike the Time-based Static Thresholds, which are based off of settings configured manually, Adaptive Thresholds source their threshold settings off of a “collected” baseline.  This is more advanced than static set thresholds as it takes the history of the workload collected in a baseline into consideration when calculating the thresholds.  The most important thing to remember is to ensure to use a baseline that includes a clear example of a standard workload of the system in the snapshot.

There are two types of baselines, static and moving.  A static baseline is for a given snapshot of time and does not change.  A moving baseline is recollected on a regular interval and can be for anywhere from 7-31 days.

The reason to use a moving baseline over a static one is that a moving baseline will incorporate changes to the workload over time, resulting in a system that has metric growth to go with system growth.  The drawback?  If there is a problem that happens on a regular interval, you may not catch it, where the static baseline could be verified and not be impacted by this type of change.

After a baseline of performance metric data has been collected from a target, you can then access the Adaptive Thresholds configuration tab via the Advanced Threshold Management page.

You have the option from the Advanced Threshold Management page to set up the default settings for the baseline type, threshold change frequency and how long the accumulation of baseline data should be used to base the adaptive threshold value on.

adaptive11

Once you choose the adaptive settings you would like to make active, click on the Save button to keep the configuration.

Now let’s add the metrics we want to configure adaptive thresholds for by clicking on Register Metrics-

adaptive14

You will be taken to a similar window that you saw for the Time-based Static Thresholds.  Drill down in the list and choose the metrics that could benefit from an adaptive threshold setting and once you are done choosing all the metrics that you want from the list, click on OK.

Note:  Once you hit OK, there is no other settings that have to be configured.  Cloud Control will then complete the configuration, so ensure you have the correct you wish to have registered for the target.

adaptive15

Advanced Reporting on Adaptive Thresholds

For any adaptive threshold that you have register, you can click on the Select, (on the right side of the Metric list) and view analysis of the threshold data to see how the adaptive thresholds are supporting the metric.

adaptvie16

You can also test out different values and preview how they will support the metric and decide if you want to move away from an adaptive threshold and to a static one.

You can also choose click on the Test All which will look at previous data and see how the adaptive thresholds will support in theory in the future by how data in the baseline has been analyzed for the frequency window.

For my metric, I didn’t have time behind my baseline to give much in the way of a response, but the screenshot gives you an idea of what you will be looking at-

adaptive18

Removing Adaptive Thresholds

If there is a metric that you wish to no longer have a metric threshold on, simply put a check mark in the metric’s Select box and then click on Deregister-

adaptive17

You will be asked if you want to continue, click Yes and the adaptive threshold will be removed from the target for the metric(s) checked.

Advanced threshold management offers the administrator a few more ways to gain definitive control over monitoring of targets via EM12c.  I haven’t found an environment yet that didn’t have at least one database or host that could benefit from these valuable features.

 

 

 

 



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [Metric Thresholds and the Power to Adapt], All Right Reserved. 2014.