Search

OakieTags

Who's online

There are currently 0 users and 35 guests online.

Recent comments

Affiliations

Collecting and analysing Exadata cell metrics

Recently I have been asked to write a paper about Exadata Flash Cache and its impact on performance. This was a challenge to my liking! I won’t reproduce the paper I wrote, but I’d like to demonstrate the methods I used to get more information about what’s happening in the software.

Hardware

The Exadata Flash Cache is provided by four F20 PCIe cards in each cell. Currently the PCI Express bus is the most potent way to realise the potential of the flash disk in terms of latency and bandwidth. SSDs attached to a standard storage array will be slowed by fibre channel as the transport medium.

Each of the F20 cards holds 96G of raw space, totalling in 384GB of capacity per storage cell. The usable capacity is slightly less. The F20 card is subdivided into 4 so called FMODs, or solid state flash modules visible to the operating system using the standard SCSI SD driver.

Cellcli can also be used to view the FMODs using the “LIST PHYSICALDISK” command. The output is slightly different from the spinning disks as they are reported SCSI drivee’s [host:bus:target:lun] notation.

Now please don’t be tempted to take the FMODs and transform them into celldisks!

How it works

The flash cache is a write-through cache, and therefore won’t speed up write operations to disk. If an internal, undocumented algorithm decides that the data just written to disk is suitable for caching, it will put the data into the ESFC. It is quite clever in this respect and performs optimisations “ordinary” cache cannot perform.

Smart scans are going to ignore the flash cache by default and read directly from disk, unless the segment’s storage clause specifically instructs Oracle to read from flash cache as well. Meddling with the storage clause is also referred to as object pinning, a practice which is not recommended. Pinning objects to the flash cache immediately reminded me of the bad practice in the old 8i days of pinning objects in the shared pool. It didn’t necessarily made things go faster or help (except in some special cases).

The main statistics counter to query in relation to the smart cache is called “cell flash cache read hits”. The remainder of this paper has to do with the cell metrics which cannot be queried directly from the database. They are more granular though making them well suited for analysis.

The test case

The swingbench order entry benchmark has provided the data for the following text. The table ORDER_ITEMS contains most rows in the schema, and it is partitioned to reflect a realistic environment. Indexes have been made invisible as they may have impacted the performance statistics. Additionally cellcli’s WHERE clause isn’t very flexible and querying a number of objectIDs can be time consuming.

SOE.ORDER_ITEMS is divided into partitions with object IDs 74506 to 74570. These object IDs are required for mapping the output of cellcli to the data dictionary later.

A little bit of setup has been performed to capture the cell metrics. For pre and post comparison it was necessary to capture statistics in a statspack-like way and to analyse them. Rather than using Excel I decided to store the gathered information in the database in a dedicated schema.

I gathered two different sets of information from the cells: firstly the metrics for the objectType “FLASHCACHE”, and secondly the contents of the flash cache using the fabulous “dcli” command. The distributed command line interface can be used to execute arbitrary commands on a number of hosts simultaneously and reports the results back to the caller. I am redirecting the output from STDOUT to a CSV file in my home directory and use external tables to copy this information into the so-called “RESULT” table.

The table DDL is as follows:

create table cacheContents_csv (
cellname varchar2(50),
cachedKeepSize     number,
cachedSize         number,
hitCount           number,
missCount          number,
objectNumber             number
)
organization external (
type oracle_loader
default directory testdir
access parameters (
records delimited  by newline
fields  terminated by ';'
missing field values are null
)
location ('flashcontents.csv')
)
reject limit unlimited;

CREATE TABLE cacheContents_RESULT AS
SELECT 1 AS
TESTRUN#,
CELLNAME,
CACHEDKEEPSIZE,
CACHEDSIZE,
HITCOUNT,
MISSCOUNT,
OBJECTNUMBER
FROM cacheContents_csv A;

CREATE TABLE "METRICCURRENT_CSV"
(
"CELLNAME" VARCHAR2(100),
"NAME" VARCHAR2(50),
"OBJECTTYPE" VARCHAR2(50),
"METRICVALUE" VARCHAR2(100)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY TESTDIR
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
FIELDS LDRTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
"CELLNAME" (1:12) CHAR,
"NAME" (14:36) CHAR,
"OBJECTTYPE" (39:48) CHAR,
"METRICVALUE" (51:90) CHAR
)
)
location
(
'metrics.csv'
)
) REJECT LIMIT UNLIMITED;

CREATE TABLE "METRICCURRENT_RESULT"
(
"TESTRUN#"    NUMBER,
"CELLNAME"    VARCHAR2(100 BYTE),
"NAME"        VARCHAR2(50 BYTE),
"OBJECTTYPE"  VARCHAR2(50 BYTE),
"METRICVALUE" VARCHAR2(100 BYTE)
)

I additionally added a table called “metricdefinition” which allows me to match the short metric name from metriccurrent with a more human readable description.

CREATE TABLE "METRICDEFINITION"
(
"NAME"        VARCHAR2(100 BYTE),
"DESCRIPTION" VARCHAR2(500 BYTE),
"METRICTYPE"  VARCHAR2(50 BYTE),
"OBJECTTYPE"  VARCHAR2(50 BYTE),
"UNIT"        VARCHAR2(20 BYTE)
)

The definitions, too, were loaded into the database using sqlldr.

First the metrics were captured:

# dcli -l root -g /tmp/martin_group 'cellcli -e "list METRICCURRENT where objecttype = \"FLASHCACHE\""' | sed -e 's/://' > /home/oracle/martin/metrics.csv

I admit it’s probably not the most elegant way to do so but it worked for me and I didn’t have weeks to prepare (as always…)

Now what does this command do? It will execute a call to cellcli on all nodes defined in /tmp/martin_group to list the current metrics for the Flash Cache. The result is then piped into sed which strips out any colons after the cell names and then saves the result in my TESTDIR directory on the first RDBMS node.

Similarly I am using the following command to get the contents of the flash cache to see if anything has changed after executing the query.

 # dcli -l root -g /tmp/martin_group 'cellcli -e "list flashcachecontent where dbUniqueName like 'TEST' and objectNumber > 74505 and objectNumber < 74571 attributes cachedKeepSize,cachedSize,hitCount,missCount,objectNumber"' | sed -e 's/://' | awk '{print $1";"$2";"$3";"$4";"$5";"$6}' > /home/oracle/martin/flashcontents.csv

This looks a little more complex, but it isn’t. The apparent complexity is required to limit the result set to the contents of the flash cache for database TEST and the object_ids of the table partitions. There is a little more transformation involved with AWK to convert the output in a “;” separated line.

With this done, it was possible to transfer the information in the staging tables into the RESULT tables. This is a simple “insert into table select testrun#, * from …CSV”.

The analysis I did was very simple once the data has been loaded into the RESULT tables. For the flashcache contents I used the following queries:

WITH prestats AS
( SELECT * FROM metriccurrent_result WHERE testrun# = &1
),
poststats AS
( SELECT * FROM metriccurrent_result WHERE testrun# = &2
)
SELECT pre.cellname,
--pre.name,
defn.description,
pre.metricvalue pre_metricvalue,
post.metricvalue post_metricvalue,
to_number(REGEXP_SUBSTR(post.metricvalue, '(\S*)'),'999G999G999D999999', 'NLS_NUMERIC_CHARACTERS = ''.,''') -
to_number(REGEXP_SUBSTR( pre.metricvalue, '(\S*)'),'999G999G999D999999', 'NLS_NUMERIC_CHARACTERS = ''.,''') delta
FROM prestats pre,
poststats post,
metricdefinition defn
WHERE pre.cellname = post.cellname
and pre.name = post.name
and pre.objecttype = post.objecttype
and pre.name = defn.name(+)
and pre.name = 'FC_IO_BYKEEP_R'
and (to_number(REGEXP_SUBSTR(post.metricvalue, '(\S*)'),'999G999G999D999999', 'NLS_NUMERIC_CHARACTERS = ''.,''') -
to_number(REGEXP_SUBSTR( pre.metricvalue, '(\S*)'),'999G999G999D999999', 'NLS_NUMERIC_CHARACTERS = ''.,''') ) <> 0
order by 1,2;

WITH prestats AS
( SELECT * FROM cachecontents_result WHERE testrun# = &1
),
poststats AS
( SELECT * FROM cachecontents_result WHERE testrun# = &2
)
SELECT pre.cellname,
pre.objectnumber,
pre.cachedkeepsize pre_cachedkeepsize,
pre.cachedsize pre_cachedsize,
pre.hitcount pre_hitcount,
pre.misscount pre_misscount,
post.cachedkeepsize post_cachedkeepsize,
post.cachedsize post_cachedsize,
post.hitcount post_hitcount,
post.misscount post_misscount,
(post.cachedkeepsize-pre.cachedkeepsize) delta_cachedkeepsize,
(post.cachedsize-pre.cachedsize) delta_cachedsize,
(post.hitcount-pre.hitcount) delta_hitcount,
(post.misscount-pre.misscount) delta_misscount
FROM prestats pre,
poststats post
WHERE pre.cellname   = post.cellname
AND pre.objectnumber = post.objectnumber
and
ORDER BY 1,2

This revealed quite a lot about the underlying workings of the flash cache. The above queries are best executed with SQL Developer and the result is then copied and pasted into a spreadsheet application to be saved from accidental overwriting.

Summary

The above certainly isn’t bullet proof or very elegant, but a good starting point for your own experiments. Time permitting I would like to automate the generation of the CSV files via a scheduled job, and also the population of the result tables. There really should be some sort of unique key on the results tables to avoid duplicates. Maybe one day I’d like to write a statspack like version of the lot…