Search

OakieTags

Who's online

There are currently 0 users and 36 guests online.

Recent comments

Oakies Blog Aggregator

fsfreeze in Linux

The fsfreeze command, is used to suspend and resume access to a file system. This allows consistent snapshots to be taken of the filesystem. fsfreeze supports Ext3/4, ReiserFS, JFS and XFS.

A filesystem can be frozen using following command:

#cccccc;line-height: 1.4"># /sbin/fsfreeze -f /data

Now if you are writing to this filesystem, the process/command will be stuck. For example, following command will be stuck in D (UNINTERUPTEBLE_SLEEP) state:

#cccccc;line-height: 1.4"># echo “testing” > /data/file

Only after the filesystem is unfreezed using the following command, can it continue:

#cccccc;line-height: 1.4"># /sbin/fsfreeze -u /data

As per the fsfreeze main page, “fsfreeze is unnecessary for device-mapper devices. The device-mapper (and LVM) automatically freezes filesystem on the device when a snapshot creation is requested.”

fsfreeze is provided by the util-linux package in RHEL systems. Along with userspace support, fsfreeze also requires kernel support.

For example, in the following case, fsfreeze was used in the ext4 filesystem of an AWS CentOS node:

#cccccc;line-height: 1.4"># fsfreeze -f /mysql
fsfreeze: /mysql: freeze failed: Operation not supported

From strace we found that ioctl is returning EOPNOTSUPP:

#cccccc;line-height: 1.4">fstat(3, {st_dev=makedev(253, 0), st_ino=2, st_mode=S_IFDIR|0755,
st_nlink=4, st_uid=3076, st_gid=1119, st_blksize=4096, st_blocks=8,
st_size=4096, st_atime=2014/05/20-10:58:56,
st_mtime=2014/11/17-01:39:36, st_ctime=2014/11/17-01:39:36}) = 0
ioctl(3, 0xc0045877, 0) = -1 EOPNOTSUPP (Operation not
supported)

From latest upstream kernel source:

#cccccc;line-height: 1.4">static int ioctl_fsfreeze(struct file *filp)
{
struct super_block *sb = file_inode(filp)->i_sb;if (!capable(CAP_SYS_ADMIN))
return -EPERM;

/* If filesystem doesn’t support freeze feature, return. */
if (sb->s_op->freeze_fs == NULL)
return -EOPNOTSUPP;

/* Freeze */
return freeze_super(sb);
}

EOPNOTSUPP is returned when a filesystem does not support the feature.

On testing to freeze ext4 in CentOs with AWS community AMI, fsfreeze worked fine.

This means that the issue was specific to the kernel of the system. It was found that AMI used to build the system was having a customized kernel without fsfreeze support.

Ingest a Single Table from Microsoft SQL Server Data into Hadoop

Introduction

This blog describes the best-practice approach in regards to the data ingestion from SQL Server into Hadoop. The case scenario is described as under:

  • Single table ingestion (no joins)
  • No partitioning
  • Complete data ingestion (trash old and replace new)
  • Data stored in Parquet format

Pre-requisites

This example has been tested using the following versions:

  • Hadoop 2.5.0-cdh5.3.0
  • Hive 0.13.1-cdh5.3.0
  • Sqoop 1.4.5-cdh5.3.0
  • Oozie client build version: 4.0.0-cdh5.3.0

Process Flow Diagram

process_flow1

Configuration

  • Create the following directory/file structure (one per data ingestion process). For a new ingestion program please adjust the directory/file names as per requirements. Make sure to replace the
    tag with your table name
_ingest
+ hive-
create-schema.hql
+ oozie-properties
.properties
+ oozie-
-ingest
+ lib
kite-data-core.jar
kite-data-mapreduce.jar
sqljdbc4.jar
coordinator.xml
impala_metadata.sh
workflow.xml
  • The ingestion process is invoked using an oozie workflow. The workflow invokes all steps necessary for data ingestion including pre-processing, ingestion using sqoop and post-processing.
oozie-
-ingest

This directory stores all files that are required by the oozie workflow engine. These files should be stored in HDFS for proper functioning of oozie
oozie-properties
This directory stores the
.properties. This file stores the oozie variables such as database users, name node details etc. used by the oozie process at runtime.
hive-

This directory stores a file called create-schema.hql  which contains the schema definition of the HIVE tables. This file is required to be run in HIVE only once.
  • Configure files under oozie-
    -ingest
1.   Download kite-data-core.jar and kite-data-mapreduce.jar files from http://mvnrepository.com/artifact/org.kitesdk
2.  Download sqljdbc4.jar from https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx
3.  Configure coordinator.xml. Copy and paste the following XML.
-ingest-coordinator” frequency=”${freq}” start=”${startTime}” end=”${endTime}” timezone=”UTC” xmlns=”uri:oozie:coordinator:0.2″>


${workflowRoot}/workflow.xml
partition_name
${coord:formatTime(coord:nominalTime(), ‘YYYY-MM-dd’)}



4.  Configure workflow.xml. This workflow has three actions:

a) mv-data-to-old – Deletes old data before refreshing new
b) sqoop-ingest-
– Sqoop action to fetch table from SQL Server
c) invalidate-impala-metadata – Revalidate Impala data after each refresh
Copy and paste the following XML.
-ingest” xmlns=”uri:oozie:workflow:0.2″>

/*.parquet’ />
/.metadata’ />
”/>

”>

${jobTracker}
${nameNode}

mapred.job.queue.name
${queueName}
import
–connect
${db_string}
–table
${db_table}
–columns
${db_columns}
–username
${db_username}
–password
${db_password}
–split-by
${db_table_pk}
–target-dir
${sqoop_directory}/

–as-parquetfile
–compress
–compression-codec
org.apache.hadoop.io.compress.SnappyCodec




${jobTracker}
${nameNode} mapred.job.queue.name
${queueName}

${impalaFileName}
${impalaFilePath}





Workflow failed with error message ${wf:errorMessage(wf:lastErrorNode())}

5. Configure impala_metadata.sh. This file will execute commands to revalidate impala metadata after each restore. Copy and paste the following data.

#!/bin/bash
export PYTHON_EGG_CACHE=./myeggs
impala-shell -i  -q “invalidate metadata .
  • Configure files under oozie-properties. Create file oozie.properties with contents as under. Edit the parameters as per requirements.
# Coordinator schedulings
freq=480
startTime=2015-04-28T14:00Z
endTime=2029-03-05T06:00Z
jobTracker=
nameNode=hdfs://
queueName=
rootDir=${nameNode}/user//oozie
workflowRoot=${rootDir}/
-ingest
oozie.use.system.libpath=true
oozie.coord.application.path=${workflowRoot}/coordinator.xml
# Sqoop settings
sqoop_directory=${nameNode}/data/sqoop
# Hive/Impala Settings
hive_db_name=
impalaFileName=impala_metadata.sh
impalaFilePath=/user/oozie/
-ingest/impala_metadata.sh
# MS SQL Server settings
db_string=jdbc:sqlserver://;databaseName=
db_username=
db_password=
db_table=
db_columns=
  • Configure files under hive-
    . Create a new file create-schema.hql with contents as under.
DROP TABLE IF EXISTS ;CREATE EXTERNAL TABLE ()
STORED AS PARQUET
LOCATION ‘hdfs:///data/sqoop/
';

Deployment

  • Create new directory in HDFS and copy files
$ hadoop fs -mkdir /user//oozie/
-ingest
$ hadoop fs -copyFromLocal /
/oozie-
-ingest/lib /user//oozie/
-ingest
$ hadoop fs -copyFromLocal /
/oozie-
-ingest/ coordinator.xml /user//oozie/
-ingest
$ hadoop fs -copyFromLocal /
/oozie-
-ingest/ impala_metadata.sh /user//oozie/
-ingest
$ hadoop fs -copyFromLocal /
/oozie-
-ingest/ workflow.xml /user//oozie/
-ingest
  • Create new directory in HDFS for storing data files
$ hadoop fs -mkdir /user/SA.HadoopPipeline/oozie/
-ingest
$ hadoop fs -mkdir /data/sqoop/
  • Now we are ready to select data in HIVE. Go to URL http://:8888/beeswax/#query.
a. Choose existing database on left or create new.
b. Paste contents of create-schema.hql in Query window and click Execute.
c. You should now have an external table in HIVE pointing to data in hdfs:///data/sqoop/
  • Create Oozie job
a. Choose existing database on left or create new.
$ oozie job -run -config /home//</
/oozie-properties/oozie.properties

Validation and Error Handling

  • At this point an oozie job should be created. To validate the oozie job creation open URL http://:8888/oozie/list_oozie_coordinators. Expected output as under. In case of error please review the logs for recent runs.
 oozie1
  • To validate the oozie job is running open URL http://:8888/oozie/list_oozie_workflows/ . Expected output as under. In case of error please review the logs for recent runs.
 oozie2
  • To validate data in HDFS execute the following command. You should see a file with *.metadata extension and a number of files with *.parquet extension.
$ hadoop fs -ls /data/sqoop/
/
  • Now we are ready to select data in HIVE or Impala.
    For HIVE go to URL http://:8888/beeswax/#query
    For Impala go to URL http://:8888/impala
    Choose the newly created database on left and execute the following SQL – select * from limit 10
    You should see the the data being outputted from the newly ingested data.

Instance stats

+

While reading a posting by Martin Bach on a new buffering option for 12c I was prompted to take a look at another of his posts on the instance activity stats, which reminded me that the class column on v$statname is a bit flag, which we can dissect using the bitand() function to pick out the statistics that belong to multiple classes. I’ve got 2 or 3 little scripts that do this one, for example, picks out all the statistics relating to RAC, another is just a cross-tab of the class values used and their breakdown by class.  Originally this latter script used the “diagonal” method of decode() then sum() – but when the 11g pivot() option appeared I used it as an experiment on pivoting.

This is the script as it now stands, with the output from 12.1.0.2




select
        *
from    (
        select
                st.class,
                pwr.class_id,
                case bitand(st.class, pwr.expn)
                        when 0 then to_number(null)
                               else 1
                end     class_flag
        from
                v$statname      st,
                (select
                        level                   class_id,
                        power(2,level - 1)      expn
                from
                        dual
                connect by level <= 8
                )       pwr
        where
                bitand(class,pwr.expn) = pwr.expn
        )
pivot   (
                sum(class_flag)
        for     class_id in (
                        1 as EndUser,
                        2 as Redo,
                        3 as Enqueue,
                        4 as Cache,
                        5 as OS,
                        6 as RAC,
                        7 as SQL,
                        8 as Debug
                )
        )
order by
        class
;


     CLASS    ENDUSER       REDO    ENQUEUE      CACHE         OS        RAC        SQL      DEBUG
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
         1        130
         2                    68
         4                                9
         8                                         151
        16                                                     16
        32                                                                35
        33          3                                                      3
        34                     1                                           1
        40                                          53                    53
        64                                                                          130
        72                                          15                               15
       128                                                                                     565
       192                                                                            2          2

13 rows selected.

The titles given to the columns come from Martin’s blog, but the definitive set is in the Oracle documentation in the reference manual for v$statname. (I’ve changed the first class from “User” to “EndUser” because of a reserved word problem, and I abbreviated the “RAC” class for tidiness.) It’s interesting to note how many of the RAC statistics are also about the Cache layer.

Additional information on Oracle 12c big table caching

Teaching is on the things I like doing, and currently I am investigating the Oracle 12c features around caching data in the various memory areas. Since the In-Memory (cost) option has been discussed by other far more knowledgeable people I would like to share some findings about the big table caching here.

Some Background

In Oracle 12c you have two additional options to cache information: full database caching and big table caching. The first is great if you have a massively big machine with lots and lots of DRAM plus a clever OS that can deal with the inevitable ccNUMA setup you will have to tackle. And maybe don’t want to pay for the In-Memory option. This post is not about full database caching, but rather about the other possibility to cache blocks.

This other option is to have just a few tables in an area within the buffer cache. That’s right-after the keep and recycle pools you can now tweak the default buffer cache. In fact you tell Oracle by means of an initialisation parameter how much of the default buffer cache can be used to cache full scans. You can use zero percent (default) for caching full scans, up to 90%. You must leave 10% to OLTP workloads. I haven’t really investigated the pre-12c caching mechanism in detail (hey it works!) but I read that full scans are not cached by default to avoid thrashing the cache.

Big Table Caching

Now what? We can cache full scans too. Interesting, let’s try it. I am using 12.1.0.2.2 on an ODA, but that shouldn’t really matter. The SGA is 24 GB in size, and my segment to be scanned (a non-partitioned table without indexes) is about 20GB in size. I want 50% of the buffer cache allocated for the big table caching tool.

SQL> select component, current_size/power(1024,2) size_mb
  2  from v$sga_dynamic_components where current_size <> 0;

COMPONENT                                                           SIZE_MB
---------------------------------------------------------------- ----------
shared pool                                                            2688
large pool                                                              192
java pool                                                               128
DEFAULT buffer cache                                                  20992
Shared IO Pool                                                          512

SQL> show parameter db_big_table_cache_percent_target

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_big_table_cache_percent_target    string      50

SQL> select * from v$bt_scan_cache;

BT_CACHE_ALLOC BT_CACHE_TARGET OBJECT_COUNT MEMORY_BUF_ALLOC MIN_CACHED_TEMP     CON_ID
-------------- --------------- ------------ ---------------- --------------- ----------
    .500334108              50            0                0            1000          0

SQL> select bytes/power(1024,2) m from dba_segments
  2  where owner = 'MARTIN' and segment_name = 'T1';

         M
----------
     20864

Now let’s see if we can make use of this. 50% of 20GB are about 10GB useable for the scan cache. If I start a query against T1 in another session I can see the object count increase.

SQL> r
  1* select * from v$bt_scan_cache

BT_CACHE_ALLOC BT_CACHE_TARGET OBJECT_COUNT MEMORY_BUF_ALLOC MIN_CACHED_TEMP     CON_ID
-------------- --------------- ------------ ---------------- --------------- ----------
    .500334108              50            1          1292363            1000          0

Subsequent queries against T1 will increase the temperature and potentially the cached information. The “object temperature” is a method Oracle uses to determine the suitability of an object to be cached. In my very basic example there is only one table which has been full-scanned so far. Later on I’ll add T2 to the mix. The temperature and other object information are reflected in the second view, v$bt_scan_obj_temps:

SQL> select * from v$bt_scan_obj_temps;

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39461      2669763        4000 MEM_PART         1292711          0

Session Counters!

I have developed a habit of looking into session counters when running queries to see if there is anything of interest. Using snapper with the before/after snapshot I got this:

SQL> @snapper4 all,end 5 1 163
Sampling SID 163 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.11 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
    163, MARTIN    , STAT, user calls                                                ,             3,        .06,         ,             ,          ,           ,          3 per execution
...
    163, MARTIN    , STAT, physical read total IO requests                           ,         10752,     224.51,         ,             ,          ,           ,     10.75k per execution
    163, MARTIN    , STAT, physical read total multi block requests                  ,         10752,     224.51,         ,             ,          ,           ,     10.75k per execution
    163, MARTIN    , STAT, physical read total bytes                                 ,   11262763008,    235.17M,         ,             ,          ,           ,     11.26G per execution
...
    163, MARTIN    , STAT, consistent gets                                           ,       2666722,     55.68k,         ,             ,          ,           ,      2.67M per execution
    163, MARTIN    , STAT, consistent gets from cache                                ,       1292769,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets pin                                       ,       1292769,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets pin (fastpath)                            ,       1292768,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets direct                                    ,       1373953,     28.69k,         ,             ,          ,           ,      1.37M per execution
...
    163, MARTIN    , STAT, physical reads                                            ,       1374849,     28.71k,         ,             ,          ,           ,      1.37M per execution
    163, MARTIN    , STAT, physical reads cache                                      ,           128,       2.67,         ,             ,          ,           ,        128 per execution
    163, MARTIN    , STAT, physical reads direct                                     ,       1374721,     28.71k,         ,             ,          ,           ,      1.37M per execution
    163, MARTIN    , STAT, physical read IO requests                                 ,         10752,     224.51,         ,             ,          ,           ,      1.05M bytes per request
    163, MARTIN    , STAT, physical read bytes                                       ,   11262763008,    235.17M,         ,             ,          ,           ,     11.26G per execution
...
    163, MARTIN    , STAT, data warehousing scanned objects                          ,             1,        .02,         ,             ,          ,           ,          1 per execution
    163, MARTIN    , STAT, data warehousing scanned blocks                           ,       2666668,     55.68k,         ,             ,          ,           ,      2.67M per execution
    163, MARTIN    , STAT, data warehousing scanned blocks - memory                  ,       1292715,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, data warehousing scanned blocks - disk                    ,       1373953,     28.69k,         ,             ,          ,           ,      1.37M per execution
...
    163, MARTIN    , STAT, table scans (short tables)                                ,             1,        .02,         ,             ,          ,           ,          1 per execution
    163, MARTIN    , STAT, table scan rows gotten                                    ,      16000006,    334.09k,         ,             ,          ,           ,        16M per execution
    163, MARTIN    , STAT, table scan disk non-IMC rows gotten                       ,      16000006,    334.09k,         ,             ,          ,           ,        16M per execution
    163, MARTIN    , STAT, table scan blocks gotten                                  ,       2666668,     55.68k,         ,             ,          ,           ,      2.67M per execution
...
    163, MARTIN    , STAT, execute count                                             ,             1,        .02,         ,             ,          ,           ,          1 executions per parse
...
    163, MARTIN    , TIME, parse time elapsed                                        ,            26,      .54us,      .0%, [          ],          ,           ,
    163, MARTIN    , TIME, DB CPU                                                    ,       5682136,   118.65ms,    11.9%, [@@        ],          ,           ,
    163, MARTIN    , TIME, sql execute elapsed time                                  ,      33220099,   693.66ms,    69.4%, [#######   ],          ,           ,
    163, MARTIN    , TIME, DB time                                                   ,      33220306,   693.66ms,    69.4%, [#######   ],          ,           ,       -.32 % unaccounted time
    163, MARTIN    , WAIT, db file scattered read                                    ,         24636,   514.41us,      .1%, [          ],         1,        .02,    24.64ms average wait
    163, MARTIN    , WAIT, direct path read                                          ,      27536664,   574.98ms,    57.5%, [WWWWWW    ],     10694,      223.3,     2.57ms average wait
    163, MARTIN    , WAIT, SQL*Net message to client                                 ,             3,      .06us,      .0%, [          ],         2,        .04,      1.5us average wait
    163, MARTIN    , WAIT, SQL*Net message from client                               ,      14826292,   309.58ms,    31.0%, [WWWW      ],         2,        .04,      7.41s average wait

--  End of Stats snap 1, end=2015-05-11 10:13:13, seconds=47.9

Don’t worry about the wide output-all you need in the output is the STATISTIC and DELTA columns. This can all be a bit overwhelming at first, so let me guide you through-statistics that are not important for the discussion have already been removed.

First of all you see the phyical reads. 10752 IO requests were issued, all of them multi block requests (physical read total IO requests and physical read total multi block requests, repeated also in physical read IO requests). Hmm – 10GB – isn’t that 50% of my buffer cache? It might actually be possible that 10GB of the 20GB table were read from disk using direct path reads, and another 10GB came from memory.

What is interesting is the next set of counters: “data warehousing scanned %” which I haven’t seen until now. In fact, if I run @TanelPoder’s statn.sql script in my session I can see there are more than the ones I have in the snapper output:

SQL> @statn.sql warehousing

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       237 ED          1896 data warehousing scanned objects                                          5
       238 EE          1904 data warehousing scanned blocks                                    13333340
       239 EF          1912 data warehousing scanned blocks - memory                            6462815
       240 F0          1920 data warehousing scanned blocks - flash                                   0
       241 F1          1928 data warehousing scanned blocks - disk                              6870525
       242 F2          1936 data warehousing scanned blocks - offload                                 0
       243 F3          1944 data warehousing evicted objects                                          0
       244 F4          1952 data warehousing evicted objects - cooling                                0
       245 F5          1960 data warehousing evicted objects - replace                                0
       246 F6          1968 data warehousing cooling action                                           0

10 rows selected.

Interesting! I’m quite glad Oracle gave us so many statistics that describe the way the big table caching works. I have queried T1 5 times in my session, and scanned 13333340 blocks during these. The distribution between memory and disk is 6462815 to 6870525, almost 50:50. It looks like you can use flash for this and offloading (note to self: repeat the test on Exadata). That makes sense as in the example just shown: the segment to be scanned is 20GB out of which 10GB are in the buffer cache. If the rest of the segment can be scanned using direct path reads as in the above example then it is more than likely that you can offload the scan as well.

I can even see if there was space pressure on the big table cache, the data warehousing evicted% statistics hint at space management.

Purely from memory?

Looking at the session counters above my take is too big. I would like to see more scans entirely from memory :) So I created a table with 25% of the size of T1 and called it T2 using the sample clause. Initial scans against T2 showed disk scans only (policy = DISK):

SQL> r
  1* select * from v$bt_scan_obj_temps

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39461      2669763        8000 MEM_PART         1293081          0
         4      39465       668107        5000 DISK                   0          0

But after the 9th scan (each scan increased the temperature by 1000) the situation changed:

SQL> r
  1* select * from v$bt_scan_obj_temps

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39465       668107        9000 MEM_ONLY          668107          0
         4      39461      2669763        8000 MEM_PART          624974          0

The smaller table now fits into memory and took over! The result is visible in the execution time:

SQL> r
  1* select min(date_created) from t2

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:05.39

...

SQL> r
  1* select min(date_created) from t2

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:01.25

So to prove the point I reconnected to the database and ran my test again:


SQL> select min(date_created) from t2;

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:01.36

SQL> @statn warehousing

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       237 ED          1896 data warehousing scanned objects                                          1
       238 EE          1904 data warehousing scanned blocks                                      666966
       239 EF          1912 data warehousing scanned blocks - memory                             666966
       240 F0          1920 data warehousing scanned blocks - flash                                   0
       241 F1          1928 data warehousing scanned blocks - disk                                    0
       242 F2          1936 data warehousing scanned blocks - offload                                 0
       243 F3          1944 data warehousing evicted objects                                          0
       244 F4          1952 data warehousing evicted objects - cooling                                0
       245 F5          1960 data warehousing evicted objects - replace                                0
       246 F6          1968 data warehousing cooling action                                           0

10 rows selected.

So the popularity of the object plays a big role working out which segment to cache as well. This is an interesting feature worth testing in your development environments. I have no idea if it’s licensable or not so please make sure you check with Oracle before using it to avoid surprises.

Troubleshooting Target Status Availability Issues

If you’ve been using EM12c (or any of its precursors for that matter), you’d know that it can sometimes be problematic to troubleshoot an availability issue for targets.  You can see they might be up (hopefully!), down, pending, unreachable or showing a metric collection error, but understanding what’s causing that particular status (and indeed why it can sometimes be wrong) can be difficult at times.

Thankfully there’s a couple of Information Publisher reports in EM12c from 12.1.0.3 up (a patch is available to get the functionality in EM 12.1.0.2 see patch# 16457404) to help with troubleshooting these target availability issues.  The two reports are called “Target Status Diagnostics Report: Agent-based targets” and “Target Status Diagnostics Report: Repository-based targets”.  The agent-based targets report is used for diagnosing issues with databases, listeners or ASM environments, while the repository-based targets report is used for clustered targets – the cluster itself, clustered databases, or clustered ASM.  The reports can be useful for issues such as these:

  • Listener status down, pending, metric error, or agent unreachable
  • Database instance status down, pending, metric error, or agent unreachable
  • Database system status down, pending, metric error, or agent unreachable
  • Cluster database status down, pending , or metric error
  • Cluster ASM status down, pending, or metric error
  • ASM Instance status down, pending , metric error, or agent unreachable
  • Automatic Storage Management status down, pending, metric error or  agent unreachable
  • Agent status Agent Unreachable (Under Migration), Agent Unreachable (Cannot Write to File System), Agent Unreachable (Collections Disabled), Agent Unreachable (Disk Full), Agent Unreachable (Post Blackout), Agent Unreachable (Communication Broken), Unreachable (Agent Misconfigured), or Agent Unreachable
  • Agent status Agent Blocked (Blocked Manually), Agent Blocked (Plug-in Mismatch), or Agent Blocked (Bounce Counter Mismatch)
  • Agent status Status Pending (Target Addition in Progress), Status Pending (Post Blackout), Status Pending (Post Metric Error), or Agent status Status Pending

Let’s take a look at an environment where my single instance test database is showing a status of Agent Unreachable, and use the “Target Status Diagnostics Report: Agent-based targets” report to identify why.  Here’s a short video I produced that shows you how to do that:

Hope that helps someone with diagnosing these sorts of target availability issues!

Parallel Query

According to the Oracle Database VLDB and Partitioning Guide (10g version and 11g version):

A SELECT statement can be executed in parallel only if the following conditions are satisfied:

  • The query includes a parallel hint specification (PARALLEL or PARALLEL_INDEX) or the schema objects referred to in the query have a PARALLEL declaration associated with them.
  • At least one table specified in the query requires one of the following:
    • A full table scan
    • An index range scan spanning multiple partitions
  • No scalar subqueries are in the SELECT list.

Note, particularly, that last restriction. I was looking at a query recently that seemed to be breaking this rule so, after examining the 10053 trace file for a while, I decided that I would construct a simplified model of the client’s query to demonstrate how the manuals can tell you the truth while being completely deceptive or (conversely) be wrong while still giving a perfectly correct impression. So here’s a query, with execution plan, from 11.2.0.4:

select
        /*+ parallel(t1 2) */
        d1.small_vc,
        t1.r1,
        t2.n21,
        t2.v21,
        t3.v31,
        (select max(v1) from ref1 where n1 = t2.n21)    ref_t2,
        (select max(v1) from ref2 where n1 = t1.r1)     ref_t1,
        t1.padding
from
        driver          d1,
        t1, t2, t3
where
        d1.n1 = 1
and     t1.n1 = d1.id
and     t1.n2 = 10
and     t1.n3 = 10
and     t2.id = t1.r2
and     t3.id = t1.r3
;

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |   100 | 15700 |  1340   (3)| 00:00:07 |        |      |            |
|   1 |  SORT AGGREGATE              |          |     1 |    10 |            |          |        |      |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| REF1     |     1 |    10 |     2   (0)| 00:00:01 |        |      |            |
|*  3 |    INDEX UNIQUE SCAN         | R1_PK    |     1 |       |     1   (0)| 00:00:01 |        |      |            |
|   4 |  SORT AGGREGATE              |          |     1 |    10 |            |          |        |      |            |
|   5 |   TABLE ACCESS BY INDEX ROWID| REF2     |     1 |    10 |     2   (0)| 00:00:01 |        |      |            |
|*  6 |    INDEX UNIQUE SCAN         | R2_PK    |     1 |       |     1   (0)| 00:00:01 |        |      |            |
|   7 |  PX COORDINATOR              |          |       |       |            |          |        |      |            |
|   8 |   PX SEND QC (RANDOM)        | :TQ10003 |   100 | 15700 |  1340   (3)| 00:00:07 |  Q1,03 | P->S | QC (RAND)  |
|*  9 |    HASH JOIN                 |          |   100 | 15700 |  1340   (3)| 00:00:07 |  Q1,03 | PCWP |            |
|* 10 |     HASH JOIN                |          |   100 | 14700 |  1317   (3)| 00:00:07 |  Q1,03 | PCWP |            |
|* 11 |      HASH JOIN               |          |   100 | 13300 |  1294   (3)| 00:00:07 |  Q1,03 | PCWP |            |
|  12 |       BUFFER SORT            |          |       |       |            |          |  Q1,03 | PCWC |            |
|  13 |        PX RECEIVE            |          |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,03 | PCWP |            |
|  14 |         PX SEND BROADCAST    | :TQ10000 |   100 |  1300 |     4   (0)| 00:00:01 |        | S->P | BROADCAST  |
|* 15 |          TABLE ACCESS FULL   | DRIVER   |   100 |  1300 |     4   (0)| 00:00:01 |        |      |            |
|  16 |       PX BLOCK ITERATOR      |          |   100 | 12000 |  1290   (3)| 00:00:07 |  Q1,03 | PCWC |            |
|* 17 |        TABLE ACCESS FULL     | T1       |   100 | 12000 |  1290   (3)| 00:00:07 |  Q1,03 | PCWP |            |
|  18 |      BUFFER SORT             |          |       |       |            |          |  Q1,03 | PCWC |            |
|  19 |       PX RECEIVE             |          | 10000 |   136K|    23   (5)| 00:00:01 |  Q1,03 | PCWP |            |
|  20 |        PX SEND BROADCAST     | :TQ10001 | 10000 |   136K|    23   (5)| 00:00:01 |        | S->P | BROADCAST  |
|  21 |         TABLE ACCESS FULL    | T2       | 10000 |   136K|    23   (5)| 00:00:01 |        |      |            |
|  22 |     BUFFER SORT              |          |       |       |            |          |  Q1,03 | PCWC |            |
|  23 |      PX RECEIVE              |          | 10000 |    97K|    23   (5)| 00:00:01 |  Q1,03 | PCWP |            |
|  24 |       PX SEND BROADCAST      | :TQ10002 | 10000 |    97K|    23   (5)| 00:00:01 |        | S->P | BROADCAST  |
|  25 |        TABLE ACCESS FULL     | T3       | 10000 |    97K|    23   (5)| 00:00:01 |        |      |            |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=:B1)
   6 - access("N1"=:B1)
   9 - access("T3"."ID"="T1"."R3")
  10 - access("T2"."ID"="T1"."R2")
  11 - access("T1"."N1"="D1"."ID")
  15 - filter("D1"."N1"=1)
  17 - filter("T1"."N2"=10 AND "T1"."N3"=10)

Thanks to my hint the query has been given a parallel execution plan – and a check of v$pq_tqstat after running the query showed that it had run parallel. Note, however, where the PX SEND QC and PX COORDINATOR operations appear – lines 7 and 8, and above those lines we see the two scalar subqueries.

This means we’re running the basic select statement as a parallel query but the query co-ordinator has serialised on the scalar subqueries in the select list.  Is the manual “right but deceptive” or “wrong but giving the right impression” ?  Serialising on (just) the scalar subqueries can have a huge impact on the performance and effectively make the query behave like a serial query even though, technically, the statement has run as a parallel query.

You may recall that an example of this type of behaviour, and its side effects when the scalar subqueries executed independently as parallel queries, showed up some time ago. At the time I said I would follow up with a note about the change in behaviour in 12c; this seems to be an appropriate moment to show the 12c plan(s), first the default:


----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |   100 | 19100 |  1364   (3)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR              |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)        | :TQ10005 |   100 | 19100 |  1364   (3)| 00:00:01 |  Q1,05 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED        |          |   100 | 19100 |  1364   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|*  4 |     HASH JOIN OUTER          |          |   100 | 18100 |  1340   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|*  5 |      HASH JOIN               |          |   100 | 16400 |  1335   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|*  6 |       HASH JOIN OUTER        |          |   100 | 15000 |  1311   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|*  7 |        HASH JOIN             |          |   100 | 13300 |  1306   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|   8 |         PX RECEIVE           |          |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,05 | PCWP |            |
|   9 |          PX SEND BROADCAST   | :TQ10000 |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,00 | S->P | BROADCAST  |
|  10 |           PX SELECTOR        |          |       |       |            |          |  Q1,00 | SCWC |            |
|* 11 |            TABLE ACCESS FULL | DRIVER   |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,00 | SCWP |            |
|  12 |         PX BLOCK ITERATOR    |          |   100 | 12000 |  1302   (3)| 00:00:01 |  Q1,05 | PCWC |            |
|* 13 |          TABLE ACCESS FULL   | T1       |   100 | 12000 |  1302   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|  14 |        PX RECEIVE            |          |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,05 | PCWP |            |
|  15 |         PX SEND BROADCAST    | :TQ10001 |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,01 | S->P | BROADCAST  |
|  16 |          PX SELECTOR         |          |       |       |            |          |  Q1,01 | SCWC |            |
|  17 |           VIEW               | VW_SSQ_1 |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,01 | SCWC |            |
|  18 |            HASH GROUP BY     |          |  1000 | 10000 |     5  (20)| 00:00:01 |  Q1,01 | SCWC |            |
|  19 |             TABLE ACCESS FULL| REF2     |  1000 | 10000 |     4   (0)| 00:00:01 |  Q1,01 | SCWP |            |
|  20 |       PX RECEIVE             |          | 10000 |   136K|    24   (5)| 00:00:01 |  Q1,05 | PCWP |            |
|  21 |        PX SEND BROADCAST     | :TQ10002 | 10000 |   136K|    24   (5)| 00:00:01 |  Q1,02 | S->P | BROADCAST  |
|  22 |         PX SELECTOR          |          |       |       |            |          |  Q1,02 | SCWC |            |
|  23 |          TABLE ACCESS FULL   | T2       | 10000 |   136K|    24   (5)| 00:00:01 |  Q1,02 | SCWP |            |
|  24 |      PX RECEIVE              |          |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,05 | PCWP |            |
|  25 |       PX SEND BROADCAST      | :TQ10003 |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,03 | S->P | BROADCAST  |
|  26 |        PX SELECTOR           |          |       |       |            |          |  Q1,03 | SCWC |            |
|  27 |         VIEW                 | VW_SSQ_2 |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,03 | SCWC |            |
|  28 |          HASH GROUP BY       |          |  1000 | 10000 |     5  (20)| 00:00:01 |  Q1,03 | SCWC |            |
|  29 |           TABLE ACCESS FULL  | REF1     |  1000 | 10000 |     4   (0)| 00:00:01 |  Q1,03 | SCWP |            |
|  30 |     PX RECEIVE               |          | 10000 |    97K|    24   (5)| 00:00:01 |  Q1,05 | PCWP |            |
|  31 |      PX SEND BROADCAST       | :TQ10004 | 10000 |    97K|    24   (5)| 00:00:01 |  Q1,04 | S->P | BROADCAST  |
|  32 |       PX SELECTOR            |          |       |       |            |          |  Q1,04 | SCWC |            |
|  33 |        TABLE ACCESS FULL     | T3       | 10000 |    97K|    24   (5)| 00:00:01 |  Q1,04 | SCWP |            |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T3"."ID"="T1"."R3")
   4 - access("ITEM_2"(+)="T2"."N21")
   5 - access("T2"."ID"="T1"."R2")
   6 - access("ITEM_1"(+)="T1"."R1")
   7 - access("T1"."N1"="D1"."ID")
  11 - filter("D1"."N1"=1)
  13 - filter("T1"."N2"=10 AND "T1"."N3"=10)

The first thing to note is the location of the PX SEND QC and PX COORDINATOR operations – right at the top of the plan: there’s no serialisation at the query coordinator. Then we spot the views at operations 17 and 27 – VW_SSQ_1, VW_SSQ_2 (would SSQ be “scalar subquery”, perhaps). The optimimzer has unnested the scalar subqueries out of the select list into the join. When a scalar subquery in the select list returns no data it’s value is deemed to be NULL so the joins (operations 4 and 6) have to be outer joins.

You’ll notice that there are a lot of PX SELECTOR operations – each feeding a PX SEND BROADCAST operations that reports its “IN-OUT” column as S->P (i.e. serial to parallel). Historically a serial to parallel operation started with the query coordinator doing the serial bit but in 12c the optimizer can dictate that one of the PX slaves should take on that task (see Randolf Geist’s post here). Again my code to report v$pq_tqstat confirmed this behaviour in a way that we shall see shortly.

This type of unnesting is a feature of 12c and in some cases will be very effective. It is a cost-based decision, though, and the optimizer can make mistakes; fortunately we can control the feature. We could simply set the optimizer_features_enable back to 11.2.0.4 (perhaps through the hint) and this would take us back to the original plan, but this isn’t the best option in this case. There is a hidden parameter _optimizer_unnest_scalar_sq enabling the feature so we could, in principle, disable just the one feature by setting that parameter to false; a more appropriate strategy would simply be to tell the optimizer that it should not unnest the subqueries. In my case I could put the /*+ no_unnest */ hint into both the subqueries or use the qb_name() hint to give the two subquery blocks names, and then used the /*+ no_unnest() */ hint with the “@my_qb_name” format at the top of the main query. Here’s the execution plan I get whether I use the hidden parameter or the /*+ no_unnest */ mechanim:

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |       |       |  1554 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                 |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)           | :TQ10003 |   100 | 15700 |  1354   (3)| 00:00:01 |  Q1,03 | P->S | QC (RAND)  |
|   3 |    EXPRESSION EVALUATION        |          |       |       |            |          |  Q1,03 | PCWC |            |
|*  4 |     HASH JOIN                   |          |   100 | 15700 |  1354   (3)| 00:00:01 |  Q1,03 | PCWP |            |
|*  5 |      HASH JOIN                  |          |   100 | 14700 |  1330   (3)| 00:00:01 |  Q1,03 | PCWP |            |
|*  6 |       HASH JOIN                 |          |   100 | 13300 |  1306   (3)| 00:00:01 |  Q1,03 | PCWP |            |
|   7 |        BUFFER SORT              |          |       |       |            |          |  Q1,03 | PCWC |            |
|   8 |         PX RECEIVE              |          |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,03 | PCWP |            |
|   9 |          PX SEND BROADCAST      | :TQ10000 |   100 |  1300 |     4   (0)| 00:00:01 |        | S->P | BROADCAST  |
|* 10 |           TABLE ACCESS FULL     | DRIVER   |   100 |  1300 |     4   (0)| 00:00:01 |        |      |            |
|  11 |        PX BLOCK ITERATOR        |          |   100 | 12000 |  1302   (3)| 00:00:01 |  Q1,03 | PCWC |            |
|* 12 |         TABLE ACCESS FULL       | T1       |   100 | 12000 |  1302   (3)| 00:00:01 |  Q1,03 | PCWP |            |
|  13 |       BUFFER SORT               |          |       |       |            |          |  Q1,03 | PCWC |            |
|  14 |        PX RECEIVE               |          | 10000 |   136K|    24   (5)| 00:00:01 |  Q1,03 | PCWP |            |
|  15 |         PX SEND BROADCAST       | :TQ10001 | 10000 |   136K|    24   (5)| 00:00:01 |        | S->P | BROADCAST  |
|  16 |          TABLE ACCESS FULL      | T2       | 10000 |   136K|    24   (5)| 00:00:01 |        |      |            |
|  17 |      BUFFER SORT                |          |       |       |            |          |  Q1,03 | PCWC |            |
|  18 |       PX RECEIVE                |          | 10000 |    97K|    24   (5)| 00:00:01 |  Q1,03 | PCWP |            |
|  19 |        PX SEND BROADCAST        | :TQ10002 | 10000 |    97K|    24   (5)| 00:00:01 |        | S->P | BROADCAST  |
|  20 |         TABLE ACCESS FULL       | T3       | 10000 |    97K|    24   (5)| 00:00:01 |        |      |            |
|  21 |     SORT AGGREGATE              |          |     1 |    10 |            |          |        |      |            |
|  22 |      TABLE ACCESS BY INDEX ROWID| REF1     |     1 |    10 |     2   (0)| 00:00:01 |        |      |            |
|* 23 |       INDEX UNIQUE SCAN         | R1_PK    |     1 |       |     1   (0)| 00:00:01 |        |      |            |
|  24 |     SORT AGGREGATE              |          |     1 |    10 |            |          |        |      |            |
|  25 |      TABLE ACCESS BY INDEX ROWID| REF2     |     1 |    10 |     2   (0)| 00:00:01 |        |      |            |
|* 26 |       INDEX UNIQUE SCAN         | R2_PK    |     1 |       |     1   (0)| 00:00:01 |        |      |            |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T3"."ID"="T1"."R3")
   5 - access("T2"."ID"="T1"."R2")
   6 - access("T1"."N1"="D1"."ID")
  10 - filter("D1"."N1"=1)
  12 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."N2"=10 AND "T1"."N3"=10))
  23 - access("N1"=:B1)
  26 - access("N1"=:B1)

Note particularly that the PX SEND QC and PX COORDINATOR operations are operations 2 and 1,, and we have a new operator EXPRESSSION EVALUATION at operation 3. This has three child operations – the basic select starting at operation 4, and the two scalar subqueries starting at lines 21 and 24. We are operating the scalar subqueries as correlated subqueries, but we don’t leave all the work to the query coordinator – each slave is running its own subqueries before forwarding the final result to the coordinator. There is a little side effect that goes with this change – the “serial to parallel” operations are now, as they always used to be, driven by the query co-ordinator, the PX SELECTOR operations have disappeared.

And finally

Just to finish off, let’s take a look at the results from v$pq_tqstat in 12.1.0.2. First from the default plan with the PX SELECTOR operations. Remember that this turned into a five table join where two of the “tables” were non-correlated aggregate queries against the reference tables.


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P002                   200       2428          0          0           0
                                             1 P003                     0         48          0          0           0
                      Consumer               1 P000                   100       1238         59         27           0
                                             1 P001                   100       1238         41         24           0

                    1 Producer               1 P002                  2000      23830          0          0           0
                                             1 P003                     0         48          0          0           0
                      Consumer               1 P000                  1000      11939         57         26           0
                                             1 P001                  1000      11939         41         24           0

                    2 Producer               1 P002                     0         48          0          0           0
                                             1 P003                 20000     339732          0          0           0
                      Consumer               1 P000                 10000     169890         49         22           0
                                             1 P001                 10000     169890         31         21           0

                    3 Producer               1 P002                     0         48          0          0           0
                                             1 P003                  2000      23830          0          0           0
                      Consumer               1 P000                  1000      11939         58         26           0
                                             1 P001                  1000      11939         38         23           0

                    4 Producer               1 P002                     0         48          0          0           0
                                             1 P003                 20000     239986          0          0           0
                      Consumer               1 P000                 10000     120017         50         22           0
                                             1 P001                 10000     120017         34         21           0

                    5 Producer               1 P000                     1        169          0          0           0
                                             1 P001                     1        169          1          0           0
                      Consumer               1 QC                       2        338          3          0           0

As you read down the table queues you can see that in the first five table queues (0 – 4) we seem to operate parallel to parallel, but only one of the two producers (p002 and p003) produces any data at each stage. A more traditional plan would show QC as the single producer in each of these stages.

Now with scalar subquery unnesting blocked – the plan with the three table join and EXPRESSION EVALUATION – we see the more traditional serial to parallel, the producer is QC in all three of the first table queues (the full scan and broadcast of tables t1, t2, and t3).

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Producer               1 QC                     200       1726          0          0           0
                      Consumer               1 P000                   100       1614         28         15           0
                                             1 P001                   100       1614         34         13           0

                    1 Producer               1 QC                   20000     339732          0          0           0
                      Consumer               1 P000                 10000     169866         19         10           0
                                             1 P001                 10000     169866         25          8           0

                    2 Producer               1 QC                   20000     239986          0          0           0
                      Consumer               1 P000                 10000     119993         23         11           0
                                             1 P001                 10000     119993         31         11           0

                    3 Producer               1 P000                     1        155          1          0           0
                                             1 P001                     1        155          0          0           0
                      Consumer               1 QC                       2        310          3          1           0

It’s an interesting point that this last set of results is identical to the set produced in 11g – you can’t tell from v$pq_tqstat whether the parallel slaves or the query co-ordinator executed the subqueries – you have to look at the output from SQL trace (or similar) to see the individual Rowsource Executions Statistics for the slaves and coordinator to see which process actually ran the subqueries.