Search

OakieTags

Who's online

There are currently 0 users and 34 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Oracle EMEA Customer Support Services Excellence Award 2014

The corporation announced today that I got the Customer Services Excellence Award 2014 in the category ‘Customer Champion’ for the EMEA region. It is an honor to be listed there together with these excellent professionals that I proudly call colleagues.

CSS Excellence Award 2014

Blog articles at Toad World

I just wanted to point you to the blog articles I've been posting over at Toad World in case you're wondering why there aren't many articles showing up here.

So, go take a look!

Karen's Toad World Blog

Oracle Midlands Event #5 : Summary

Oracle Midlands Event #5 happened last night.

First up was Martin Widlake speaking about clustering data to improve performance. The cool and scary thing about Oracle is you often go into session like this thinking it’s all going to be stuff you already know, then you realise how much you either didn’t know in the first place, or had forgotten. A couple of times Martin asked questions of the audience and I felt myself shrinking back in my seat and chanting the mantra, “Don’t pick me!”, in my head. :)

After food and some chatting there was freebie prize draw. I won a RedGate t-shirt. I shall have to remember to take it to OOW and “accidentally” wear it whilst walking past the RedGate booth. :)

Next up was Ron Ekins speaking about data virtualization and instant cloning. The session started off with a general introduction to the subject and a quick look at the products available to achieve it. This session also included a live demo of Delphix. The more I hear about Delphix, the more impressed I am. Judging by the Oracle geeks they’ve drawn into the company in recent years (Kyle Haley, Steve Karam and Tim Gorman) it seems I’m not alone in that opinion. :)

Thanks to RedGate for sponsoring the event, allowing it remain free.  Thanks to Mike Dirden for organising the event. Thanks to the speakers travelling to Birmingham to teach us some stuff. Thanks also to the attendees, without whom this would not happen. Please keep spreading the word. :)

Cheers

Tim…

PS. The next event will be with Bjoern Rost in November. Keep an eye on the Oracle Midlands site for details, which should be appearing in the next few days.


Oracle Midlands Event #5 : Summary was first posted on September 17, 2014 at 11:12 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

About index range scans, disk re-reads and how your new car can go 600 miles per hour!

Despite the title, this is actually a technical post about Oracle, disk I/O and Exadata & Oracle In-Memory Database Option performance. Read on :)

If a car dealer tells you that this fancy new car on display goes 10 times (or 100 or 1000) faster than any of your previous ones, then either the salesman is lying or this new car is doing something radically different from all the old ones. You don’t just get orders of magnitude performance improvements by making small changes.

Perhaps the car bends space around it instead of moving – or perhaps it has a jet engine built on it (like the one below :-) :

Anyway, this blog entry is a prelude to my upcoming Oracle In-Memory Database Option series and here I’ll explain one of the radical differences between the old way of thinking and modern (In-Memory / Smart Scan) thinking that allow such performance improvements.

To set the scope and and clarify what I mean by the “old way of thinking”: I am talking about reporting, analytics and batch workloads here – and the decades old mantra “if you want more speed, use more indexes”.

I’m actually not going to talk about the In-Memory DB option here – but I am going to walk you through the performance numbers of one index range scan. It’s a deliberately simple and synthetic example executed on my laptop, but it should be enough to demonstrate one important point.

Let’s say we have a report that requires me to visit 20% of rows in an orders table and I’m using an index range scan to retrieve these rows (let’s not discuss whether that’s wise or not just yet). First, I’ll give you some background information about the table and index involved.

My test server’s buffer cache is currently about 650 MB:

SQL> show sga

Total System Global Area 2147483648 bytes
Fixed Size                  2926472 bytes
Variable Size             369100920 bytes
Database Buffers          687865856 bytes
Redo Buffers               13848576 bytes
In-Memory Area           1073741824 bytes

The table I am accessing is a bit less than 800 MB in size, about 100k blocks:

SQL> @seg soe.orders

    SEG_MB OWNER  SEGMENT_NAME   SEGMENT_TYPE    BLOCKS 
---------- ------ -------------  ------------- -------- 
       793 SOE    ORDERS         TABLE           101504 

I have removed some irrelevant output from the output below, I will be using the ORD_WAREHOUSE_IX index for my demo:

SQL> @ind soe.orders
Display indexes where table or index name matches %soe.orders%...

TABLE_OWNER  TABLE_NAME  INDEX_NAME         POS# COLUMN_NAME     DSC
------------ ----------- ------------------ ---- --------------- ----
SOE          ORDERS      ORDER_PK              1 ORDER_ID
                         ORD_WAREHOUSE_IX      1 WAREHOUSE_ID
                                               2 ORDER_STATUS

INDEX_OWNER  TABLE_NAME  INDEX_NAME        IDXTYPE    UNIQ STATUS   PART TEMP  H  LFBLKS       NDK   NUM_ROWS      CLUF LAST_ANALYZED     DEGREE VISIBILIT
------------ ----------- ----------------- ---------- ---- -------- ---- ---- -- ------- --------- ---------- --------- ----------------- ------ ---------
SOE          ORDERS      ORDER_PK          NORMAL/REV YES  VALID    NO   N     3   15801   7148950    7148950   7148948 20140913 16:17:29 16     VISIBLE
             ORDERS      ORD_WAREHOUSE_IX  NORMAL     NO   VALID    NO   N     3   17860      8685    7148950   7082149 20140913 16:18:03 16     VISIBLE

I am going to do an index range scan on the WAREHOUSE_ID column:

SQL> @descxx soe.orders

Col# Column Name                    Null?      Type                      NUM_DISTINCT        Density  NUM_NULLS HISTOGRAM       NUM_BUCKETS Low Value                        High Value
---- ------------------------------ ---------- ------------------------- ------------ -------------- ---------- --------------- ----------- -------------------------------- --------------------------------
   1 ORDER_ID                       NOT NULL   NUMBER(12,0)                   7148950   .00000013988          0                           1 1                                7148950
...
   9 WAREHOUSE_ID                              NUMBER(6,0)                        999   .00100100100          0                           1 1                                999
...

Also, I enabled SQL trace and event 10298 – “ORA-10298: ksfd i/o tracing”, more about that later:

SQL> ALTER SESSION SET EVENTS '10298 trace name context forever, level 1';

Session altered.

SQL> EXEC SYS.DBMS_MONITOR.SESSION_TRACE_ENABLE(waits=>TRUE);

PL/SQL procedure successfully completed.

SQL> SET AUTOTRACE ON STAT

Ok, now we are ready to run the query! (It’s slightly formatted):

SQL> SELECT /*+ MONITOR INDEX(o, o(warehouse_id)) */ 
         SUM(order_total) 
     FROM 
         soe.orders o 
     WHERE 
         warehouse_id BETWEEN 400 AND 599;

Let’s check the basic autotrace figures:

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    1423335  consistent gets
     351950  physical reads
          0  redo size
        347  bytes sent via SQL*Net to client
        357  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

What?! We have done 351950 physical reads?! This is 351950 blocks read via physical read operations. This is about 2.7 GB worth of IOs done just for this query! Our entire table size was under 800MB and the index size under 150MB. Shouldn’t indexes allow us to visit less blocks than the table size?!

Let’s dig deeper – by breaking down this IO number by execution plan line (using a SQL Monitoring report in this case):

Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|      48 |      25 |       23 |     1 |     1M | 352K |   3GB |
================================================================

SQL Plan Monitoring Details (Plan Hash Value=16715356)
=============================================================================================================================================
| Id |               Operation                |       Name       | Execs |   Rows   | Read | Read  | Activity |       Activity Detail       |
|    |                                        |                  |       | (Actual) | Reqs | Bytes |   (%)    |         (# samples)         |
=============================================================================================================================================
|  0 | SELECT STATEMENT                       |                  |     1 |        1 |      |       |          |                             |
|  1 |   SORT AGGREGATE                       |                  |     1 |        1 |      |       |          |                             |
|  2 |    TABLE ACCESS BY INDEX ROWID BATCHED | ORDERS           |     1 |       1M | #ff0000;">348K |   #ff0000;">3GB |    96.30 | Cpu (1)                     |
|    |                                        |                  |       |          |      |       |          | db file parallel read (25)  |
|  3 |     INDEX RANGE SCAN                   | ORD_WAREHOUSE_IX |     1 |       1M | 3600 |  28MB |     3.70 | db file sequential read (1) |
=============================================================================================================================================

So, most of these IOs come from accessing the table (after fetching relevant ROWIDs from the index). 96% of response time of this query was also spent in that table access line. We have done about ~348 000 IO requests for fetching blocks from this table. This is over 3x more blocks than the entire table size! So we must be re-reading some blocks from disk again and again for some reason.

Let’s confirm if we are having re-reads. This is why I enabled the SQL trace and event 10298. I can just post-process the tracefile and see if IO operations with the same file# and block# combination do show up.

However, using just SQL trace isn’t enough because multiblock read wait events don’t show all blocks read (you’d have to infer this from the starting block# and count#), the “db file parallel read” doesn’t show any block#/file# info at all in SQL Trace (as this “vector read” wait event encompasses multiple different block reads under a single wait event).

The classic single block read has the file#/block# info:

WAIT #139789045903344: nam='db file sequential read' ela= 448 file#=2 block#=1182073 blocks=1 obj#=93732 tim=156953721029

The parallel read wait events don’t have individual file#/block# info (just total number of files/blocks involved):

WAIT #139789045903344: nam='db file parallel read' ela= 7558 files=1 blocks=127 requests=127 obj#=93696 tim=156953729450

Anyway, because we had plenty of db file parallel read waits that don’t show all the detail in SQL Trace, I also enabled the event 10298 that gives us following details below (only tiny excerpt below):

...
ksfd_osdrqfil:fob=0xce726160 bufp=0xbd2be000 #ff0000;">blkno=1119019 nbyt=8192 flags=0x4
ksfdbio:rq=0x7f232c4edb00 fob=0xce726160 aiopend=126
ksfd_osdrqfil:fob=0xce726160 bufp=0x9e61a000 #ff0000;">blkno=1120039 nbyt=8192 flags=0x4
ksfdbio:rq=0x7f232c4edd80 fob=0xce726160 aiopend=127
ksfdwtio:count=127 aioflags=0x500 timeout=2147483647 posted=(nil)
...
ksfdchkio:ksfdrq=0x7f232c4edb00 completed=1
ksfdchkio:ksfdrq=0x7f232c4edd80 completed=0
WAIT #139789045903344: nam='db file parallel read' ela= 6872 files=1 blocks=127 requests=127 obj#=93696 tim=156953739197

So, on Oracle 12.1.0.2 on Linux x86_64 with xfs filesystem with async IO enabled and filesystemio_options = SETALL we get the “ksfd_osdrqfil” trace entries to show us the block# Oracle read from a datafile. It doesn’t show the file# itself, but it shows the accessed file state object address (FOB) in SGA and as it was always the same in the tracefile, I know duplicate block numbers listed in trace would be for the same datafile (and not for a block with the same block# in some other datafile). And the tablespace I used for my test had a single datafile anyway.

Anyway, I wrote a simple script to summarize whether there were any disk re-reads in this tracefile (of a select statement):

$ grep ^ksfd_osdrqfil LIN121_ora_11406.trc | awk '{ print $3 }' | sort | uniq -c | sort -nr | head -20
     10 blkno=348827
     10 blkno=317708
      9 blkno=90493
      9 blkno=90476
      9 blkno=85171
      9 blkno=82023
      9 blkno=81014
      9 blkno=80954
      9 blkno=74703
      9 blkno=65222
      9 blkno=63899
      9 blkno=62977
      9 blkno=62488
      9 blkno=59663
      9 blkno=557215
      9 blkno=556581
      9 blkno=555412
      9 blkno=555357
      9 blkno=554070
      9 blkno=551593
...

Indeed! The “worst” blocks have been read in 10 times – all that for a single query execution.

I only showed 20 top blocks here, but even when I used “head -10000″ and “head -50000″ above, I still saw blocks that had been read in to buffer cache 8 and 4 times respectively.

Looking into earlier autotrace metrics, my simple index range scan query did read in over 3x more blocks than the total table and index size combined (~350k blocks read while the table had only 100k blocks)! Some blocks have gotten kicked out from buffer cache and have been re-read back into cache later, multiple times.

Hmm, let’s think further: We are accessing only about 20% of a 800 MB table + 150 MB index, so the “working set” of datablocks used by my query should be well less than my 650 MB buffer cache, right? And as I am the only user in this database, everything should nicely fit and stay in buffer cache, right?

Actually, both of the arguments above are flawed:

  1. Accessing 20% of rows in a table doesn’t automatically mean that we need to visit only 20% blocks of that table! Maybe all of the tables’s blocks contain a few of the rows this index range scan needs? So we might need to visit all of that table’s blocks (or most of them) and extract only a few matching rows from each block. But nevertheless, the “working set” of required blocks for this query would include almost all of the table blocks, not only 20%. We must read all of them in at some point in the range scan.So, the matching rows in table blocks are not tightly packed and physically in correspondence with the index range scan’s table access driving order, but are potentially “randomly” scattered all over the table.This means that an index range scan may come back and access some data block again and again to get a yet-another row from it when the ROWID entries in index leaf blocks point there. This is what I call buffer re-visits(Now scroll back up and see what is that index’es clustering factor :-)

  2. So what, all the buffer re-visits should be really fast as the previously read block is going to be in buffer cache, right?Well, not really. Especially when the working set of blocks read is bigger than buffer cache. But even if it is smaller, the Oracle buffer cache isn’t managed using basic LRU replacement logic (since 8.1.6). New blocks that get read in to buffer cache will be put into the middle of the “LRU” list and they work their way up to the “hot” end only if they are touched enough times before someone manages to flush them out. So even if you are a single user of the buffer cache, there’s a chance that some just recently read blocks get aged out from buffer cache – by the same query still running – before they get hot enough. And this means that your next buffer re-visit may turn into a disk block re-read that we saw in the tracefiles.If you combine this with the reality of production systems where there’s a thousand more users trying to do what you’re doing, at the same time, it becomes clear that you’ll be able to use only a small portion of the total buffer cache for your needs. This is why people sometimes configure KEEP pools – not that the KEEP pool is somehow able to keep more blocks in memory for longer per GB of RAM, but simply for segregating the less important troublemakers from more important… troublemakers :)

 

So what’s my point here – in the context of this blog post’s title?

Let’s start from Exadata – over the last years it has given many customers order(s) of magnitude better analytics, reporting and batch performance compared to their old systems, if done right of course. In other words, instead of indexing even more, performing wide index range scans with millions of random block reads and re-reads, they ditched many indexes and started doing full table scans. Full table scans do not have such “scaling problems” like a wide index range scan (or a “wide” nested loop join driving access to another table). In addition you got all the cool stuff that goes really well with full scans – multiblock reads, deep prefetching, partition-wise hash joins, partition pruning and of course all the throughput and Smart Scan magic on Exadata).

An untuned complex SQL on a complex schema with lots of non-ideal indexes may end up causing a lot of “waste IO” (don’t have a better term) and similarly CPU usage too. And often it’s not simple to actually fix the query – as it may end up needing a significant schema adjustment/redesign that would require also changing the application code in many different places (ain’t gonna happen). With defaulting reporting to full table scans, you can actually eliminate a lot of such waste, assuming that you have a high-througput – and ideally smart – IO subsystem. (Yes, there are always exceptions and special cases).

We had a customer who had a reporting job that ran almost 2000x faster after moving to Exadata (from 17 hours to 30 seconds or something like that). Their first reaction was: “It didn’t run!” Indeed it did run and it ran correctly. Such radical improvement came from the fact that the new system – compared to the old system – was doing multiple things radically better. It wasn’t just an incremental tweak of adding a hint or a yet another index without daring to do more significant changes.

In this post I demoed just one of the problems that’s plaguing many of the old-school Oracle DW and reporting systems. While favoring full table scanning had always been counterintuitive for most Oracle shops out there, it was the Exadata’s hardware, software and also the geek-excitement surrounding it, what allowed customers to take the leap and switch from the old mindset to new. I expect the same from the Oracle In-Memory Database Option. More about this in a following post.

 


12c In-Memory on RAC

I started looking into In-Memory on RAC this week. Data can be distributed across RAC nodes in a couple of different ways. The default is to spread it across the available nodes in the cluster. So if you had a 2 node cluster, roughly 50% of the data in your table or partition would be loaded into the column store in each of the 2 instances.

SYS@dw1> alter table kso.skew inmemory;
 
Table altered.
 
SYS@dw1> @gen_ddl
Enter value for object_type: 
Enter value for owner: KSO
Enter value for object_name: SKEW
 
DDL
--------------------------------------------------------------------------------
 
  CREATE TABLE "KSO"."SKEW"
   (    "PK_COL" NUMBER,
        "COL1" NUMBER,
        "COL2" VARCHAR2(30),
        "COL3" DATE,
        "COL4" VARCHAR2(1),
         PRIMARY KEY ("PK_COL")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 INVISIBLE COMPUTE STATISTICS
  STORAGE(INITIAL 865075200 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 1480589312 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"
  INMEMORY PRIORITY NONE MEMCOMPRESS FOR QUERY LOW
  DISTRIBUTE AUTO NO DUPLICATE                             <--- here's the RAC bit
   CACHE
 
SYS@dw1> @inmem_segs
Enter value for owner: 
Enter value for segment_name: 
 
OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
                                                                             ----------------
sum
 
no rows selected
 
SYS@dw1> select count(*) from kso.skew;
 
  COUNT(*)
----------
  32000004
 
SYS@dw1> @inmem_segs
Enter value for owner: 
Enter value for segment_name: 
 
OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEW                                  1,413.0            391.4        1.7              749.4
                                                                             ----------------
sum                                                                                     391.4
SYS@dw1> -- so about half the data is loaded in the local instance column store
SYS@dw1> -- let's see what's in the other instance's cache
SYS@dw1> l
  1  SELECT v.owner, v.segment_name,
  2  v.bytes/(1024*1024) orig_size_megs,
  3  v.inmemory_size/(1024*1024) in_mem_size_megs,
  4  (v.bytes - v.bytes_not_populated) / v.inmemory_size comp_ratio,
  5  v.bytes_not_populated/(1024*1024) megs_not_populated
  6  FROM v$im_segments v
  7  where owner like nvl('&owner',owner)
  8* and segment_name like nvl('&segment_name',segment_name)
SYS@dw1> l6
  6* FROM v$im_segments v
SYS@dw1> c/v$/gv$/
  6* FROM gv$im_segments v
SYS@dw1> /
Enter value for owner: 
Enter value for segment_name: 
 
OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEW                                  1,413.0            569.1        1.6              526.6
KSO                            SKEW                                  1,413.0            391.4        1.7              749.4
                                                                             ----------------
sum                                                                                     960.5

So in this example with a non-partitioned table, the data is automatically spread (roughly 50/50) across the 2 instances. That’s pretty cool. It allows the column store to be scaled via adding RAC nodes. For partitioned tables there are a couple of options as to how the data can be distributed. You can distribute by partition or by rowid range. The default for a partitioned table appears to be to distribute partitions to the nodes in a round robin fashion not ordered by number of rows or size (as I would have expected), but then again I haven’t done much testing on it. Anyway, here’s a simple example using a partitioned table.

SYS@dw1> create table kso.skewp partition by range (col1) (partition "P1" values less than (2), partition p2 values less than (100000), partition p3 values less than (maxvalue)) as select * from kso.skew;
 
Table created.
 
SYS@dw1> select count(*) from kso.skewp partition (p1);
 
  COUNT(*)
----------
   3199971
 
SYS@dw1> select count(*) from kso.skewp partition (p2);
 
  COUNT(*)
----------
      8512
 
SYS@dw1> select count(*) from kso.skewp partition (p3);
 
  COUNT(*)
----------
  28791521
 
SYS@dw1> alter table kso.skewp inmemory;
 
Table altered.
 
SYS@dw1> select count(*) from kso.skewp;
 
  COUNT(*)
----------
  32000004
 
SYS@dw1> @inmem_part_segs
Enter value for owner: 
Enter value for segment_name: SKEWP
 
OWNER                          SEGMENT_NAME                   PARTITION_NAME                 ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEWP                          P1                                      128.0             55.4        2.3                 .0
KSO                            SKEWP                          P3                                    1,220.0            349.6        1.5              689.3
                                                                                                            ----------------
sum                                                                                                                    405.1
 
SYS@dw1> @inmem_part_segs_g
Enter value for owner: 
Enter value for segment_name: SKEWP
 
   INST_ID OWNER                          SEGMENT_NAME                   PARTITION_NAME                 ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
---------- ------------------------------ ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
         1 KSO                            SKEWP                          P1                                      128.0             55.4        2.3                 .0
         1 KSO                            SKEWP                          P3                                    1,220.0            878.1        1.4                 .0
         2 KSO                            SKEWP                          P2                                        8.0              1.1        7.1                 .0
                                                                                                                       ----------------
sum                                                                                                                               934.7
 
SYS@dw1> -- very skewed distribution, P1 and P3 on node 1, and P2 on node 2
SYS@dw1> -- let's try again
SYS@dw1> alter table kso.skewp inmemory distribute by rowid range;
 
Table altered.
 
SYS@dw1> select count(*) from kso.skewp;
 
  COUNT(*)
----------
  32000004
 
SYS@dw1> @inmem_part_segs
Enter value for owner: 
Enter value for segment_name: SKEWP
 
set lines 200
OWNER                          SEGMENT_NAME                   PARTITION_NAME                 ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEWP                          P1                                      128.0             55.4        2.3                 .0
KSO                            SKEWP                          P3                                    1,220.0            605.2        1.4              358.9
                                                                                                            ----------------
sum                                                                                                                    660.6
 
 
SYS@dw1> @inmem_part_segs_g.sql
Enter value for owner: 
Enter value for segment_name: SKEWP
 
   INST_ID OWNER                          SEGMENT_NAME                   PARTITION_NAME                 ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
---------- ------------------------------ ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
         1 KSO                            SKEWP                          P1                                      128.0             55.4        2.3                 .0
         1 KSO                            SKEWP                          P3                                    1,220.0            605.2        1.4              358.9
         2 KSO                            SKEWP                          P2                                        8.0              1.1        7.1                 .0
         2 KSO                            SKEWP                          P3                                    1,220.0            263.0        1.6              793.5
                                                                                                                       ----------------
sum                                                                                                                               924.8

So the default distribution mode for partitioned tables appears to be round robin by partition. This can result in very skewed data distributions as in this example. Using the DISTRIBUTE BY ROWID RANGE syntax we can get a more even distribution of data across the nodes. In this case it put the 2 smaller partitions on separate nodes and split the largest partition between the two (asymmetrically by the way, so that the data was actually spread pretty much equally).

There is another option which allows all the data to be duplicated across the instances. This is a high availability option. If a node needs to be restarted, query times could be impacted until the data is reloaded. Note that queries would still execute without error, but the data would have to be processed through the normal row oriented process. The DUPLICATE option populates the column store of both nodes with all the data for the object, in order to minimize performance impact if a node goes down. The documentation says this feature is only available on engineered systems by the way. Let’s have a look.

SYS@dw1> alter table kso.skew inmemory duplicate;
 
Table altered.
 
SYS@dw1> @gen_ddl
Enter value for object_type: 
Enter value for owner: KSO
Enter value for object_name: SKEW
 
DDL
--------------------------------------------------------------------------------
 
  CREATE TABLE "KSO"."SKEW"
   (    "PK_COL" NUMBER,
        "COL1" NUMBER,
        "COL2" VARCHAR2(30),
        "COL3" DATE,
        "COL4" VARCHAR2(1),
         PRIMARY KEY ("PK_COL")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 INVISIBLE COMPUTE STATISTICS
  STORAGE(INITIAL 865075200 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 1480589312 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"
  INMEMORY PRIORITY NONE MEMCOMPRESS FOR QUERY LOW
  DISTRIBUTE AUTO DUPLICATE
   CACHE
 
SYS@dw1> select count(*) from kso.skew;
 
  COUNT(*)
----------
  32000004
 
SYS@dw1> @inmem_segs
Enter value for owner: 
Enter value for segment_name: 
 
OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED
------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------
KSO                            SKEW                                  1,413.0            956.4        1.5                 .0
                                                                             ----------------
sum                                                                                     956.4
 
SYS@dw1> @inmem_segs_g.sql
Enter value for owner: 
Enter value for segment_name: 
 
   INST_ID OWNER                          SEGMENT_NAME                   ORIG_SIZE_MEGS IN_MEM_SIZE_MEGS COMP_RATIO MEGS_NOT_POPULATED POPULATE_
---------- ------------------------------ ------------------------------ -------------- ---------------- ---------- ------------------ ---------
         1 KSO                            SKEW                                  1,413.0            956.4        1.5                 .0 COMPLETED
         2 KSO                            SKEW                                  1,413.0            965.4        1.5                 .0 COMPLETED
                                                                                        ----------------
sum                                                                                              1,921.9

So the DUPLICATE keyword caused the entire table to be loaded into both instances (i.e. DUPLICATEd). Note that this test was done on an Exadata (an engineered system) so it should work.

How the data is accessed across the nodes is also interesting. When accessing data spread across two nodes with a serial plan, all the work appears to be done by the server process handling the connection on the driving node. (i.e. there don’t appear to be any processes on the remote that are burning CPU) When accessing data with a parallel plan on the other hand, slave processes are used on both nodes. This makes sense because that capability already existed. In this case though, the slaves appear to automatically be spread across the nodes at least when the data is distributed (i.e. not duplicated). This is in contrast to the normal spreading of slaves across a cluster which tends to want to keep them on the same node if possible. Anyway, here is an example. Note that this example has a very unusual SQL Monitor output.

SYS@dw1> select /*+ parallel */ avg(pk_col+col1) from kso.skewp a;
 
AVG(PK_COL+COL1)
----------------
      16588749.5
 
SYS@dw1> @rsm
Enter value for sql_id: 
Enter value for sql_exec_id: 
 
REPORT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report
 
SQL Text
------------------------------
select /*+ parallel */ avg(pk_col+col1) from kso.skewp a
 
Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  SYS (141:54624)
 SQL ID              :  0qmdbnszhb2nh
 SQL Execution ID    :  16777218
 Execution Started   :  09/16/2014 22:01:27
 First Refresh Time  :  09/16/2014 22:01:26
 Last Refresh Time   :  09/16/2014 22:01:28
 Duration            :  1s
 Module/Action       :  sqlplus@enkdb03.enkitec.com (TNS V1-V3)/-
 Service             :  SYS$USERS
 Program             :  sqlplus@enkdb03.enkitec.com (TNS V1-V3)
 Fetch Calls         :  1
 
Global Stats
===================================================================================================
| Elapsed |   Cpu   |    IO    | Application |  Other   | Fetch | Buffer | Read | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs | Bytes | Offload |
===================================================================================================
|    3.67 |    3.66 |     0.00 |        0.00 |     0.00 |     1 |  81893 |    1 | 352KB |  59.84% |
===================================================================================================
 
Parallel Execution Details (DOP=3 , Servers Allocated=3)
 Instances  : 2
 
=======================================================================================================================================================
| Instance |      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    | Application |  Other   | Buffer | Read | Read  |  Cell   | Wait Events |
|          |                |       |         | Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes | Offload | (sample #)  |
=======================================================================================================================================================
| 1        | PX Coordinator | QC    |         |    0.01 |    0.01 |          |        0.00 |     0.00 |     13 |      |     . |    NaN% |             |
| 1        | p000           | Set 1 |       1 |    1.05 |    1.05 |          |             |          |     21 |      |     . |    NaN% |             |
| 1        | p001           | Set 1 |       2 |    1.01 |    1.01 |     0.00 |             |          |     69 |    1 | 352KB |  59.84% |             |
| 2        | p000           | Set 1 |       3 |    1.60 |    1.60 |          |             |          |  81790 |      |     . |    NaN% |             |
=======================================================================================================================================================
 
Instance Drill-Down
====================================================================================================================================
| Instance | Process Names | Elapsed |   Cpu   |    IO    | Application |  Other   | Buffer | Read | Read  |  Cell   | Wait Events |
|          |               | Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes | Offload |             |
====================================================================================================================================
|    1     | QC p000 p001  |    2.07 |    2.06 |     0.00 |        0.00 |     0.00 |    103 |    1 | 352KB |  59.84% |             |
|    2     | p000          |    1.60 |    1.60 |          |             |          |  81790 |      |       |    NaN% |             |
====================================================================================================================================
 
SQL Plan Monitoring Details (Plan Hash Value=2545210427)
============================================================================================================================================================================
| Id |             Operation             |   Name   |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity | Activity Detail |
|    |                                   |          | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |   (# samples)   |
============================================================================================================================================================================
|  0 | SELECT STATEMENT                  |          |         |      |         1 |     +1 |     1 |        1 |      |       |         |       |          |                 |
|  1 |   SORT AGGREGATE                  |          |       1 |      |         1 |     +1 |     1 |        1 |      |       |         |       |          |                 |
|  2 |    PX COORDINATOR                 |          |         |      |         1 |     +1 |     4 |        3 |      |       |         |       |          |                 |
|  3 |     PX SEND QC (RANDOM)           | :TQ10000 |       1 |      |         1 |     +1 |     3 |        3 |      |       |         |       |          |                 |
|  4 |      SORT AGGREGATE               |          |       1 |      |         1 |     +1 |     3 |        3 |      |       |         |       |    33.33 | Cpu (1)         |
|  5 |       PX BLOCK ITERATOR           |          |     32M |  783 |         1 |     +1 |     3 |      32M |      |       |         |       |          |                 |
|  6 |        TABLE ACCESS INMEMORY FULL | SKEWP    |     32M |  783 |         2 |     +0 |    31 |      32M |    1 | 352KB |  60.00% |    1M |    66.67 | in memory (2)   |
============================================================================================================================================================================

Can you see why this one is so unusual? This plan showed a TABLE ACCESS INMEMORY FULL, which had both inmemory access and offloading via smart scan (probably on different partitions). I’m not at all sure why that happened though. If you know please enlighten me.

Migrating to ASM

On the Maximum Availability Architecture website, there’s a paper on Best Practices for Database Consolidation. It’s a great paper, as you’d expect from the MAA guys (well, apart from that awful term alert I had to add! :) ). Since database consolidation is an area I work in a lot, I thought I’d start looking at implementing as much of their recommendations as I could in Enterprise Manager, which of course is my tool of choice.

If you look at that paper and its companion paper on MAA Reference Architectures, one of the recommendations they include in the paper is to use Automatic Storage Management (ASM) to provide local mirroring to protect against disk failure. In my test environment, I hadn’t bothered using ASM when I created my test database, purely and simply because the database has no important data in it at all, so if the disk failed I can just recreate it. However, since I’m trying to implement the MAA recommendations, it made sense to start here so I decided to migrate my test database to ASM.

There are a couple of prerequisites you need to have in place before you can start this process. I’m not going to document those in any detail since they’re already documented in the Oracle documentation and many other places, but just to ensure you know what they are, these are the prerequisites:

  1. You need to have installed the Grid Infrastructure code. Most people tend to think of that as a piece of software you need in a clustered environment, and of course that is one of its main areas of functionality. However, you also use it for ASM in standalone environments (i.e. non-clustered hardware). Installing the Grid Infrastructure also gives you an added advantage – along with ASM you now get access to Oracle Restart. Oracle Restart is a feature that was introduced in the 11.2 code (I think) to automatically restart the database, the listener, and other Oracle components after a hardware or software failure or whenever your database host restarts. More on that later.
  2. You also need to have configured some disk groups for ASM to manage (see the documentation here for details on how that is done). ASM can use disk partitions, LUN’s, logical volumes or NFS files to build disk groups. In my case, this poor little test environment didn’t have much hardware, so I created some NFS files following the instructions documented here by my good friend Tim Hall.

Once that’s all installed and configured, you are ready to move onto migrating your database to ASM. A note of caution here – this will require downtime for the database.

Migrating the database itself is done from the database home page, so from that page select “Migrate to ASM” from the “Administration” drop down menu:

asm01

On the next screen, you will be asked to provide the relevant credentials for the user that owns the Oracle software installation. I already have that defined as preferred credentials for this host, so all I need to do is check the details are correct and click “Continue”:

asm02

Next, I am asked which files I want to migrate and the degree of parallelism. I can specify to move just the database files (which in this case includes the datafiles, redo log files, control file and SPFILE) or to move the recovery related files (archive log files and backup files) as well. The degree of parallelism defines the number of RMAN channels that will be allocated for copying the files. In this example, I’ve asked for both database files and recovery related files to be moved, left the degree of parallelism at its default of one and clicked “Next”:

asm03

The diskgroups I created earlier were cunningly called “DATA” and “RECO”, so on the next screen I enter the relevant diskgroup names for the database area and fast recovery area, select the “Setup Fast Recovery Area and Migrate Recovery-related Files” radio button and click “Next”:

asm04

I left the fast recovery area at its default size, so it prompts me with a warning that the fast recovery area size is less than twice the database size. In my simple test environment, that doesn’t bother me so I click “Yes” to continue:

asm05

On the next screen, I can specify a name and description for the job that does the actual migration, and schedule a time for it to start (which defaults to immediately). In this example, I leave these at the defaults and just clicked “Next”:

asm06

Finally, I can review all the information I’ve provided and click “Submit Job” to start the migration:

asm07

I will see a confirmation that the job was submitted successfully, and can click “View Status” to view the status of the migration job:

asm08

This will take me to the Job Activity page. Depending on the amount of job activity taking place in this environment, you can either search for the job or simply scroll through the list and select the job:

asm09

On the job execution page, I can select the Auto refresh period on the right hand side, and also choose “Expand All” from the “View” drop down menu so I can see all the details of the job:

asm10

All going well, you should see a “Succeeded” message for the status after a few minutes:

asm11

To validate the migration has in fact taken place, go back to the database home page, then follow the path “Administration” -> Storage -> Datafiles”:

asm12

You should see that the datafile names all now start with the name of the ASM diskgroup (+DATA in my case):

asm13

While the database has now been migrated to ASM, there is one more step needed to add the database to Oracle Restart. This step is necessary because the database was created BEFORE we installed the Grid Infrastructure software. To add the database to the Oracle Restart configuration, we use the “srvctl add database” command, and we can then use the “srvctl config database” to confirm it is now under the management of Oracle Restart:

asm14
And we are done!

Jonathan Lewis explains Delphix live Sept 18th

Join myself and Tim Gorman as we host a live webinar  and Q&A September 18th at 10am PST  with Jonathan Lewis. Jonathan will explain from his on experiences how Delphix works and what industry problems it solves.

#222222;">#444444;">#bd2028;">#bd2028;">Click here to register for our webinar.

#222222;">Screen Shot 2014-05-21 at 8.08.47 AM

#222222;">#1155cc;" href="http://pages.delphix.com/Sep18_2014J.LewisWebinar_how-dba-lives-are-made-easier-with-delphix.html" target="_blank">
#222222;">
#222222;">
#222222;">Screen Shot 2014-05-21 at 8.08.47 AM
#222222;">Screen Shot 2014-05-21 at 8.08.47 AM
#222222;">See Jonathan’s blog post on this up coming webinar
#222222;">Screen Shot 2014-05-21 at 8.08.47 AM

#555555;">Jonathan explains issues and obstacles to creating “thin clones” on typical industry hardware:

#555555;">#2970a6;" href="http://kylehailey.com/wp-content/uploads/2013/07/Screen-Shot-2013-07-15-at-10.13.05-PM.png">Screen Shot 2013-07-15 at 10.13.05 PM

#555555;">how point in time snapshots work#2970a6;" href="http://kylehailey.com/wp-content/uploads/2013/07/Screen-Shot-2013-07-15-at-10.12.07-PM.png">
Screen Shot 2013-07-15 at 10.12.07 PM

#555555;">And from point in time snapshots, he explains how clones can rapidly be made.

#555555;">#2970a6;" href="http://kylehailey.com/wp-content/uploads/2013/07/Screen-Shot-2013-07-15-at-10.12.37-PM.png">Screen Shot 2013-07-15 at 10.12.37 PM

Related blog posts by Jonathan’s blog

  • #2970a6;" href="http://jonathanlewis.wordpress.com/2013/04/04/delphix-overview/">http://jonathanlewis.wordpress.com/2013/04/04/delphix-overview/ – write up on experiences with Delphix good technical details

Using Ansible for executing Oracle DBA tasks.

This post looks like I am jumping on the bandwagon of IT orchestration like a lot of people are doing. Maybe I should say ‘except for (die hard) Oracle DBA’s’. Or maybe not, it up to you to decide.

Most people who are interested in IT in general will have noticed IT orchestration has gotten attention, especially in the form of Puppet and/or Chef. I _think_ IT orchestration has gotten important with the rise of “web scale” (scaling up and down applications by adding virtual machines to horizontal scale resource intensive tasks), in order to provision/configure the newly added machines without manual intervention, and people start picking it up now to use it for more tasks than provisioning of virtual machines for web applications.

I am surprised by that. I am not surprised that people want boring tasks like making settings in configuration files and restarting daemons, installing software with all the correct options, etc. being automated. Instead, I am surprised that people are now picking this up after it has been around for so long.

A little history.
As far as I know, IT orchestration started with cfengine, which was really a configuration engine (hence the name). Despite having a little different purpose (configuration vs. orchestration), this tool is the parent of all the orchestration/configuration tools which exist nowaday. cfengine started off as a study in 1993 by Mark Burgess at the university of Oslo, with the creation of the cfengine software as a result. As far as I can see, it has been available as open source software since the beginning.

Now back to what I am surprised at: with cfengine, there has been a way to configure linux/unix systems in a structured way, and install and configure software on systems since the mid-nineties. Yet, this wasn’t picked up (of course with a few exceptions). Fast forward to today, we see it is being adopted. And that is a good thing.

I created a setup with cfengine for a client a long time ago, which had the ability to install the Oracle software, different PSU’s in different home’s, and remove it by adding or removing machines to groups in a cfengine configuration file. It wasn’t picked up by the client, it’s way more fun running X to install the software, and make the choices by hand, and redo this over and over on every machine, right?

I almost forgotten about my work with cfengine, until I spoke with Alex Gorbatchev at a conference, at which he pointed me to Ansible. At first I didn’t do a lot with it, but lately I’ve given it a go, and I am very happy with it.

Another redo of cfengine?
From what I read, most of the configuration/orchestration engines created after cfengine are created to circumvent all kinds of difficulties with cfengine. I can understand that. It took me a while to learn cfengine, and indeed it forces you to think in a different way.

The Ansible project decided to radically do it different than all the other engines. It is different in the sense that it advertises itself as simple, agentless and powerful.

Simple.
Simple is a terrific goal. For those of you that have worked with configuration/orchestration engines, there is a steep learning curve. It is just hard to get the basic principles in your head. To be honest, also Ansible took me a while too, to grasp the basic principles, and get the picture correctly in my head. Yet, having worked with cfengine comparing it with Ansible’s playbooks, which are the scripts to do things on the targets, it is a breath of fresh air. Playbooks are so clean they (almost) can be read and understood as plain english.

Agentless.
This is where Ansible is truly different than any of the other configuration/orchestration tools. Ansible does not require any agent installation on the targets. The obvious next question then is: how can this work? Well, quite simple: Ansible uses ssh to connect to the host, and executes commands via the shell. Having that said, it requires a little more detail; Ansible uses python on the remote host for it’s normal execution. However, you can use it without python, for example to setup the host up for the Ansible normal usage mode Which requires python and the simple-json module.

This is truly important, and makes it an excellent fit for my daily work as an IT consultant.

Powerful.
Ansible is powerful in the way that you can do the configuration and orchestration tasks in a simple clean way.

Summary on the introduction.
Above was a brief personal history, and some of the “marketed” features of Ansible. I think being agentless is the true “killer feature” here. All the other configuration/orchestration engines require you to setup and configure a fixed client-server connection, and install a deamon and a central server process. In case you wondered, yes, authentication is important, and it’s simply brilliant that the ssh password authentication or public key infrastructure can be used.

Because there’s no daemon to install, you can run your created play books everywhere. So instead of a fixed client configuration, you can create play books to do routine tasks, and repeat it at multiple sites.

Okay, how does this work?

Installation: add EPEL and install ansible.
If you are on one of the clones of RedHat Enterprise Linux (I use Oracle Linux), you simply need to add the EPEL repository to your yum source list, and run:

# yum install ansible

First steps.
One of the first things I do, is create a directory for a typical ansible ‘project’. Project means a set of tasks you want to do to a set of hosts here. Next, I create a file called ‘hosts’ which is the list of hosts you want to use for executing tasks on. By default, Ansible looks in /etc/ansible/hosts. In this case, I put a single machine in it (a test VM), but it can be a list of machines (ip addresses or hostnames).

$ cat hosts
192.168.101.2

In fact, you can create groups in the hosts file in the “ini style”. But I just put one host in for this example.
The next thing is to check if Ansible reads the file correctly. This is done in the following way:

$ ansible all -i hosts --list-hosts
    192.168.101.2

Okay, this means Ansible will operate on this one host if invoked. The next logical thing (typically done when you are in a new client environment to check if you can reach the hosts):

$ ansible all -i hosts -m ping
192.168.101.2 | FAILED => FAILED: Authentication failed.

Ping might be a bit misleading for some people. What ping does here (-m means module), is trying to connect to the host over ssh, and log in. Because I didn’t specify a user, it used the username of the current user on the machine, which is ‘ansible’. A user ‘ansible’ typically doesn’t exist on a normal server (and is not necessary or should be created), and also not on my test server. So it failed, as the message said, on authentication.

My test VM is a basic installed (OL) linux 6 server. This means there’s only one user: root.

So, let’s specify the user root as user:

$ ansible all -i hosts -m ping -u root
192.168.101.2 | FAILED => FAILED: Authentication failed.

The authentication failed again. And it should! What this is doing, is trying to log on as root, and we haven’t given any password, nor have I put my local user’s public key in the remote authorised_keys file. So there is no way this could work. This is typically also the state when you want to do stuff with a “fresh” client system. Let’s add the ‘-k’ option (ask ssh password), and run again:

$ ansible all -i hosts -m ping -u root -k
SSH password:
192.168.101.2 | success >> {
    "changed": false,
    "ping": "pong"
}

To walk you through the output: It now asks for a password, which I’ve filled out, then lists the host and the state: success. During this execution, there was nothing changed on the remote host, and the ping command resulted in a pong (alike the ICMP ping response).

With what we have learned now, we can do things like this:

$ ansible all -i hosts -u root -k -a "ifconfig"
SSH password:
192.168.101.2 | success | rc=0 >>
eth0      Link encap:Ethernet  HWaddr 00:0C:29:14:65:ED
          inet addr:192.168.39.145  Bcast:192.168.39.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe14:65ed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:47 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6293 (6.1 KiB)  TX bytes:2594 (2.5 KiB)

eth1      Link encap:Ethernet  HWaddr 00:0C:29:14:65:F7
          inet addr:192.168.101.2  Bcast:192.168.101.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe14:65f7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:188 errors:0 dropped:0 overruns:0 frame:0
          TX packets:112 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:142146 (138.8 KiB)  TX bytes:15545 (15.1 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Does this look familiar for you Exadata DBA’s? Yes, this replicates some of the functionality of dcli (although dcli is aimed at executing simple tasks to a group of hosts, whilst Ansible is aimed at enterprise configuration and orchestration).

One step beyond! Playbooks.
Now let’s progress to playbooks. An Ansible playbook is where the true strength lies of Ansible. It allows you to specify tasks to execute on the remote hosts, and create sequences of tasks and make decisions based on the outcome of a tasks for further execution. Let me show you a simple playbook, and guide you through it:

---
- hosts: all
  gather_facts: no
  remote_user: root
  tasks:

  - name: upgrade all packages
    yum: name=* state=latest

  - name: install python-selinux
    yum: name=libselinux-python state=installed

  - name: add public key to authorized_key file of root
    authorized_key: user=root state=present key="{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}"

As you can see, this is a playbook with three tasks: upgrade all packages, install libselinux-python and adding my (local) public key to the authorised key file of root (to allow passwordless access).

Line 1 shows three dashes, which means the start of a YAML document.
Line 2 starts with a single dash, which indicates a list. There is one dash at this indention level, so it’s a list of one. The fields of this member are hosts, gather_facts and tasks. Tasks got his own list (mind the indention level, that is important). The fields are key/value pairs, with the separation indicated by the colon (:). The first field is ‘hosts’, with the value ‘all’. This means that all hosts in the hosts file are used for this playbook. I don’t think it’s hard to imagine how useful it can be to specify a group/kind of servers the playbook can run on. The next one is ‘gather_facts’. A normal playbook execution first gathers a lot of information from all the hosts it is going to run on before execution. These can be used during playbook execution. Next ‘remote_user’. This indicates with which user ansible is going to logon, so we don’t have to specify it on the command line. Then we see ‘tasks’ to indicate the list of tasks to be executed on the hosts.

It’s easy to spot we got three tasks. What is extremely important, is the indention of this list (it’s got a dash, so it’s a list!). Name is not mandatory, but it makes it easy to read if you give the tasks useful names and these will be shown when the playbook is executed. The first task has the name ‘upgrade all packages’. The next field shows the key is ‘yum’ indicating it is making use of the yum module. This key got two values: name=*, which means all ‘all packages’, and state=latest, which means we want all packages to be at the latest version. This means this command is the equivalent of ‘yum update’.

The second task is called ‘install python-selinux’. It makes use of the yum module again, and is self explanatory, it installs the libselinux-python package. This packages is necessary to work on a host which has selinux enabled on things that are protected by selinux.

The next task is called ‘add public key to authorised_key file of root’. It is making use of the authorized_key module. This module requires a parameter ‘key’, for which we use the lookup function to look up the local (!) public key, of the user with which I execute ansible, which is ‘ansible’. ‘state=present’ means we want this key to be present; ‘present’ is the default value, so it wasn’t necessary to put this in. Next ‘user=root': we want the public key to be added to the authorized_keys file of the user root.

Of course these tasks could be executed using the ‘ansible’ executable as single tasks. To show the importance of the installation of the libselinux-python module on a host with selinux enabled (which is the state of selinux on a fresh installed Oracle Linux machine), let’s execute the task using the authorized_key module:

$ ansible all -i hosts -k -u root -m authorized_key -a "user=root state=present key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
SSH password:
192.168.101.2 | FAILED >> {
    "failed": true,
    "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"
}

Clear, right? The host is selinux protected. Now, let’s execute the installation of the libselinux package as single task, and then add our public key to the authorized_key file of root:

$ ansible all -i hosts -k -u root -m yum -a "name=libselinux-python state=installed"
SSH password:
192.168.101.2 | success >> {
    "changed": true,
    "msg": "",
    "rc": 0,
    "results": [
        "Loaded plugins: security\nSetting up Install Process\nResolving Dependencies\n--> Running transaction check\n---> Package libselinux-python.x86_64 0:2.0.94-5.3.el6_4.1 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package             Arch     Version                 Repository           Size\n================================================================================\nInstalling:\n libselinux-python   x86_64   2.0.94-5.3.el6_4.1      public_ol6_latest   201 k\n\nTransaction Summary\n================================================================================\nInstall       1 Package(s)\n\nTotal download size: 201 k\nInstalled size: 653 k\nDownloading Packages:\nRunning rpm_check_debug\nRunning Transaction Test\nTransaction Test Succeeded\nRunning Transaction\n\r  Installing : libselinux-python-2.0.94-5.3.el6_4.1.x86_64                  1/1 \n\r  Verifying  : libselinux-python-2.0.94-5.3.el6_4.1.x86_64                  1/1 \n\nInstalled:\n  libselinux-python.x86_64 0:2.0.94-5.3.el6_4.1                                 \n\nComplete!\n"
    ]
}

$ ansible all -i hosts -k -u root -m authorized_key -a "user=root state=present key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
SSH password:
192.168.101.2 | success >> {
    "changed": true,
    "key": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAliR905hxLnsOCRlOGnmN0H9dGH4NPV88ySC6GMv0KNnU7FfCXYE51Bkk97p2IWFsPhYO9qJDyAFxRm/lia1IZRDpCFcKKKMh5eXmEJC5XSrHWFdmGZRlFcS3VQ3rpCIyU3qFM6xMazh3JHKKEtE1J6nvw/hW3slY9G/6VoJ8CzpfeQMLDOdVXUIcZXqtCPuIEDBQ7yjfMzTGz+hEmz7ImbLaUyB4MDGrDnl33L8mkBEVYu8RrwgBcagDQSiQKnIca/EL45eX/74NG1e/6vxZkHZJz/W0ak4KD+o9vF4ikz0bdrGPMZ5gRYXWoSSHrVA+Rqk8A93qBXNKUUkzGoQYTQ== ansible@ansiblevm.local",
    "key_options": null,
    "keyfile": "/root/.ssh/authorized_keys",
    "manage_dir": true,
    "path": null,
    "state": "present",
    "unique": false,
    "user": "root"
}

Maybe your customer doesn’t want you to store your keys in their servers. It’s easy to do the reverse, and remove your key from the authorized_key file:

$ ansible all -i hosts -u root -m authorized_key -a "user=root state=absent key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
192.168.101.2 | success >> {
    "changed": true,
    "key": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAliR905hxLnsOCRlOGnmN0H9dGH4NPV88ySC6GMv0KNnU7FfCXYE51Bkk97p2IWFsPhYO9qJDyAFxRm/lia1IZRDpCFcKKKMh5eXmEJC5XSrHWFdmGZRlFcS3VQ3rpCIyU3qFM6xMazh3JHKKEtE1J6nvw/hW3slY9G/6VoJ8CzpfeQMLDOdVXUIcZXqtCPuIEDBQ7yjfMzTGz+hEmz7ImbLaUyB4MDGrDnl33L8mkBEVYu8RrwgBcagDQSiQKnIca/EL45eX/74NG1e/6vxZkHZJz/W0ak4KD+o9vF4ikz0bdrGPMZ5gRYXWoSSHrVA+Rqk8A93qBXNKUUkzGoQYTQ== ansible@ansiblevm.local",
    "key_options": null,
    "keyfile": "/root/.ssh/authorized_keys",
    "manage_dir": true,
    "path": null,
    "state": "absent",
    "unique": false,
    "user": "root"
}

Please mind I didn’t specify ‘-k’ on the command line to send a password: in the previous step we added our key, so we can access our host using our public key. Another extremely important thing is ‘changed’. ‘changed’ indicates if the task did actually change something on the destination server.

I have ran single task until now, I changed the state of my test VM back to it’s state before I started changing it with ansible (by removing the libselinux package using ‘ansible all -i hosts -k -u root -m yum -a “name=libselinux-python state=absent”‘

Let’s run the above described playbook:

$ ansible-playbook -i hosts -k linux_setup_example.yml
 [WARNING]: The version of gmp you have installed has a known issue regarding
timing vulnerabilities when used with pycrypto. If possible, you should update
it (ie. yum update gmp).

SSH password:

PLAY [all] ********************************************************************

TASK: [upgrade all packages] **************************************************
changed: [192.168.101.2]

TASK: [install python-selinux] ************************************************
changed: [192.168.101.2]

TASK: [add public key to authorized_key file of root] *************************
changed: [192.168.101.2]

PLAY RECAP ********************************************************************
192.168.101.2              : ok=3    changed=3    unreachable=0    failed=0

Now at this point you might think: I get it, but these are all pretty simple tasks, it’s not special at all. Well, let me show you an actual thing which totally shows what the importance of using this is, even on a single machine, but even more when you got a large group of servers you have to administer.

The next example is a playbook created to apply PSU3 to an Oracle 11.2.0.4 home. It’s still quite simple, it just applies PSU3 to the Oracle home. But totally automatic. The point I am trying to make is that this is already nice to have automated a lot of work for a single home, but it saves a lot of hours (read: a lot of money), and saves you from human error.

---
- hosts: all
  vars:
    u01_size_gb: 1
    tmp_size_gb: 1
    oracle_base: /u01/app/oracle
    oracle_home: /u01/app/oracle/product/11.2.0.4/dbhome_1
    patch_dir: /u01/install
  remote_user: oracle
  tasks:

  - name: check u01 free disk space
    action: shell df -P /u01 | awk 'END { print $4 }'
    register: u01size
    failed_when: u01size.stdout|int < {{ u01_size_gb }} * 1024 * 1024

  - name: check tmp free disk space
    action: shell df -P /tmp | awk 'END { print $4 }'
    register: tmpsize
    failed_when: tmpsize.stdout|int < {{ tmp_size_gb }} * 1024 * 1024

  - name: create directory for installation files
    action: file dest={{ patch_dir }} state=directory owner=oracle group=oinstall

  - name: copy opatch and psu
    copy: src=files/{{ item }} dest={{ patch_dir }} owner=oracle group=oinstall mode=0644
    with_items:
     - p6880880_112000_Linux-x86-64.zip
     - p18522509_112040_Linux-x86-64.zip
     - ocm.rsp

  - name: install opatch in database home
    action: shell unzip -oq {{ patch_dir }}/p6880880_112000_Linux-x86-64.zip -d {{ oracle_home }}

  - name: unzip psu patch
    action: shell unzip -oq {{ patch_dir }}/p18522509_112040_Linux-x86-64.zip -d {{ patch_dir }}

  - name: patch conflict detection
    action: shell export ORACLE_HOME={{ oracle_home }}; cd {{ patch_dir }}/18522509; $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    register: conflict_detection
    failed_when: "'Prereq \"checkConflictAgainstOHWithDetail\" passed.' not in conflict_detection.stdout"

  - name: apply psu
    action: shell export ORACLE_HOME={{ oracle_home}}; cd {{ patch_dir }}/18522509; $ORACLE_HOME/OPatch/opatch apply -silent -ocmrf {{ patch_dir }}/ocm.rsp
    register: apply_psu
    failed_when: "'Composite patch 18522509 successfully applied.' not in apply_psu.stdout"

  - name: clean up install directory
    file: path={{ patch_dir }} state=absent

Let me run you through this playbook! It starts off with the indication of a YAML document: ‘—‘. Next hosts: all again. I just put all the hosts in the hosts file, I did not create all kinds of groups of hosts (which would be fitting when you use it at a fixed environment, but I use it for various customers). Then vars, with a list of variables. As you can see, I can use the variables, which are shown in the playbook as {{ variable }}. Then remote_user: oracle and tasks.

The first and second task use variables, and use the argument ‘register’ to save all response into a named variable. I also use ‘failed_when’ to make the playbook stop executing when the argument after ‘failed_when’ is true. Arguments of ‘failed_when’ is the named variable, for which the output of the standard out is used (.stdout). Then a filter is used to cast the output to integer, and is compared with a calculation of the variable.

The third task is using the files module to create a directory. The fourth task is using the copy module. The copy module means a file or files (in this case) are copied from the machine from which the playbook is run, onto the destination host or hosts. Here is also another trick used, to process the task with a list of items. As you can see, the copy line contains a variable {{ items }}, and the task is executed for all the items in the list ‘with_items’. I found this is fine for smaller files (up to a few hundred of megabytes), but too slow for bigger files. I use http (the get_url module) to speed up file transfer.

The fifth and sixth tasks execute a shell command, unzip, to extract the contents of a zip file into a specific place.

The seventh task is executing a small list of shell commands, in order to be able to run the conflict detection option of opatch. The same trick as with the first two tasks is used, register a name for the output of the conflict detection. Here I check if the stdout contains what I would manually check for when I would run it. The eighth task is the main task of the whole playbook: the actual patch. However, it uses the same technique as task seven. The last task simply removes a directory, in order to remove the files we used for this patch.

Summary
I hope this shows what a tremendous help Ansible can be for a consultant. This kind of tool is simply mandatory if you got an environment with more than approximately ten to twenty servers to administer. Ansible can be used even if the organisation does not want to spend time on the implementation of a configuration tool.

Tagged: ansible, configuration, installation, linux, orchestration

Temporal Validity, ACLs, External Tables, SQL*Loader and more in Oracle 12c

Some more 12c articles have trickled out over the last few days.

I kind-of mentioned this next thing in a post a few weeks ago, but didn’t name names. :) While writing an article about the PDB logging clause in 12.1.0.2 I noticed it didn’t work. I raised an SR with Oracle Support and they confirmed it was a bug. I was not planning to release the article until the bug was patched, but it came up in conversation recently and I decided it was better to release the article with a big fat warning on the top saying it doesn’t work, just so others are not as confused by this as I was. I’m still not sure it is the right thing to do, but what the heck…

When the bug is patched, I will revise the article and probably promote it to the front page of the website as a “new article”. For now it is lurking in the depths of my website. :)

The 12c journey continues…

Cheers

Tim…


Temporal Validity, ACLs, External Tables, SQL*Loader and more in Oracle 12c was first posted on September 14, 2014 at 8:23 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Learning for yourself!

This subject comes up again and again. See:

In a freaky coincidence, two people asked virtually the same thing of me yesterday and I answered each of them individually, but it paved the way for this post.

In the first case, the person asked me several questions about getting better at Oracle. This is part of my reply.

Q: What is the secret behind your knowledge in oracle?
A: No secret. I’ve been doing this for nearly 20 years and I’m constantly reading and trying to learn.

Q: How have you gained knowledge in oracle and able to answer our questions of your own?
A: I am not on my own. I have the manuals, MOS, and Google.

Q: There should be some tactics with you that how you could able to answer our questions .Please share with us to become like you in future.
A: Every time you have a question, try and answer it for yourself by searching the manuals, MOS and the internet. When you think you have the answer, test it to make sure you are correct. The process of getting the answer for yourself will teach you a lot. You will often stumble across other totally unrelated information, which adds to your total knowledge.

In the second case, I suggested the person start trying to answer questions for themselves, to which they asked what was the benefit of searching for themselves over just asking me. This is part of my response.

1) You are learning how to find answers and research for yourself. This is a very important skill. Do you think all the people answering questions know all of this by heart? We don’t, but we know how to quickly get information. Over time you start to remember stuff, but I rarely rely on my own memory. I nearly always look for a link in the docs to prove I am not mistaken, or remembering how things were in a previous version. It’s important you learn this skill if you want to get good at anything.

2) You might ask me a question when I am away from the keyboard, like when I’m sleeping, and you will wait hours for the answer. If you know how to find the answers for yourself you can work at your own pace.

3) Often, finding the answer for yourself is quicker than asking the question. For example, for that last question, if you had gone to Google and searched for “oracle uninstall” the first four links are:
– 2 document references.
– The OTN forum, where people link to my article as the answer.
– My article.

Chances are, you would have got the answer quicker than the time it takes to write the question to me. It would certainly be quicker than the time it took to receive the answer. :)

4) People on the internet might be wrong. You need to check everything I say anyway, so why not find the information for yourself. Even the manuals are wrong sometimes, so you always need to check. You might be reading an old article etc.

I know people think there is some secret to this, but there really isn’t. When I look at all the people I admire in the Oracle community, they have all put the hours in.

Cheers

Tim…


Learning for yourself! was first posted on September 14, 2014 at 8:05 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.