Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

19c EM Express (aka Oracle Cloud Database Express)

Oracle has a long history of interactive tools for DBA and, as usual, the name has changed at each evolution for marketing reasons.

OEM in Oracle7 #nostalgia

SQL*DBA had a Menu mode for text terminals. You may also remember DBA Studio. Then called Oracle Enterprise Manager with its SYSMAN repository and also referred to as OEM or EM. The per-database version has been called OEM “Database Control” and then “EM Express” in 12c. The multi-database version has been called according to the marketing tag “Grid Control” in 11g, and “Cloud Control” in 12c.

I hate those names because they are wrong. A central console has nothing to do with “grid” or “cloud” and the only all-targets view is the ‘DB Load Map’ page. Something is wrong when a customer talks about “The Grid” and you don’t know if it is about the administration console (OEM) or the clusterware Grid Infrastructure (GI). Even worse with a one-database only GUI.

19c EM Express login screen when connecting to CDB port

But marketing always win. And in 19c this small single-database graphical interface, mostly used by small companies having few databases hosted in their premises, is called “Oracle Cloud Database Express”.

Remember that you need to define the port where EM Express runs. It runs with XDB, and the syntax to set is not easy to remember: underscores for package name, but no underscores for the http/https function name):

SQL> exec dbms_xdb_config.sethttpsport(5500);
PL/SQL procedure successfully completed.


As you can see in the login screenshot, there’s no way to mention that I want to connect ‘as sysdba’ so let’s try with different users to see if the role is chosen autonomously:

grant create session,sysdba to c##sysdba identified by "oracle" container=all;
grant create session,sysoper to c##sysoper identified by "oracle" container=all;
grant dba to c##dba identified by "oracle" container=all;
grant sysdba,sysoper,dba to c##all identified by "oracle" container=all;

Actually, I was able to connect with SYS but not with my own SYSDBA users. Here are the only successful connections:

C##DBA (role DBA) and SYS can connect, but not my SYSDBA custom users

It is probably not a big problem for the moment, given the very limited features that are there. No need for SYSDBA to read performance statistics and kill sessions. I’ll update this post when I have more information about this.


As you can see in the login screenshot, I can mention a container name, the default being the CDB root. However, when I try to do so I get the XDB login popup (same as when I forgot the /em in the URL) and ‘Invalid Container’.

The workaround is to open a port for each PDB and connect directly to it.


You remember how the move from the 11g dbconsole to 12c EM Express removed many items in the menus. Here is the 19c database express one:

There’s only one item in the 19.2 menu: Performance/ Performance Hub

One item only in a menu… my guess (and hope) is that this one is still work-on-progress. 19c is currently for Exadata only and Ican imagine that all installations are managed by Oracle Enterprise Manager. Or maybe SQL Developer Web will become the replacement for this console.

HTML5, ASH Analytics,…

There’s one awesome news here: end of Flash. This Performance Hub is nice and responsive. No Adobe Flex anymore, but the same idea with an HTML that contains the data (XML) and calls an online script to display it:

SQL Monitor shows the predicates on the same tab as execution statistics:

There’s a tab to go directly to the execution plan operation which is the busier:

EM Express (I’ll continue to call it like this) can be used on Data Guard as well and can monitor the recovery on the read-only CDB$ROOT:

I can kill a session but not (yet?) cancel a running SQL statement:

The activity tab is similar to the ASH Analytics where I can choose the dimensions displayed:

and I can also remove the time dimension to show three other dimensions:

zHeap: PostgreSQL with UNDO

I’m running on an Oracle Cloud Linux 7.6 VM provisioned as a sandbox so I don’t care about where it installs. For a better installation procedure, just look at Daniel Westermann script in:

Some more zheap testing - Blog dbi services

The zHeap storage engine (in development) is provided by EnterpriseDB:


I’ll also use pg_active_session_history, the ASH (Active Session History) approach for PostgreSQL, thanks to Bertrand Drouvot


In order to finish with the references, I’m running this on an Oracle Cloud compute instance (but you can run it anywhere).

Cloud Computing VM Instances - Oracle Cloud Infrastructure

Here is what I did on my OEL7 VM to get PostgreSQL with zHeap:

# Install and compile

sudo yum install -y git gcc readline-devel zlib-devel bison-devel
sudo mkdir -p /usr/local/pgsql
sudo chown $(whoami) /usr/local/pgsql
git clone
cd zheap && ./configure && make all && make install
cd contrib && make install
cd ../..

# Create a database

# Environment

export PGDATA=/var/lib/pgsql/data
echo "$PATH" | grep /usr/local/pgsql/bin ||
export PATH="$PATH:/usr/local/pgsql/bin"

# Creation of the database and start the server

pg_ctl start
ps -edf | grep postgres && psql postgres <<<"\l\conninfo\;show server_version;"

# Install pg_Sentinel extension

git clone
cd pgsentinel/src && make && make install
cat >> $PGDATA/postgresql.conf <shared_preload_libraries = 'pg_stat_statements,pgsentinel'
track_activity_query_size = 2048
pg_stat_statements.track = all
psql postgres -c "create extension pgsentinel;"

# create a demo database

psql postgres -c "create database demo;"
psql demo -c "create extension pgsentinel;"

Undo and discard workers

Here I am. Don’t worry about the user running it, that’s just me using what I already have there, but you can create a postgres user. I’m in a version 12 in development:

ps -edf | grep postgres && psql postgres <<<"\l\conninfo\;show server_version;show config_file;"

ps -edf | grep postgres && psql postgres <<<”\l\conninfo\;show server_version;show config_file;”

zHeap vs. Heap

In the past I measured the redo journaling (WAL) by PostgreSQL ( because, coming from Oracle, I was surprised by the amount of redo generated by some small updates in PostgreSQL. This overhead is due to the combination of two weaknesses: full page logging and no in-place update. The second will be partially addressed by zHeap, so let’s do the same test.

strace | awk

Here is the awk script I use to measure the volume written to disk

strace -fye trace=write,pwrite64 -s 0 pg_ctl start 2>&1 >/dev/null | awk '
/^.pid *[0-9]+. /{
pid=$2 ; sub("]","",pid)
"cat /proc/" pid "/cmdline" |& getline cmdline
sub(/pid *[0-9]+/,sprintf("%-80s ",cmdline))
/pg_wal/ || /undo/ {
/pwrite64[(].*, *[0-9]+, *[0-9]+[)]/{
sub(/, *[0-9]+[)].*/,"")
sub(/[(][0-9]+ sum[$0]=sum[$0]+bytes
/write[(].*, *[0-9]+[)]/{
sub(/[(][0-9]+ sum[$0]=sum[$0]+bytes
{ print > "/dev/stderr" }
printf "%9s%1s %6s %7s %s\n","BYTES","","COUNT","AVG","process/file"
for (i in sum){
u=" "
if (cnt[i]>1) printf "%9d%1s %6d %7d %s\n",s,u,cnt[i],sum[i]/cnt[i],i
' | sort -h

I strace the write calls (-e trace=write, pwrite64) without showing the data written (-s 0) when running the database server (pg_ctl start), tracing all child processes (-f) and showing the file names with the descriptor (-y). The awk keeps only the call, file, pid and bytes written to aggregate them. The pid is expanded with the process argv[0] for better readability.

Create zHeap table

Here is the table as in the previous blog post, but mentioning zHeap storage:

create table demoz using zheap as select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(0,0);
insert into demoz select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(1,1000000);

Sparse update on one column

Here is the update that I wanted to test:

update demoz set b=b+1 where mod(a,10)=1;
UPDATE 100000

And the result of my strace|awk script on these 100000 updates:

- 4403+2047=1645 8k blocks, which is 112+15=127MB of data
- 120+14=134MB of WAL
- 15+14+2=31MB of UNDO
The volume of undo is approximately the real volume of changes (I had 15MB of redo and 6MB of undo with same update on Oracle). But we still have an exaggerated volume of block changes (and with full-page logging).

I’ve created the same table in default Heap storage, and here is the write() trace for the same update:

- 16191+4559+1897=22647 8k blocks, which is 175MB of data
- 131+33=164MB of WAL

On this use case, which is quite common when we process data (call record, orders, …) and set only a flag or a date to mark them as processed, it seems that zHeap helps, but not a lot. But a real case would have many indexes on this table and updating in-place may reduce the overhead for non-updated columns. That’s for a future post.

Automatic Indexes in #Exasol

An Exasol database will automatically create, maintain and drop indexes, following the core idea to deliver great performance without requiring much administrative efforts. Like our tables, our indexes are always compressed and you don’t need to configure anything for that.

Joins between two or more tables are processed like this in Exasol: One table is full scanned (this is called the root table) and the other tables are joined using an index on their join columns.

If these indexes on the join columns are not already existing, they are automatically created during the join operation. Taking two tables t1 and t2 as an example, and a statement like

select count(*) from t1 join t2 on t1.joincol = t2.joincol;

The Exasol optimizer will compute an execution plan based on automatically gathered statistics that inform it amongst others about the table sizes. Often, the smaller table will become the root table and the other table will be joined using an index on the join column.
Let’s see an example:

create or replace table t1 as
rownum as joincol, 
'Just some text' as textcol,
mod(rownum,5) as numcol1, 
mod(rownum,1000) as numcol2,
5000 as numcol3,
to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'') as time_id 
from (select 1 from dual connect by level<=1e6);

SUCCESS	CREATE	36.43		1000000	OK

create or replace table t2 as select * from t1 limit 5e5;

SUCCESS	CREATE	0.735		500000	OK

select count(*) from t1 join t2 on t1.joincol=t2.joincol;


select index_table,remarks from exa_dba_indices where index_table like 'T_';


That index got created during query execution (on the first join between t1 and t2) and subsequently supports further joins with t1 on joincol.

If DML is done on t1, this index is maintained by the system:
INSERT into t1 will add new index entries accordingly.
DELETE from t1 will mark rows as deleted until more than 25% of rows have been deleted, Then t1 is reorganized automatically and the index is automatically rebuilt.
UPDATE statements that affect less than 15% of rows will update index key entries accordingly. If more than 15% of rows are updated, the index is automatically rebuilt.

If  an index is not used to support queries for more than 5 weeks, it will be automatically dropped. That way, a useless index will not slow down DML and consume space for no good reason.

Operationally, nothing needs to be done about indexes in Exasol and that’s what most of our customers do: They just let the system take care of indexes. In earlier versions, EXA_DBA_INDICES didn’t exist even to avoid providing superfluous information.

What is a global index and why was it created that way on t1, you may ask. Like tables, indexes are also distributed across the Exasol cluster nodes. If the index part on a node points only to table rows residing on the same node, that’s a local index. A global index means that the index part on a node points to (at least some) table rows residing on another node. Means a global join leads to global indexes while local joins lead to local indexes. Profiling a join between t1 and t2 confirms that:

select part_name,part_info,object_name,remarks from exa_user_profile_last_day where stmt_id =21 and session_id=current_session;

COMPILE / EXECUTE   (null)	              (null)	           (null)
SCAN	            (null)	               T2	           (null)
JOIN	            GLOBAL	               T1	           T2(JOINCOL) => GLOBAL INDEX (JOINCOL)
GROUP BY	    GLOBAL on TEMPORARY table  tmp_subselect0	   (null)

So that was a global join using a global index. If the two tables were distributed on joincol, this leads to local joins with local indexes. Putting the distribution key on joincol for t1 will automatically convert the existing index into a local index:

alter table t1 distribute by joincol;


alter table t2 distribute by joincol;


select index_table,remarks from exa_dba_indices where index_table like 'T_';


When using prepared statements to do UPDATE or DELETE, this may automatically create an index on the filter column:

update t2 set textcol='Modified' where numcol1=:parameter1; -- Enter 1

SUCCESS	UPDATE	0.44		100004	OK

select index_table,remarks from exa_dba_indices where index_table like 'T_';


delete from t1 where numcol2=:parameter2; -- Enter 42


select index_table,remarks from exa_dba_indices where index_table like 'T_';


This results in local indexes because there is no need to refer to rows on other nodes while each node updates or deletes on his locally stored part of the tables.
I’m using DbVisualizer as a SQL Client for my demos here and it prompts for inputs of :parameter1 and :parameter2 when the statements are executed.

Another reason why indexes are automatically created is when primary or unique constraints are added to a table:

alter table t2 add constraint t2pk primary key (joincol);


select index_table,remarks from exa_dba_indices where index_table like 'T_';


This created a local index because t2 is distributed on joincol.
Please do not take this as a recommendation to add primary or unique constraints to tables in Exasol, I just mentioned it because it also leads to the creation of indexes. MERGE statements also lead to index creation because they are processed as joins, by the way.

Although customers are not required to do anything about indexes, there’s a couple of good practices related with them that make your Exasol database perform even better:

Small numeric formats are optimal for join columns and lead to small efficient indexes. It’s often beneficial to replace multi-column joins respectively joins on large formatted columns by joins on numeric surrogate keys. IDENTITY COLUMNS help to generate these surrogate keys.

Avoid using expressions on the join columns because the resulting indexes are not persisted and have to be built again during every join:

select count(*) from t1 join t2 on t1.joincol+1=t2.joincol+1; -- don't do that

Avoid having mixed data types on join columns because that can also lead to expression indexes:

create or replace table t3 as
to_char(rownum) as joincol, -- don't do that
'Just some text' as textcol,
mod(rownum,5) as numcol1, 
mod(rownum,1000) as numcol2,
5000 as numcol3,
to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'') as time_id 
from (select 1 from dual connect by level<=1e6);

SUCCESS	CREATE	34.925		1000000	OK

create or replace table t4 as select * from t1 limit 100;


Above created a large table t3 using a string type for joincol and a small table t4 using a numeric type for joincol. Upon joining the two, likely t4 becomes the root table and t3 is expression indexed:

select count(*) from t3 join t4 on t3.joincol=t4.joincol;


select index_table,remarks from exa_dba_indices where index_table like 'T_';


No index listed for t3 as you see. The profile of the join between t3 and t4 shows:

select part_name,part_info,object_name,remarks from exa_user_profile_last_day where stmt_id =95 and session_id=current_session;

COMPILE / EXECUTE   (null)	               (null)	        (null)
INDEX CREATE	    EXPRESSION INDEX	        T3	        ExpressionIndex
SCAN	            on REPLICATED table	        T4	        (null)
JOIN	            (null)	                T3	        T4(JOINCOL) => ExpressionIndex
GROUP BY	    GLOBAL on TEMPORARY table	tmp_subselect0	(null)

There was an index created for the join, but expression indexes are not persisted. This was a local join (no GLOBAL indicator in the PART_INFO column for the JOIN step) because t4 was replicated across all nodes due to its small size.

Apart from following the mentioned good practices, there’s simply not much for customers to take care of related to indexes in Exasol – it just works </p />

    	  	<div class=

19c EZCONNECT and Wallet (Easy Connect and External Password File)

I like EZCONNECT because it is simple when we know the host:port, and I like External Password Files because I hate to see passwords in clear text. But the combination of the two was not easy before 19c.

Of course, you can add a wallet entry for an EZCONNECT connection string, like ‘//localhost/PDB1’ but in the wallet, you need a different connection string for each user because it associates a user and password to a service name. And you have multiple users connecting to a service.

Here is an example. I have a user DEMO with password MyDemoP455w0rd:

SQL*Plus: Release - Production on Thu Apr 4 19:19:47 2019
Copyright (c) 1982, 2018, Oracle.  All rights reserved.
SQL> connect sys/oracle@//localhost/PDB1 as sysdba
SQL> grant create session to demo identified by MyDemoP455w0rd;
Grant succeeded.
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release - Production

I create a wallet:

mkdir -p /tmp/wallet
mkstore -wrl /tmp/wallet -create <MyWall3tP455w0rd

I add an entry for service name PDB1_DEMO connecting to PDB1 with user DEMO:

mkstore -wrl /tmp/wallet -createCredential PDB1_DEMO DEMO <MyDemoP455w0rd

I define sqlnet.ora to use it and tnsname.ora for this PDB1_DEMO entry:

echo "
" >> /tmp/wallet/sqlnet.ora
echo "
" >> /tmp/wallet/tnsnames.ora

I can connect passwordless when running sqlplus with TNS_ADMIN=/tmp/wallet where I have the sqlnet.ora and tnsnames.ora:

SQL*Plus: Release - Production on Thu Apr 4 19:19:49 2019
Copyright (c) 1982, 2018, Oracle.  All rights reserved.
SQL> connect /@PDB1_DEMO
SQL> show user
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release - Production

Eazy Connect

I add a new entry for the EZCONNECT string:

mkstore -wrl /tmp/wallet -createCredential //localhost/PDB1 DEMO <MyDemoP455w0rd

I can connect with it:

SQL*Plus: Release - Production on Thu Apr 4 19:19:50 2019
Copyright (c) 1982, 2018, Oracle.  All rights reserved.
SQL> connect /@//localhost/PDB1
SQL> show user
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release - Production

But what do you do when you need to connect with different users? With a tnsnames.ora you can have multiple entries for each one, like:


and then define a credential for each one. But that is not possible with EZCONNECT. Or you have to define a different server for each user — which may not be a bad idea by the way.

19c dummy parameter

Oracle 19c extends the EZCONNECT syntax as I described recently in:

19c Easy Connect

With this syntax, I can add parameters. And then, why not some dummy parameters to differentiate multiple entries connecting to the same database but with different users? Here is an example:

mkstore -wrl /tmp/wallet \
-createCredential //localhost/PDB1?MyUserTag=DEMO DEMO <MyDemoP455w0rd

This just adds a parameter that will be ignored, but helps me to differentiate multiple entries:

$ tnsping //localhost/PDB1?MyUserTag=DEMO
TNS Ping Utility for Linux: Version - Production on 04-APR-2019 19:41:49
Copyright (c) 1997, 2018, Oracle.  All rights reserved.
Used parameter files:
Used HOSTNAME adapter to resolve the alias
OK (0 msec)

Here is my connection to DEMO using the credentials in the wallet:

SQL*Plus: Release - Production on Thu Apr 4 19:19:51 2019
Copyright (c) 1982, 2018, Oracle.  All rights reserved.
SQL> connect /@//localhost/PDB1?MyUserTag=demo
SQL> show user
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release - Production

I need an sqlnet.ora and a wallet, but no tnsnames.ora

Here are all the entries that I can use:

$ mkstore -wrl /tmp/wallet -listCredential
Oracle Secret Store Tool Release - Production
Copyright (c) 2004, 2018, Oracle and/or its affiliates. All rights reserved.
Enter wallet password:
List credential (index: connect_string username)
3: //localhost/PDB1?MyUserTag=demo DEMO
2: //localhost/PDB1 DEMO

I do not use it for applications. The host name is not a problem as I can have a DNS alias for each application, but I don’t want the listener port hardcoded there. Better a centralized tnsnames. ora or LDAP.

However, for the administration scripts like RMAN backups or duplicates, or Data Guard broker, a simple passwordless EZCONNECT is easier.

Adding JDBC driver property in SQL Developer connecting to MySQL

I suppose you get it there because this kind of error was properly indexed by Google:

Status : Failure -Test failed: The server time zone value 'CEST' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support.

However, this trick works if you want to add any property to the JDBC URL string when connecting with Oracle SQL Developer, which provides no other way to add properties.

The trick is JDBC URL Injection after the port. When connecting to port 5501 I set the following in the ‘port’ field:


like this:

which finally will expand to:


And get connected probably because of few bugs on both side, so not sure it works on all versions for DB Load and Active Sessions turns out to be a nice solution for collecting, retrieving and displaying multi-dimensional time series data, i.e. the kind of data you get from sampling.

For example, in the database world we have Active Session History (ASH) which at  it’s core tracks

  1. when – timestamp
  2. who – user
  3. command – what SQL are they running
  4. state – are they runnable on CPU or are they waiting and if waiting what are they waiting for like I/O, Lock, latch, buffer space, etc

Collecting this information is pretty easy to store in a relational database as I did when creating S-ASH (Simulated ASH) and Marcin Przepiorowski built upon over the years since, or even store in flatfiles like I did with W-ASH (web enabled ASH).

On the other hand retrieving the data in a way that can be graphed is challenging. To retrieve and display the data we need to transform the data into number time series.

WIth we can store, retrieve and display data by various dimensions as time series.

Just sign up at honeycomb, then start to create a dataset. Pick any application it doesn’t matter and when you hit continue for creating a dataset, you will get a writekey. WIth that writekey you can start sending data to

I’m on a Mac using Python so I just installed with

pip install libhoney


I then connected to a PostgreSQL database in Amazon RDS and looped, running a query to collect the sampled data

       state = 'active' ; "

and putting this in a call to

import libhoney
import psycopg2
import pandas as pd
import time
from time import gmtime, strftime
libhoney.init(writekey="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", dataset="honeycomb-python-example", debug=True)
conn_string = "host="+ PGHOST +" port="+ "5432" +" dbname="+ PGDATABASE +" user=" + PGUSER  +" password="+ PGPASSWORD
sql_command = "select datname, pid, usename, application_name, client_addr, COALESCE(client_hostname,'unknown'), COALESCE(wait_event_type,'cpu'), COALESCE(wait_event,'cpu'), query from pg_stat_activity where state = 'active' ; "
print (sql_command)
builder = libhoney.Builder()
        while 1 < 2 :
                mytime=strftime("%Y-%m-%d %H:%M:%S", time.localtime())
                print "- " + mytime + " --------------------- "
                cursor = conn.cursor()
                for row in cursor:
                        query=row[8].replace('\n', ' ')
                        if group != "cpu" :
                                event= group + ":" + event
                        print '{0:10s} {1:15s} {2:15s} {3:15s} {4:40.40s}'.format(un,ip,group,event,query)
                        ev = builder.new_event()
                        ev.add_field( "hostname", ip)
                        ev.add_field( "user", un)
                        ev.add_field( "event", event)
                        ev.add_field( "sql", query)
                        #ev.created_at = mytime;

( You might notice the disconnect / connect at the end of the loop. That waste resources but for some reason querying from pg_stat_activity would return the same number of rows if I didn’t disconnect. Disconnecting it worked. For the case of a simple demo I gave up trying to figure out what was going on. This weirdness doesn’t happen for user tables)

On the dashboard page I can choose “events” under “BREAK DOWN” and “count()” under (CALCULATE PER GROUP) and I get a db load chart by wait event. I can further choose to make the graph a stacked graph:


Screen Shot 2019-04-03 at 2.38.47 PM


Now there are some limitations that make this less than a full solution. For one, zooming out cause the granularity to change from graphing a point every second to graphing points every 15, 30 or 60 seconds, yet the count will count all the points in those intervals and there is no way to normalize it by the elasped time i.e. for a granularity of 60 seconds it sums up all the points in 60 seconds and graphs that value where what I want is to take that 60 second sum and divide by 60 seconds to get the *Average* active sessions in that interval and not the sum.

But over all a fun easy demo to get started with.

I found to respond quickly to emails and they have a slack channel where folks were asking and answering questions that was responsive as well.


Ansible tips’n’tricks: testing and debugging Ansible scripts using Vagrant

At last year’s UKOUG I presented about Vagrant and how to use this great piece of software to test and debug Ansible scripts easily. Back then in December I promised a write-up, but for various reasons only now got around to finishing it.

Vagrant’s Ansible Provisioner

Vagrant offers two different Ansible provisioners: “ansible” and “ansible_local”. The “ansible” provisioner depends on a local Ansible installation, on the host. If this isn’t feasible, you can use “ansible_local” instead. As the name implies it executes code on the VM instead of on the host. This post is about the “ansible” provisioner.

Most people use Vagrant with the default VirtualBox provider, and so do I in this post.

A closer look at the Vagrantfile

It all starts with a Vagrantfile. A quick “vagrant init ” will get you one. My test image I use for deploying the Oracle database comes with all the necessary block devices and packages needed, saving me quite some time. Naturally I’ll start with that one.

$ cat -n Vagrantfile 
     1    # -*- mode: ruby -*-
     2    # vi: set ft=ruby :
     4    Vagrant.configure("2") do |config|
     6      config.ssh.private_key_path = "/path/to/ssh/key"
     8 = "ansibletestbase"
     9      config.vm.define "server1" do |server1|
    10 = "ansibletestbase"
    11        server1.vm.hostname = "server1"
    12 "private_network", ip: ""
    14        config.vm.provider "virtualbox" do |vb|
    15          vb.memory = 2048
    16          vb.cpus = 2 
    17        end 
    18      end 
    20      config.vm.provision "ansible" do |ansible|
    21        ansible.playbook = "blogpost.yml"
    22        ansible.groups = { 
    23          "oracle_si" => ["server1"],
    24          "oracle_si:vars" => { 
    25            "install_rdbms" => "true",
    26            "patch_rdbms" => "true",
    27            "create_db" => "true"
    28          }   
    29        }   
    30      end 
    32    end

Since I have decided to create my own custom image without relying on the “insecure key pair” I need to keep track of my SSH keys. This is done in line 6. Otherwise there wouldn’t be an option to connect to the system and Vagrant couldn’t bring the VM up.

Lines 8 to 18 define the VM – which image to derive it from, and how to configure it. The settings are pretty much self-explanatory so I won’t go into too much detail. Only this much:

  • I usually want a host-only network instead of just a NAT device, and I create one in line 12. The IP address maps to and address on vboxnet0 in my configuration. If you don’t have a host-only network and want one, you can create it in VirtualBox’s preferences.
  • In line 14 to 17 I set some properties of my VM. I want it to come up with 2 GB of RAM and 2 CPUs.

Integrating Ansible into the Vagrantfile

The Ansible configuration is found on lines 20 to 30. As soon as the VM comes up I want Vagrant to run the Ansible provisioner and execute my playbook named “blogpost.yml”.

Most of my playbooks rely on global variables I define in the inventory file. Vagrant will create an inventory for me when it finds an Ansible provisioner in the Vagrantfile. The inventory it creates doesn’t fit my needs though, but that is easy to change. Recent Vagrant versions allow me to create the inventory just as I need it. You see this in lines 22 to 28. The resulting inventory file is created in .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory and looks like this:

$ cat .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory 
# Generated by Vagrant

server1 ansible_host= ansible_port=2222 ansible_user='vagrant' ansible_ssh_private_key_file='/path/to/ssh/key'



That’s exactly what I’d use if I manually edited the inventory file, except that I don’t need to use “vagrant ssh-config” to figure out what the current SSH configuration is.

I define a group of hosts, and a few global variables for my playbook. This way all I need to do is change the Vagrantfile and control the execution of my playbook rather than maintaining information in 2 places (Vagrantfile and static inventory).

Ansible Playbook

The final piece of information is the actual Ansible playbook. Except for the host group I’m not going to use the inventory’s variables to keep the example simple.

$ cat -n blogpost.yml 
     1    ---
     2    - name: blogpost
     3      hosts: oracle_si
     4      vars:
     5      - oravg_pv: /dev/sdb
     6      become: yes
     7      tasks:
     8      - name: say hello
     9        debug: msg="hello from {{ ansible_hostname }}"
    11      - name: partitioning PVs for the volume group
    12        parted:
    13          device: "{{ oravg_pv }}"
    14          number: 1
    15          state: present
    16          align: optimal
    17          label: gpt

Expressed in plain English, it reads: take the block device indicated by the variable oravg_pv and create a single partition on it spanning the entire device.

As soon as I “vagrant up” the VM, it all comes together:

$ vagrant up
Bringing machine 'server1' up with 'virtualbox' provider…
==> server1: Importing base box 'ansibletestbase'…
==> server1: Matching MAC address for NAT networking…
==> server1: Setting the name of the VM: blogpost_server1_1554188252201_2080
==> server1: Clearing any previously set network interfaces…
==> server1: Preparing network interfaces based on configuration…
server1: Adapter 1: nat
server1: Adapter 2: hostonly
==> server1: Forwarding ports…
server1: 22 (guest) => 2222 (host) (adapter 1)
==> server1: Running 'pre-boot' VM customizations…
==> server1: Booting VM…
==> server1: Waiting for machine to boot. This may take a few minutes…
server1: SSH address:
server1: SSH username: vagrant
server1: SSH auth method: private key
==> server1: Machine booted and ready!
==> server1: Checking for guest additions in VM…
==> server1: Setting hostname…
==> server1: Configuring and enabling network interfaces…
server1: SSH address:
server1: SSH username: vagrant
server1: SSH auth method: private key
==> server1: Mounting shared folders…
server1: /vagrant => /home/martin/vagrant/blogpost
==> server1: Running provisioner: ansible…

Vagrant has automatically selected the compatibility mode '2.0'according to the Ansible version installed (2.7.7).

Alternatively, the compatibility mode can be specified in your Vagrantfile:

server1: Running ansible-playbook...

PLAY [blogpost] *************************************************************

TASK [Gathering Facts] ******************************************************
ok: [server1]

TASK [say hello] ************************************************************
ok: [server1] => {
"msg": "hello from server1"

TASK [partitioning PVs for the volume group] ********************************
changed: [server1]

PLAY RECAP ******************************************************************
server1 : ok=3 changed=1 unreachable=0 failed=0

Great! But I forgot to partition /dev/sd[cd] in the same way as I partition /dev/sdb! That’s a quick fix:

- name: blogpost
  hosts: oracle_si
  - oravg_pv: /dev/sdb
  - asm_disks:
      - /dev/sdc
      - /dev/sdd
  become: yes
  - name: say hello
    debug: msg="hello from {{ ansible_hostname }}"

  - name: partitioning PVs for the Oracle volume group
      device: "{{ oravg_pv }}"
      number: 1
      state: present
      align: optimal
      label: gpt

  - name: partition block devices for ASM
      device: "{{ item }}"
      number: 1
      state: present
      align: optimal
      label: gpt
    loop: "{{ asm_disks }}"

Re-running the provisioning script couldn’t be easier. Vagrant has a command for this: “vagrant provision”. This command re-runs the provisioning code against a VM. A quick “vagrant provision” later my system is configured exactly the way I want:

$ vagrant provision
==> server1: Running provisioner: ansible...
Vagrant has automatically selected the compatibility mode '2.0'
according to the Ansible version installed (2.7.7).

Alternatively, the compatibility mode can be specified in your Vagrantfile:

    server1: Running ansible-playbook...

PLAY [blogpost] ****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [server1]

TASK [say hello] ***************************************************************
ok: [server1] => {
    "msg": "hello from server1"

TASK [partitioning PVs for the Oracle volume group] ****************************
ok: [server1]

TASK [partition block devices for ASM] *****************************************
changed: [server1] => (item=/dev/sdc)
changed: [server1] => (item=/dev/sdd)

PLAY RECAP *********************************************************************
server1                    : ok=4    changed=1    unreachable=0    failed=0   

This is it! Using just a few commands I can spin up VMs, test my Ansible scripts and later on when I’m happy with them, check the code into source control.

Indexing The Oracle Autonomous Data Warehouse (Autobahn)

When the Autonomous Data Warehouse was initially released in March 2018, one of the notable restrictions was that it did not support typical Data Warehouse performance related structures such as Indexes, Partitioning or Materialized Views. The platform was deemed efficient enough to not require these features, which would unnecessarily complicate the autonomous environment. In September […]

Amazon RDS cluster dashboard with Performance Insights

Amazon RDS Performance Insights (PI) doesn’t have a single pane of glass dashboard for clusters, yet. Currently PI has a dashboard that has to be looked at for each instance in a cluster.

On the other hand one can create a simple cluster dashboard using Cloudwatch.

PI, when enabled, automatically sends three metrics to Cloudwatch every minute.

These metrics are

  1. DBLoad
  2. DBLoadCPU
  3. DBLoadNonCPU

DBLoad = DBLoadCPU + DBLoadNonCPU

These metrics are measured in units of Average Active Sessions (AAS). AAS is like the run queue on the UNIX top command except at the database level. AAS is the average number of SQL queries running concurrently in the database. In the case of DB Load AAS,  the average is over 1 minute since the metrics are reported each minute, and represents the total average # of SQL queries running concurrently. The DBLoad AAS can be broken down into those queries that are runnable on CPU, which is DBLoadCPU, and those queries that are not ready to run on the CPU because they are waiting for some resource like an I/O to complete, a lock , a latch, or some  resource that can only be accessed in single threaded mode like a latch or buffer.

These metrics can be used to look at the health of the database.

For example we can only have as many SQL running on the CPU as there are vCPUs. If DBLoadCPU goes above the # of vCPUs then we know that some of those queries that are runnable on the CPU are actually waiting for the CPU.

We also know that when DBLoadNonCPU is low or near 0 then the queries are not waiting for resources and can execute. When DBLoadNonCPU goes up significantly then that represents an opportunity to optimize. For example if queries are spending half their time waiting for IO then if we could buffer that IO we could remove the IO wait and theoretically the queries could go twice as fast, doubling throughput.

By looking at DBLoadCPU for each instance in a cluster we can see if the load is well balanced and we can see if the load goes above the maximum CPU resources of the instance which would indicate a CPU bottleneck.

By looking at the ratio or percentage of DBLoadNonCPU to total DBLoad we can see how much time is wasted waiting for resources instead of executing on CPU. By show this percentage for each instance in the cluster in one graph we can see if any particular instance is running into a bottleneck. If so we would want to look the performance insights dashboard for that instance to get more detail about what is happening.

So let’s set up a cluster dashboard using PI data.

Create a RDS database instance with PI enabled :

PI is supported on all RDS Oracle, Aurora PostgreSQL,  RDS PostgreSQL 10 , RDS SQL Server (except 2008) and on the most recent versions of  Aurora MySQL 5.6, RDS MySQL 5.6 & 5.7. See docs for details:

In the case of this blog I created an Aurora MySQL 5.6  cluster with the original writer node and 3 reader nodes. You can create the initial instance with one reader node, then after creation, go to modify instanced and add reader node.

My nodes are shard1, shard2, shard3 and shard4.

After creating or modifying an instance to support PI, navigate to Cloudwatch and create a new dashboard (or add to an existing one)

Screen Shot 2019-03-28 at 12.17.34 PM


after creating a dashboard (or modifying an existing one) add a new widget, click “Add widget” button

Screen Shot 2019-03-28 at 12.22.46 PM

and for this example chose the line graph, the first option on the left of popup:

Screen Shot 2019-03-25 at 6.00.44 PM

at the bottom of the screen enter “DBLoadCPU” into the search field

Screen Shot 2019-03-28 at 12.24.02 PM

hit return and click on “per database metrics”

Screen Shot 2019-03-28 at 12.25.17 PM


My cluster instances are shard1, shard2, shard3 and shard4 so I click those

Screen Shot 2019-03-28 at 12.26.19 PM

and click “Create Widget” in the bottom left

Screen Shot 2019-03-28 at 12.28.46 PM

I got a surprise, as each shard instance was suppose to have same load but can see  something is wrong on shard4. Will investigate that as we go.

For now there are some options on the widget that I want changed. I want the graph to start at 0 (zero) and have a maximum of 4 ,  since my instances have 2vCPUs and I want to be able to look quickly at the graph to know where I’m at without having to read the axis everytime.  My max available CPU load is 2 AAS since I have 2 vCPU. I set the max at 4 so there is some head room to be able to show load about 2.

There is pull down menu in top right of widget. Click it and choose “Edit”

Screen Shot 2019-03-28 at 12.31.25 PM

Click the third tab “Graph options” and enter 0 for min and for max enter a value above the # of vCPUs you have on your instance. I have 2 vCPUs, so I enter 4.

Screen Shot 2019-03-28 at 12.32.14 PM

click update in the bottom right.

Screen Shot 2019-03-28 at 12.34.30 PM

I can see I’m running a pretty good load as shards 1-3 are running  around 1.5 AAS on CPU i.e. our SQL are asking for about 75% of the CPU capacity of the box. ( 100% * (1.5 AAS CPU / 2 vCPU) ).

I also see shard4 has a lower DBLoadCPU. Turns out I had turned off the load to that shard this morning and forgot, so I restarted it.

Now lets add a widget to see how efficient our load is, i.e. what % of the load is waiting instead of being runnable on CPU.

Create a new widget and search for DBLoad and choose the 2 metrics DBLoadNonCPU & DBLoad for all 4 instances. We will use them in a mathematical expression.

Screen Shot 2019-03-28 at 12.40.19 PM

create the widget, then edit it,

uncheck all the metrics

then we click “Add a math expression”

add the expression 100*(DBLoadNonCPU/DBLoad)  for each instance

Screen Shot 2019-03-28 at 12.45.32 PM

Looks like

Screen Shot 2019-03-28 at 12.47.12 PM

You can see I restarted the load on shard4 because the DBLoadCPU has gone up.

Now for the new graph click on title and type “% bottleneck”

edit it and add min 0 and max 100 ( i.e. 0-100% range), now it looks like

Screen Shot 2019-03-28 at 12.49.36 PM

Be sure and click “Save dashboard” so you don’t loose you work.

Now what do we see? well now that I’ve restarted the load on shard4, we see on “DBLoadCPU”,  the DBLoadCPU is pretty evenly balanced.

On “% bottleneck”  we see it’s pretty low except for shard1. To find out what is happening we have to navigate to the PI dashboard for shard1. Looks like shard1 is spending a lot of it’s load waiting on resources.

Let’s go to the PI dashboard for shard1.


Screen Shot 2019-03-28 at 12.58.21 PM


we can see that on the left most of the load was not CPU. CPU is green. All other colors are waiting for resources.

This is the write node and other activity is going on than the reader nodes which are only selects.

On the right hand side we can see CPU load went up so the ratio of Wait load in relation to CPU load and total load went down. This is what we see in the “% bottleneck” widget we made in Cloudwatch.

Now what are those resources that we are waiting on and what changed to make CPU go up? We can see that by exploring the PI dashboard.

For a demo on how to use PI to  identify and solve performance issues see



Oracle RAC vs. SQL Server AG

As I have seen the benefit for having a post on Oracle database vs. SQL Server architecture, let’s move onto the next frontier- High Availability…or what people think is high availability architecture in the two platforms.

To RAC or Not to RAC

There is a constant rumble among Oracle DBAs- either all-in for Oracle Real Application Cluster, (RAC) or a desire to use it for the tool it was technically intended for. Oracle RAC can be very enticing- complex and feature rich, its the standard for engineered systems, such as Oracle Exadata and even the Oracle Data Appliance, (ODA). Newer implementation features, such as Oracle RAC One-Node offered even greater flexibility in the design of Oracle environments, but we need to also discuss what it isn’t- Oracle RAC is not a Disaster Recovery solution.

Disaster Recovery projects hold requirements for secondary, failover environments to be housed in a second data center or cloud and multiple databases, but that simply isn’t what Oracle RAC is. That doesn’t mean we can’t build out the solution, it just means, in our comparison to grant an Apple-to-Apple comparison, I’m going to add a second product into the scenario- Oracle Data Guard.

RAC on Azure VMs

For those trying to build out Oracle RAC on Azure, yes you can do it. You can build out a VM cluster and then use FlashGrid underneath the Oracle RAC installation. Would you want to do it? I just don’t know why. If you ask the customer questions, so far, every time, I’ve discovered a better way to reach the solution they are hoping to achieve and it didn’t involve RAC. It was quickly discovered that just because you needed something on premises, bringing in the cloud changes everything. Redundancy is built into the infrastructure, VMs can have clustering built in and I would prefer to put Oracle Data Guard on a VM over implementing all the moving parts with RAC.

So what is Oracle RAC bringing to the game? 300w, 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

Now to directly discuss Oracle RAC with Data Guard comparison to SQL Server Always-on Availability Groups: 300w, 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

Note that Oracle RAC has numerous nodes that are connected to one RAC database. This database seamlessly connects to all sessions, directed across the nodes by the Scan Listener. Transactions, including Undo and Redo is shipped between the nodes via the Global Cache.

In our architecture, we’ve added in Oracle Data Guard, which is a secondary replica that can then be sync’d with the archive logs, (similar to transaction logs in SQL Server) and can be made use of even more if implementing Active Data Guard. 300w, 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

For SQL Server Always-on Availability Groups, the architecture above has implemented a primary with two replicas, one on-premises and a secondary in a remote data center, providing full DR in case of the primary data center being unavailable. Note that there is no write to the replicas back to the primary. They are solely isolated and designed for High Availability and Disaster Recovery.

When talking scalability, Oracle RAC allows the DBA to allocate more hosts to allocate more CPU and memory to a given database. The amount of nodes that can be allocated to a unique database is quite extensive, where in SQL Server AG, each node requires storage for its unique replica that must be provisioned along with the CPU and memory resources.

The Gist

So high level, it’s good to remember the following: 300w, 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

I know I didn’t get in too deep with any of the technology here, but that wasn’t my goal. I wanted to try to give everyone, no matter which platform you specialize in, to see the other side of what is similar. Hopefully this helps.

Tags:  ,





Copyright © DBAKevlar [Oracle RAC vs. SQL Server AG], All Right Reserved. 2019.