Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Using Grafana Loki to be able to search and view all logs

This post is about how to make your log files being aggregated in a single place and easy searchable via a convenient web interface.

You might think: wait a minute; doesn’t this exist already? Well, yes and no. Let me explain.
a) traditional log management: traditionally, logs are/were searched for certain strings, typically error messages or parts of error messages and a shell script run by cron, and when matched, a system that was setup to be able to email would send an indicator that the aforementioned string was found.
b) log management via a monitoring tool: the next step was the individual script on each server was exchanged for a monitoring tool, which performed the same task as the shell script. In reality quite often a default monitoring set/template was enabled, instead of the specific strings that were searched for with the shell script. Sometimes this was an improvement, sometimes this meant the specific issues (and thus messages) gone invisible. This is still the way the monitoring works in 90% of the cases, including a general completely standard monitoring template. At least in my experience.
c) other log gathering and indexing: there are many more products that perform this function. The first one that comes to my mind is splunk, and all the options for doing this cloud based (many there!), and a lot of tools based on elasticsearch, like the “ELK stack”.

I think it’s clear “a)” and “b)” are not dynamic and in fact very static. My major concern is it doesn’t allow exploratory investigation, it simply is a warning that is raised, any investigation means you have to log on and start browsing the available information locally. Everybody who worked with HP Openview or Oracle Enterprise Manager will recognise this. Yes, it’s probably all possible with these tools, but it never (ever(!)) is implemented.

For the last category, the likes of splunk, the ELK stack and the cloud based log tools: for the first two I feel definitely serves a function, but it’s aimed at aggregating logs of multiple servers, and is simply too much to setup on a server alongside the processes it is meant to monitor. For the cloud based tools: it might be my conservatism, but getting a subscription and loading up logging feels awkward, especially if it’s for my own test purposes.

This is where Loki comes in. Finally, I would say, there is a tool that can function on small scale (!) and perform log aggegration and provide a searchable database without a huge setup. This is ideal for your own test or development box to able to discover what is going on, and have all the log files at your fingertips without endlessly going through the filesystem performing cd, tail, grep, wc, sort, uniq, et cetera. I think lots of people recognise travelling from log file to log file.

Loki gives you a database that orders log entries based on time, and Grafana provides a web based UI to view and query the loki database. This is how it looks like: 600w, 150w" sizes="(max-width: 300px) 100vw, 300px" />
(click to enlarge)
This is an example, what this shows is my test machine, where I decided to see when linux started, as well when the Oracle database instance was started.
* The first query uses the label “rdbms_alert” (indicating it came from the text-based oracle alert.log file, this is a label I added to it), and within the log lines with the label job with value “rdbms_alert”, I added a filter for the string “Starting ORACLE”, which indicates an Oracle database instance start.
* The second query uses the label “messages” (indicating it came from the linux /var/log/messages file, this label is mine too), and within the log lines with the label job with the value “messages”, I added a filter for the string “kernel: Command line”, which indicates linux startup. I additionally added a negative filter for “loki”, because loki logs the queries to the messages file, which I don’t want to see.

I hope you can see the power of having all the logs in a single place, and completely searchable.

This is just a start, this is a very simple proof-of-concept setup, for example the date/time in the log lines is not used, the date/time of the log lines is when it was ingested into loki, it is possible to have loki interpret these.

If you are interested, but are uncertain if this is for you, and would like to test this: I got a couple of Ansible scripts that can setup the combination of:
* promtail (the default loki log streaming tool)
* loki (the database)
* Grafana (the web UI)
The scripts are created on Oracle Linux 7.8.

Install git, and clone the repository using a user that has sudo rights:

git clone

Install ansible (you might have to install the EPEL repository).

In the loki-setup repo you find the setup scripts for loki, promtail and grafana.
You can execute the scripts in that order (loki, promtail and Grafana) by executing the setup_loki.yml, setup_promtail,yml and setup_grafana.yml scripts.
IMPORTANT: do proofread the scripts, and validate the variables for your situation. Don’t worry: the scripts are easy to read.

After installing, you can go to http://yourmachine:3000, login with admin and the password you set in the setup_grafana.yml script, and click on the compass (explore) option, and you can perform your own log browsing.

If you decide you want to remove it, I got a remove script for each of the components, which will remove the specific component entirely. Same applies here too: validate the script.

How to use DBMS_PIPE to halt and continue a PLSQL database session

I posted a message on twitter saying that DBMS_PIPE is an excellent mechanism to make a session run and halt in PLSQL. One response I gotten was asking for an example of that. That is what this post is about.

DBMS_PIPE is an implementation of a pipe inside the Oracle database. A pipe is a mechanism that is not limited to the Oracle database, in fact I assume the implementation is inspired by an operating system level pipe, which can be created using the ‘mknod /path/pipename p’ on unix and unix-like systems. A quick search shows windows has got them too, but not really, or does it? Another implementation is the pipe to redirect output from one command to the next using ‘|’. The essence of a pipe that input to and output from the pipe are separated from each other, and that information is flowing from the inputting process of the pipe to the outputting process.

For the specific purpose this blogpost is about, DBMS_PIPE, I am not using the information that is sent through the pipe, I am mainly using the fact that a receiving session waits for information to be sent, and therefore halts execution.

When the Oracle database executes in PL/SQL mode, there is no simple way to make it pause and request for user input, like you would do with the ‘pause’ or ‘accept’ commands in sqlplus in SQL mode. PL/SQL runs and provides its output when the session quits PL/SQL mode, in other words: there is no interaction with the terminal that executes the PL/SQL procedure or anonymous block. So if you want to make it stop and continue, you must use something else, for which I use DBMS_PIPE. Of course there are other ways, I would encourage people to come up with other mechanisms as a comment.

The use the technique of using DBMS_PIPE, you need two sessions, one which runs the PL/SQL block that you want to pause and continue (using a database pipe), and one to make it continue by sending a message to that pipe. Another thing that you might notice is the lack of explicit setup of the pipe; I just declare waiting for the pipe, and it’ll get created implicitly.

This is how it’s done:

1. Simple example anonymous PL/SQL block:

set serverout on;
 v_result number;
 v_time1 date;
 v_time2 date;
  select sysdate into v_time1 from dual;
  v_result := dbms_pipe.receive_message ( 'test' );
  select sysdate into v_time2 from dual;
  dbms_output.put_line('you waited: '||(24*60*60)*(v_time2-v_time1)||' seconds');

If you execute this, it will provide no output, and in fact be waiting for the pipe that I named ‘test’.

In order to make anonymous PL/SQL block continue, I need another session to send a message to the pipe:
2. Send empty message to the pipe:

var v_result number;
exec v_result := dbms_pipe.send_message ( 'test' );

Once you executed that, the PL/SQL block will continue, and then finish and therefore display the output:

you waited: 3 seconds

PL/SQL procedure successfully completed.

There are a couple of other caveats you need to be aware of:
– a pipe is visible to the instance, in the case of a cluster database it’s NOT visible on the other instances in the cluster.
– there can be multiple sessions waiting on the same pipe, but a single message is consumed/received by a single session, based on time.

Run your own TPC-H Benchmark with #Oracle or #Exasol 150w, 300w" sizes="(max-width: 554px) 100vw, 554px" />

The TPC-H Benchmark is for Decision Support Systems. It’s described very detailed on the TPC.ORG site, but you may find it quite an effort to generate the data and prepare the SQL for table creation and reporting.

At least I did, which is why I thought having that all ready for download and run would be helpful.

What I have prepared for Oracle and for Exasol is:

  • The data files (CSV format) for the 1 GB TPC-H
  • The DDL for the TPC-H tables
  • The loader commands to populate these tables
  • The 22 queries for the TPC-H benchmark

You can download it here:

TPC-H for Oracle          TPC-H for Exasol

The data volume is of course quite small for a production data warehouse but ideal for quick testing and self-education. I’m using it together with VirtualBox and VMs on my notebook with 16 GB memory.

See here for a demo – I’m setting up the TPC-H for both Oracle and Exasol and then I do a comparison:

Some remarks about the comparison:

I’m an Exasol employee and the outcome is very positive for Exasol.

Never the less, I tried to do a fair comparison. It’s just running the pure 22 SELECT statements, no tuning, no tweaking of the Exasol database or the underlying VM.

The Oracle version is quite recent (18.3) but not the most recent, same with the Exasol version (6.2), not the just released Exasol 7.0.

As you can see, the Exasol database is out of the box about 6 times faster than the Oracle database for the same workload having the same hardware resources – without any tuning.

I suppose you could get better performance from Oracle for the 22 queries with some effort, like analyzing the workload, adding indexes of the certain available types, partitioning the tables, adding SQL Profiles and Optimizer Directives etc.

The point is, that’s all not required with Exasol. I just run the workload twice and everything is self-optimized afterwards.

You could call this an autonomous database </p />

    	  	<div class=

Friday Philosophy – Is The Problem The Small Things?

Something has been bothering me for a while. In fact, I’d go as far as to say it’s been depressing me. It’s you. Well, many of you.

What do I mean? Well I’ll give you an example. A week or so ago I went out in the car to get some shopping. A few minutes into the journey, as I go around a gentle bend, I see there is a car coming towards me – on my side of the road. I had to brake to give it space to get back over and I see it has swerved to avoid a branch in the road. As you can see in the picture, it’s not a huge branch, it covers less than one lane. I’m past it now so I go on to the shops and get my stuff.

30 minutes later I’m coming back. And I’m thinking to myself “I bet that branch is still there.” And it is. I can see it from maybe 300 meters back. The two cars in front of me barely slow down and they swerve past it. An oncoming vehicle that *I* can see coming, let alone the two cars in front of me, has to slow down for the swervers like I did. That slight bend means you get a much better warning of the obstacle from the side of the road it is on and as it is on your side, it’s really your responsibility so slow or even briefly stop, but the people in front of me just went for it. They did not care.

I did not swerve. I slowed down. And I put on my hazard lights, and stopped about 20 meters back from the branch. I double checked that no car has appeared behind me and I got out the car. In 20 seconds (including taking the snap), I’ve moved the branch off the road with no danger at all and I’m back to my car.

I know, you would have done the same.

Only no. No, you would not have.

Some of you would like to think you would have stopped and moved the obstacle.

I suspect most of you would claim, if asked, that you would have stopped and moved the branch.

And of course all of you would have slowed to avoid inconveniencing others.

But reality shows that nearly all of you would not.

As I left the scene, I was wondering how many people would have passed that branch in that 30 minutes I knew for sure this small branch had been an obstacle on the road. I’m going to let people going the other way off, as they would have to do a u-turn to come back to it, so how many people would have had to swerve past it?I know that road well, it would have been hmm, 4 or 5 cars a minute going past in one direction – certainly more than 3 cars, less than 10. So well over a hundred drivers would have seen that branch from a distance, most would have been able to safely slow and stop – and yet not one of them had. I have no idea how long the branch had been there, it was not too beaten up so maybe not long, but it could have been a couple of hours. It was easy to avoid – especially if you swerved with little concern for any on-coming traffic…

It turns out I’m the one in a hundred.

Are you thinking “well, it’s not my job to move branches of a road!”

So who’s job is it? And if you could label it as someone’s job (let’s go for someone in the “highways agency”) how do they get to know it needs doing? I don’t know about you but I see dozens of highways agency maintenance people on every journey I do, just cruising around looking for things that need doing. {sarcasm}.

When was the last time you saw something that needed doing in a public place and took the time to think about who should be told, try to contact them, get told to contact someone else, find out it’s not their job but are asked to ring Dave, who you do ring and he says thanks (before making a note to think about it, whilst probably muttering “this is not my job, I’ve got major roadworks to look after”). Hell, it’s easier to stop and move the branch.

Generally in life, in so many situations, I am constantly wondering why someone has not done X (or has done Y). Why don’t you reach for the jar in the shop the old lady can’t quite reach? Why don’t you hold the door? Why did you drop that litter when the bin is JUST THERE! That person  in front of you buying a parking ticket can’t find 10p in their purse to make the correct change? You have loads of 10p pieces… some in your hand already.

This is what is depressing me. Even though nearly everyone likes to think they are the nice person who will do a little for the common good, the reality is that most people won’t when it comes to it – but most people think we all should, and you tell yourselves you do the little things. You are telling yourself now, aren’t you? You are trying to think of the little things you have done for the common good. If you can think of a half dozen in the last month then you really are one of the good guys/gals. If you can only come up with a few…and actually most of them were ages ago… well, sorry but you are the problem.

The strange thing is that, having just insulted you all, as a group you lot are much more likely to be in the 1% than normal. Even though out of the general public not even 1 in 100 people would put in a little effort to move that branch, out of the people reading this, I’d say 10% would. Because I spend a lot of time in the Oracle user community, packed with people who give up their time, knowledge, even their holidays, to speak at conferences, help organise meetings, answer on forums, write blogs, answer questions on twitter, and all that stuff. Many of you reading this are active members of the User Community doing not just small things but often large things for the community. That’s why the community works.

To the rest of you, instead of liking to think you would move the branch or claiming you would (as everyone wants to be thought of as the nice guy/gal) just occasionally move the branch. Or pick that piece of litter up. Or do something small that cost you so little but it just would be nice if someone did it.

No one will thank you.

But you will know you did it. And you are becoming no longer part of the problem but part of the solution. I’m not asking you to give 10% of your salary to charity or give up an important part of your life, just do a bit of the small stuff.

If more of us do it, we will have a better world. If someone had moved that branch soon after it fell, I would not have had to  avoid some swerving dickhead, and the person I saw later would have not had to avoid people who could not even be bothered to slow down or stop briefly. And, in the worst case, that needless accident need not have happened. It really is as simple as spending 1 minute moving a branch.

Don’t be part of the problem, be part of the solution. It’s really, really, really easy.


Amazon DynamoDB Local: running NoSQL on SQLite

By Franck Pachot

DynamoDB is a cloud-native, managed, key-value proprietary database designed by AWS to handle massive throughput for large volume and high concurrency with a simple API.

  • simple API: Get, Put, Query, Scan on a table without joins, optimizer, transparent indexes,…
  • high concurrency: queries are directed to one shard with a hash function
  • massive throughput: you can just add partitions to increase the IOPS
  • large volume: it is a shared-nothing architecture where all tables are hash partitioned
  • key-value: you access to any data with a key that is hashed to go to the right partition and the right range within it
  • managed: you have zero administration to do. Just define the key and the throughput you desire and use it
  • cloud-native: it was designed from the beginning to run in the AWS cloud

One problem with cloud-native solution is that you need to access the service during the development of your application. This is not a major cost issue because DynamoDB is available on the Free Tier (with limited throughput, but that’s sufficient for development). But users may want to develop offline, on their laptop, without a reliable internet connection. And this is possible because Amazon provides a downloadable version of this database: DynamoDB Local.


The most important thing is that the API is the same as with the cloud version. For sure, all the partitioning stuff is missing in the local version. And I have no idea if the underlying data format is similar or not:
However, this is more for curiosity. The local version just needs a compatible API. You will not measure the performance there. And from what I’ve discovered below, I’m sure the storage is completely different in the cloud version.


[oracle@cloud DynamoDBLocal]$ cat /etc/oracle-release
Oracle Linux Server release 7.7
[oracle@cloud DynamoDBLocal]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 (Maipo)

I am doing this installation on OEL 7.7 which is similar to RHEL 7.7 or CentOS 7.7

[oracle@cloud DynamoDBLocal]$ java -version
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.231-b11, mixed mode)

I have a JRE installed

mkdir -p /var/tmp/DynamoDBLocal && cd $_

I’m installing everything in a local temporary directory.

All is documented in:

curl | tar -xvzf -

This simply downloads and extract the DynamoDB local distribution


java -Djava.library.path=/var/tmp/DynamoDBLocal/DynamoDBLocal_lib -jar /var/tmp/DynamoDBLocal/DynamoDBLocal.jar -sharedDb -dbPath /var/tmp/DynamoDBLocal &

This will use a persistent file (you can run it in memory only with -inMemory instead of it) in the directory mentioned by -dbPath and -sharedDb will use the following file name:

[oracle@cloud ~]$ ls -l /var/tmp/DynamoDBLocal/shared-local-instance.db
-rw-r--r-- 1 oracle oinstall 12346368 Aug  6 12:20 /var/tmp/DynamoDBLocal/shared-local-instance.db

I’ll tell you more about this file later.

so, when started it displays on which port it listens:

[oracle@cloud ~]$ pkill -f -- '-jar DynamoDBLocal.jar -sharedDb'

[oracle@cloud ~]$ java -Djava.library.path=/var/tmp/DynamoDBLocal/DynamoDBLocal_lib -jar /var/tmp/DynamoDBLocal/DynamoDBLocal.jar -sharedDb -dbPath /var/tmp/DynamoDBLocal &
[1] 33294
[oracle@cloud ~]$ Initializing DynamoDB Local with the following configuration:
Port:   8000
InMemory:       false
DbPath: /var/tmp/DynamoDBLocal
SharedDb:       true
shouldDelayTransientStatuses:   false
CorsParams:     *

Another port can be defined with -port


I use the AWS commandline interface, here is how to install it:

wget --continue
unzip -oq
sudo ./aws/install
aws configure

For the configuration, as you are local, you can put anything you want for the access key and region:

[oracle@cloud ~]$ aws configure
AWS Access Key ID [****************chot]: @FranckPachot
AWS Secret Access Key [****************chot]: @FranckPachot
Default region name [Lausanne]: Lausanne
Default output format [table]:

Because this information is not used, I’ll need to define the endpoint “–endpoint-url http://localhost:8000” with each call.

Create table

aws dynamodb --endpoint-url http://localhost:8000 create-table \
 --attribute-definitions \
  AttributeName=MyKeyPart,AttributeType=S \
  AttributeName=MyKeySort,AttributeType=S \
 --key-schema \
  AttributeName=MyKeyPart,KeyType=HASH \
  AttributeName=MyKeySort,KeyType=RANGE \
 --billing-mode PROVISIONED \
 --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
 --table-name Demo

I mentioned some provisioned capacity ready for my test on the Free Tier but they are actually ignored by DynamoDB local.

[oracle@cloud ~]$ aws dynamodb --endpoint-url http://localhost:8000 create-table \
>  --attribute-definitions \
>   AttributeName=MyKeyPart,AttributeType=S \
>   AttributeName=MyKeySort,AttributeType=S \
>  --key-schema \
>   AttributeName=MyKeyPart,KeyType=HASH \
>   AttributeName=MyKeySort,KeyType=RANGE \
>  --billing-mode PROVISIONED \
>  --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
>  --table-name Demo
|                                                                      CreateTable                                                                     |
||                                                                  TableDescription                                                                  ||
||         CreationDateTime         | ItemCount  |                      TableArn                       | TableName  | TableSizeBytes  |  TableStatus  ||
||  2020-08-06T12:42:23.669000+00:00|  0         |  arn:aws:dynamodb:ddblocal:000000000000:table/Demo  |  Demo      |  0              |  ACTIVE       ||
|||                                                               AttributeDefinitions                                                               |||
|||                              AttributeName                             |                              AttributeType                              |||
|||  MyKeyPart                                                             |  S                                                                      |||
|||  MyKeySort                                                             |  S                                                                      |||
|||                                                                     KeySchema                                                                    |||
|||                                      AttributeName                                     |                         KeyType                         |||
|||  MyKeyPart                                                                             |  HASH                                                   |||
|||  MyKeySort                                                                             |  RANGE                                                  |||
|||                                                               ProvisionedThroughput                                                              |||
|||      LastDecreaseDateTime      |      LastIncreaseDateTime       |   NumberOfDecreasesToday    |   ReadCapacityUnits   |   WriteCapacityUnits    |||
|||  1970-01-01T00:00:00+00:00     |  1970-01-01T00:00:00+00:00      |  0                          |  25                   |  25                     |||

Another difference with the cloud version is that this command returns immediately (no “CREATING” status).


I’ll put some items with Python, thus installing it.

yum install -y python3
pip3 install boto3

boto3 is the AWS SDK for Python

Insert some items

Here is my program:

import boto3, time, datetime
from botocore.config import Config
dynamodb = boto3.resource('dynamodb',config=Config(retries={'mode':'adaptive','total_max_attempts': 10}),endpoint_url='http://localhost:8000')
n=0 ; t1=time.time()
 for k in range(0,10):
  for s in range(1,k+1):
except Exception as e:
print(f"Last: %s\n\n===> Total: %d seconds, %d keys %d items/second\n"%(r,(t2-t1),k,n/(t2-t1)))

I just fill each collection with an increasing number of items.

[oracle@cloud DynamoDBLocal]$ python3
Last: {'ResponseMetadata': {'RequestId': '6b23dcd2-dbb0-404e-bf5d-57e7a9426c9b', 'HTTPStatusCode': 200, 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.0', 'x-amz-crc32': '2745614147', 'x-amzn-requestid': '6b23dcd2-dbb0-404e-bf5d-57e7a9426c9b', 'content-length': '2', 'server': 'Jetty(8.1.12.v20130726)'}, 'RetryAttempts': 0}}

===> Total: 3 seconds, 9 keys 14 items/second

[oracle@cloud DynamoDBLocal]$

count items

[oracle@cloud DynamoDBLocal]$ aws dynamodb --endpoint-url http://localhost:8000 scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL
|              Scan              |
|   Count  |    ScannedCount     |
|  45      |  45                 |
||       ConsumedCapacity       ||
||  CapacityUnits |  TableName  ||
||  0.5           |  Demo       ||

The nice thing here is that you can see the ConsumedCapacity which gives you an idea about how it scales. Here, I read 45 items that have a size of 81 bytes and this is lower than 4K. Then the cost of it is 0.5 RCU for eventually consistent queries.


You know how I’m curious. If you want to build a local NoSQL database, which storage engine would you use?

[oracle@cloud DynamoDBLocal]$ cd /var/tmp/DynamoDBLocal
[oracle@cloud DynamoDBLocal]$ file shared-local-instance.db
shared-local-instance.db: SQLite 3.x database

Yes, this NoSQL database is actually stored in a SQL database!

They use SQLite for this DynamoDB Local engine, embedded in Java.

[oracle@cloud DynamoDBLocal]$ sudo yum install sqlite
Loaded plugins: ulninfo, versionlock
Excluding 247 updates due to versionlock (use "yum versionlock status" to show them)
Package sqlite-3.7.17-8.el7_7.1.x86_64 already installed and latest version
Nothing to do

I have SQLite installed here and then can look at what is inside with my preferred data API: SQL.

[oracle@cloud DynamoDBLocal]$ sqlite3 /var/tmp/DynamoDBLocal/shared-local-instance.db
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"

sqlite> .databases
seq  name             file
---  ---------------  ----------------------------------------------------------
0    main             /var/tmp/DynamoDBLocal/shared-local-instance.db

sqlite> .tables
Demo  cf    dm    sm    ss    tr    us

Here is my Demo table accompanied with some internal tables.
Let’s look at the fixed tables there (which I would call the catalog or dictionary if DynamoDB was not a NoSQL database)

sqlite> .headers on
sqlite> .mode column
sqlite> select * from cf;

That looks like the version of the database (Config Table)

sqlite> select * from dm;
TableName   CreationDateTime  LastDecreaseDate  LastIncreaseDate  NumberOfDecreasesToday  ReadCapacityUnits  WriteCapacityUnits  TableInfo                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    BillingMode  PayPerRequestDateTime
----------  ----------------  ----------------  ----------------  ----------------------  -----------------  ------------------  ---------------------------------------------------------------------------------------------                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                -----------  ---------------------
Demo        1596718271246     0                 0                 0                       25                 25                  {"Attributes":[{"AttributeName":"MyKeyPart","AttributeType":"S"},{"AttributeName":"MyKeySort","AttributeType":"S"}],"GSIList":[],"GSIDescList":[],"SQLiteIndex":{"":[{"DynamoDBAttribute":{"AttributeName":"MyKeyPart","AttributeType":"S"},"KeyType":"HASH","SQLiteColumnName":"hashKey","SQLiteDataType":"TEXT"},{"DynamoDBAttribute":{"AttributeName":"MyKeySort","AttributeType":"S"},"KeyType":"RANGE","SQLiteColumnName":"rangeKey","SQLiteDataType":"TEXT"}]},"UniqueIndexes":[{"DynamoDBAttribute":{"AttributeName":"MyKeyPart","AttributeType":"S"},"KeyType":"HASH","SQLiteColumnName":"hashKey","SQLiteDataType":"TEXT"},{"DynamoDBAttribute":{"AttributeName":"MyKeySort","AttributeType":"S"},"KeyType":"RANGE","SQLiteColumnName":"rangeKey","SQLiteDataType":"TEXT"}],"UniqueGSIIndexes":[]}  0            0

Here are the (DynamoDB Metadata) about my table, the DynamoDB ones, like “AttributeName”,”AttributeType” and their mapping to the SQLite “SQLiteColumnName”,”SQLiteDataType”,…

The tables sm, ss, us, and tr are empty and are related with Streams Metadata, Shard Metadata, Streams and Transactions and I may have a look at them for a next post.

Now the most interesting one: my Demo table. For this one, I’ve opened it in DBeaver: 300w, 768w, 1496w" sizes="(max-width: 1024px) 100vw, 1024px" />
I have one SQLite table per DynamoDB table (global secondary indexes are just indexes on the table), one SQLite row per DynamoDB item, the keys (the HASH for partitioning and the RANGE for sorting within the partition) for which I used a string are stored as TEXT in SQLite but containing their ASCII hexadecimal codes (hashKey and rangeKey). And those are the columns for the SQLite primary key. They are also stored in an even larger binary (hashValue,rangeValue where hashValue is indexed), probably a hash function applied to it. And finally, the full item is stored as JSON in a BLOB. The itemSize is interesting because that’s what counts in Capacity Units (the sum of attribute names and attribute values).

The power of SQL to verify the NoSQL database

Actually, there’s a big advantage to have this NoSQL backed by a SQL database. During the development phase, you don’t only need a database to run your code. You have to verify the integrity of data, even after some race conditions. For example, I’ve inserted more items by increasing the ‘k’ loop in my and letting it run for 6 hours:

[oracle@cloud aws]$ time aws dynamodb --endpoint-url http://localhost:8000 scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL
|              Scan              |
|   Count   |   ScannedCount     |
|  338498   |  338498            |
||       ConsumedCapacity       ||
||  CapacityUnits |  TableName  ||
||  128.5         |  Demo       ||

real    0m50.385s
user    0m0.743s
sys     0m0.092s

The DynamoDB scan is long here: 1 minute for a small table (300K rows). This API is designed for the cloud where a huge amount of disks can provide high throughput for many concurrent requests. There’s no optimization when scanning all items, as I described it in a previous post: RDBMS (vs. NoSQL) scales the algorithm before the hardware. SQL databases have optimization for full table scans, and the database for those 338498 rows is really small:

[oracle@cloud aws]$ du -h /var/tmp/DynamoDBLocal/shared-local-instance.db
106M    /var/tmp/DynamoDBLocal/shared-local-instance.db

Counting the rows is faster from SQLite directly:

[oracle@cloud aws]$ time sqlite3 /var/tmp/DynamoDBLocal/shared-local-instance.db "select count(*) from Demo;"

real    0m0.045s
user    0m0.015s
sys     0m0.029s

But be careful: SQLite is not a multi-user database. Query it only when the DynamoDB Local is stopped.

And with the power of SQL it is easy to analyze the data beyond the API provided by DynamoDB:

[oracle@cloud aws]$ sqlite3 /var/tmp/DynamoDBLocal/shared-local-instance.db
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .mode column
sqlite> .header on
sqlite> .timer on

sqlite> select count(distinct hashKey),count(distinct hashKey),count(distinct rangeKey),count(distinct rangeValue) from Demo;
count(distinct hashKey)  count(distinct hashKey)  count(distinct rangeKey)  count(distinct rangeValue)
-----------------------  -----------------------  ------------------------  --------------------------
823                      823                      822                       822

CPU Time: user 0.570834 sys 0.168966

This simple query confirms that I have as many hash/range Key as Value.

sqlite> select cast(hashKey as varchar),json_extract(ObjectJSON,'$.MyKeyPart')
   ...> ,count(rangeKey),count(distinct rangeKey)
   ...> from Demo group by hashKey order by count(rangeKey) desc limit 10;

cast(hashKey as varchar)  json_extract(ObjectJSON,'$.MyKeyPart')  count(rangeKey)  count(distinct rangeKey)
------------------------  --------------------------------------  ---------------  ------------------------
K-00000823                {"S":"K-00000823"}                      245              245
K-00000822                {"S":"K-00000822"}                      822              822
K-00000821                {"S":"K-00000821"}                      821              821
K-00000820                {"S":"K-00000820"}                      820              820
K-00000819                {"S":"K-00000819"}                      819              819
K-00000818                {"S":"K-00000818"}                      818              818
K-00000817                {"S":"K-00000817"}                      817              817
K-00000816                {"S":"K-00000816"}                      816              816
K-00000815                {"S":"K-00000815"}                      815              815
K-00000814                {"S":"K-00000814"}                      814              814

Run Time: real 0.297 user 0.253256 sys 0.042886

There I checked how many distinct range keys I have for the 10 ones (LIMIT 10) with the highest value (ORDER BY count(rangeKey) DESC), and converted this hexadecimal int a string (CAST) and also compare with what is in the JSON column (JSON_EXTRACT). Yes, many RDBMS database can manipulate easily a semi-structured JSON with SQL.

sqlite> select
   ...>  round(timestamp_as_seconds-lag(timestamp_as_seconds)over(order by timestamp)) seconds
   ...>  ,MyKeyPart,MyKeySort,MyKeySort_First,MyKeySort_Last,timestamp
   ...> from (
   ...>  select
   ...>    MyKeyPart,MyKeySort
   ...>   ,first_value(MyKeySort)over(partition by MyKeyPart) MyKeySort_First
   ...>   ,last_value(MyKeySort)over(partition by MyKeyPart) MyKeySort_Last
   ...>   ,timestamp,timestamp_as_seconds
   ...>  from (
   ...>   select
   ...>     json_extract(ObjectJSON,'$.MyKeyPart.S') MyKeyPart,json_extract(ObjectJSON,'$.MyKeySort.S') MyKeySort
   ...>    ,json_extract(ObjectJSON,'$.timestamp.S') timestamp
   ...>    ,julianday(datetime(json_extract(ObjectJSON,'$.timestamp.S')))*24*60*60 timestamp_as_seconds
   ...>   from Demo
   ...>  )
   ...> )
   ...> where MyKeySort=MyKeySort_Last
   ...> order by timestamp desc limit 5
   ...> ;

seconds     MyKeyPart   MyKeySort   MyKeySort_First  MyKeySort_Last  timestamp
----------  ----------  ----------  ---------------  --------------  --------------------------
 16.0       K-00000823  S-00000245  S-00000001       S-00000245      2020-08-07T04:19:55.470202
 54.0       K-00000822  S-00000822  S-00000001       S-00000822      2020-08-07T04:19:39.388729
111.0       K-00000821  S-00000821  S-00000001       S-00000821      2020-08-07T04:18:45.306205
 53.0       K-00000820  S-00000820  S-00000001       S-00000820      2020-08-07T04:16:54.977931
 54.0       K-00000819  S-00000819  S-00000001       S-00000819      2020-08-07T04:16:01.003016

Run Time: real 3.367 user 2.948707 sys 0.414206

Here is how I checked the time take by the insert. My Python code added a timestamp which I convert it to seconds (JULIANDAY) and get the difference with the previous row (LAG). I actually did that only for the last item of each collection (LAST_VALUE).

Those are examples. You can play and improve your SQL skills on your NoSQL data. SQLite is one of the database with the best documentation: And it is not only about learning. During development and UAT you need to verify the quality of data and this often goes beyond the application API (especially when the goal is to verify that the application API is correct).

That’s all for this post. You know how to run DynamoDB locally, and can even access it with SQL for powerful queries </p />

    	  	<div class=

Modifying Scheduler Windows

There are a few possible reasons why you might be one of those people why you might be using your database late in the evening. These include

  • you support a 24 by 7 system and so you often need to be online late at night,
  • you are a hopeless geek and love tinkering with the database when you should be in bed,
  • you now live in a virtual world and thus many of your meetings or seminars or presentations are being done in a unfriendly time zone

I tick some of these boxes and thus I am often using the database on my own home machine late in the evening. It is always frustrating when the clock ticks over to 10:00 PM and suddenly the fan on my machine goes nuts, the lights go dim in my house Smile and I can tell that the many many databases on my machine have suddenly decided it is time to get busy.

The reason for this is that the database has a number of tasks that get run at regular intervals in order to keep your database in tiptop condition. For example, the gathering of optimiser statistics and running the various advisors that come by default with your installation. Out of the box these tasks assume you are running a typical business day style of operation for your database, thus 10:00 PM is the default kick-off time for these tasks, and they’ll run for up to 4 hours.

Let’s take a look at the default scheduler windows that come in most recent versions of the Oracle database

SQL> select window_name, repeat_interval, duration
  2  from dba_scheduler_windows;

WINDOW_NAME              REPEAT_INTERVAL                                                          DURATION
------------------------ ------------------------------------------------------------------------ ---------------
MONDAY_WINDOW            freq=daily;byday=MON;byhour=22;byminute=0; bysecond=0                    +000 04:00:00
TUESDAY_WINDOW           freq=daily;byday=TUE;byhour=22;byminute=0; bysecond=0                    +000 04:00:00
WEDNESDAY_WINDOW         freq=daily;byday=WED;byhour=22;byminute=0; bysecond=0                    +000 04:00:00
THURSDAY_WINDOW          freq=daily;byday=THU;byhour=22;byminute=0; bysecond=0                    +000 04:00:00
FRIDAY_WINDOW            freq=daily;byday=FRI;byhour=22;byminute=0; bysecond=0                    +000 04:00:00
SATURDAY_WINDOW          freq=daily;byday=SAT;byhour=6;byminute=0; bysecond=0                     +000 20:00:00
SUNDAY_WINDOW            freq=daily;byday=SUN;byhour=6;byminute=0; bysecond=0                     +000 20:00:00
WEEKEND_WINDOW           freq=daily;byday=SAT;byhour=0;byminute=0;bysecond=0                      +002 00:00:00
WEEKNIGHT_WINDOW         freq=daily;byday=MON,TUE,WED,THU,FRI;byhour=22;byminute=0; bysecond=0    +000 08:00:00

If 10:00 PM is not a good time then it is easily to change these windows. I generally recommend people not to disable the windows because it is easy to get into a position where the nightly tasks you should be running are forgotten and never run again. Rest assured when you log a support call at some stage in the future and your database has not run any of these regular tasks for the past year, that is going to make the support technician’s job that much harder to help you, because you have diverted so far from default operations of the database.

If you need to change the windows I would suggest that you simply move the start time, and similarly you can shrink the duration to a smaller time scale if needed. For me, I run under the assumption that I will not be using the database at 2:00 AM (or that if I am I should not be and the busy machine will hopefully force me to go to bed!)

Here is a simple anonymous block that loops through the weekday windows to move the start time and duration:

SQL> declare
  2    x sys.odcivarchar2list :=
  3          sys.odcivarchar2list('MONDAY','TUESDAY','WEDNESDAY','THURSDAY','FRIDAY');
  4  BEGIN
  5  for i in 1 .. x.count
  6  loop
  7    DBMS_SCHEDULER.disable(name => 'SYS.'||x(i)||'_WINDOW', force => TRUE);
  9    DBMS_SCHEDULER.set_attribute(
 10      name      => 'SYS.'||x(i)||'_WINDOW',
 11      attribute => 'REPEAT_INTERVAL',
 12      value     => 'FREQ=WEEKLY;BYDAY='||substr(x(i),1,3)||';BYHOUR=02;BYMINUTE=0;BYSECOND=0');
 14    DBMS_SCHEDULER.set_attribute(
 15      name      => 'SYS.'||x(i)||'_WINDOW',
 16      attribute => 'DURATION',
 17      value     =>  numtodsinterval(60, 'minute'));
 19    DBMS_SCHEDULER.enable(name=>'SYS.'||x(i)||'_WINDOW');
 20  end loop;
 21  END;
 22  /

PL/SQL procedure successfully completed.

Remember that in a multi-tenant environment, these windows may be present in each of your pluggable as well as the root.

TL;DR: You can change the scheduler windows to a time that best suits your requirements but please leave them in place so that important database tasks do not get overlooked

Improve Your Remote Collaboration With P2

P2 powers internal collaboration at — and now it’s free for everyone.

As more collaboration is happening remotely and online — work yes, but increasingly also school and personal relationships — we’re all looking for better ways to work together online. Normally, teachers hand out homework to students in person, and project leaders gather colleagues around a conference table for a presentation. Suddenly all this is happening in email, and Slack, and Zoom, and Google docs, and a dozen other tools.

At, our 15 years as a fully distributed company with over 1,200 employees working from 77 countries relies on P2: an all-in-one team website, blog, database, and social network that consolidates communications and files in one accessible, searchable spot.

It powers work at, WooCommerce, and Tumblr. And today, a beta version is available for anyone — for your newly-remote work team, your homeschooling pod, your geographically scattered friends. P2 is the glue that gives your group an identity and coherence. 

What’s P2?

P2 moves your team or organization away from scattered communication and siloed email inboxes. Any member of your P2, working on any kind of project together, can post regular updates. Discussions happen via comments, posted right from the front page and updated in real time, so your team can brainstorm, plan, and come to a consensus. Upload photos or charts, take a poll, embed files, and share tidbits from your day’s activities. Tag teammates to get their attention. Your P2 members can see updates on the Web, via email, or in the WordPress mobile apps. 

Keep your P2 private for confidential collaboration. Or make it public to build a community. How you use it and who has access is up to you. And as folks come and go, all conversations and files remain available on the P2, and aren’t lost in anyone’s inbox.

The beta version of P2 is free for anyone, and you can create as many P2 sites as you need. (Premium versions are in the works.)  

What can I use P2 for?

Inside Automattic, we use P2 for:

  • Companywide blog posts from teams and leadership, where everyone can ask questions via comments.
  • Virtual “watercoolers” to help teammates connect — there are P2s for anything from music to Doctor Who to long-distance running.
  • Project planning updates.
  • Sharing expertise to our broader audience. We’ve got a P2 with guidance on how to manage remote work, and WooCommerce uses P2 to organize their global community.

P2 works as an asynchronous companion to live video like Zoom or live chat like Slack. It’s a perfect partner for live video and chat — you have those tools when a real-time conversation gets the job done, and P2 for reflection, discussion, and commemorating decisions.

How can you use your P2?

  • Plan a trip with friends and family — share links, ticket files, and travel details. (See an example on this P2!).
  • Create a P2 for your school or PTA to share homeschooling resources and organize virtual events.
  • Manage your sports team’s schedules and share photos from games.
  • Let kids track and submit homework assignments remotely, with a space for Q&A with your students.

How can I learn more?

Visit this demo P2 to learn the P2 ropes! Check out a range of example posts and comments to see how you can:

  • Post, read, comment, like, and follow conversations. 
  • @-mention individuals and groups to get their attention. 
  • Share video, audio, documents, polls, and more.  
  • Access in-depth stats and get notifications.

Ready for your own P2?

Visit and create your own P2.

Case and Aggregate bug

The following description of a bug appeared on the Oracle Developer Community forum a little while ago – on an upgrade from 12c to 19c a query starting producing the wrong results on a simple call to the average() function. In fact it turned out to be a bug introduced in

The owner of the thread posted a couple of zip files to build a test case – but I had to do a couple of edits, and change the nls_numeric_characters to ‘,.’ in order to get past a formatting error on a call to the to_timestamp() function. I’ve stripped the example to a minimum, and translated column name from German (which was presumably the source of the nls_numeric_characters issue) to make it easier to demonstrate and play with the bug.

First the basic data – you’ll notice that I’ve tested this on, and to find out when the bug appeared:

rem     Script:         case_aggregate_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem     Purpose:        
rem     Last tested

create table test(
        case_col        varchar2(11), 
        duration        number(*,0), 
        quarter         varchar2(6), 
        q2h_knum_b      varchar2(10)

insert into test values('OK',22,'1.2020','AB1234');
insert into test values('OK',39,'1.2020','AB1234');
insert into test values('OK',30,'1.2020','AB1234');
insert into test values('OK',48,'1.2020','AB1234');

execute dbms_stats.gather_table_stats(user,'test')

create or replace force view v_test
                when b.case_col not like 'err%'
                        then b.duration 
        end     duration,
                when b.case_col not like 'err%' 
                        then 1 
                        else 0 
        end     status_ok
        test b
        substr(b.quarter, -4) = 2020

break on report
compute avg of duration on report
select * from v_test;


---------- ---------- ----------
AB1234             22          1
AB1234             39          1
AB1234             30          1
AB1234             48          1
avg             34.75

I’ve created a table, loaded some data, gathered stats, then created a view over the table. The view includes a couple of columns that use a simple case expression, and both expressions are based in the same way on the same base column (this may, or may not, be significant in what’s coming). I’ve then run off a simple query with a couple of SQL*Plus commands to report the actual content of the view with the average of the duration column – which is 34.75.

So now we run a couple of queries against the view which aggregate the data down to a single row – including the avg() of the duration – using the coalesce() function – rather than the older nvl() function – to convert any nulls to zero.

        coalesce(count(duration), 0)    duration_count,
        coalesce(median(duration), 0)   duration_med,
        coalesce(avg(duration), 0)      duration_avg,
        coalesce(sum(status_ok), 0)     ok_count
        v_test  v1
        instr('AB1234', q2h_knum_b) > 0


-------------- ------------ ------------ ----------
             4         34.5            0          4

You’ll notice that the duration_avg is reported as zero (this would be the same if I used nvl(), and would be a null if I omitted the coalesce(). This is clearly incorrect. This was the output from 19.3; 12.2 gives the same result, reports the average correctly as 34.75.

There are several way in which you can modify this query to get the right average – here’s one, just put the ok_count column first in the select list:

        coalesce(sum(status_ok), 0)     ok_count,
        coalesce(count(duration), 0)    duration_count,
        coalesce(median(duration), 0)   duration_med,
        coalesce(avg(duration), 0)      duration_avg
        v_test  v1
        instr('AB1234', q2h_knum_b) > 0


---------- -------------- ------------ ------------
         4              4         34.5        34.75

There’s no obvious reason why the error should occur, but there’s a little hint about what may be happening in the Column projection information from the execution plan. The basic plan is the same in both cases, so I’m only show it once; but it’s followed by two versions of the projection information (restricted to operation 1) which I’ve formatted to improve:

Plan hash value: 2603667166

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT GROUP BY     |      |     1 |    20 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST |     1 |    20 |     2   (0)| 00:00:01 |

Predicate Information (identified by operation id):

   2 - filter((INSTR('AB1234',"Q2H_KNUM_B")>0 AND

Column Projection Information (Operation 1 only):  (Wrong result)

Column Projection Information (Operation 1 only):  (Right result)

As you can see, to report avg() Oracle has projected sum() and count().

When we get the right result the sum() for duration appears immediately after the count().

When we get the wrong result the sum() for ok_count comes between the count() and sum() for duration.

This makes me wonder whether Oracle is somehow just losing track of the sum() for duration and therefore dividing null by the count().

This is purely conjecture, of course, and may simply be a coincidence – particularly since gets the right result and shows exactly the same projection information.

Readers are left to experiment with other variations to see if they can spot other suggestive patterns.

Update (Aug 2020)

This is now logged as Bug 31732779 – WRONG RESULT WITH CASE STATEMENT AGGREGATION , though it’s not yet publicly visible.

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part I (Don’t Look Down)

I’ve discussed many times the importance of data clustering in relation to the efficiency of indexes. With respect to the efficiency of Automatic Indexes including their usage within Oracle’s Autonomous Database environments, data clustering is just as important. The following demo was run on an Oracle 19c database within the Oracle Autonomous Database Transaction Processing […]

Hosting Live (Virtual!) Events: Lessons from Planning the Growth Summit

Back in January, my team at was busy planning another year of exciting in-person events — community meetups, conference keynotes, booths, and in-person demos — at large exhibit halls and hotels around the world.

Then the world changed overnight, and because of a global pandemic, our Events team — just like many of you running your own businesses — had to rethink everything about how we connect with people. 

So we went back to work. We’ve learned so much in just five months, and it culminates in the upcoming Growth Summit — our first-ever multi-day virtual conference. It’s going to be a jam-packed program full of expert advice from business leaders and entrepreneurs. We’ll also have breakout sessions with our own WordPress experts, the Automattic Happiness Engineers, taking you through everything you need to know about building a powerful, fast website that works great for SEO, eCommerce, and growing your business. 

In the lead-up to the Summit, we wanted to share everything we’ve learned so far about running virtual events, from YouTube to webinars to Facebook Live and ticket sales. There are dozens of great solutions for staying connected to and supporting your audience — here’s what’s been working for us: 

Live webinars 

In April, we launched a series of daily webinars, 30-minute live demos and Q&As direct from our Happiness Engineers, five days a week. These webinars walk people through the basics of getting started with We also launched a few topical webinars — deeper dives into specific topics: eCommerce 101, growing an audience, using the WordPress app, and podcasting, to name a few.

We chose Zoom to host these because it’s a popular platform that allows for key webinar elements like pre-registration/signups, screen sharing, and Q&A. We pulled these together quickly, so going with a familiar solution was key for us and our audience. 

To expand our reach, we also streamed to our Facebook, Instagram, and YouTube channels. This was a simple integration that Zoom offers already, and we saw our viewership grow exponentially. 

Pre-recorded vs. live instruction 

At virtual events, one question always comes up: pre-recorded or live? We prefer a combination! Live is great when possible; it gives attendees an opportunity to interact with speakers, speakers can personalize the content based on questions being asked, and attendees can interact with one another, forming connections with like-minded content creators and entrepreneurs. 

Live content also has challenges: internet connections can cut out, computers can shut down unexpectedly, and there are more opportunities for interruption (does anyone else’s dog bark the minute you get on a live video?). It also requires all participants to be online at the same time, which can be logistically challenging.

Our advice: Test, test, test! If a speaker isn’t comfortable presenting live, there is the option to do a combination — a pre-recorded session with a live Q&A in the chat. We’ve found it to work really well, and it gives attendees the same access to the presenter.

The Growth Summit 

We helped folks to get online quickly with our daily webinars and dove into deeper topics each month, and now we want to help you grow your site. Enter The Official Growth Summit, happening next week, August 11-13.

We gathered frequently asked questions over the past few months, listened to your requests for live sessions across more time zones, and found inspiration from users that we felt needed to be shared widely.  

The Growth Summit takes a deeper dive into topics and offers hands-on WordPress training for anyone looking to get online. You’ll have the opportunity to ask questions live, connect with speakers, visit our virtual Happiness Bar for 1:1 support, and connect with other attendees during the networking breaks. 

Some key highlights from the agenda

  • Using the block editor
  • Customizing your site
  • Growing your audience
  • Improving your content ranking (SEO)
  • Creating a marketing plan 
  • Expanding from blogging to podcasting 
  • Making money with your site
  • And so much more… 

We wanted a platform for this event that would act as an immersive event experience. There are many great platforms for this, including Accelevents and Hopin. We’ll be experimenting with many of them in the coming months (Remember: test!). A few key features we looked for: 

  • Ease of self-production
  • Ability for simultaneous sessions
  • Overall user experience
  • Flow of the event — central location for agenda, speaker bios, networking, and more
  • Networking features
  • Audience engagement — polling, live chat, and more
  • Analytics
  • Registration within the platform
  • Accessibility
  • Customization
  • Speaker (virtual) green rooms

The best part? This event is being offered twice a day so that we cover all time zones. And if you can’t join us live, attendees will have access to all content from all time zones, after the event.