Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Installing an #Exasol 2+0 Cluster on Hyper-V

After having installed Hyper-V, an ISO file with the Exasol software needs to be downloaded. The 2+0 cluster consisting of two data nodes and one license server needs a host machine with at least 8 GB memory and 60 GB free disk space. I do it on my Windows 10 notebook with 16 GB memory. This is for educational purposes of course and not suitable for production use.

2+0 means two active nodes and no reserve node. A reserve node can be added later, expanding the environment to a 2+1 cluster.

We’ll start adding two virtual switches to be able to connect the network adapter of the VMs to them later.

Create Hyper-V Switches for Private and Public Network of the cluster

Click the Virtual Switch Manager in Hyper-V Manager: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />


Select Private and click Create Virtual Switch: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Name the Private switch edu-cluster-interconnect: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Create another Virtual Switch of type Internal and name it edu-public-network: 1238w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Go to Network and Sharing Center and click on the edu-public-network there, then on Properties: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Then select TCP/IPv4 and click again Properties: 150w, 300w, 768w, 898w" sizes="(max-width: 620px) 100vw, 620px" />

Select Use the following IP address and type in and the Subnet mask 131w, 262w, 768w, 982w" sizes="(max-width: 620px) 100vw, 620px" />

The IP above can be different as long as it doesn’t conflict with other IPs used in your virtual environment. I find it convenient to let it end with the digit 1, though.

Create the License Server

In Hyper-V Manager, click New then Virtual Machine: 1974w, 150w, 300w, 768w, 1024w" sizes="(max-width: 987px) 100vw, 987px" />

Do not use Quick Create.

Give it the name ls and click Next: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

It must be Generation 1: 1580w, 150w, 300w, 768w, 1024w" sizes="(max-width: 790px) 100vw, 790px" />

Give it 1500 MB memory if your host has >= 16 GB memory, else give it 1100 MB. De-select Use Dynamic Memory: 1556w, 150w, 300w, 768w, 1024w" sizes="(max-width: 778px) 100vw, 778px" />

Do not yet connect it to a network: 1536w, 150w, 300w, 1024w" sizes="(max-width: 768px) 100vw, 768px" />

Give it a disk of 100 GB maximum size: 1518w, 150w, 300w, 768w, 1024w" sizes="(max-width: 759px) 100vw, 759px" />

Select Install operating system later: 1462w, 150w, 300w, 768w, 1024w" sizes="(max-width: 731px) 100vw, 731px" />

After creating ls, click on Settings: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Add a Legacy Network Adapter: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

We need Legacy Adapters in the VMs because later, the data nodes need to boot from network and that requires Legacy Adapters to be used. Connect it to the Virtual Switch edu-cluster-interconnect and click Apply.

Connect a normal (not legacy) Network Adapter to the switch edu-public-network: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Confirm the boot order is CD first for ls: 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Now power on the virtual machine just created and configured. It should come up with this screen and you type in install 150w, 300w, 713w" sizes="(max-width: 620px) 100vw, 620px" />

There’s no need for device encryption on an educational system. Tab to OK and press return: 150w, 300w, 711w" sizes="(max-width: 620px) 100vw, 620px" />

Confirm the LS number as 10. Tab to OK and press return: 150w, 300w, 711w" sizes="(max-width: 620px) 100vw, 620px" />

Confirm to install EXAClusterOS on first boot. Tab to OK and press return: 150w, 300w, 715w" sizes="(max-width: 620px) 100vw, 620px" />

Enter exasol12 as the password for the maintenance user: 150w, 300w, 715w" sizes="(max-width: 620px) 100vw, 620px" />

Check with ipconfig (using the cmd shell) for the Ethernet adapter vEthernet (edu-public-network): Then replace the last digit 1 with 10 for the IP of the public interface. It should be 1240w, 150w, 300w, 768w, 1024w" sizes="(max-width: 620px) 100vw, 620px" /> 150w, 300w" sizes="(max-width: 485px) 100vw, 485px" />

Check the network configuration comparing it to the tab Networking in Hyper-V Manager 150w, 300w" sizes="(max-width: 511px) 100vw, 511px" />

Private must match edu-cluster-interconect and Public must match edu-public-network 150w, 300w, 684w" sizes="(max-width: 620px) 100vw, 620px" />

Click on Configure in the Advanced Network configuration to change it accordingly otherwise: 150w, 300w" sizes="(max-width: 533px) 100vw, 533px" />

Confirm the (corrected) network configuration. Tab to OK and press return: 150w, 300w" sizes="(max-width: 476px) 100vw, 476px" />

This triggers the installation of various components on the License Server and takes some time, depending on the hardware resources of your machine. The critical resource is CPU, so try to avoid running anything else on your machine that consumes CPU during the install.

If the License Server reboots during that process and presents the choice to install, update or boot from local disk, type in local. You may then remove the virtual CD from the CD drive of the VM.

When prompted for login on the node n0010, you may login as the OS user maintenance with the password exasol12 : 150w, 300w" sizes="(max-width: 498px) 100vw, 498px" />

Refresh until it shows EXAoperation status: Online

This may take several Minutes depending on your hardware.

Connect to EXAoperation

Now open a Chrome browser (preferred) with the https URL of your License Server’s IP: 150w, 300w, 630w" sizes="(max-width: 620px) 100vw, 620px" />

User Name is admin, Password is admin

Check Network Configuration

The network configuration should look like this: 150w, 300w, 756w" sizes="(max-width: 620px) 100vw, 620px" />

Click on Edit and modify it to the above settings otherwise. You may change the Cluster Name to something else. Notice that the value None is not allowed for NTP Servers and you have to remove it. Then click Apply.

Add Monitoring Service

To be able to follow the node’s installation, it’s useful to add a monitoring service now. Click on Services at the left and then on the Add button: 150w, 300w" sizes="(max-width: 560px) 100vw, 560px" />

Select all cluster services with the Log Priority Information to get the most verbose output, then click Add.

Add first data node

In Hyper-V Manager, create another VM with the name n11: 150w, 300w, 707w" sizes="(max-width: 620px) 100vw, 620px" />

Specify Generation 1 on next page.

Give it 2500 MB memory if your host has >=16 GB  and 2200 MB otherwise and de-select dynamic memory: 150w, 300w, 697w" sizes="(max-width: 620px) 100vw, 620px" />

Give it a 100 GB max. sized disk and name it n11sda: 150w, 300w, 682w" sizes="(max-width: 620px) 100vw, 620px" />

Select install OS later and finish the creation dialog. Then click on settings. Add a legacy network adapter to n11 and connect it to the switch edu-cluster-interconnect: 150w, 300w, 718w" sizes="(max-width: 620px) 100vw, 620px" />

Connect another normal network adapter to the switch edu-public-network.

Configure the boot order for n11 with legacy network adapter first: 150w, 300w, 697w" sizes="(max-width: 620px) 100vw, 620px" />

Add a second hard drive to n11: 150w, 300w, 701w" sizes="(max-width: 620px) 100vw, 620px" />

Select New Virtual Harddisk of type VHDX, dynamically expanding up to 100 GB max. and name it n11sdb: 150w, 300w" sizes="(max-width: 481px) 100vw, 481px" />

We need to know the MAC addresses of the data nodes upfront to enter them in EXAoperation, so the data nodes can connect to the license server to get booted over network. The MAC addresses below can be different as long as you know they do not conflict with other machines in your virtual environment,

Select Advanced Features of the legacy network adapter and configure it with a static MAC address of 08:00:27:A9:D1:22 150w, 300w, 651w" sizes="(max-width: 620px) 100vw, 620px" />

Select Advanced Features of the normal network adapter and configure a static MAC address of  08:00:27:92:10:6B 150w, 300w, 663w" sizes="(max-width: 620px) 100vw, 620px" />

Do not yet power on the new VM. Instead, click on Add in the EXAoperation Nodes screen: 150w, 300w" sizes="(max-width: 620px) 100vw, 620px" />

Configure the added node: 91w, 182w" sizes="(max-width: 497px) 100vw, 497px" />

Click on Add, then click on the new node to configure it further: 150w, 300w, 726w" sizes="(max-width: 620px) 100vw, 620px" />

Scroll down on the node’s detail page and click on Disks: 150w, 300w" sizes="(max-width: 535px) 100vw, 535px" />

Click on Edit on the EXACluster Disk Information for node n0011 screen: 150w, 300w" sizes="(max-width: 620px) 100vw, 620px" />

Add the device /dev/sda with the other values like on the picture: 150w, 300w" sizes="(max-width: 620px) 100vw, 620px" />

Do this accordingly for the other three storage partitions. Make sure to reduce the size to 30 GB for the Data partition and assign /dev/sdb for the Storage partition. It should look like this in the end: 150w, 300w" sizes="(max-width: 620px) 100vw, 620px" />

Now power on the n11 VM. It should come up booting from network and start installing. Check that the install starts normally without errors: 150w, 300w, 768w, 1000w" sizes="(max-width: 620px) 100vw, 620px" />

If you see errors in the log at this point, turn off the VM and check the configuration as above described, especially the disk layout. After having corrected it, power the VM on again. Make sure the state of the node shows as Unknown to install (Not Unknown Active) here. Execute the action Set install flag for the node to change it to the state Unknown to install otherwise.

While the install of n11 is ongoing, add the second data node as a Hyper-V VM just like the n11 VM before. Give it the name n12 with all the same properties as n11, except these MAC addresses:

Legacy network adapter (Private Network) 08:00:27:AA:61:8D

Normal network adapter (Public Network)  08:00:27:A3:AB:46

After having created n12 in Hyper-V, go to EXAoperation and click on the n11 node: 150w, 300w, 726w" sizes="(max-width: 620px) 100vw, 620px" />

Scroll down on the node’s detail page and click on Copy Node.

Change the node numbers to 12 and the MAC addresses as above listed: 150w, 300w" sizes="(max-width: 393px) 100vw, 393px" />

Click on Copy Node again to finish the creation of n12.

Click on the Nodes screen: 150w, 300w, 768w, 918w" sizes="(max-width: 620px) 100vw, 620px" />

The install of n11 is still ongoing and n12 is now ready to start in Hyper-V. Power it on.

It should come up and install like n11. The install ends with the nodes prompting for login and a message like Boot process finished after xxx seconds in the logservice.

The install consumes much CPU resources. Try to avoid running anything else on the machine while it is ongoing. It may take quite a long while to complete, depending on your local machine power.

It looks like this in the end: 150w, 300w, 768w, 901w" sizes="(max-width: 620px) 100vw, 620px" />

Now select the two nodes and execute the action Set active flag for them: 150w, 300w, 715w" sizes="(max-width: 620px) 100vw, 620px" />

Their state should now change to Running Active.

On the EXAStorage screen, click on Startup Storage Service: 150w, 300w, 682w" sizes="(max-width: 620px) 100vw, 620px" />

Now select the two data nodes and click on Add Unused Disks: 150w, 300w" sizes="(max-width: 509px) 100vw, 509px" />

Add a data volume

On the EXAoperation Storage screen, click on Add Volume: 150w, 300w, 768w, 815w" sizes="(max-width: 620px) 100vw, 620px" />

Specify a Redundancy of 2, add user admin to Allowed Users, Volume Size should be 8 GB, add n11 and n12 to the Nodes List, Number of Master Nodes is 2 and Disk is d03_storage: 107w, 214w" sizes="(max-width: 620px) 100vw, 620px" />

The new volume should then look like this: 150w, 300w, 740w" sizes="(max-width: 620px) 100vw, 620px" />

Create a database

On the EXASolution screen, click on Add: 150w, 300w" sizes="(max-width: 620px) 100vw, 620px" />

On the Add Database screen, specify 2 Active Nodes, assign the two data nodes and click Add 120w, 240w, 649w" sizes="(max-width: 620px) 100vw, 620px" />

The new database shows a Status of Not created. Click on the database link: 150w, 300w, 768w, 804w" sizes="(max-width: 620px) 100vw, 620px" />

Select the Action Create and click Submit: 150w, 300w, 768w, 849w" sizes="(max-width: 620px) 100vw, 620px" />

The Action should change to Startup, n11 and n12 appear as Selected. Click Submit again: 150w, 300w, 768w, 894w" sizes="(max-width: 620px) 100vw, 620px" />

The State changes to Starting for n11 and n12. Click on EXASolution on the Services menu on the left:

The new database should now show as Online. 150w, 300w, 768w, 844w" sizes="(max-width: 620px) 100vw, 620px" />

Congratulations for going the whole way, it was a bit long at times, wasn’t it? But now you have your own Exasol cluster sandbox on Hyper-V – have fun with it!

Going forward, I find it convenient to create Hyper-V checkpoints with the database and storage service shutdown and the nodes having the cluster services shutdown as well. That way I make sure nothing can change on the disks of the data nodes when I take the checkpoint. I can then just turn off the VMs and apply these checkpoints afterwards. Then starting the VMs saves me the time they would need to go through the normal boot process otherwise. My next article will explain how to add a reserve node to this cluster, bringing it from 2+0 to 2+1. Hope you find this mini-series useful </p />

    	  	<div class=

Presenting at UKOUG Techfest19 Conference in Brighton, UK

I’m very excited to be attending my 3rd UKOUG Conference, this year re-badged as Techfest19. The fact it’s being held in Brighton is a little disconcerting for a Crystal Palace fan, but really looking forward nonetheless to what has always been one of the very best Oracle conferences on the yearly calendar. I have a […]

19c High-Frequency statistics gathering and Real-Time Statistics

Those are the two exciting new features about the optimizer statistics which arrived in the latest release of 12cR2: 19c. Less exciting is that we are not allowed to use them in any other platform than Exadata:

But let’s cross the fingers and hope that this will be released in the future because they solve real-life problems such as Out-of-Range queries. Here is a little example involving both of them. A table starts empty and is growing during the day. Relying only on the statistics gathered during the maintenance window will give bad estimations. And dynamic sampling may not sample the right blocks.

A little example

Here is a little example where I insert one row every second in a DEMO table and look at the statistics.


First I initialize the example by creating the table, gathering stats on it and set the global parameter AUTO_TASK_STATUS to ON. In is not obvious from the name, but in the context of DBMS_STATS, this Auto Task is the “high-frequency” one, running outside of the maintenance window, every 15 minutes by default, as opposed to the Auto Job that runs during the maintenance window every 4 hours.

spool hi-freq-stat.log
set echo on time on
whenever sqlerror exit failure
alter session set nls_date_format='dd-MON-yyyy hh24:mi:ss';
create table DEMO(d date,n number(*,2),h varchar2(2),m varchar2(2));
exec dbms_stats.gather_table_stats(user,'DEMO');
show parameter exadata
exec dbms_stats.set_global_prefs('auto_task_status','on');

This is a lab where I simulate Exadata features. High-Frequency Automatic Optimizer Statistics Collection is available only on Exadata.

Run every second

Every second I run this transaction to add one row. The D column will contain SYSDATE at the time of insert:

insert into DEMO (D, N, H, M) values (sysdate
select count(*),min(d),max(d) from DEMO;

Then I query the column statistics for this D column:

exec dbms_stats.flush_database_monitoring_info;
with function d(r raw) return date as o date;
begin dbms_stats.convert_raw_value(r,o); return o; end;
select column_name,d(low_value),d(high_value),num_distinct,notes
from dba_tab_col_statistics where owner=user and table_name='DEMO'
and column_name='D'

As an example, here is the 10000th iteration 2 hours 46 minutes after the initialization:

The min value is from 20:22:16 and the max value is this insert at 23:18:19

Without those 19c features, the statistics would have stayed at 0 distinct values, and null low/high, as it was gathered when the table was empty.

However, here I have two statistics here (visible in the dictionary after flushing database monitoring info):

  • Real-Time Statistics (NOTES=STATS_ON_CONVENTIONAL_DML) which is updated when DML occurs and can be used as dynamic statistics without the need to sample. The number of distinct value is not known there (this is not updated on the fly because it would be expensive to know if new values are within the same set of existing values, or not). high/low values are accurate: in this example less than 2 minutes stale (23:16:42 is the high value known at 23:19:18 point-in-time).
  • The regular Statistics (no NOTES) which are not as accurate, but not too stale either: nearly 15 minutes stale (23:02:06 is the high value known at 23:19:18 point-in-time). And they are full statistics (gathered on the table with Auto Sample size): the number of distinct values is the one we had at 23:02:06 when the statistics were gathered by the “high-frequency” task.


To cleanup the test I drop the table and set back the auto_task_status to the default (off):

set termout off
drop table DEMO;
set termout on
exec dbms_stats.set_global_prefs('auto_task_status','off');

High-Frequency statistics gathering task result

I have run this every second for 10 hours with:

for i in {1..36000} ; do echo @ /tmp/sql2.sql ; done

And I awk’d the spool to get information about the distinct statistics we had during that time (regular statistics only) and the first time they were known:

awk '
#ignore real-time stats here
#store update time
/> commit;/{time=$1}
# stats on column inserted with SYSDATE
/^D/{if (logtime[$0]==""){logtime[$0]=time}}
END{for (i in logtime){print i,logtime[i]}}
' hi-freq-stat.log | sort -u

As the high value is the SYSDATE at the time of insert, this shows the staleness:

Here, I started my script at 20:22:16 with an empty table, and gathered the statistics, then showing 0 rows and null low/high value. Then one row was inserted each second. And the statistics stay until 20:31:30 where they show 523 lines. The high value here is from 20:31:29 when the high-frequency job has run. Those stats were used by the optimizer until 20:46:38 when the task has run again.

All those task and job execution are logged and visible from the same view:

select * from DBA_AUTO_STAT_EXECUTIONS order by OPID;

The detail for this specific tables from DBA_OPTSTAT_OPERATION_TASKS:

select target,start_time,end_time,notes 
where target like '%DEMO%' order by OPID desc

We see the runs every 15 minutes from 20:22:16, then 20:31:30, then 20:46:38… until 23:02:06

Then, I was till inserting at the same rate but the task, still running every 15 minutes, gathered statistics on this table only every 30 minutes: 23:02:06, then 23:32:12… and we see the latest here every 60 minutes only.

What do you think happened for the high-frequency job not gathering statistics on this table 15 minutes after the 23:02:06 run?

Let’s look at the numbers:

  • I insert one row per second
  • The task runs every 15 minutes

This means that when the task runs, I have inserted 900 rows. I didn’t change the default STALE_PERCENT which is 10%. And when those 900 rows do reach the threshold of 10%? When the table has more than 9000 rows.

And now look at the log at 23:02:06 before the task has run:

The statistics show 8222 rows (from DBA_TAB_STATISTICS but as I know they are all unique I guess them from the NUM_DISTINCT in DBA_TAB_COL_STATISTICS) and then the 900 modifications recorded count for 11%. This is higher than 10%, the table statistics are stale and the next task has re-gathered them:

A side note: the “STATS_ON_CONVENTIONAL_DML” disappeared because statistics were just gathered. But that’s for later…

Now that the table is known to have 9078 rows, when will it be stale again? 15 minutes later we will have inserted 900 rows, but 900/9078=9.9% and that’s under the 10% threshold. This is why the next run of the task did not gather statistics again. But after another 15 minutes, then 1800 rows have been inserted and that’s a 19.8% staleness.

You see the picture: this high-frequency task takes care of the very volatile tables. It runs every 15 minutes (the default AUTO_TASK_INTERVAL) and spends no more than 1 hour (the default AUTO_TASK_MAX_RUN_TIME).

Real-Time statistics

And to go further, Real-Time statistics not only count the modifications to evaluate the staleness but also stores information about low and high value.

Here is my awk script now showing the distinct information from the real-time statistics:

awk '
#ignore real-time stats here
#store update time
/> commit;/{time=$1}
# stats on column inserted with SYSDATE
/^D.*STATS_ON_CONVENTIONAL_DML/{if (logtime[$0]==""){logtime[$0]=time}}
END{for (i in logtime){print i,logtime[i]}}
' hi-freq-stat.log | sort -u

and an except around the time where the high-frequency job ran every 15 or 30 minutes only:

While I was inserting every second, the staleness is 2 minutes only. This reduces even further the risk of out-of-range queries.

Basically, the high-frequency task:

  • gathers statistics in the same way as in the maintenance window job (stale tables first) but more frequently and using fewer resources

and real-time statistics:

  • adds some more fresh information, not from gathering on the table but inferring from the DML that was monitored.

A little remark though: there’s no magic to have the new query executions use the new statistics. The high-frequency task just calls dbms_stats with the default rolling window invalidation.

Announcement: Australia/NZ “Let’s Talk Database” Events October 2019 !!

I’ve very excited to announce the next series of Oracle “Let’s Talk Database” events to be run throughout Australia and New Zealand in October 2019. I’ll be discussing two exciting topics this series, “Oracle Database 19c New Features” and “Oracle Exadata X8“. As always, these sessions run between 9am-1pm, include a networking lunch and are free, […]

Oracle Database 19c Automatic Indexing: Default Index Column Order Part II (Future Legend)

In Part I, we explored some options that Oracle might adopt when ordering the columns within an Automatic Index by default, in the absence of other factors where there is only the one SQL statement to be concerned with. A point worth making is that if all columns of an index are specified within SQL […]

Update restarts

Somewhere I think I’ve published a note about an anomaly that’s been bugging me since at least Oracle 10g – but if it is somewhere on the Internet it’s hiding itself very well and I can’t find it, though I have managed to find a few scripts on my laptop that make a casual reference to the side effects of the provlem. [Ed: a tweet from Timur Ahkmadeev has identified a conversation in the comments on an older post that could explain why I thought I’d written something about the issue.]

Anyway, I’ve decided to create some new code and write the article (all over again, maybe). The problem is a strange overhead that can appear when you do a simple but large update driving off a tablescan.

If you just want the conclusion and don’t feel like reading the detail, it’s this: a big update may roll itself back before it gets to the end of the tablescan and restart “invisibly”, doing a “select for update” (2nd tablescan, one row at a time) followed by the update (3rd tablescan, by array) massively increasing the redo and undo generated.

Here’s the script I’ve created to demonstrate the issue. As its header suggests the results I’ll be showing come from

rem     Script:         update_restart.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Seo 2019
rem     Purpose:        
rem     Last tested 

create table t1 
pctfree 80 
with g as (
        select rownum id 
        from dual 
        connect by level <= 500 -- > comment to avoid wordpress format issue
        rownum id, 
        lpad(rownum,10) v1
from    g,g

                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'

select  blocks, num_rows
from    user_tables 
where   table_name = 'T1'

I’ve created a table with 250,000 rows (500 * 500) and gathered stats on it (the table was 3,345 blocks) as  gathering stats tends to sort out any issues of block cleanout that might confuse the issue. Having done that I’m going to do an update of every row in the table:

alter system switch logfile;
execute snap_my_stats.start_snap

update t1 set id = id;

execute snap_my_stats.end_snap
execute dump_log

The update doesn’t actually change the column value (and in 12.2 there’s an interesting change to the Oracle code that comes into play in that case) but it will still generate redo and undo for every row in the table, and update ITL entries and do all the rest of the work we normally associate with an update.

Before the update I’ve issued a “switch logfile” and after the update I’ve called a procedure I wrote to do a symbolic dump of the current log file (“alter system dump logfile”). The calls to the snap_my_stats package allow me to see the changes in the session activity stats due to the update. A simple check of these statistics can often give you a good idea of what’s going on. So you might ask yourself what you might expect to see in the stats from this update. (As Isaac Asimov once said: The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’, but ‘That’s funny …’)

Having updated 250,000 rows by tablescan I think we might expect to see a tablescan of 250,000 rows; we might also expect to see (in the worst case) something like 250,000 redo entries although we might expect to see far fewer since Oracle can probably do array updates and create one redo change vector and one undo record per block changed rather than per row changed.

So here are some of the more interesting numbers that we actually see (though the specific results are not 100% reproducible):

Name                                                                     Value
----                                                                     -----
session logical reads                                                  270,494
db block gets                                                          263,181
consistent gets                                                          7,313
db block changes                                                       534,032
free buffer requested                                                    7,274
switch current to new buffer                                             3,534
redo entries                                                           265,128
redo size                                                           83,803,996
rollback changes - undo records applied                                  1,002
table scan rows gotten                                                 522,470
table scan blocks gotten                                                 6,881 
HSC Heap Segment Block Changes                                         264,135
Heap Segment Array Updates                                              14,135

A pair of figures that stand out as most surprising are the tablescan figures – I’ve got a table of 3,345 blocks and 250,000 rows, but my tablescans total 6,881 blocks and 522,370 rows – I seem to have scanned the table (slightly more than) twice! At the same time I’ve requested 7,274 free buffers (some of them probably for undo blocks) and “switched current to new buffer” 3,534 times – so I seem to have read every table block then cloned it in memory.

Side note: the “switched current to new buffer” is a common feature of a tablescan update and not particularly relevant to this article. I assume the intent is to “leave behind” a read-consistent copy for any other queries that might have started before the big update.

Another really strange statistic is the one that says “rollback changes – undo records applied” – I’ve used 1,002 undo records in a rollback operation of some sort. I haven’t show you the stats for “user rollbacks” or “transaction rollbacks” because, despite the undo records being applied, there were no rollbacks reported.

What about changes? We see 534,000 “db block changes” – and maybe it’s okay for that statistic to be roughly twice the number of rows we expect to change because for every row we change we have to generate some undo, so may have modify an undo block to create an undo record – but there are still 34,000 “db block changes” too many. The number of changes is fairly consistent with the number of “redo entries”, of course, allowing for a factor of 2 because redo entries often consist of a matching pair of changes, one for the “forward change” and one for the “undo block change”. Again, though, there a bit of an excess: the 34,000 extra “db block changes” seem to have resulted in an excess 15,000 “redo entries”. And we still have to ask: “Whatever happened to array processing ?”

In fact, we do see some array processing – we did 14,135 “Heap Segment Array Updates” which, at an average of about 18 rows per array would account for our 250,000 row table – but we also did 264,135 “HSC Heap Segment Block Changes”, and here’s a (“That’s funny …”) coincidence: 264,135 – 14,135 = 250,000: the number of rows in the table. Maybe that means something, maybe not. Another funny coincidence: 264,135 (HSC Heap Segment Block Changes) + 1,002 (rollback changes – undo records applied) = 265,137 which is remarkably close to the number of “redo entries”.

If, by now, you’re not beginning to suspect that something a little odd has happened you haven’t been reading enough Agatha Christie novels, or watching enough Sherlock Holmes (or House) videos.

I don’t always … dump trace files but when I do I like to go over the top. My update generated 83MB of redo so I made sure my log files were 100MB each, and when I dumped the current log file I got a trace file that was a little over 466MB and 11M lines. Clearly I had no intention of reading it all. A critical skill in reading ANY type of trace file is knowing in advance what you ought to be looking for and working out a painless way of finding it. (Practising by slugging your way through a very small example is a good starting point.)

In this case I started by using grep to extract all the lines containing “OP:”, stripping them down to just the “OP:” followed by the redo op code, and then sorting and summarising the use of op codes. Here (with some annotations) is what I got:

grep OP or32_ora_14652.trc | sed 's/.*OP/OP/' | sed 's/ .*$//' | sort | uniq -c | sort -n

      1 OP:11.3
      2 OP:10.4
      2 OP:11.5
      3 OP:23.1
      3 OP:4.1
      4 OP:14.4
      4 OP:22.5
      5 OP:13.22
      8 OP:11.2
      8 OP:22.2
     13 OP:10.2
     25 OP:5.4
    119 OP:5.11         Mark undo as applied during rollback
    883 OP:5.6          Applying undo - NB 883 + 119 = 1002
   3776 OP:5.2          Get undo segment header
  15139 OP:11.19        Array update of table
 250000 OP:11.4         Lock table row piece
 264190 OP:5.1          Update undo block

The interesting point here is that I seem to have locked every single row in the table separately, but I’ve also done array updates on the table, and I’ve done some rolling back. So I need to do a more detailed check to see when things happened. So let’s begin by examining a few thousand lines near the start of the trace:

head -10000 or32_ora_14652.trc | grep -n OP | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

68:CHANGE #1  OP:11.19
214:CHANGE #2  OP:5.2
217:CHANGE #3  OP:11.19
363:CHANGE #4  OP:11.19
509:CHANGE #5  OP:5.2
512:CHANGE #6  OP:11.19
658:CHANGE #7  OP:11.19
692:CHANGE #8  OP:11.19
838:CHANGE #9  OP:11.19
984:CHANGE #10  OP:11.19
1130:CHANGE #11  OP:11.19
1276:CHANGE #12  OP:11.19
1422:CHANGE #13  OP:11.19
1568:CHANGE #14  OP:11.19
1714:CHANGE #15  OP:5.2
1717:CHANGE #16  OP:11.19
1863:CHANGE #17  OP:11.19
2009:CHANGE #18  OP:11.19
2155:CHANGE #19  OP:11.19
2301:CHANGE #20  OP:11.19
2447:CHANGE #21  OP:11.19
2593:CHANGE #22  OP:11.19
2739:CHANGE #23  OP:11.19
2885:CHANGE #24  OP:5.2
2888:CHANGE #25  OP:11.19
3034:CHANGE #26  OP:11.19
3180:CHANGE #27  OP:5.1
3336:CHANGE #28  OP:5.1
3489:CHANGE #29  OP:5.1
3642:CHANGE #30  OP:5.1
3795:CHANGE #31  OP:5.1
3836:CHANGE #32  OP:5.1
3989:CHANGE #33  OP:5.1
4142:CHANGE #34  OP:5.1
4295:CHANGE #35  OP:5.1
4448:CHANGE #36  OP:5.1
4601:CHANGE #37  OP:5.1
4754:CHANGE #38  OP:5.1
4907:CHANGE #39  OP:5.1
5060:CHANGE #40  OP:5.1
5213:CHANGE #41  OP:5.1
5366:CHANGE #42  OP:5.1
5519:CHANGE #43  OP:5.1
5672:CHANGE #44  OP:5.1
5825:CHANGE #45  OP:5.1
5978:CHANGE #46  OP:5.1
6131:CHANGE #47  OP:5.1
6284:CHANGE #48  OP:5.1
6440:CHANGE #1  OP:5.1
6593:CHANGE #2  OP:11.19
6742:CHANGE #1  OP:5.1
6895:CHANGE #2  OP:11.19
7044:CHANGE #1  OP:5.1
7197:CHANGE #2  OP:11.19
7346:CHANGE #1  OP:5.1
7499:CHANGE #2  OP:11.19
7648:CHANGE #1  OP:5.1
7801:CHANGE #2  OP:11.19
7950:CHANGE #1  OP:5.2
7953:CHANGE #2  OP:5.1
8106:CHANGE #3  OP:11.19
8255:CHANGE #1  OP:5.1
8408:CHANGE #2  OP:11.19
8557:CHANGE #1  OP:5.1
8710:CHANGE #2  OP:11.19
8859:CHANGE #1  OP:5.1
9012:CHANGE #2  OP:11.19
9161:CHANGE #1  OP:5.1
9314:CHANGE #2  OP:11.19
9463:CHANGE #1  OP:5.1
9616:CHANGE #2  OP:11.19
9765:CHANGE #1  OP:5.1
9918:CHANGE #2  OP:11.19

Everything up to line 6284 (change# 48) is one big “private redo” record where we do several “11.19 (table array updates)” followed by a matching group of “(5.1 update undo block)”, before switching to the “public redo” mechanism used by long transactions where each redo record is a matched pair of “backward” and “forward” change vectors, i.e. one 5.1 with a matching 11.19.

So we start by doing array updates without doing any single row locking – so I want to know when the array updates stop. Here’s a small but continuous piece from the next grep I ran (which produced an output totalling 15,000 lines):

grep -n "OP:11.19" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//' 
7461225:-OP:11.19    *** Big jump

The 11.19 opcodes were appearing roughly every 300 lines of trace, except they stopped after 2,004 occurrences then disappeared for about 7M lines before reappering and running evenly (ca. every 300 lines) to the end of the trace file. Clearly we need to know what happened somewhere around lines 447200 and 7461225. But before looking too closely, let’s find out when the “11.4 lock row piece” and “5.6 Apply undo” start to appear”

grep -n "OP:11.4" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//' 

... lots of evenly spread 11.4, every 28 lines

Note where the 11.4 opcodes begin and end – they fit exactly into the gap between the two sets of 11.19 opcodes. And where do the 5.6 op codes come into play ?

grep -n "OP:5.6" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//'

... a few hundred appearances of 5.6, common spaced by 155 lines 

That was the entire set of “5.6 Apply undo” records – a short burst of activity which finishes just before the big gap in the 11.19 opcodes . This leads to the question: “does anything difference happen around the time that the 5.6 op codes start to appear?”  Time to look at a very small extract from a very big grep output:

grep -n "OP" or32_ora_14652.trc | sed 's/ CON_ID.*OP/:-OP/' | sed 's/ ENC.*$//' 
294895:CHANGE #1:-OP:5.1
295048:CHANGE #2:-OP:11.19

295197:CHANGE #1:-OP:5.1
295322:CHANGE #2:-OP:11.19

295443:CHANGE #1:-OP:5.1
295596:CHANGE #2:-OP:11.19

295745:CHANGE #1:-OP:22.2

295753:CHANGE #1:-OP:22.5

295760:CHANGE #1:-OP:14.4
295766:CHANGE #2:-OP:22.2

295773:CHANGE #1:-OP:11.19
295919:CHANGE #2:-OP:5.6

295928:CHANGE #1:-OP:11.19
296046:CHANGE #2:-OP:5.6

296055:CHANGE #1:-OP:11.19
296201:CHANGE #2:-OP:5.6


I’ve inserted blank lines between redo records in this tiny extract from around the point where the 5.6 opcodes appear. The first few records show Oracle doing the update, then there’s a little bit of space management (Op codes in the 22 and 14 layer) and then Oracle starts rolling back. (Note: whether updating or rolling back Oracle is applying “11.19 table array updates”, but the “5.6 undo applied” appear as change vector #2 in the redo record as we rollback, while the “5.1 create undo” appear as change vector #1 as we make forward changes).

If we count the number of 11.19 opcodes up to the point where the 11.4 opcodes appear we find that we’ve used 1,002 to update the table, then another 1,002 to rollback the update.


Without going into very fine detail examining individual redo change vectors, what we’ve seen is as follows:

  • On executing the very large update the session activity statistics show the number of rows accessed by tablescan was more than twice the expected number. This obvious oddity made me look more closely and notice the surprising “rollback changes – undo records applied” which just shouldn’t have been there at all, and this led me to more closely at the number of redo entries which averaged more than one per row when it should have been a much smaller number due to Oracle’s internal array update mechanism.
  • Extracting just the redo op codes from a dump of the redo log generated during this update we’ve extracted summary information that the work we’ve done includes: (a) locking every single row in the table separately, (b) performing a load of table array updates, (c) applying a significant number of undo records to rollback changes.
  • Digging in for just a little more detail we see (in order of activity):
    • A thousand array updates  (roughly 20,000 rows by tablescan – I did actually check a few redo records in detail, 20 rows was a common array size)
    • The thousand updates being rolled back (luckily a long time before we got near the end of the first tablescan)
    • 250,000 rows locked individually (250,000 rows by tablescan – the 2nd tablescan)
    • 13,000 array updates   (250,000 rows by tablescan – the third tablescan)

I don’t know why this happens – and from memory I think it started happening in 10g. I have an idea it relates to changes in the global SCN as the update takes place and something (possibly related to space management updates) making the update think it has to go into “write consistent” mode as its “start SCN” gets too far away from the current SCN.

The bottom line on this, though, is that there may be cases where a large, tablescan-driven, update could actually do the tablescan three times, and generate far more undo and redo than it should. In theory I could have completed the update in about 13,000 pairs of (11.19, 5.1) change vectors. In practice this simple test produced far more work.

And it gets worse …

Update – prompted by a tweet from Tanel Poder

In one of my tests I had added the /+ monitor */ hint to see if the activity would be reported correctly. It didn’t look quite right – but what I hadn’t noticed was that the monitor reported the statement as executing 3 times (and, of course, the tablescan executing 3 times):

Global Information
 Status              :  DONE
 Instance ID         :  1
 Session             :  TEST_USER (14:13842)
 SQL ID              :  9a166bpd7455j
 SQL Execution ID    :  16777216
 Execution Started   :  09/10/2019 09:41:54
 First Refresh Time  :  09/10/2019 09:41:54
 Last Refresh Time   :  09/10/2019 09:42:00
 Duration            :  6s
 Module/Action       :  MyModule/MyAction
 Service             :  orcl
 Program             :  sqlplus@vbgeneric (TNS V1-V3)

Global Stats
| Elapsed |   Cpu   |    IO    |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
|    5.84 |    5.77 |     0.01 |     0.06 |     1M |   42 |  26MB |

SQL Plan Monitoring Details (Plan Hash Value=2927627013)
| Id |      Operation       | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity | Activity Detail |
|    |                      |      | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |   (# samples)   |
|  0 | UPDATE STATEMENT     |      |         |      |         4 |     +3 |     3 |        0 |      |       |          |                 |
|  1 |   UPDATE             | T1   |         |      |         6 |     +1 |     3 |        0 |      |       |   100.00 | Cpu (6)         |
|  2 |    TABLE ACCESS FULL | T1   |    250K |  432 |         4 |     +3 |     3 |     698K |   42 |  26MB |          |                 |

As you see from Tanel’s tweet, he has a little script you can use to scan through the library cache to see if any (big) updates have suffered from this problem.



I couldn’t quite manage to leave this problem alone, so I’ve run the test and dumped the redo log three more times and there’s an interesting coincidence. Every time I ran the test the switch from updating to rolling back had these 4 (annotated) redo opcodes separating them:

158766:CHANGE #1:-OP:22.2        ktfbhredo - File Space Header Redo
158774:CHANGE #1:-OP:22.5        ktfbbredo - File BitMap Block Redo:
158781:CHANGE #1:-OP:14.4        kteop redo - redo operation on extent map
158787:CHANGE #2:-OP:22.2        ktfbhredo - File Space Header Redo:

The group represents some allocation of space from a file to an object – so the question is: what object gets an extent map update. Here’s the full redo vchange vector for OP:14.4

REDO RECORD - Thread:1 RBA: 0x001cdd.000028a4.00d8 LEN: 0x00b0 VLD: 0x01 CON_UID: 0
SCN: 0x0b86.0db28b4c SUBSCN: 88 09/10/2019 15:06:13
CHANGE #1 CON_ID:0 TYP:0 CLS:29 AFN:2 DBA:0x00800688 OBJ:4294967295 SCN:0x0b86.0db28b4c SEQ:10 OP:14.4 ENC:0 RBL:0 FLG:0x0000
kteop redo - redo operation on extent map
   ADD: dba:0x802380 len:128 at offset:6
  ADDRET: offset:6 ctime:0
   SETSTAT: exts:9 blks:911 lastmap:0x0 mapcnt:0
   UPDXNT: extent:6 add:TRUE

The “ADD” is adding extent id 6, of 128 blocks, at file 2 block 9088. That happens to be the undo segment which my update statement is using. Further 14.4 change vectors that appear through the trace file take the size of the undo segment up to 35 extents, but in the (small number of ) tests I did it was always first 14.4 that seemed to trigger the rollback and restart.

Preliminary Hypothesis

When the transaction does sufficient work that it forces its undo segment to extend some feature of the resulting recursive transaction causes the transaction to rollback and restart.

This does fit with the observation that the number of rows updated before the rollback/restart occurs varies apparently at random. It also fits the observation that an update that doesn’t rollback and restart in 12.1 does rollback and restart in 12.2 because it gets a bit unlcky using the “single row locking” optimisation for “nochange update” optimisation.

Extending in-lists

A well known “limitation” of Oracle is that you can only have 1000 elements within an IN-LIST style expression in your SQL predicates. I use the quotes because I’ve always found that if you are heading into the territory where you need more than 1000 items in an IN-LIST, then it is often indicative of something else being wrong. For example, if the code is backing a user interface, then where in the design phase did someone not go “Whoa…we are going to have 1000 select-able elements on the screen?”

In any event, you can find many tricks and techniques out there on the intertubes about how workaround this issue, such as:

My preference is for the last one because once you use a temporary table, suddenly all of the power of SQL is at your fingertips. I had an AskTOM question recently which gave me an opportunity to display this. The requester had an IN-LIST set of values, but the values could contain the standard wild-cards. Using a temporary table approach, this becomes trivial with a small adjustment to the join condition.

Here’s my first solution using a familiar XMLTABLE technique.

SQL> create table t ( x varchar2(50));

Table created.

SQL> insert into t values ('stringasd');

1 row created.

SQL> insert into t values ('blah');

1 row created.

SQL> insert into t values ('more data');

1 row created.

SQL> insert into t values ('blah blah');

1 row created.

SQL> insert into t values ('some stuff with qwe in it');

1 row created.

SQL> insert into t values ('more with a double match on qwe');

1 row created.

SQL> with
  2  conditions as
  3    ( select 'string%,more%,%qwe%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(column_value) inlist_item
  7  from   conditions,
  8         xmltable(('"'||replace(wildcard_list, ',', '","')||'"'))
  9  )
 10  select *
 11  from   rowstyle;


3 rows selected.

SQL> with
  2  conditions as
  3    ( select 'string%,more%,%qwe%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(column_value) inlist_item
  7  from   conditions,
  8         xmltable(('"'||replace(wildcard_list, ',', '","')||'"'))
  9  )
 10  select *
 11  from   rowstyle r,
 12         t
 13  where t.x like r.inlist_item;

INLIST_ITEM          X
-------------------- ----------------------------------------
string%              stringasd
more%                more data
%qwe%                some stuff with qwe in it
more%                more with a double match on qwe
%qwe%                more with a double match on qwe

5 rows selected.


XMLTABLE has its limitations in terms of handling special characters as you can see from below:

SQL> set define off
SQL> with
  2  conditions as
  3    ( select 'string%,more%,%abc&xyz%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(column_value) inlist_item
  7  from   conditions,
  8         xmltable(('"'||replace(wildcard_list, ',', '","')||'"'))
  9  )
 10  select *
 11  from   rowstyle r,
 12         t
 13  where t.x like r.inlist_item;
ERROR at line 1:
ORA-19112: error raised during evaluation:
XVM-01003: [XPST0003] Syntax error at '"%abc'
1   "string%","more%","%abc&xyz%"
-                    ^

but this does not rule out the temporary table technique – we just need to use a different mechanism to convert our list into rows. Here’s another common technique, using a regular expression and a hierarchy CONNECT BY syntax.

SQL> with
  2  conditions as
  3    ( select 'string%,more%,%abc&xyz%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(w) inlist_item
  7  from   conditions c,
  8         lateral(
  9           select trim(regexp_substr(c.wildcard_list, '[^,]+',1,level)) w
 10           from   dual
 11           connect by level <= regexp_count(c.wildcard_list,'[^,]+')
 12           )
 13  )
 14  select *
 15  from   rowstyle r
 16  /


3 rows selected.

With the special characters issue fixed, it is then a simple join with the LIKE to obtain the desired result.

SQL> with
  2  conditions as
  3    ( select 'string%,more%,%abc&xyz%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(column_value) inlist_item
  7  from   conditions c,
  8         lateral(
  9           select regexp_substr(c.wildcard_list, '[^,]+',1,level) column_value
 10           from   dual
 11           connect by level <= regexp_count(c.wildcard_list,'[^,]+')
 12           )
 13  )
 14  select *
 15  from   rowstyle r,
 16         t
 17  where t.x like r.inlist_item;

INLIST_ITEM          X
-------------------- ----------------------------------------
string%              stringasd
more%                more data
more%                more with a double match on qwe

3 rows selected.

So before you look at the “tricks” to allow more than 1000 items in an IN-LIST, do not be too quick to dismiss the value of converting the values to rows, and therefore opening up all the power of SQL to perform your processing.

Quiz Night

Upgrades cause surprises – here’s a pair of results from a model that I constructed more than 15 years ago, and ran today on 12.2, then modified and ran again, then ran on, then on It’s very simple, I just create a table, gather stats, then update every row.

rem     Script:         update_nochange.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2019
rem     Last tested 

create table t1
with generator as (
                rownum id 
        from dual 
        connect by 
                rownum <= 1e4  -- > comment to avoid wordpress format issue
        rownum                          id,
        lpad(rownum,10,'x')             small_vc,
--      lpad(rownum,10,'0')             small_vc,
        'Y'                             flag
        generator       v1
where   rownum <= 1e3   -- > comment to avoid wordpress format issue

                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'

execute snap_my_stats.start_snap

update t1 set small_vc = upper(small_vc);

execute snap_my_stats.end_snap

The calls to package snap_my_stats are my little routines to calculate the change in the session activity stats (v$sesstat, joined to v$statname) due to the update. Here are a a few of the results for the test using the code as it stands:

Name                                    Value
----                                    -----
redo entries                               36
redo size                             111,756
undo change vector size                53,220

You’ll notice, however, that the CTAS has an option commented out to create the small_vc column using lpad(rownum,‘0’) rather than lpad(rownum,‘x’). This is what the redo stats look like if I use ‘0’ instead of ‘x’:

Name                                    Value
----                                    -----
redo entries                              909
redo size                             223,476
undo change vector size                68,256

What – they’re different ?!  (and it’s reproducible).

Running the test on or, both variants of the code produce the smaller number of redo entries (and bytes) and undo – it’s only that shows a change. (I haven’t got around to testing 18.3 and 19.3 yet.)

Tonight’s quiz:

Figure out what’s happening in to give two different sets of undo and redo figures.

If that problem is too easy – extrapolate the test to more complex cases to see when the difference stops appearing, and see if you can find any cases where this new feature might cause existing applications to break.

I’ll supply the basic answer in 48 hours.

Update (a few hours early)

The question has been answered in the comments – it’s an optimisation introduced in 12.2 that attempts to reduce the amount of undo and redo by minimising the work done for “no change” updates to data.  In principle – but we don’t yet know the rules and limitations – if an update does not change the column values Oracle 12.2 will not “save the old values in an undo record and log the new values in a redo change vector”, it will simply lock the row, to produce a minimal redo change vector.

Unfortunately Oracle goes into “single row” mode to lock rows, while it can do “block-level – i.e. multi-row/array” processing if it uses the “change” mechanism.  Inevitably there are likely to be cases where the 12.2 optimisation actually produces a worse result in terms of volume of redo, or contention for redo latches.

If we modify the code to dump the redo generated by the two different updates we can see more clearly what Oracle is doing:

alter session set tracefile_identifier = 'UPD';

column start_scn new_value m_start_scn
select to_char(current_scn,'999999999999999999999999') start_scn from v$database;

update t1 set small_vc = upper(small_vc);

column end_scn new_value m_end_scn
select to_char(current_scn,'999999999999999999999999') end_scn from v$database;

alter system dump redo scn min &m_start_scn scn max &m_end_scn;

Then, after running the test we can dump the list of redo op codes from the trace file:

First when we do the “no-change” update (with lots of repetitions deleted):

grep -n OP orcl12c_ora_21999_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

138:CHANGE #1  OP:11.4
147:CHANGE #2  OP:5.2
150:CHANGE #3  OP:11.4
159:CHANGE #4  OP:11.4
168:CHANGE #5  OP:11.4
177:CHANGE #6  OP:11.4
2458:CHANGE #189  OP:5.1
2474:CHANGE #190  OP:5.1
2490:CHANGE #191  OP:5.1
2506:CHANGE #192  OP:5.1
2525:CHANGE #1  OP:5.1
2541:CHANGE #2  OP:11.4
2553:CHANGE #1  OP:5.1
2569:CHANGE #2  OP:11.4
27833:CHANGE #1  OP:5.1
27849:CHANGE #2  OP:11.4
27861:CHANGE #1  OP:5.1
27877:CHANGE #2  OP:11.4
27889:CHANGE #1  OP:5.4

The dump starts with a large redo record (192 change vectors) that started life in the private redo buffer, and then switch to the standard “paired change vectors” in the public redo buffer. The 11.4 vectors are “lock row piece” while the 5.1 vectors are the “generate undo”. Counting the 11.4 and 5.1 lines there are exactly 1,000 of each – every row has been individually locked.

Now for the “real change” update:

grep -n OP orcl12c_ora_22255_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

135:CHANGE #1  OP:11.19
281:CHANGE #2  OP:5.2
284:CHANGE #3  OP:11.19
430:CHANGE #4  OP:11.19
576:CHANGE #5  OP:11.19
5469:CHANGE #41  OP:5.1
5573:CHANGE #42  OP:5.1
5726:CHANGE #43  OP:5.1
5879:CHANGE #44  OP:5.1
6035:CHANGE #1  OP:5.1
6188:CHANGE #2  OP:11.19
6337:CHANGE #1  OP:5.1
6490:CHANGE #2  OP:11.19
15029:CHANGE #2  OP:11.19
15101:CHANGE #1  OP:5.1
15177:CHANGE #2  OP:11.19
15249:CHANGE #1  OP:5.4

It’s a much smaller trace file (ca. 15,249 lines compared to ca. 27889 lines), and the table change vectors are 11.19 (Table array update) rather than 11.4 (table lock row piece). Counting the op codes we get 52 of each of the 11.19 and 5.1. If we want a little more information about those vectors we can do the following:

egrep -n -e "OP:" -e "Array Update" orcl12c_ora_22255_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//' 

135:CHANGE #1  OP:11.19
140:Array Update of 20 rows:
281:CHANGE #2  OP:5.2
284:CHANGE #3  OP:11.19
289:Array Update of 20 rows:
430:CHANGE #4  OP:11.19
435:Array Update of 20 rows:
576:CHANGE #5  OP:11.19
581:Array Update of 20 rows:
5469:CHANGE #41  OP:5.1
5481:Array Update of 13 rows:
5573:CHANGE #42  OP:5.1
5585:Array Update of 20 rows:
5726:CHANGE #43  OP:5.1
5738:Array Update of 20 rows:
5879:CHANGE #44  OP:5.1
5891:Array Update of 20 rows:
6035:CHANGE #1  OP:5.1
6047:Array Update of 20 rows:
6188:CHANGE #2  OP:11.19
6193:Array Update of 20 rows:
6337:CHANGE #1  OP:5.1
6349:Array Update of 20 rows:
14953:CHANGE #1  OP:5.1
14965:Array Update of 9 rows:
15029:CHANGE #2  OP:11.19
15034:Array Update of 9 rows:
15101:CHANGE #1  OP:5.1
15113:Array Update of 9 rows:
15177:CHANGE #2  OP:11.19
15182:Array Update of 9 rows:
15249:CHANGE #1  OP:5.4

As you can see, the 11.19 (table change) and 5.1 (undo) change vectors both report that they are are structured as array updates. In most cases the array size is 20 rows, but there are a few cases where the array size is smaller. In this test I found one update with an array size of 13 rows and three updates with an array size of 9 rows.


Oracle has introduced an optimisation for “no change” updates in 12.2 that tries to avoid generating quite so much undo and redo; however this may result in some cases where an “array update” change vector turns into several “single row lock” change vectors, so when you upgrade to 12.2 (or beyone) you may want to check any large update mechanism you run to see if your system has benefited or been penalised to any significant extent by this change. The key indicator will be an increase in the value of the session/system stats “redo entries” and “redo size”.

Speaker info for Perth, Australia

I have just got back from the Groundbreakers Latin America tour, and the travelling was made so much easier with the information provided by the organizers. So with the APAC tour coming up, I felt duty bound to give some hopefully useful information about my home town in Perth.

Flying in/out

There are two airport terminals in Perth. In times gone by, these were known as the Domestic and International terminals because, as the names suggest, domestic flights within Australia landed at one, and flights from overseas landed at the other. Hence locals will still refer to the terminals as “domestic” and “international”. However, due to all sorts of politics and bickering between airlines, times have changed.

The “Domestic” airport is now officially Terminal 3 and 4, and takes all Qantas and Jetstar flights (both domestic and international). The “International” airport is now Terminal 1 and 2, which takes all of the other airlines (including Tiger, Virgin for domestic) and all the major international carriers. Compounding all of this is that if your Qantas international flight (normally Terminal 4) is being operated by a partner (Emirates) then it might take off from Terminal 1.

Confused? </p />

    	  	<div class=

Working with AWR: Old Statistics in Current Execution Plans

This is a blog about a little trap, into which it is easy to fall when analysing performance problems using ASH data and execution plans from AWR.
I will illustrate it with a real example taken from a production PeopleSoft system. A number of sessions were waiting on blocking locks.  I used ASH data to find statements running while holding a blocking lock.  That led me to a particular SQL ID and plan have value.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SQL_ID        SQL_PLAN_HASH_VALUE   ASH_SECS
------------- ------------------- ----------
5st32un4a2y92 2494504609 10670
652mx4tffq415 1888029394 7030
artqgxug4z0f1 8450529 580
7qxdrwcn4yzhh 3723363341 270

So, I used DBMS_XPLAN.DISPLAY_AWR to extract the execution plan from the AWR repository.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT * FROM table(dbms_xplan.display_awr('5st32un4a2y92',2494504609,NULL,'ADVANCED'));

I can see the statement, the execution plan, the costs of each operation in the plan, and the value of the bind variable. This is a statement that has consumed a lot of execution time, it isn't executing that frequently, yet the cost of statement is quite low (only 22). That, in itself, does not mean there is a problem here, but it is perhaps a suggestion of an inconsistency.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SQL_ID 5st32un4a2y92

Plan hash value: 2494504609

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | | | 22 (100)| |
| 1 | INDEX FAST FULL SCAN| PS_CDM_LIST | 1 | 5 | 22 (10)| 00:00:01 |

Peeked Binds (identified by position):
1 - :1 (NUMBER): 17776

Familiarity with the application is an advantage here. The value of CONTENTID comes from a sequence number (albeit generated from a table, not an Oracle sequence because this is PeopleSoft). The system has been running for a while, and Process Instances and Context IDs were well in excess of 1,000,000, so why is the bind variable only 17776? That is another alarm bell.
I can extract the current execution plan from the library cache, if the statement is in memory, using DBMS_XPLAN.DISPLAY_CURSOR().

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT * FROM table(dbms_xplan.display_cursor('5st32un4a2y92',NULL,'ADVANCED')); 

Plan hash value: 2494504609

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 6 | 3178 (9)| 00:00:05 |
|* 1 | INDEX FAST FULL SCAN| PS_CDM_LIST | 1 | 6 | 3178 (9)| 00:00:05 |

Peeked Binds (identified by position):
1 - :1 (NUMBER): 10897657

Predicate Information (identified by operation id):
1 - filter("CONTENTID"=TO_NUMBER(:1))

DBMS_XPLAN.DISPLAY_CURSOR() shows the current bind variable value. It is immediately obvious it has moved on since the plan was captured by AWR. If the statement is not in memory, I could generate a new execution plan using EXPLAIN PLAN FOR, that reparses the statement, and then extract the execution plan with DBMS_XPLAN.DISPLAY(). However, this will not give me the bind variable value.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">explain plan for SELECT 'X' FROM PS_CDM_LIST WHERE CONTENTID = :1


select * from table(dbms_xplan.display(null,null,'ADVANCED'));

The execution plan has the same plan hash value, so it is the same plan. However, the cost has increased significantly (from 22 to 3178). This reflects how the table has grown over time, and that is the problem. The full scan of the index was fast when the table was small, but as the table has grown over time, the scan is also taking longer. By the time I came to look at it, it was quite significant.
In this case, we created a new index on PS_CDM_LIST leading on CONTENTID to satisfy this query. The result was to improve the performance of this statement, and a reduce in time spent holding blocking locks which has further benefits.

What is going on here? 

This is SQL statement is a regular part of the application. The SQL ID doesn't change because it uses a bind variable. The execution plan hasn't changed either because the optimizer can't find a better option. The execution plan we obtained from AWR was the first such plan captured for that combination of SQL ID and plan hash value. AWR may, in fact is very likely to, have captured that plan again subsequently. However, it did not update the plan in AWR, and old plans are never purged.
The original cost (22) was the cost then, not now. The value of the bind variable (17776) was the value then not now!
So how do you know you are looking at a plan with old costs and bind variables? You don't.  There is nothing explicit to warn you to go and check.
However, if things are not making sense, see if the cost for the same plan is close to the costs stored in AWR.
If you use EXPLAIN PLAN FOR you may get a different plan, in which case you will need to use the hints in the outline section of the execution plan (not shown above) to force it to produce the same execution plan before you can compare costs.