Top 60 Oracle Blogs

Recent comments

zHeap: PostgreSQL with UNDO

I’m running on an Oracle Cloud Linux 7.6 VM provisioned as a sandbox so I don’t care about where it installs. For a better installation procedure, just look at Daniel Westermann script in:

Some more zheap testing - Blog dbi services

The zHeap storage engine (in development) is provided by EnterpriseDB:


I’ll also use pg_active_session_history, the ASH (Active Session History) approach for PostgreSQL, thanks to Bertrand Drouvot


In order to finish with the references, I’m running this on an Oracle Cloud compute instance (but you can run it anywhere).

Cloud Computing VM Instances - Oracle Cloud Infrastructure

Here is what I did on my OEL7 VM to get PostgreSQL with zHeap:

# Install and compile

sudo yum install -y git gcc readline-devel zlib-devel bison-devel
sudo mkdir -p /usr/local/pgsql
sudo chown $(whoami) /usr/local/pgsql
git clone
cd zheap && ./configure && make all && make install
cd contrib && make install
cd ../..

# Create a database

# Environment

export PGDATA=/var/lib/pgsql/data
echo "$PATH" | grep /usr/local/pgsql/bin ||
export PATH="$PATH:/usr/local/pgsql/bin"

# Creation of the database and start the server

pg_ctl start
ps -edf | grep postgres && psql postgres <<<"\l\conninfo\;show server_version;"

# Install pg_Sentinel extension

git clone
cd pgsentinel/src && make && make install
cat >> $PGDATA/postgresql.conf <shared_preload_libraries = 'pg_stat_statements,pgsentinel'
track_activity_query_size = 2048
pg_stat_statements.track = all
psql postgres -c "create extension pgsentinel;"

# create a demo database

psql postgres -c "create database demo;"
psql demo -c "create extension pgsentinel;"

Undo and discard workers

Here I am. Don’t worry about the user running it, that’s just me using what I already have there, but you can create a postgres user. I’m in a version 12 in development:

ps -edf | grep postgres && psql postgres <<<"\l\conninfo\;show server_version;show config_file;"

ps -edf | grep postgres && psql postgres <<<”\l\conninfo\;show server_version;show config_file;”

zHeap vs. Heap

In the past I measured the redo journaling (WAL) by PostgreSQL ( because, coming from Oracle, I was surprised by the amount of redo generated by some small updates in PostgreSQL. This overhead is due to the combination of two weaknesses: full page logging and no in-place update. The second will be partially addressed by zHeap, so let’s do the same test.

strace | awk

Here is the awk script I use to measure the volume written to disk

strace -fye trace=write,pwrite64 -s 0 pg_ctl start 2>&1 >/dev/null | awk '
/^.pid *[0-9]+. /{
pid=$2 ; sub("]","",pid)
"cat /proc/" pid "/cmdline" |& getline cmdline
sub(/pid *[0-9]+/,sprintf("%-80s ",cmdline))
/pg_wal/ || /undo/ {
/pwrite64[(].*, *[0-9]+, *[0-9]+[)]/{
sub(/, *[0-9]+[)].*/,"")
sub(/[(][0-9]+ sum[$0]=sum[$0]+bytes
/write[(].*, *[0-9]+[)]/{
sub(/[(][0-9]+ sum[$0]=sum[$0]+bytes
{ print > "/dev/stderr" }
printf "%9s%1s %6s %7s %s\n","BYTES","","COUNT","AVG","process/file"
for (i in sum){
u=" "
if (cnt[i]>1) printf "%9d%1s %6d %7d %s\n",s,u,cnt[i],sum[i]/cnt[i],i
' | sort -h

I strace the write calls (-e trace=write, pwrite64) without showing the data written (-s 0) when running the database server (pg_ctl start), tracing all child processes (-f) and showing the file names with the descriptor (-y). The awk keeps only the call, file, pid and bytes written to aggregate them. The pid is expanded with the process argv[0] for better readability.

Create zHeap table

Here is the table as in the previous blog post, but mentioning zHeap storage:

create table demoz using zheap as select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(0,0);
insert into demoz select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(1,1000000);

Sparse update on one column

Here is the update that I wanted to test:

update demoz set b=b+1 where mod(a,10)=1;
UPDATE 100000

And the result of my strace|awk script on these 100000 updates:

- 4403+2047=1645 8k blocks, which is 112+15=127MB of data
- 120+14=134MB of WAL
- 15+14+2=31MB of UNDO
The volume of undo is approximately the real volume of changes (I had 15MB of redo and 6MB of undo with same update on Oracle). But we still have an exaggerated volume of block changes (and with full-page logging).

I’ve created the same table in default Heap storage, and here is the write() trace for the same update:

- 16191+4559+1897=22647 8k blocks, which is 175MB of data
- 131+33=164MB of WAL

On this use case, which is quite common when we process data (call record, orders, …) and set only a flag or a date to mark them as processed, it seems that zHeap helps, but not a lot. But a real case would have many indexes on this table and updating in-place may reduce the overhead for non-updated columns. That’s for a future post.