Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Spatial Data 1: Loading GPX data into XML data types

This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava.

In these posts I have only shown extracts of some of the scripts I have written.  The full files are available on github.

Upload and Expand Strava Bulk Export

Strava will bulk export all your data to a zipped folder.  It contains various CSV files.  I am interested in activities.csv that contains a row for each activity with various pieces of data including the name of the data file that can be found in the /activities directory.  That file will usually be a .gpx, or it may be zipped as a .gpx.gz file.  GPX is an XML schema that contains sets of longitude/latitude coordinates and may contain other attributes.  

The first job is to upload the Strava export .zip file to somewhere accessible to the database server (in my case /vagrant) and to expand it (to /tmp/strava/).

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">cd /vagrant
mkdir /tmp/strava
unzip /vagrant/export_1679301.zip -d /tmp/strava

Create Strava Schema 

I need to create a new database schema to hold the various objects I will create, and I have to give it certain privileges.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">connect / as sysdba
create user strava identified by strava;
grant connect, resource to strava;
grant create view to strava;
grant select_catalog_role to strava;
grant XDBADMIN to STRAVA;
grant alter session to STRAVA;
alter user strava quota unlimited on users;
alter user strava default tablespace users;

GRANT CREATE ANY DIRECTORY TO strava;
CREATE OR REPLACE DIRECTORY strava as '/tmp/strava';
CREATE OR REPLACE DIRECTORY activities as '/tmp/strava/activities';
CREATE OR REPLACE DIRECTORY exec_dir AS '/usr/bin';

GRANT READ, EXECUTE ON DIRECTORY exec_dir TO strava;
GRANT READ, EXECUTE ON DIRECTORY strava TO strava;
GRANT READ ON DIRECTORY activities TO strava;


  • I need to create database directories for both the CSV files in /tmp/strava and the various GPX files in the /tmp/strava/activities sub-directory.  I will need read privilege on both directories, and also execute privilege on the strava directory so that I can use a pre-processor script.
  • The exec_dir directory points to /usr/bin where the zip executables are located.  I need read and execute privilege on this so I can read directly from zipped files.
  • XDBADMIN: "Allows the grantee to register an XML schema globally, as opposed to registering it for use or access only by its owner. It also lets the grantee bypass access control list (ACL) checks when accessing Oracle XML DB Repository".

Import CSV file via an External Table

I will start by creating an external table to read the Strava activities.csv file, and then copy it into a database table.  This file is a simple comma-separated variable file.  The activity date, name and description are enclosed in double-quotes.  
The first problem that I encountered was that some of the descriptions I typed into Strava contain newline characters and the external table interprets them as the end of the record even though these characters are inside the double-quotes.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">4380927517,"23 Nov 2020, 18:03:54",Zwift Crash Recovery,Virtual Ride,"Zwift Crash Recovery
1. recover fit file per https://zwiftinsider.com/retrieve-lost-ride/,
2. fix corrupt .fit file with https://www.fitfiletools.com",1648,13.48,,false,Other,activities/4682540615.gpx.gz,,10.0,1648.0,1648.0,13480.2001953125,13.199999809265137,
8.179733276367188,91.0,36.20000076293945,12.600000381469727,69.5999984741211,7.099999904632568,0.40652215480804443,,,84.0,62.1943244934082,
,,,150.66201782226562,276.8444519042969,,,,,,,,,,,,158.0,1649.0,,,0.0,,1.0,,,,,,,,,,,,,,,,4907360.0,,,,,,,,,,,


As Chris Saxon points out on AskTom, it is necessary to pre-process the records to replace the newline characters with something else.  I found this awk script to process the record.  So I put it into a shell script nlfix.sh, made it executable and invoked as a pre-processor in the external table definition.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">#nlfix.sh
/usr/bin/gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' $*

nlfix.sh

  • Note the full path for gawk is specified.
A database directory is needed for the location of the pre-processor scripts and it is necessary to grant read and execute privileges on it.  I simply put the pre-processor in the same directory as the CSV file so I could use the same strava directory I created earlier.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">GRANT READ, EXECUTE ON DIRECTORY strava TO strava;
Now I can define an external table that will read the activities.csv file. 
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">CREATE TABLE strava.activities_ext
(Activity_ID NUMBER
,Activity_Date DATE
,Activity_Name VARCHAR2(100)
,Activity_Type VARCHAR2(15)
,Activity_Description VARCHAR2(200)
,Elapsed_Time NUMBER
,Distance_km NUMBER
…)
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
DEFAULT DIRECTORY strava
ACCESS PARAMETERS
(RECORDS DELIMITED BY newline
SKIP 1
DISABLE_DIRECTORY_LINK_CHECK
PREPROCESSOR strava:'nlfix.sh'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' RTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
NULLIF = BLANKS
(Activity_ID,Activity_Date date "DD Mon yyyy,HH24:mi:ss"
,Activity_Name,Activity_Type,Activity_Description
,Elapsed_Time,Distance_km
…))
LOCATION ('activities.csv')
) REJECT LIMIT 5
/

Import Activities

Now I can simply copy from the external table to a regular table.  I have omitted a lot of columns that Strava does not populate (at least not in my export) but that appear in the CSV file.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">rem 1b_create_activities_ext.sql
spool 1b_create_activities_ext

CREATE TABLE strava.activities AS
select ACTIVITY_ID,ACTIVITY_DATE,ACTIVITY_NAME,ACTIVITY_TYPE,ACTIVITY_DESCRIPTION,
ELAPSED_TIME,DISTANCE_KM,RELATIVE_EFFORT,COMMUTE_CHAR,ACTIVITY_GEAR,
FILENAME,
ATHLETE_WEIGHT,BIKE_WEIGHT,ELAPSED_TIME2,MOVING_TIME,DISTANCE_M,MAX_SPEED,AVERAGE_SPEED,
ELEVATION_GAIN,ELEVATION_LOSS,ELEVATION_LOW,ELEVATION_HIGH,MAX_GRADE,AVERAGE_GRADE,
--AVERAGE_POSITIVE_GRADE,AVERAGE_NEGATIVE_GRADE,
MAX_CADENCE,AVERAGE_CADENCE,
--MAX_HEART_RATE,
AVERAGE_HEART_RATE,
--MAX_WATTS,
AVERAGE_WATTS,CALORIES,
--MAX_TEMPERATURE,AVERAGE_TEMPERATURE,
RELATIVE_EFFORT2,
TOTAL_WORK,
--NUMBER_OF_RUNS,
--UPHILL_TIME,DOWNHILL_TIME,OTHER_TIME,
PERCEIVED_EXERTION,
--TYPE,
--START_TIME,
WEIGHTED_AVERAGE_POWER,POWER_COUNT,
PREFER_PERCEIVED_EXERTION,PERCEIVED_RELATIVE_EFFORT,
COMMUTE,
--TOTAL_WEIGHT_LIFTED,
FROM_UPLOAD,
GRADE_ADJUSTED_DISTANCE,
--WEATHER_OBSERVATION_TIME,WEATHER_CONDITION,
--WEATHER_TEMPERATURE,APPARENT_TEMPERATURE,
--DEWPOINT,HUMIDITY,WEATHER_PRESSURE,
--WIND_SPEED,WIND_GUST,WIND_BEARING,
--PRECIPITATION_INTENSITY,
--SUNRISE_TIME,SUNSET_TIME,MOON_PHASE,
BIKE
--GEAR,
--PRECIPITATION_PROBABILITY,PRECIPITATION_TYPE,
--CLOUD_COVER,WEATHER_VISIBILITY,UV_INDEX,WEATHER_OZONE,
--JUMP_COUNT,TOTAL_GRIT,AVG_FLOW,
--FLAGGED
FROM strava.activities_ext
/

ALTER TABLE activities ADD CONSTRAINT activities_pk PRIMARY KEY (activity_id);

ALTER TABLE activities ADD (gpx XMLTYPE) XMLTYPE COLUMN gpx STORE AS SECUREFILE BINARY XML (CACHE DISABLE STORAGE IN ROW);
ALTER TABLE activities ADD (geom mdsys.sdo_geometry));
ALTER TABLE activities ADD (geom_27700 mdsys.sdo_geometry));
ALTER TABLE activities ADD (mbr mdsys.sdo_geometry));
ALTER TABLE activities ADD (xmlns VARCHAR2(128));
ALTER TABLE activities ADD (num_pts INTEGER DEFAULT 0);

Spool off
  • I have specified a primary key on activity_id and made a number of other columns not nullable.
  • I have added a new XMLTYPE column GPX into which I will load the GPS data in the .gpx files.  

FIT files

Some applications, such as Garmin and Rouvy generate compressed .fit files, and Strava exports them again (apparently if it can't convert them, although it can convert the .fit files from Zwift to .gpx).  These are binary files, and since I only have a few of them, I have converted them to .gpx files using GPSBabel on my laptop, and then I reuploaded the .gpx files.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">for %i in (*.fit.gz) do "C:\Program Files\GnuWin\bin\gzip" -fd %i
for %i in (*.fit) do "C:\Program Files (x86)\GPSBabel\GPSBabel.exe" -i garmin_fit -f "%i" -o gpx -F "%~ni".gpx
I then update the file name in the activities table.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">UPDATE activities
SET filename = REPLACE(filename,'.fit.gz','.gpx')
WHERE filename like '%.fit.gz'
/

Compress GPX files (optional)

Some of the GPX files in the Strava export are compressed and some are not.  There is no obvious reason why.  To minimise the space I can gzip the GPX files.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">gzip -9v /tmp/strava/activities/*.gpx
If I do compress any .gpx files, then I also need to update the file names in the activities table.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">UPDATE activities
Set filename = filename||'.gz'
Where filename like '%.gpx'
/

Load the GPX files into the XML data type.

The next stage is to load each of the GPX files into the activities table.  
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">create or replace package body strava_pkg as 
k_module CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;

----------------------------------------------------------------------------------------------------
function getClobDocument
(p_directory IN VARCHAR2
,p_filename IN VARCHAR2
,p_charset IN VARCHAR2 DEFAULT NULL
) return CLOB deterministic
is
l_module VARCHAR2(64);
l_action VARCHAR2(64);

v_filename VARCHAR2(128);
v_directory VARCHAR2(128);
v_file bfile;
v_unzipped blob := empty_blob();

v_Content CLOB := ' ';
v_src_offset number := 1 ;
v_dst_offset number := 1 ;
v_charset_id number := 0;
v_lang_ctx number := DBMS_LOB.default_lang_ctx;
v_warning number;

e_22288 EXCEPTION; --file or LOB operation FILEOPEN failed
PRAGMA EXCEPTION_INIT(e_22288, -22288);
BEGIN
dbms_application_info.read_module(module_name=>l_module
,action_name=>l_action);
dbms_application_info.set_module(module_name=>k_module
,action_name=>'getClobDocument');

IF p_charset IS NOT NULL THEN
v_charset_id := NLS_CHARSET_ID(p_charset);
END IF;

v_filename := REGEXP_SUBSTR(p_filename,'[^\/]+',1,2);
v_directory := REGEXP_SUBSTR(p_filename,'[^\/]+',1,1);

IF v_directory IS NOT NULL and v_filename IS NULL THEN /*if only one parameters then it is actually a filename*/
v_filename := v_directory;
v_directory := '';
END IF;

IF p_directory IS NOT NULL THEN
v_directory := p_directory;
END IF;

v_File := bfilename(UPPER(v_directory),v_filename);

BEGIN
DBMS_LOB.fileopen(v_File, DBMS_LOB.file_readonly);
exception
when VALUE_ERROR OR e_22288 then
dbms_output.put_line('Can''t open:'||v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
v_content := '';
dbms_application_info.set_module(module_name=>l_module
,action_name=>l_action);
return v_content;
END;

IF v_filename LIKE '%.gz' THEN
v_unzipped := utl_compress.lz_uncompress(v_file);
dbms_lob.converttoclob(
dest_lob => v_content,
src_blob => v_unzipped,
amount => DBMS_LOB.LOBMAXSIZE,
dest_offset => v_dst_offset,
src_offset => v_src_offset,
blob_csid => dbms_lob.default_csid,
lang_context => v_lang_ctx,
warning => v_warning);
ELSE --ELSIF v_filename LIKE '%.g__' THEN
DBMS_LOB.LOADCLOBFROMFILE(v_Content,
Src_bfile => v_File,
amount => DBMS_LOB.LOBMAXSIZE,
src_offset => v_src_offset,
dest_offset => v_dst_offset,
bfile_csid => v_charset_id,
lang_context => v_lang_ctx,
warning => v_warning);
END IF;

dbms_output.put_line(v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
DBMS_LOB.fileclose(v_File);

dbms_application_info.set_module(module_name=>l_module
,action_name=>l_action);

return v_Content;
exception when others then
dbms_output.put_line(v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
DBMS_LOB.fileclose(v_File);
dbms_application_info.set_module(module_name=>l_module
,action_name=>l_action);
raise;
end getClobDocument;
----------------------------------------------------------------------------------------------------

END strava_pkg;
/

I can simply query the contents of the uncompressed GPX file in SQL by calling the function.  In this case, the zipped .gpx file is 65K but decompresses to 1.2Mb.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">Set long 1000 lines 200 pages 99 serveroutput on
Column filename format a30
Column gpx format a100
select activity_id, filename
, getClobDocument('',filename) gpx
from activities
where filename like '%.gpx%'
And activity_id = 4468006769
order by 1
/


ACTIVITY_ID FILENAME GPX
----------- ------------------------------ ----------------------------------------------------------------------------------------------------
4468006769 activities/4468006769.gpx.gz <?xml version="1.0" encoding="UTF-8"?>
ation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin
.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.gar
min.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlsch
emas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">




Loop
1


30.6



57





activities/4468006769.gpx.gz - 1286238
Elapsed: 00:00:00.14
I can load the .gpx files into the GPX column of the activities table with a simple update statement.  The CLOB returned from the function is converted to an XML with XMLTYPE.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">UPDATE activities
SET gpx = XMLTYPE(getClobDocument('ACTIVITIES',filename))
WHERE filename like '%.gpx%'
/
I can now query back the same GPX from the database.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">Set long 1100 lines 200 pages 99 serveroutput on
select activity_id, filename, gpx
from activities
where filename like '%.gpx%'
And activity_id = 4468006769
order by 1
/


ACTIVITY_ID FILENAME GPX
----------- ------------------------------ ----------------------------------------------------------------------------------------------------
4468006769 activities/4468006769.gpx.gz <?xml version="1.0" encoding="US-ASCII"?>
ation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin
.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.gar
min.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlsch
emas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">




Loop
1


30.6



57




Observations About the Scalability of Data Loads in Snowflake

In the last weeks, I was running a number of tests based on the TPC-DS benchmark against Snowflake. One of the first thing I did is of course to create the TPC-DS schema and populate it. The aim of this blog post is to share some observations related to the population step.

The data I loaded was the same I used for this blog post (i.e. it is a 1 TB TPC-DS schema). The only difference was that I had to split the input files. This was necessary because Snowflake cannot parallelize the load with a single file. Therefore, for optimal performance, I had to split the large input files (the largest exceeds 400 GB). For example, by limiting the number of lines per input file to 1 million, the CATALOG_RETURNS table, instead of a single 22 GB input file, needs 144 input files of about 150 MB each. In addition, I also created compressed (gzip) versions of each input file. Note that all input files were stored in AWS S3.

The file format I used for the loads was the following:

CREATE OR REPLACE FILE FORMAT tpcds
TYPE = CSV
COMPRESSION = AUTO
FIELD_DELIMITER = '|'
ENCODING='iso-8859-1'
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE

The load itself was carried out with the COPY statement. For example:

COPY INTO catalog_returns
FROM @tpcds1t/catalog_returns/
FILE_FORMAT = tpcds
PATTERN = '.*\.data'

To decide which virtual warehouse size to use for the full load, I ran a number of loads of the CATALOG_RETURNS table with sizes going from X-Small up to and including 2X-Large. Note that I did not test with 3X-Large and 4X-Large because I was not able to use them with my account. The following chart summarizes what I observed:

As you can see, except for 2X-Large (probably because of the limited size of the input data), as the size of the virtual warehouse increased (remember, for each “step” the number of credits doubles), the load time decreased of factor 1.5-1.9. Which, in my opinion, is rather good. In addition, with all virtual warehouse sizes, the load time of compressed input files decreased of factor 1.5-1.6 compared to uncompressed input files.

The last test I ran was loading the full 1 TB of data with a virtual warehouse of size 2X-Large. The data was stored in compressed (gzip) input files. Every input file contained at most 2.5 million rows. The load took 8 minutes. The following screenshot provides details:

In summary, the loads I tested scaled well. In addition, I consider being able to load 1 TB of data in less than 10 minutes very good performance.

Terraform tips’n’tricks: getting the latest Oracle Linux 8 image OCID programatically

This post is a direct follow up to the previous one where I shared how I used a Terraform data source to fetch the latest Oracle-provided Oracle Linux 7 cloud image identifier. This time around I’d like to fetch the latest Oracle Cloud ID (OCID) for Oracle Linux 8. It’s a different approach and instead of a single article covering both Oracle Linux versions I decided to use a more the search-engine friendly method of splitting the topics.

Terraform versions

I’m still using Terraform 0.14.5/OCI provider 4.10.0.

What’s the latest Oracle Linux 8 OCID?

Referring back to the documentation the latest Oracle-provided Oracle Linux 8 image at the time of writing is Oracle-Linux-8.3-2021.01.12-0. As my home region is eu-frankfurt-1 and I want to create an always-free VM I need to go with this OCID:

ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa3qkhrcj4xo5beoffschi2ofagkufpc6bp2mnkcglpsq4pivxp2dq

Getting the OCID via the Terraform data source

So now let’s try to get the same OCID via Terraform’s data source. Here’s the code snippet I used in Cloud Shell:

# using Instance Principal for authentication
provider "oci" {
    auth = "InstancePrincipal"
    region = var.oci_region

}

data "oci_core_images" "ol8_latest" {

        compartment_id = var.compartment_ocid

        operating_system = "Oracle Linux"
        operating_system_version = "8"
        shape = "VM.Standard.E2.1.Micro"
}


output "latest_ol8_image" {
        value = data.oci_core_images.ol8_latest.images.0.id
} 

Note that unlike with Oracle Linux 7 you don’t specify the “dot” release. Doing so will raise an error with Terraform 0.14.5/OCI provider 4.10.0.

Let’s run the code:

 $ terraform apply

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

latest_ol8_image = "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa3qkhrcj4xo5beoffschi2ofagkufpc6bp2mnkcglpsq4pivxp2dq"  

A quick comparison reveals that my code fetched the intended OCID; QED.

Summary

I really like not having to specify the dot release with Oracle’s provided OL8 as it’s more future-proof. As soon as Oracle Releases Oracle Linux 8.4 in their cloud, I don’t need to change a single line of code to use it.

On the other hand, if my code requires a different (more specific) image, some trickery is needed. I’ll write about a possible solution in another post.

Blockchain tables are here in 19c!

Slow down there….Take a breath. Because if you’ve read the title of this post and you’re already logging on to your 19c database, I need to you STOP.  Before you go anywhere, please read this post first.

OK … lets proceed, now that you’re not going to end up with a table you can’t drop!

There’s been a lot of interest in blockchain technology and rather than having to go out and purchase a completely bespoke solution, there has naturally been a lot of interest in being able to simply utilise blockchain technology in the same place where your other business critical data is located, namely, an Oracle database!

That excitement has perhaps turned to disappointment as you’ve read that blockchain was planned for 20c, and since that release got shelved in the crazy year that 2020 was, it is now available in 21c, which might seem an eternity away for many customers still on earlier releases.

Compounding that is the tough decision point of whether to forgo blockchain in order to stay on the 19c version, which has the longest support timeframes, or having to jump to 21c to get blockchain but then perhaps be looking a more frequent upgrade cycle. For some customers, regular upgrades are no drama, but for others, it is can be tough to get approval for upgrades except when absolutely necessary.

But here is some good news! Blockchain tables have made their back into 19c, so you can have blockchain technology and still be on the long term support release!

As alluded to on the blockchain blog, release update 19.10 now contains support for blockchain tables, and the 19c documentation has been updated to reflect this.

Don’t panic if you have upgraded to 19.10 and you are seeing errors when trying to create a blockchain table. Here was my experience when I first tried to do it after upgrading to 19.10


SQL*Plus: Release 19.0.0.0.0 - Production on Sat Jan 30 22:34:49 2021 Version 19.10.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Last Successful login time: Sat Jan 23 2021 13:17:46 +08:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.10.0.0.0

SQL> create blockchain table bc1 (d date, amt number)
  2  no drop until 0 days idle
  3  no delete until 31 days after insert
  4  hashing using "sha2_512" version v1;
create blockchain table bc1 (d date, amt number)
       *
ERROR at line 1:
ORA-00901: invalid CREATE command

The first thing you’ll need to do is set compatible to 19.10.  After you’ve restarted, you’ll see this


SQL> show parameter compatible

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
compatible                           string      19.10.0


SQL*Plus: Release 19.0.0.0.0 - Production on Sat Jan 30 22:34:49 2021 Version 19.10.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Last Successful login time: Sat Jan 23 2021 13:17:46 +08:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.10.0.0.0

SQL> create blockchain table bc1 (d date, amt number)
  2  no drop until 0 days idle
  3  no delete until 31 days after insert
  4  hashing using "sha2_512" version v1;
create blockchain table bc1 (d date, amt number)
       *
ERROR at line 1:
ORA-00901: invalid CREATE command

Yeah, not our finest hour Smile . We forgot a file in the patch release. So you also need to apply patch 32431413 as shown below.


[oracle@db192 32431413]$ opatch apply
Oracle Interim Patch Installer version 12.2.0.1.23
Copyright (c) 2021, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/19.2.0/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/19.2.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.23
OUI version       : 12.2.0.7.0
Log file location : /u01/app/oracle/product/19.2.0/dbhome_1/cfgtoollogs/opatch/opatch2021-01-30_22-35-53PM_1.log

Verifying environment and performing prerequisite checks...

--------------------------------------------------------------------------------
Start OOP by Prereq process.
Launch OOP...

Oracle Interim Patch Installer version 12.2.0.1.23
Copyright (c) 2021, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/19.2.0/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/19.2.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.23
OUI version       : 12.2.0.7.0
Log file location : /u01/app/oracle/product/19.2.0/dbhome_1/cfgtoollogs/opatch/opatch2021-01-30_22-36-42PM_1.log

Verifying environment and performing prerequisite checks...
OPatch continues with these patches:   32431413  

Do you want to proceed? [y|n]
y
User Responded with: Y
All checks passed.

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/19.2.0/dbhome_1')


Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Applying interim patch '32431413' to OH '/u01/app/oracle/product/19.2.0/dbhome_1'

Patching component oracle.rdbms, 19.0.0.0.0...

Patching component oracle.rdbms.rsf, 19.0.0.0.0...
Patch 32431413 successfully applied.
Log file location: /u01/app/oracle/product/19.2.0/dbhome_1/cfgtoollogs/opatch/opatch2021-01-30_22-36-42PM_1.log

OPatch succeeded.
[oracle@db192 32431413]$ 

Finally, you’ll be good to go! Blockchain in 19c!


SQL*Plus: Release 19.0.0.0.0 - Production on Sat Jan 30 22:39:03 2021 Version 19.10.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 4294963952 bytes
Fixed Size        9143024 bytes
Variable Size    1392508928 bytes
Database Buffers   2885681152 bytes
Redo Buffers        7630848 bytes
Database mounted.
Database opened.

SQL> create blockchain table bc1 (d date, amt number)
  2  no drop until 0 days idle
  3  no delete until 31 days after insert
  4  hashing using "sha2_512" version v1;

Table created.

Did you make it this far? Then please go back and read this post, so you know how to control blockchain tables in your development environment.

Footnote: Note that this is not a general comment that every time you apply a release update, you should change the compatible parameter. Mike Dietrich covers that nicely here.

Oracle Database 19c Automatic Indexing: Invisible Indexes Oddity (Wild Eyed Boy From Freecloud)

There have been a couple of “oddities” in relation to both Oracle Autonomous Databases and Automatic Indexing behaviour that I’ve seen frequently enough now (on Oracle 19.5.0.0.0) to make it worth a quick blog article. The following is a simple test case that highlights both these issues. I’ll begin with a basic table, that has […]

data_default

Here’s a quirky little detail – probably totally irrelevant to virtually everyone – that came up in a question on the Oracle Developer Forum a couple of days ago. It concerns the way Oracle stores and records default values for columns, and it also does a hat-tip to the “commas at the start/end of the line” argument. Here’s a little script to create two identical tables:

create table t1 (
        t1 timestamp default '01-Jan-2021 12:00:00'
,       t2 timestamp default '01-Jan-2021 12:00:00'
,       t3 timestamp default '01-Jan-2021 12:00:00'
)
;


create table t2 (
        t1 timestamp default '01-Jan-2021 12:00:00',
        t2 timestamp default '01-Jan-2021 12:00:00',
        t3 timestamp default '01-Jan-2021 12:00:00')
;

Here’s a query to check that we’ve set the defaults correctly, followed by the output:

break on table_name skip 1
set linesize 180

spool default_length.lst

select
        table_name, column_name, default_length, data_default
from
        user_tab_columns
where
        table_name in ('T1', 'T2')
order by
        table_name,
        column_name
;

TABLE_NAME           COLUMN_NAME          DEFAULT_LENGTH DATA_DEFAULT
-------------------- -------------------- -------------- ----------------------------------
T1                   T1                               23 '01-Jan-2021 12:00:00'
                     T2                               23 '01-Jan-2021 12:00:00'
                     T3                               23 '01-Jan-2021 12:00:00'

T2                   T1                               22 '01-Jan-2021 12:00:00'
                     T2                               22 '01-Jan-2021 12:00:00'
                     T3                               22 '01-Jan-2021 12:00:00'

It would appear that we have the same default values set for the columns – but for table t1 the length of the default values is 23, while for table t2 it’s only 22. How strange, how do we investigate what’s going on.

A check of the view user_tab_columns tells us that data_default is a long column so we can’t dump() it, and we can’t substr() it. We could dump the relevant block from sys.col$, but rather than do that I’ll write a little PL/SQL block that reads the long into a PL/SQL varchar2() and outputs the last byte:

declare
        v1 varchar2(32);
begin
        for r in (
                select  table_name, column_name, default_length, data_default
                from    user_tab_columns
                where   table_name in ('T1','T2')
        ) loop
                v1 := r.data_default;
                dbms_output.put_line(
                        r.table_name || ' ' ||
                        r.column_name || ' ' ||
                        r.default_length || ' ' ||
                        ascii(substr(v1,r.default_length))
                );
        end loop;
end;
/

T1 T1 23 10
T1 T2 23 10
T1 T3 23 10
T2 T1 22 39
T2 T2 22 39
T2 T3 22 39

The last character of data_default for the t1 table (with length 23) is chr(10) – the line-feed, while the last character for the t2 table (with length 22) is chr(39) – the single-quote character.

The text stored in the data_default column is literally the text you supplied to Oracle (it’s not an expression that is stored and evaluated at table creation time); and the text that’s stored seems to be all the text that Oracle see up to the point where a new token tells it to stop, and in the case of t1 that’s the comma after the line-feed (if you’re running on Windows you might see the length as 24 since DOS uses “carriage return – line feed” compared to the UNIX line-feed only).

Here’s another variant, just to emphasise the point, showing another table declaration and the associated output from the PL/SQL:

create table t3 (
        t1 timestamp default '01-Jan-2021 12:00:00'    
,       t2 timestamp default '01-Jan-2021 12:00:00'     ,       t3 timestamp default '01-Jan-2021 12:00:00'
)
;


T3 T1 23 10
T3 T2 27 32
T3 T3 23 10

In this case there are 5 spaces between the declaration of column t2 and the comma that separates it from the declaration of column t3. As you can see the default length is longer and the last stored byte is chr(32) – the space character

Lagniappe

You could criticise me for not including a format string as part of my definition of the default value, so it should have been something like: to_timestamp(’01-Jan-2021 12:00:00′,’dd-mon-yyyy hh24:mi:ss’) There is, however, one drawback to this – the expression is now 62 characters long (at least), which means the default value won’t be cached in the dictionary cache (v$rowcache)- and this might introduce a parsing overhead that you would prefer to avoid.

Pivot upgrade

I’ve hardly ever touched the pivot/unpivot feature in SQL, but a recent comment by Jason Bucata on a note I’d written about Java names and the effects of newer versions of Oracle allowing longer object and column names prompted me to look at a script I wrote several years ago for 11g.

As Jason pointed out, it’s another case where the output from a script might suffer some cosmetic changes because of an upgrade. Here’s the script to generate some data and run a query:

rem
rem     Script:         pivot_upgrade.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2021
rem
rem     Last tested 
rem             19,3,0,0
rem             12.2.0.1
rem             11.2.0.4
rem

set linesize 144
set trimspool on

create table t1
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                rownum <= 10000
)
select
        rownum                  id,
        rownum                  n1,
        rpad(chr(65 + mod(rownum,3)), 35, 
             chr(65 + mod(rownum,3))
        )                       state,
        mod(rownum,4)           flag,
        lpad(rownum,10,'0')     small_vc
from
        generator       v1,
        generator       v2
where
        rownum <= 10000
/

select  *
from
        (
        select
                flag, state, n1
        from
                t1
        )       piv
        pivot   (
                        avg(n1)
                 for    state in (
                                'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
                                'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB',
                                'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC'
                        )
        )
order by
        flag
;

I’ve hijacked (cloned and hacked) a script I wrote for another little test so don’t read too much into the data that I’ve created and how I’ve created it. All that matters is that I have a column with three distinct values and I want a report that summarises the data across the page according to the value of those three columns.

To be awkward (and demonstrate the point of the blog note), the values in the columns are all 35 character strings – created using rpad(), but reported in the pivot() using the literal string value.

Here’s the result of the query from 12c (in my case 12.2.0.1) onwards:

      FLAG 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB' 'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC'
---------- ------------------------------------- ------------------------------------- -------------------------------------
         0                                  5004                                  5002                                  5000
         1                                  5001                                  4999                                  4997
         2                                  4998                                  5002                                  5000
         3                                  5001                                  4999                                  5003

You’ll notice that the pivoted column heading include the single-quote marks, plus the 35 defining characters. Compare this with the result from 11.2.0.4:

      FLAG 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAA 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBB 'CCCCCCCCCCCCCCCCCCCCCCCCCCCCC
---------- ------------------------------ ------------------------------ ------------------------------
         0                           5004                           5002                           5000
         1                           5001                           4999                           4997
         2                           4998                           5002                           5000
         3                           5001                           4999                           5003

Including the initial single-quote mark the headings are exactly 30 characters long – the historical limit under Oracle’s naming conventions.

So if you’re still using 11g, an upgrade to a more recent version of Oracle could end up forcing you to do a few little adjustments to some of your code simply to ensure column lengths (and subsequent line lengths) don’t change.

Video : JSON_TRANSFORM in Oracle Database 21c

In today’s video we demonstrate the JSON_TRANSFORM function, introduced in Oracle database 21c.

This video is based on the following article, which has a lot more information and examples than the video does.

I’ve written a bunch of stuff on JSON and ORDS, and there’s a YouTube playlist.

The star of today’s video is Tom Kyte. Here are two funny (to me) stories about my interactions with Tom.

On my first visit to OpenWorld in 2006, Tom walked up to me and introduced himself. I was blown away he even knew who I was. It was a massive ego trip. On my next visit to OpenWorld in 2007, I walked up to Tom and said casually, “Hi Tom”. He looked at me, looked down at my conference badge, looked at me again and said, “Hi Tim”. Needless to say, my ego was placed firmly back where it belonged. I still laugh when I think about it now.

At a later OpenWorld I was sitting at a table and someone was fanboying me. Tom came along and asked if he could sit at the same table. I said sure. My fanboy turned and said in a rather snarky tone, “And who are you?” Tom replied, “Hi. I’m Tom Kyte”. Fanboy’s face immediately dropped and at that exact moment in time I became invisible, and fanboy began to fanboy Tom. It was priceless!

Good times!

Cheers

Tim…

The post Video : JSON_TRANSFORM in Oracle Database 21c first appeared on The ORACLE-BASE Blog.


Video : JSON_TRANSFORM in Oracle Database 21c was first posted on February 1, 2021 at 10:54 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Blockchain tables in 21c? A word of warning

As I write this, the moment you do a browser search for “blockchain”, the last thing you will get a match on is the technology elements. Instead, my results are flooded with hedge funds, short selling, Gamestop and bitcoin. A friend has been sending me messages along the lines of “Is it true Satoshi Nakamoto lives in a mountain cave surrounded by computers and wild yaks?”. Yes indeed, the 2021 world looks like following 2020’s footsteps of being a crazy place Smile

Because of this, I have little doubt that with the availability of blockchain tables in the Oracle Database, people are very keen to jump into the technology and start tinkering. Which brings me to this:

STOP…RIGHT…NOW

That’s right. Please do not create a blockchain table until you have read just a little further. It might save you a great deal of heartache.

Blockchain tables first arrived in the 20c preview release, and we call these things preview releases for a reason! It is to give you a glimpse of functionality and also to get feedback from the community in order to firm up the functionality before full production availability. In the first 20c release, when you created a blockchain table, the syntax would look like this:


SQL> create blockchain table bc1 (d date, amt number)
  2      no drop until 16 days idle
  3      no delete until 31 days after insert
  4      hashing using "sha2_512" version v1;

Table created.

Notice that 16 days? That was the minimum setting. Once you created a blockchain table and put even just a single row into it, that table was going to be there for 16 days. If you wanted to drop that table – then you were in for a lot of pain and effort. I did a video on that which you can see below

We took your feedback and adjusted that minimum threshold down to zero days for the production implementation of blockchain tables. Thus if you created a blockchain table with the a NO DROP UNTIL 0 DAYS clause, then you could drop that table as soon as you no longer needed it. Perfect for development and prototyping.

However, this does not prohibit anyone from still specifying a non-zero value, and the moment you do that, the table is there to stay!


SQL> create blockchain table bc1 (d date, amt number)
  2      no drop until 1 days idle
  3      no delete until 31 days after insert
  4      hashing using "sha2_512" version v1;

Table created.

SQL> insert into bc1 values (sysdate,1);

1 row created.

SQL> drop table bc1 purge;
drop table bc1 purge
           *
ERROR at line 1:
ORA-05723: drop blockchain table BC1 not allowed

For a true production blockchain table, that inability to meddle with the table and its data is exactly what you want. But for non-production environments, if you run a continuous integration methodology, or even just drop/recreate tables as part of unit tests, then not being able to drop a table, or even the schema that owns it, or even the tablespace it sits in, might cause you a lot of frustration.

Hence the reason for this blog post. Before you create a single blockchain table in your database, give some serious consideration to the following parameter:


SQL> show parameter max_no_drop

NAME                                 TYPE        VALUE
------------------------------------ ----------- ---------
blockchain_table_max_no_drop         integer     

It sets the upper limit on how long a NO DROP clause can be, and defaults to null. Setting it to zero will ensure that no-one gets to lump a blockchain table into your schema that cannot be dropped.


SQL> alter system set blockchain_table_max_no_drop = 0 scope=both;

System altered.

SQL> create blockchain table bc2 (d date, amt number)
  2      no drop until 1 days idle
  3      no delete until 31 days after insert
  4      hashing using "sha2_512" version v1;
create blockchain table bc2 (d date, amt number)
*
ERROR at line 1:
ORA-05747: maximum retention time too high, should be less than or equal to 0 days

Note that any blockchain table you created before altering the setting are unchanged – you will need to wait for them to expire before you can drop them.

Blockchain tables are robust and tamperproof, but sometimes in your development area, a little bit of tampering is exactly what you need Smile

Footnote: The sacrifices I make for my readers. I’m now waiting until tomorrow so I can drop that BC1 table Smile

Google Cloud SQL Insights: ASH, plans and statement tagging

By Franck Pachot

.
Looking at database performance has always been necessary to optimize the response time or throughput, but when it comes to public cloud where you are charged by resource usage, performance tuning is critical for cost optimization. When looking at host metrics, you see only the symptoms and blindly guess at some solutions: add more vCPU if CPU usage is high, more memory if I/O wait is high. And this can be done automatically with auto-scaling. However, scaling up or out doesn’t solve your problem but only adapts the resources to the usage. In order to know if you actually need those resources or not, you need to drill down on their usage: which queries are responsible for the high CPU or I/O usage, or suffer from pressure on those resources, or are just waiting on something else like background checkpoint, WAL write, application lock,… And look at the execution plan because this is where you can scale down the resource usage by factors of 10 or 100.

Oracle Database has been instrumented for a long time and has proven how the ASH approach is efficient: sampling of active sessions, displaying by Average Active Session on the wait class dimension, on a time axis. The same approach is implemented in AWS where Amazon RDS can provide Performance Insight. And here is the Google Cloud version of it that has been recently released: Cloud SQL Insights.

Database load

Here is an example:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 1024w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 2048w" sizes="(max-width: 2560px) 100vw, 2560px" />
The PostgreSQL wait events are limited but the main ones are displayed here: CPU in green, I/O in blue, and I have an application contention on many sessions, in orange. As you can see here, the retention is 7 days (good to troubleshoot recent issues, not enough to compare with the previous monthly report for example) and the sample frequency seems to be quite low: 1 minute.

Queries

When you have Lock Wait, there’s is something to do: identify the the query waiting. Because this query consumes elapsed time (the user is waiting) without doing any work, and no scaling can help here.
We have a list of queries active during this load samples:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 1024w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 1536w" sizes="(max-width: 2550px) 100vw, 2550px" />
The most important metrics are there, in the order of importance for me:

  • “Times called” is the number of executions during the observed time range. This is very important because you need to understand if this number is expected (given the number of concurrent users) or a mistake in design (row-by-row calls) or ORM mapping (the N+1 problem)
  • “Avg rows returned” is the number of rows per execution. This is very important to estimate the efficiency of the application design. There is a latency for each call, and then you should try to process many rows on each call. If the “Times called” is high, I check the “Avg rows returned”. If it is low, near 1, then there’s probably a row-by-row access to the database, which is the less efficient design. And the load in the database is only the emerged tip of the iceberg in this case. Context switches and network latency are not counted in the “Avg execution time”. Any you can’t imagine the many times I’ve seen a high load, with high number of executions, with 0 rows returned on average. Removing unnecessary work is the first tuning step: do not scale up the instance to run more queries doing nothing. Even if you have an infinite amount of cloud credits, this is not good for earth.
  • Finally “Avg execution time” is the part of the response time the query is responsible of within the database. It is important as a start for a performance analysis, but it requires more information. You must look at the query to understand how it is involved in the response time: called once per user interaction, or many times? Run asynchronously in background, or user waiting on it? And if the time is a significant part that must be improved, you should look at the execution plan to understand what you can improve.

The essentials are there but of course, I miss a few details from here. For Lock Wait, I want to know the blocking transaction. For Queries I want to see not only the top level call (when a query is executed from a procedure – which is the right design for security, agility and performance). And what I miss the most here is the number of buffers read, from shared buffer or from files. Because the execution time can vary with memory or cpu pressure. The real metric for query efficiency is the number of buffers which shows the amount of data read and processed.

Query details

For a query there is some additional details and, when you are lucky and it has been captured, you can see the execution plan in a nice graphical way:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 1024w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 2048w" sizes="(max-width: 2560px) 100vw, 2560px" />
I would have like to show a complex plan but unfortunately all complex queries I’ve run did not have their plan captured.

Query Tags

There’s something really interesting for end-to-end performance tuning. You can tag your queries with specific comments and they can be displayed, aggregated, and filtered:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 1024w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2021/01/Screens... 1536w" sizes="(max-width: 2554px) 100vw, 2554px" />

The tags are defined in the SQL Commenter format for MVC applications: Action, Route, Controller, Framework, Application, DB Driver. And those can be injected in the ORM with sqlcommenter. Or you can mention them with in comments yourself, of course. For example, for the screenshot above I’ve run pgbench with the following file:


\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
/* db_driver='LIBPQ', application='pgbench', framework='--file', controller='TPC-B