Search

Top 60 Oracle Blogs

Recent comments

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

This blog is the third in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Copy Data into Table 

Alternatively, we can copy the data into a normal table. The table needs to be created in advance. This time, I am going to run the copy as user SOE rather than ADMIN.  I need to:

  • Grant connect and resource privilege and quota on the data tablespace.
  • Grant execute on DBMS_CLOUD to SOE, so it can execute the command.
  • Grant READ and WRITE access on the DATA_PUMP_DIR directory – the log and bad files created by this process are written to this database directory.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">connect admin/Password2020!@gofaster1b_tp 
CREATE USER soe IDENTIFIED BY Password2020;
GRANT CONNECT, RESOURCE TO soe;
GRANT EXECUTE ON DBMS_CLOUD TO soe;
GRANT READ, WRITE ON DIRECTORY data_pump_dir TO soe;
ALTER USER soe QUOTA UNLIMITED ON data;

I am now going to switch to user SOE and create my table.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">connect soe/Password2020@gofaster1b_tp
Drop table soe.ash_hist purge;
CREATE TABLE soe.ASH_HIST
( SNAP_ID NUMBER,
DBID NUMBER,
INSTANCE_NUMBER NUMBER,
SAMPLE_ID NUMBER,
SAMPLE_TIME TIMESTAMP (3),
-- SAMPLE_TIME_UTC TIMESTAMP (3),
-- USECS_PER_ROW NUMBER,
SESSION_ID NUMBER,
SESSION_SERIAL# NUMBER,
SESSION_TYPE VARCHAR2(10),
FLAGS NUMBER,
USER_ID NUMBER,
-----------------------------------------
SQL_ID VARCHAR2(13),
IS_SQLID_CURRENT VARCHAR2(1),
SQL_CHILD_NUMBER NUMBER,
SQL_OPCODE NUMBER,
SQL_OPNAME VARCHAR2(64),
FORCE_MATCHING_SIGNATURE NUMBER,
TOP_LEVEL_SQL_ID VARCHAR2(13),
TOP_LEVEL_SQL_OPCODE NUMBER,
SQL_PLAN_HASH_VALUE NUMBER,
SQL_FULL_PLAN_HASH_VALUE NUMBER,
-----------------------------------------
SQL_ADAPTIVE_PLAN_RESOLVED NUMBER,
SQL_PLAN_LINE_ID NUMBER,
SQL_PLAN_OPERATION VARCHAR2(64),
SQL_PLAN_OPTIONS VARCHAR2(64),
SQL_EXEC_ID NUMBER,
SQL_EXEC_START DATE,
PLSQL_ENTRY_OBJECT_ID NUMBER,
PLSQL_ENTRY_SUBPROGRAM_ID NUMBER,
PLSQL_OBJECT_ID NUMBER,
PLSQL_SUBPROGRAM_ID NUMBER,
-----------------------------------------
QC_INSTANCE_ID NUMBER,
QC_SESSION_ID NUMBER,
QC_SESSION_SERIAL# NUMBER,
PX_FLAGS NUMBER,
EVENT VARCHAR2(64),
EVENT_ID NUMBER,
SEQ# NUMBER,
P1TEXT VARCHAR2(64),
P1 NUMBER,
P2TEXT VARCHAR2(64),
-----------------------------------------
P2 NUMBER,
P3TEXT VARCHAR2(64),
P3 NUMBER,
WAIT_CLASS VARCHAR2(64),
WAIT_CLASS_ID NUMBER,
WAIT_TIME NUMBER,
SESSION_STATE VARCHAR2(7),
TIME_WAITED NUMBER,
BLOCKING_SESSION_STATUS VARCHAR2(11),
BLOCKING_SESSION NUMBER,
-----------------------------------------
BLOCKING_SESSION_SERIAL# NUMBER,
BLOCKING_INST_ID NUMBER,
BLOCKING_HANGCHAIN_INFO VARCHAR2(1),
CURRENT_OBJ# NUMBER,
CURRENT_FILE# NUMBER,
CURRENT_BLOCK# NUMBER,
CURRENT_ROW# NUMBER,
TOP_LEVEL_CALL# NUMBER,
TOP_LEVEL_CALL_NAME VARCHAR2(64),
CONSUMER_GROUP_ID NUMBER,
-----------------------------------------
XID RAW(8),
REMOTE_INSTANCE# NUMBER,
TIME_MODEL NUMBER,
IN_CONNECTION_MGMT VARCHAR2(1),
IN_PARSE VARCHAR2(1),
IN_HARD_PARSE VARCHAR2(1),
IN_SQL_EXECUTION VARCHAR2(1),
IN_PLSQL_EXECUTION VARCHAR2(1),
IN_PLSQL_RPC VARCHAR2(1),
IN_PLSQL_COMPILATION VARCHAR2(1),
-----------------------------------------
IN_JAVA_EXECUTION VARCHAR2(1),
IN_BIND VARCHAR2(1),
IN_CURSOR_CLOSE VARCHAR2(1),
IN_SEQUENCE_LOAD VARCHAR2(1),
IN_INMEMORY_QUERY VARCHAR2(1),
IN_INMEMORY_POPULATE VARCHAR2(1),
IN_INMEMORY_PREPOPULATE VARCHAR2(1),
IN_INMEMORY_REPOPULATE VARCHAR2(1),
IN_INMEMORY_TREPOPULATE VARCHAR2(1),
-- IN_TABLESPACE_ENCRYPTION VARCHAR2(1),
CAPTURE_OVERHEAD VARCHAR2(1),
-----------------------------------------
REPLAY_OVERHEAD VARCHAR2(1),
IS_CAPTURED VARCHAR2(1),
IS_REPLAYED VARCHAR2(1),
-- IS_REPLAY_SYNC_TOKEN_HOLDER VARCHAR2(1),
SERVICE_HASH NUMBER,
PROGRAM VARCHAR2(64),
MODULE VARCHAR2(64),
ACTION VARCHAR2(64),
CLIENT_ID VARCHAR2(64),
MACHINE VARCHAR2(64),
PORT NUMBER,
-----------------------------------------
ECID VARCHAR2(64),
DBREPLAY_FILE_ID NUMBER,
DBREPLAY_CALL_COUNTER NUMBER,
TM_DELTA_TIME NUMBER,
TM_DELTA_CPU_TIME NUMBER,
TM_DELTA_DB_TIME NUMBER,
DELTA_TIME NUMBER,
DELTA_READ_IO_REQUESTS NUMBER,
DELTA_WRITE_IO_REQUESTS NUMBER,
DELTA_READ_IO_BYTES NUMBER,
-----------------------------------------
DELTA_WRITE_IO_BYTES NUMBER,
DELTA_INTERCONNECT_IO_BYTES NUMBER,
PGA_ALLOCATED NUMBER,
TEMP_SPACE_ALLOCATED NUMBER,
DBOP_NAME VARCHAR2(64),
DBOP_EXEC_ID NUMBER,
CON_DBID NUMBER,
CON_ID NUMBER,
-----------------------------------------
CONSTRAINT ash_hist_pk PRIMARY KEY (dbid, instance_number, snap_id, sample_id, session_id)
)
COMPRESS FOR QUERY LOW
/

As Autonomous Databases run on Exadata, I have also specified Hybrid Columnar Compression (HCC) for this table.
Credentials are specific to the database user.  I have to create an additional credential, for the same cloud user, but owned by SOE.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">ALTER SESSION SET nls_date_Format='hh24:mi:ss dd.mm.yyyy';
set serveroutput on timi on
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL (
credential_name => 'SOE_BUCKET',
username=> 'oraclecloud1@go-faster.co.uk',
password=> 'K7xfi-mG<1Z:dq#88;1m'
);
END;
/
column owner format a10
column credential_name format a20
column comments format a80
column username format a40
SELECT * FROM dba_credentials;

OWNER CREDENTIAL_NAME USERNAME WINDOWS_DOMAIN
---------- -------------------- ---------------------------------------- ------------------------------
COMMENTS ENABL
-------------------------------------------------------------------------------- -----
ADMIN MY_BUCKET oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"} TRUE

SOE SOE_BUCKET oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"} TRUE

The COPY_DATA procedure is similar to CREATE_EXTERNAL_TABLE described in the previous post, but it doesn't have a column list. The field names much match the column names. It is sensitive to field names with a trailing #.  These must be enclosed in double-quotes.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">TRUNCATE TABLE soe.ash_hist;
DECLARE
l_operation_id NUMBER;
BEGIN
DBMS_CLOUD.COPY_DATA(
table_name =>'ASH_HIST',
credential_name =>'SOE_BUCKET',
file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
schema_name => 'SOE',
format => json_object('blankasnull' value 'true'
,'compression' value 'gzip'
,'dateformat' value 'YYYY-MM-DD/HH24:mi:ss'
,'timestampformat' value 'YYYY-MM-DD/HH24:mi:ss.ff'
,'delimiter' value '<,>'
,'ignoreblanklines' value 'true'
,'rejectlimit' value '10'
,'removequotes' value 'true'
,'trimspaces' value 'lrtrim'
),
field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES
,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID',
operation_id=>l_operation_id
);
dbms_output.put_line('Operation ID:'||l_operation_id||' finished successfully');
EXCEPTION WHEN OTHERS THEN
dbms_output.put_line('Operation ID:'||l_operation_id||' raised an error');
RAISE;
END;
/

The copy data takes slightly longer than the query on the external table.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">Operation ID:31 finished successfully

PL/SQL procedure successfully completed.

Elapsed: 00:02:01.11
The status of the copy operation is reported in USER_LOAD_OPERATIONS.  This includes the number of rows loaded and the names of external tables that are created for the log and bad files.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">set lines 120
column type format a10
column file_uri_list format a64
column start_time format a32
column update_time format a32
column owner_name format a10
column table_name format a10
column partition_name format a10
column subpartition_name format a10
column logfile_table format a15
column badfile_table format a15
column tempext_table format a30
select * from user_load_operations where id = &operation_id;

ID TYPE SID SERIAL# START_TIME UPDATE_TIME STATUS
---------- ---------- ---------- ---------- -------------------------------- -------------------------------- ---------
OWNER_NAME TABLE_NAME PARTITION_ SUBPARTITI FILE_URI_LIST ROWS_LOADED
---------- ---------- ---------- ---------- ---------------------------------------------------------------- -----------
LOGFILE_TABLE BADFILE_TABLE TEMPEXT_TABLE
--------------- --------------- ------------------------------
31 COPY 19965 44088 07-MAY-20 17.03.20.328263 +01:00 07-MAY-20 17.05.36.157680 +01:00 COMPLETED
SOE ASH_HIST https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu 1409305
/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz
COPY$31_LOG COPY$31_BAD COPY$Y2R021UKPJ5F75JCMSKL

An external table is temporarily created by the COPY_DATA procedure but is then dropped before the procedure completes.  The bad file is empty because the copy operation succeeded without error, but we can query the copy log.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">select * from COPY$31_LOG;

RECORD
------------------------------------------------------------------------------------------------------------------------
LOG file opened at 05/07/20 16:03:21

Total Number of Files=1

Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucke...

Log File: COPY$31_105537.log

LOG file opened at 05/07/20 16:03:21

Total Number of Files=1

Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucke...

Log File: COPY$31_105537.log

LOG file opened at 05/07/20 16:03:21

KUP-05014: Warning: Intra source concurrency disabled because the URLs specified for the Cloud Service map to compressed data.

Bad File: COPY$31_105537.bad

Field Definitions for table COPY$Y2R021UKPJ5F75JCMSKL
Record format DELIMITED BY
Data in file has same endianness as the platform
Rows with all null fields are accepted
Table level NULLIF (Field = BLANKS)
Fields in Data Source:

SNAP_ID CHAR (255)
Terminated by "<,>"
Trim whitespace from left and right
DBID CHAR (255)
Terminated by "<,>"
Trim whitespace from left and right
INSTANCE_NUMBER CHAR (255)
Terminated by "<,>"
Trim whitespace from left and right
SAMPLE_ID CHAR (255)
Terminated by "<,>"
Trim whitespace from left and right
SAMPLE_TIME CHAR (255)
Date datatype TIMESTAMP, date mask YYYY-MM-DD/HH24:mi:ss.ff
Terminated by "<,>"
Trim whitespace from left and right

CON_ID CHAR (255)
Terminated by "<,>"
Trim whitespace from left and right

Date Cache Statistics for table COPY$Y2R021UKPJ5F75JCMSKL
Date conversion cache disabled due to overflow (default size: 1000)

365 rows selected.
These files are written to the DATA_DUMP_DIR database directory.  We don't have access to the database file system in Autonomous, so Oracle has provided the LIST_FILES procedure in DBMS_CLOUD so that we can see what files are in a directory.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">Set pages 99 lines 150
Column object_name format a32
Column created format a32
Column last_modified format a32
Column checksum format a20
SELECT * FROM DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR');

OBJECT_NAME BYTES CHECKSUM CREATED LAST_MODIFIED
-------------------------------- ---------- -------------------- -------------------------------- --------------------------------

COPY$31_dflt.log 0 07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_dflt.bad 0 07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_105537.log 13591 07-MAY-20 16.03.21.000000 +00:00 07-MAY-20 16.05.35.000000 +00:00

Statistics are automatically collected on the table by the copy process because it was done in direct-path mode.  We can see the number of rows retrieved corresponds with the number of rows imported by the COPY_DATA procedure.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">Set pages 99 lines 140
Column owner format a10
Column IM_STAT_UPDATE_TIME format a30
Select *
from all_tab_statistics
Where table_name = 'ASH_HIST';

OWNER TABLE_NAME PARTITION_ PARTITION_POSITION SUBPARTITI SUBPARTITION_POSITION OBJECT_TYPE NUM_ROWS BLOCKS EMPTY_BLOCKS
---------- ---------- ---------- ------------------ ---------- --------------------- ------------ ---------- ---------- ------------
AVG_SPACE CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO IM_IMCU_COUNT
---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- -------------
IM_BLOCK_COUNT IM_STAT_UPDATE_TIME SCAN_RATE SAMPLE_SIZE LAST_ANALYZED GLO USE STATT STALE_S SCOPE
-------------- ------------------------------ ---------- ----------- ------------------- --- --- ----- ------- -------
SOE ASH_HIST TABLE 1409305 19426 0
0 0 486 0 0
1409305 15:16:14 07.05.2020 YES NO NO SHARED

I can confirm that the data is compressed because the compression type of every row is type 8 (HCC QUERY LOW).  See also DBMS_COMPRESSION Compression Types
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">WITH x AS (
select dbms_compression.get_compression_type('SOE', 'ASH_HIST', rowid) ctype
from soe.ash_hist sample (.1))
Select ctype, count(*) From x group by ctype;

CTYPE COUNT(*)
---------- ----------
8 14097
I can find this SQL Statement in the Performance Hub. 
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">INSERT /*+ append enable_parallel_dml */ INTO "SOE"."ASH_HIST" SELECT * FROM COPY$Y2R021UKPJ5F75JCMSKL

Therefore, the data was queried from the temporary external table into the permanent table, in direct path mode and in parallel.
I can also look at the OCI Performance Hub and see that mode of the time was spent on CPU.  I can see the SQL_ID of the insert statement and the call to the DBMS_CLOUD procedure.

I can drill in further to the exact SQL statement.

When I query the table I get exactly the same data as previously with the external table.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">set autotrace on timi on lines 180 trimspool on
break on report
compute sum of ash_secs on report
column min(sample_time) format a22
column max(sample_time) format a22
select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
from soe.ash_hist
group by event
order by ash_Secs desc
;

EVENT ASH_SECS MIN(SAMPLE_TIME) MAX(SAMPLE_TIME)
---------------------------------------------------------------- ---------- ---------------------- ----------------------
10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
direct path read 3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
SQL*Net more data to client 269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
direct path write temp 32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
gc cr block busy 24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804

latch free 10 28-MAR-20 20.26.11.307 28-MAR-20 20.26.11.307
----------
sum 14093050

86 rows selected.

Elapsed: 00:00:00.62

I can see that the execution plan is now a single serial full scan of the table.
#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">Execution Plan
----------------------------------------------------------
Plan hash value: 1336681691

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 84 | 1428 | 1848 (9)| 00:00:01 |
| 1 | SORT ORDER BY | | 84 | 1428 | 1848 (9)| 00:00:01 |
| 2 | HASH GROUP BY | | 84 | 1428 | 1848 (9)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL| ASH_HIST | 1409K| 22M| 1753 (4)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
11 recursive calls
13 db block gets
19255 consistent gets
19247 physical reads
2436 redo size
5428 bytes sent via SQL*Net to client
602 bytes received via SQL*Net from client
7 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
86 rows processed