Search

Top 60 Oracle Blogs

Recent comments

Little things worth knowing: parallel Data Pump export in table mode

I haven’t used Data Pump in a little while but recently needed to do a bit of work involving this very useful utility to export a single table. I know that it is possible to export data in parallel using expdp, but I can’t recall the syntax for doing so off the top of my head when I need it. This post describes a potential approach to exporting a table in parallel. In the next post I will demonstrate an interesting case where using parallelism didn’t help me speed up the export. All of this was tested on 12.2 and 18.4.0, the examples I am sharing originate from my 18.4.0 single instance database (without ASM) running on Linux.

The setup

My lab environment is a bit limited when it comes to storage, so I’ll have to do with small-ish tables. The basic principles should still apply for larger segments though. Please note that my tables aren’t partitioned to keep the individual segment size as large as possible. 

My data is once more based on Swingbench’s order entry schema. The table I’ll use in my first example is the well-known ORDER_ITEMS table:

$ sql soe

SQLcl: Release 17.3.0 Production on Wed Dec 12 18:12:00 2018

Copyright (c) 1982, 2018, Oracle. All rights reserved.

Password? (*******?) *********
Last Successful login time: Wed Dec 12 2018 18:12:01 +01:00

Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.4.0.0.0

SQL> info soe.order_items
TABLE: ORDER_ITEMS
LAST ANALYZED:2018-12-11 22:19:40.0
ROWS :68442528
SAMPLE SIZE :68442528
INMEMORY :DISABLED
COMMENTS :

Columns
NAME DATA TYPE NULL DEFAULT COMMENTS
*LINE_ITEM_ID NUMBER(3,0) Yes
*ORDER_ID NUMBER(12,0) Yes
PRODUCT_ID NUMBER(6,0) Yes
UNIT_PRICE NUMBER(8,2) Yes
QUANTITY NUMBER(8,0) Yes
DISPATCH_DATE DATE Yes
RETURN_DATE DATE Yes
GIFT_WRAP VARCHAR2(20 BYTE) Yes
CONDITION VARCHAR2(20 BYTE) Yes
SUPPLIER_ID NUMBER(6,0) Yes
ESTIMATED_DELIVERY DATE Yes

Indexes
INDEX_NAME UNIQUENESS STATUS FUNCIDX_STATUS COLUMNS
SOE.ITEM_ORDER_IX NONUNIQUE VALID ORDER_ID
SOE.ORDER_ITEMS_PK UNIQUE VALID ORDER_ID, LINE_ITEM_ID
SOE.ITEM_PRODUCT_IX NONUNIQUE VALID PRODUCT_ID

Running an export without any parallelism

I’ll use this example as the baseline, exporting the table without parallelism. This example is quite basic, and you will probably have to adjust it to suit your needs. I am assuming basic familiarity with the Data Pump toolset, if not, head over to the documentation and read up on the concepts. The non-parallel expdp call is shown here:

$ expdp martin directory=exptest logfile=exp_order_items_noparallel.log \
> dumpfile=exp_order_items_noparallel.dmp tables=soe.order_items

[...]

Export: Release 18.0.0.0.0 - Production on Wed Dec 12 19:00:38 2018
Version 18.4.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Starting "MARTIN"."SYS_EXPORT_TABLE_01": martin
directory=exptest logfile=exp_order_items_noparallel.log
dumpfile=exp_order_items_noparallel.dmp tables=soe.order_items
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/INDEX/INDEX
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
. . exported "SOE"."ORDER_ITEMS" 3.601 GB 68442528 rows
Master table "MARTIN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
***************************************************************************
Dump file set for MARTIN.SYS_EXPORT_TABLE_01 is:
/u01/data_pump/ORCL/exp_order_items_noparallel.dmp
Job "MARTIN"."SYS_EXPORT_TABLE_01" successfully completed at Wed Dec 12 19:01:22 2018 elapsed 0 00:00:43

So it takes about 43 seconds for expdp to create the file.

Adding parallelism

The next attempt I made was to introduce parallelism. For expdp to actually use parallelism in the first place, you need to have multiple dump files to write to. This is accomplished by using the %U placeholder in the file name, as shown here:

$ expdp martin directory=exptest logfile=exp_order_items.log \
> dumpfile=exp_order_items_%U.dmp parallel=4 tables=soe.order_items

Export: Release 18.0.0.0.0 - Production on Wed Dec 12 19:56:42 2018
Version 18.4.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Starting "MARTIN"."SYS_EXPORT_TABLE_01": martin directory=exptest
logfile=exp_order_items.log dumpfile=exp_order_items_%U.dmp parallel=4
tables=soe.order_items
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/INDEX/INDEX
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
. . exported "SOE"."ORDER_ITEMS" 3.601 GB 68442528 rows
Master table "MARTIN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded

Dump file set for MARTIN.SYS_EXPORT_TABLE_01 is:
/u01/data_pump/ORCL/exp_order_items_01.dmp
/u01/data_pump/ORCL/exp_order_items_02.dmp
/u01/data_pump/ORCL/exp_order_items_03.dmp
/u01/data_pump/ORCL/exp_order_items_04.dmp
/u01/data_pump/ORCL/exp_order_items_05.dmp
Job "MARTIN"."SYS_EXPORT_TABLE_01" successfully completed at Wed Dec 12 19:57:02 2018 elapsed 0 00:00:20

So this isn’t too bad: using parallel 4 I managed to cut the time it takes to export the table in half. If you read the Data Pump white paper you can see that the job of the worker processes is not just the data unloading, they also have to create metadata for the object currently exported. I supposed this is the reason why there are 5 dump files. You can see they aren’t all the same size:

$ ls -lh /u01/data_pump/ORCL/exp_order_items_0[12345].dmp
-rw-r----- 1 oracle oinstall 1.2G Dec 12 19:57 /u01/data_pump/ORCL/exp_order_items_01.dmp
-rw-r----- 1 oracle oinstall 16K Dec 12 19:56 /u01/data_pump/ORCL/exp_order_items_02.dmp
-rw-r----- 1 oracle oinstall 192K Dec 12 19:57 /u01/data_pump/ORCL/exp_order_items_03.dmp
-rw-r----- 1 oracle oinstall 1.3G Dec 12 19:57 /u01/data_pump/ORCL/exp_order_items_04.dmp
-rw-r----- 1 oracle oinstall 1.3G Dec 12 19:57 /u01/data_pump/ORCL/exp_order_items_05.dmp

What I really like about the Data Pump API is the ability to attach to a job and see what it’s doing. While the export was running, I had a quick look at it:

$ expdp martin attach=SYS_EXPORT_TABLE_01

Export: Release 18.0.0.0.0 - Production on Wed Dec 12 19:56:54 2018
Version 18.4.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates. All rights reserved.
[...]
Connected to: Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production

Job: SYS_EXPORT_TABLE_01
Owner: MARTIN
Operation: EXPORT
Creator Privs: TRUE
GUID: 7CD567F0F8D057DDE0530164A8C0392F
Start Time: Wednesday, 12 December, 2018 19:56:43
Mode: TABLE
Instance: ORCL
Max Parallelism: 4
Timezone: +02:00
Timezone version: 31
Endianness: LITTLE
NLS character set: AL32UTF8
NLS NCHAR character set: AL16UTF16
EXPORT Job Parameters:
Parameter Name Parameter Value:
CLIENT_COMMAND martin directory=exptest
logfile=exp_order_items.log dumpfile=exp_order_items_%U.dmp
parallel=4 tables=soe.order_items
TRACE 0
State: EXECUTING
Bytes Processed: 0
Current Parallelism: 4
Job Error Count: 0
Job heartbeat: 5
Dump File: /u01/data_pump/ORCL/exp_order_items_%u.dmp
Dump File: /u01/data_pump/ORCL/exp_order_items_01.dmp
bytes written: 8,192
Dump File: /u01/data_pump/ORCL/exp_order_items_02.dmp
bytes written: 16,384
Dump File: /u01/data_pump/ORCL/exp_order_items_03.dmp
bytes written: 16,384
Dump File: /u01/data_pump/ORCL/exp_order_items_04.dmp
bytes written: 4,096
Dump File: /u01/data_pump/ORCL/exp_order_items_05.dmp
bytes written: 4,096

Worker 1 Status:
Instance ID: 1
Instance name: ORCL
Host name: server1
Object start time: Wednesday, 12 December, 2018 19:56:46
Object status at: Wednesday, 12 December, 2018 19:56:47
Process Name: DW00
State: WORK WAITING

Worker 2 Status:
Instance ID: 1
Instance name: ORCL
Host name: server1
Object start time: Wednesday, 00 Sat, 0000 0:00:00
Object status at: Wednesday, 12 December, 2018 15
Process Name: DW01
State: WORK WAITING

Worker 3 Status:
Instance ID: 1
Instance name: ORCL
Host name: server1
Object start time: Wednesday, 12 December, 2018 19:56:47
Object status at: Wednesday, 12 December, 2018 19:56:48
Process Name: DW02
State: EXECUTING
Object Schema: SOE
Object Name: ORDER_ITEMS
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 1
Completed Rows: 33,417,535
Worker Parallelism: 3

Worker 4 Status:
Instance ID: 1
Instance name: ORCL
Host name: server1
Access method: direct_path
Object start time: Wednesday, 12 December, 2018 19:56:45
Object status at: Wednesday, 12 December, 2018 19:56:49
Process Name: DW03
State: WORK WAITING

The output is quite interesting. It confirms that I’m running with parallel 4 and it also shows that only 1 file is really being written to by worker 3. It does so in parallel. If you recall from the ls command earlier, there were 3 files of roughly 1.3 GB each. They surely have been written to by the 3rd worker process.

Summary

I have been able to export a table in parallel using Data Pump, and by doing so I cut the time it took to export the table in half. This is a great feature which I am certainly going to use regularly.