Search

Top 60 Oracle Blogs

Recent comments

Retrofitting Partitioning into Existing Applications: Example 3. Workflow: Separate Active and Inactive Rows, and Partial Indexing.

This post is part of a series about the partitioning of database objects.

Workflow

Workflow is an example of a case where you have a roughly constant volume of active data and an increasing quality of historical inactive data that builds up until such time as it is archived.  Workflow requests are created, worked and closed.  

The PeopleSoft workflow table has four statuses.

INSTSTATUS Description
0 Available
1 Selected
2 Worked
3 Cancelled

Over time the majority of rows in the table end up with status 2 as they are worked and closed, and a few end up being cancelled. These rows are now inactive. All the workflow activity focuses on statuses 0 and 1. Partitioning can be used to separate the active rows from the inactive. I chose to range partition the worklist table by status, creating a partition of active worklist rows where the status is less than 2, and a partition of inactive rows where status is greater than or equal to 2. I could also have used list partitioning to create the same effect.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">CREATE TABLE PSWORKLIST (
BUSPROCNAME VARCHAR2(30) NOT NULL,
ACTIVITYNAME VARCHAR2(30) NOT NULL,
EVENTNAME VARCHAR2(30) NOT NULL,
WORKLISTNAME VARCHAR2(30) NOT NULL,

OPRID VARCHAR2(30) NOT NULL,

INSTSTATUS SMALLINT NOT NULL,

)
PARTITION BY RANGE (INSTSTATUS)
(
PARTITION WL_OPEN VALUES LESS THAN (2) PCTFREE 20,
PARTITION WL_CLOSED VALUES LESS THAN (MAXVALUE) PCTFREE 0
)
ENABLE ROW MOVEMENT
/

Operators query their worklist queue by their operator ID and the open request status; therefore there is an index to support this query. This index can be locally partitioned, i.e. on INSTSTATUS. 

The optimizer prunes the partition containing closed worklist requests because it knows the open requests can't be found there, and only queries the open partition. 
The open partition remains small because as worklist rows are updated to the closed status they are moved to the closed partition. Therefore, row movement must be enabled on the table. Thus, queries for open worklist requests remain small more efficient. 
There is an additional overhead of moving the rows between partitions as the status is updated to closed, but this is outweighed by the savings of only looking for open records in the open partition. 
Additional free space is specified on the open partition because that is where all the application update activity occurs. Conversely, no free space is required for the closed partition because after the rows move there, they are not updated until they are purged. 
From Oracle 12c, it is also possible to partially index a partitioned table. You can choose to build specific partitions in a local index by marking indexing on or off on the table partitions. In this example, it is only necessary to index the open workflow records. The application will never query the closed ones by operator ID, so indexing can be disabled on the closed partition.  Thus saving space and index management overhead.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">ALTER TABLE PSWORKLIST MODIFY PARTITION WL_OPEN INDEXING ON;
ALTER TABLE PSWORKLIST MODIFY PARTITION WL_CLOSED INDEXING OFF;

CREATE INDEX PSBPSWORKLIST ON PSWORKLIST (OPRID, INSTSTATUS)
LOCAL
INDEXING PARTIAL
/

Here is my worklist table with two partitions, and some sample data.  You can see over 90% of the rows are in the closed partition.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">SELECT table_name, partition_name, num_rows, blocks
FROM dba_tab_statistics
WHERE table_name = 'PSWORKLIST'
ORDER BY partition_name nulls first
/
TABLE_NAME PARTITION_NAME NUM_ROWS BLOCKS
------------------ ------------------------------ ---------- ----------
PSWORKLIST 100000 2711
PSWORKLIST WL_CLOSED 90742 2430
PSWORKLIST WL_OPEN 9258 281

There is a notional entry in DBA_IND_STATISTICS for the index on the closed partition, but it says that it holds no rows and consumes no blocks. The index-level statistics for index PSBPSWORKLIST are an estimate of the total for the index if all partitions were indexed (although in fact, the B-tree level would actually still have been 1 if I had built the index in my test case).

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">SELECT index_name, partition_name, num_rows, blevel, leaf_blocks
FROM dba_ind_statistics
WHERE table_name = 'PSWORKLIST'
ORDER BY index_name, partition_name nulls first
/
INDEX_NAME PARTITION_NAME NUM_ROWS BLEVEL LEAF_BLOCKS
------------------ ------------------------------ ---------- ---------- -----------
PSBPSWORKLIST 100000 2 318
WL_CLOSED 0 0 0
WL_OPEN 9258 1 27

PS_PSWORKLIST 100000 2 814

The index segment for the closed partition does not physically exist, so it is not reported in DBA_SEGMENTS.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">SELECT segment_type, segment_name, partition_name, blocks
FROM dba_segments
WHERE segment_name like 'PS_PSWORKLIST'
ORDER BY segment_name
/
SEGMENT_TYPE SEGMENT_NAME PARTITION_NAME BLOCKS
------------------ ------------------------------ ------------------------------ ----------
INDEX PARTITION PSBPSWORKLIST WL_OPEN 128
INDEX PS_PSWORKLIST 896

When active working requests are queried, the index may be used.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus IN(1);

Plan hash value: 3105966310
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 1953 | 11 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE | | 9 | 1953 | 11 (0)| 00:00:01 | 1 | 1 |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PSWORKLIST | 9 | 1953 | 11 (0)| 00:00:01 | 1 | 1 |
|* 3 | INDEX RANGE SCAN | PSBPSWORKLIST | 9 | | 1 (0)| 00:00:01 | 1 | 1 |
----------------------------------------------------------------------------------------------------------------------------

However, the optimizer may still judge that it is easier to full scan the small table partition.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus IN(0,1);

Plan hash value: 1913856494
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 1953 | 86 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE INLIST| | 9 | 1953 | 86 (0)| 00:00:01 |KEY(I) |KEY(I) |
|* 2 | TABLE ACCESS FULL | PSWORKLIST | 9 | 1953 | 86 (0)| 00:00:01 |KEY(I) |KEY(I) |
-----------------------------------------------------------------------------------------------------

A query on closed requests can only full scan the unindexed partition.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus = 2;

Plan hash value: 597831193
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 86 | 18662 | 718 (1)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 86 | 18662 | 718 (1)| 00:00:01 | 2 | 2 |
|* 2 | TABLE ACCESS FULL | PSWORKLIST | 86 | 18662 | 718 (1)| 00:00:01 | 2 | 2 |
-----------------------------------------------------------------------------------------------------

A query across both partitions may choose to use the index where it is available and full scan where it is not.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">SELECT * FROM psworklist WHERE oprid = 'OPRID042';

Plan hash value: 3927567812
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 101 | 21917 | 730 (1)| 00:00:01 | | |
| 1 | VIEW | VW_TE_2 | 101 | 58580 | 730 (1)| 00:00:01 | | |
| 2 | UNION-ALL | | | | | | | |
| 3 | PARTITION RANGE SINGLE | | 10 | 2170 | 12 (0)| 00:00:01 | 1 | 1 |
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PSWORKLIST | 10 | 2170 | 12 (0)| 00:00:01 | 1 | 1 |
|* 5 | INDEX RANGE SCAN | PSBPSWORKLIST | 10 | | 2 (0)| 00:00:01 | 1 | 1 |
| 6 | PARTITION RANGE SINGLE | | 91 | 19747 | 718 (1)| 00:00:01 | 2 | 2 |
|* 7 | TABLE ACCESS FULL | PSWORKLIST | 91 | 19747 | 718 (1)| 00:00:01 | 2 | 2 |
------------------------------------------------------------------------------------------------------------------------------

In this case, I cannot partition the unique index because the partitioning column does not appear in it. So that must remain a global non-partitioned index.

#eeeeee; font-family: "courier new"overflow: auto; line-height: 95%; width: 95%;">CREATE UNIQUE  INDEX PS_PSWORKLIST ON PSWORKLIST (BUSPROCNAME,
ACTIVITYNAME,
EVENTNAME,
WORKLISTNAME,
INSTANCEID)
/

Conclusion

  • Make sure you understand what your application is doing. 
  • Match the partitioning to the way the application accesses data so that the application queries prune partitions. Even if that means that it is harder to archive data. 
  • If you are not getting partition elimination, you probably should not be partitioning. 
  • Range and list partitioning keep similar data values together, so it follows that dissimilar data values are kept apart in different segments. That can avoid I/O during scans, but if it keeps transactions apart it can also avoid read consistency. 
  • Hash partitioning spreads data out across segments and can be used to avoid some forms of contention.
  • Partitioning can separate data with different usage profiles, such as active rows from inactive rows. They might then have different indexing requirements. 
  • Partial indexing of partitioned tables allows you to choose which partitions should be built in a locally partitioned index.