Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Oracle 12c – global partial index

By Franck Pachot

.
We have an incredible number of possibilities with Oracle. Yes, an index can be global (indexing many partitions without having to be partitioned itself on the same key) and partial (skipping some of the table partitions where we don’t need indexing). In the previous post of this series of small examples on recent features I partitioned a table, with covid-19 cases per day and per country, partitioned on range of date by interval. The index on the country code (GEOID) was not very efficient for data ingested per day, because countries are scattered through all the table. And then I have reorganized the old partitions to cluster them on countries.

My global index on country code is defined as:


SQL> create index covid_geoid on covid(geoid);

Index created.

This is efficient, thanks to clustering, except for the new rows coming again in time order. As those go to a new partition that is small (the idea in the post was to have short time range for the current partition, and larger ones for the old, using the ALTER TABLE … MERGE ONLINE to merge the newly old one to the others). For the current partition only, it is preferable to full scan this last partition. And even avoid maintaining the index entries for this partition as this will accelerate data ingestion.

I think that partial indexing is well known for local indexes, as this is like marking some index partitions as unusable. But here I’m showing it on a global index.

Splitting partitions

In order to continue from the previous previous post where I merged all partitions, I’ll split them again, and this can be an online operation in 12cR2:


SQL> alter table covid split partition oldmerged at (date '2020-04-01') into (partition old, partition new) online;

Table altered.

SQL> alter index COVID_GEOID coalesce cleanup;

Index altered.

I have two partitions, “old” and “new”, and a global index. I also cleaned up the orphaned index entries to get clean execution plans. And it has to be done anyway.

Here is my query, using the index:


SQL> explain plan for select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                              PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________________________
Plan hash value: 2816502185

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                               |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                              |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |   160 |  2400 |     4   (0)| 00:00:01 | ROWID | ROWID |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

This goes to all partitions, as the ROWID in a global index carries the partition information through the data object id. We see that with Pstart/Pstop=ROWID.

Partial indexing

Now I want to set my global index on countries to be a partial index:


SQL> alter index covid_geoid indexing partial;

Index altered.

This doesnt change anything for the moment. The indexing of partitions will depend on the partition attributes which is by default INDEXING ON.

I set the “new” partition to not maintain indexes (INDEXING OFF), for this partition only.


SQL> alter table covid modify partition new indexing off;

Table altered.

This means that partial indexes will not reference the “new” partition. Whether they are local (which then means no index partition) or global (which then means no index entries for this partition).

And that’s all. Now there will be no overhead in maintaining this index when ingesting new data in this partition.

Table Expansion

And then, the optimizer has a transformation to split the execution plan in two branches: one for the index access and one without. This transformation was introduced in 11g for unusable local partitions and is now used even with global indexes. :


SQL> explain plan for /*+ index(covid) */ select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                                PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________________________
Plan hash value: 1031592504

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                                 |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                                |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   3 |    VIEW                                        | VW_TE_2     |   321 |  7062 |    35   (0)| 00:00:01 |       |       |
|   4 |     UNION-ALL                                  |             |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |    93 |  1395 |     4   (0)| 00:00:01 |     1 |     1 |
|*  6 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE SINGLE                    |             |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|*  8 |       TABLE ACCESS FULL                        | COVID       |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|   9 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |   160 |  4320 |     4   (0)| 00:00:01 | ROWID | ROWID |
|* 10 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("COVID"."DATEREP"=TO_DATE(' 2020-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "COVID"."DATEREP"

The TABLE ACCESS BY GLOBAL INDEX ROWID is for partition 1 as mentioned by Pstart/Pstop, which is the “old” one with INDEXING ON. The TABLE ACCESS FULL is for partition 2, the “new” one, that has INDEXING OFF. The optimizer uses predicates on the partition key to select the branch safely.

But this plan has also an additional branch and this TBL$OR$IDX$PART$NUM again because I have interval partitioning. With interval partitioning, there is no known Pstop, it then it has handle the cases where a new partition has been created (with indexing on). Then, the third branch can access by index ROWID for the partitions that are not hardcoded in this plan.

Let’s remove interval partitioning just to get the plan easier to read:


SQL> alter table covid set interval();

Table altered.


SQL> explain plan for select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                                PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________________________
Plan hash value: 3529087922

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                                 |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                                |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   3 |    VIEW                                        | VW_TE_2     |   161 |  3542 |    33   (0)| 00:00:01 |       |       |
|   4 |     UNION-ALL                                  |             |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |    93 |  1395 |     6   (0)| 00:00:01 |     1 |     1 |
|*  6 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE SINGLE                    |             |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|*  8 |       TABLE ACCESS FULL                        | COVID       |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("COVID"."DATEREP"

Here it is clear: access by index to the partition 1 and full table scan for partition 2. This is exactly what I wanted because I know the clustering factor on the new partition is not very good until I reorganize it (move or merge as I did in the previous post).

All these features help to manage the lifecycle of data. That’s a completely different approach from purpose-built databases where you have one database service for fast ingest with simple queries on recent data (NoSQL folks may think about DynamoDB for that), then streaming data to a relational database for more OLTP queries (RDS to continue with the AWS analogy), and move old data into a database dedicated to analytics (that could be Redshift then). With Oracle, which has always been a multi-purpose database, the goal is to avoid duplication and replication and manage data in-place for all usage. Through the 40 years of this database engine, many approaches have been implemented to cluster data: CLUSTER and IOT can sort (or hash) data as soon as it is inserted, in order to put them at their optimal place for future queries. But the agility of heap tables finally wins. Now, with the ease of in-database data movement (partitioning and online operations) and improvement of full scan (multiblock reads, direct-path reads, storage indexes) we can get the best of both: heap tables with few indexes for fast ingest of current data, reorganize regularly to be clustered, with additional indexes.

I mentioned NoSQL and I mentioned fast ingest. Actually, there’s a feature called Fast Ingest for IoT (lowercase ‘o’ there) that goes further with this idea. Instead of inserting into a persistent segment and reorganize later, rows are buffered in a ‘memoptimized rowstore’ before going to the heap segment in bulk. But that’s an Exadata feature and I like to think about Oracle as a multiplatform database.

Cet article Oracle 12c – global partial index est apparu en premier sur Blog dbi services.

Oracle 12c – reorg and split table with clustering

By Franck Pachot

.
In this series of small examples on recent features, I have imported in a previous post, the statistics of covid-19 per day and per countries. This is typical of data that comes as a time-series ordered by date, because this is how it is generated day after day, but where you probably want to query from another dimension, like per countries.

If you want to ingest data faster, you keep it in the order of arrival, and insert it in heap table blocks. If you want to optimize for the future queries on the other dimension, you may load it in a table with a specialized organization where each row has its place: an Index Organized Table, a Hash Cluster, a partitioned table, or a combination of those. With Oracle we are used to storing data without the need to reorganize it. It is a multi-purpose database. But in 12c we have many features that make this reorganization easier, like partitioning, online move and online split. We can then think about a two-phase lifecycle for some operational tables that are used later for analytics:

  • Fast ingest and query on short time window: we insert data on the flow, with conventional inserts, into a conventional heap table. Queries on recent data is fast as the rows are colocated as they arrived.
  • Optimal query on history: regularly we reorganize physically the latest ingested rows, to be clustered on another dimension, because we will query for a large time range on this other dimension

Partitioning is the way to do those operations. We can have a weekly partition for the current week. When the week is over new rows will go to a new partition (11g PARTITION BY RANGE … INTERVAL) and we can optionally merge the old partition with the one containing old data, per month or year for example, to get larger time ranges for the past data. This merge is easy (18c MERGE PARTITIONS … ONLINE). And while doing that we can reorganize rows to be clustered together. This is what I’m doing in this post.

Partitioning

From the table, I have created in the previous post I create an index on GEOID (as the goal is to query by countries) and I partition it by range on DATEREP:


SQL> create index covid_geoid on covid(geoid);

Index created.

SQL> alter table covid modify partition by range(daterep) interval (numToYMinterval(1,'year')) ( partition old values less than (date '2020-01-01') , partition new values less than (date '2021-01-01') ) online;

Table altered.

This is an online operation in 12cR2. So I have two partitions, one for “old” data and one for “new” data.

I query all dates for one specific country:


SQL> select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1
  2  /
   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

This reads rows scattered through the whole table because they were inserted day after day.

This is visible in the execution plan: the optimizer does not use the index but a full table scan:


SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                       PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________
SQL_ID  2nyu7m59d7spv, child number 0
-------------------------------------
select trunc(daterep,'mon'), max(cases) from covid where geoid='US'
group by trunc(daterep,'mon') order by 1

Plan hash value: 4091160977

-----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |    55 (100)|      7 |00:00:00.01 |     180 |
|   1 |  SORT ORDER BY       |       |      1 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|   2 |   PARTITION RANGE ALL|       |      1 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|   3 |    HASH GROUP BY     |       |      2 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|*  4 |     TABLE ACCESS FULL| COVID |      2 |    105 |    53   (0)|    160 |00:00:00.01 |     180 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("GEOID"='US')

This has read 180 blocks, with multiblock reads.

I force the access by index in order to compare the cost:


SQL> select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1
  2  /

   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                                                     PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________________________
SQL_ID  2whykac7cnjks, child number 0
-------------------------------------
select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid
where geoid='US' group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |    95 (100)|      7 |00:00:00.01 |     125 |
|   1 |  SORT ORDER BY                               |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |     125 |
|   2 |   HASH GROUP BY                              |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |     125 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    105 |    93   (0)|    160 |00:00:00.01 |     125 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    105 |     1   (0)|    160 |00:00:00.01 |       2 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

Even if the number of blocks is a bit smaller, 125 blocks, they are single block reads and then the cost is higher: 95 for index access when the full table scan was 55. Using hints and comparing the cost is how I often try to understand the optimizer choice and here the reason is clear: because rows are scattered, the clustering factor of the index access is really bad.

I said that I want to merge the partitions. And maybe reorg with an online table move. But now, for this second phase of the lifecycle, I want to cluster rows on the country dimension rather than on arrival date.

Attribute clustering

This preference can be declared on the table with 12c Attribute Clustering:


SQL> alter table covid add clustering by linear order (continentexp, countriesandterritories);

Table altered.

You see that I can mention multiple columns and I don’t need to use the GEOID column that I will use to query. This is not an index. This just a preference to cluster rows and, if they are clustered on the country name, they will be also clustered on continent, country code, geoid,… I have chosen those columns for clarity when reading the DDL:


SQL> exec dbms_metadata.set_transform_param(DBMS_METADATA.SESSION_TRANSFORM,'SEGMENT_ATTRIBUTES',false);

PL/SQL procedure successfully completed.

SQL> ddl covid

  CREATE TABLE "COVID"
   (    "DATEREP" DATE,
        "N_DAY" NUMBER,
        "N_MONTH" NUMBER,
        "N_YEAR" NUMBER,
        "CASES" NUMBER,
        "DEATHS" NUMBER,
        "COUNTRIESANDTERRITORIES" VARCHAR2(50),
        "GEOID" VARCHAR2(10),
        "COUNTRYTERRITORYCODE" VARCHAR2(3),
        "POPDATA2018" NUMBER,
        "CONTINENTEXP" VARCHAR2(10)
   )
 CLUSTERING
 BY LINEAR ORDER ("COVID"."CONTINENTEXP",
  "COVID"."COUNTRIESANDTERRITORIES")
   YES ON LOAD  YES ON DATA MOVEMENT
 WITHOUT MATERIALIZED ZONEMAP
  PARTITION BY RANGE ("DATEREP") INTERVAL (NUMTOYMINTERVAL(1,'YEAR'))
 (PARTITION "OLD"  VALUES LESS THAN (TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
 PARTITION "NEW"  VALUES LESS THAN (TO_DATE(' 2021-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ) ;

  CREATE INDEX "COVID_GEOID" ON "COVID" ("GEOID")
  ;

As you can see the default is YES for ON LOAD which means that direct-path inserts will cluster rows, and ON DATA MOVEMENT is also YES which is why merging partitions will also cluster rows.

I’ve done that afterward here but this is something you can do at table creation. You mention on which attributes you want to cluster. You mention when: direct-path inserts (YES ON LOAD) and/or table reorganization (YES ON DATA MOVEMENT). This is defined at table level. Beyond those defaults, the table reorganizations (ALTER TABLE … MOVE, ALTER TABLE … MERGE PARTITIONS) can explicitly DISALLOW CLUSTERING or ALLOW CLUSTERING.

Move Partition

When I have ingested some data and think that it would be better to cluster them, maybe at the time this partition is completed and new inserts go to a higher interval, I can reorganize it with a simple ALTER TABLE … MOVE:


SQL> alter table covid move partition new online allow clustering;

Table altered.

This will cluster rows together on the clustering attributes. I mentioned ALLOW CLUSTERING to show the syntax but it is the default (YES ON DATA MOVEMENT) anyway here.

At that point, you may also want to compress the old partitions with basic compression (the compression that does not require an additional option but is possible only with bulk load or data movement). However, be careful: the combination of online operation and basic compression requires the Advanced Compression Option. More info in a previous post on “Segment Maintenance Online Compress” feature usage.

Merge Partition

As my goal is to cluster data on a different dimension than the time one, I may want to have larger partitions for the past ones. Something like the current partition holding a week of data at maximum, but the past partitions being on quarter or yearly ranges. That can be done with partition merging, which is an online operation in 18c (and note that I have a global index here and an online operation does not invalidate indexes):


SQL> alter table covid merge partitions old,new into partition oldmerged online allow clustering;

Table altered.

This is a row movement and clustering on data movement is enabled. Again I mentioned ALLOW CLUSTERING just to show the syntax.

Let’s see the number of buffers read now with index accesss. The statistics of the index (clustering factor) has not been updated, so the optimizer may not choose the index access yet (until dbms_stats runs on stale tables). I’m forcing with an hint:


SQL> select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                                                     PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________________________
SQL_ID  2whykac7cnjks, child number 0
-------------------------------------
select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid
where geoid='US' group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |    95 (100)|      7 |00:00:00.01 |       8 |
|   1 |  SORT ORDER BY                               |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |       8 |
|   2 |   HASH GROUP BY                              |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |       8 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    105 |    93   (0)|    160 |00:00:00.01 |       8 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    105 |     1   (0)|    160 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')
       filter(TBL$OR$IDX$PART$NUM(,0,8,0,"COVID".ROWID)=1)

The cost has not changed (because of the statistics) but the number of buffers read is minimal: only the 8 buffers where all my rows for this country are clustered. Remember that I clustered on the country name but use the GEOID here in my predicate. That doesn’t matter as long as the rows are together.

Asynchronous global index maintenance

Note the strange predicate on TBL$OR$IDX$PART$NUM(,0,8,0,”COVID”.ROWID)=1 that results from another 12c feature where global indexes are maintained usable during the partition maintenance (which is required for an online operation) but optimized to be cleaned-out asynchronously later. This is visible from DBA_INDEXES:


SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:33:34                        19206 YES

Orphaned entries mean that some entries in the global index may reference the dropped segment after my MOVE or MERGE and the query has to ignore them.

Those ranges of rowid are determined from the segment concerned, stored in the dictionary:


SQL> select * from sys.index_orphaned_entry$;
   INDEXOBJ#    TABPARTDOBJ#    HIDDEN
____________ _______________ _________
       79972           79970 O
       79972           79971 O
       79972           79980 O
       79972           79973 O

HIDDEN=’O’ means Orphaned and the ROWIDs addressing these partitions are filtered out from the dirty index entries buy the predicated filter(TBL$OR$IDX$PART$NUM(,0,8,0,”COVID”.ROWID)=1) above.

This maintenance of the dirty index will be done during the maintenance window but I can do it immediately to finish my reorganization correctly:


SQL> alter index COVID_GEOID coalesce cleanup;

Index altered.

SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:33:34                        19206 NO

No orphaned index entries anymore. Note that I could also have called the DBMS_PART.CLEANUP_GIDX procedure to do the same.

This is fine for the query, but as the statistics were not updated, the optimizer doesn’t know yet how clustered is my table. In order to complete my reorganization and have queries benefiting from this immediately, I gather the statistics:


SQL> exec dbms_stats.gather_table_stats(user,'COVID',options=>'gather auto');

PL/SQL procedure successfully completed.

SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:38:40                          369 NO

GATHER AUTO gathers only the stale ones, and, as soon as I did my MOVE or MERGE, the index was marked as stale (note that the ALTER INDEX COALESCE does not mark them a stale by itself).

And now my query will use this optimal index without the need for any hint:


SQL_ID  2nyu7m59d7spv, child number 0
-------------------------------------
select trunc(daterep,'mon'), max(cases) from covid where geoid='US'
group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |     7 (100)|      7 |00:00:00.01 |       5 |
|   1 |  SORT ORDER BY                               |             |      1 |    101 |     7  (29)|      7 |00:00:00.01 |       5 |
|   2 |   HASH GROUP BY                              |             |      1 |    101 |     7  (29)|      7 |00:00:00.01 |       5 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    160 |     5   (0)|    160 |00:00:00.01 |       5 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    160 |     2   (0)|    160 |00:00:00.01 |       2 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

and, thanks to the coalesce cleanup, there’s no predicate on orphan ROWIDs anymore.

With this pattern, you may realize that my global index on countries is useful only for past data. Not for the recent one that has not been clustered yet. Then, we can even avoid maintaining the index for this partition. We will see that in the next post. it is called partial indexing.

With this pattern, we can even doubt about the need to maintain an index for the old partitions. As all my rows for GEOID=’US’ were packed in a few contiguous blocks, why not just store the range of ROWIDs rather than the list of it? This is called Zone Maps. But this is only available on Exadata and I like to think about Oracle as a multiplatform database.

Those many features came in the recent releases thanks to the development of the Autonomous Database. When the DBA is a cloud provider, whether it is automated or not, all maintenance must be done online without stopping the application. Those features are the bricks to build automatic lifecycle management and performance optimization.

Cet article Oracle 12c – reorg and split table with clustering est apparu en premier sur Blog dbi services.

Oracle on Azure- Sizing vs. Optimizing

As the flood gates open up on Oracle for Azure IaaS, working in an efficient manner has become a necessity.  We’re building out partners to help, but there are ways to empower our customers and those doing this work to make us all successful-  hopefully this post will assist.

After I posted the Estimate Tool for Sizing Oracle Workloads on Azure, I realized more guidance around AWR reports would be beneficial.  These tips will provide help to any migration, not just one to Azure, so read and reap the rewards!

Separate and Conquer

A sizing estimate isn’t the same thing as an optimization exercise.  Where optimizing a database workload is about the ‘what if’, a sizing estimate is about ‘what it takes now’ and shouldn’t be confused. They can be combined into a single offering to a customer, but it’s important to know, sizing takes the full workload calculations and sizes out the resources for vCPU, memory and IO/throughput required in the cloud vs. just taking what’s been already allocated for on-prem and using those [often] inaccurate infrastructure numbers.

Why are the infrastructure numbers often wrong?  For one, the host may be sized out to support the database for years and it can be difficult to know what resource needs will be required in 1, 2 or 5 years down the road.  As new objects, code and processes are added to a database over time, what was sized doesn’t match what is needed.  The second issue is how we architect for the cloud.  One of the biggest benefits is that the cloud is flexible and there’s no need to size a server out for years down the road.  You’re not purchasing hardware and you can scale as needed.  If you lift and shift the infrastructure, the values don’t represent what is being used or what is needed now.

An optimization exercise digs into the top database processing consuming those resources and identifies what steps can be taken to do so more efficiently.  It identifies waste and incorrect configurations that need to be addressed.

If a customer or account team request a sizing estimate to move a database to the cloud and understand that this is a different ask than an optimization exercise, then the next step is to ensure that the Oracle specialist has the right information to perform the task accurately.

Simplify

AWR data is our bread and butter.  The report is a monstrous buffet of data and it’s important that we remove extra complexity and ensure the best data is sent to accomplish the request.

  1.  The report should cover the workload that we need to shift-  not a subset or a sample.
    1. 8 days are retained in the AWR by default
    2. Try to get at least a one week report, but if the customer has increased the retention time on their AWR, a larger window report is always beneficial.
    3. If there are month ending or year ending processing, discuss it with the customer team if it needs to be captured or how impacting it would be not to include it.
    4. Don’t simply accept a variety of AWR reports as you would for optimization.  We’re looking at workload calculations, not details on processes.
  2. Ensure you have all the databases included to be migrated.
    1. The only time this isn’t required is if the customer wants to just estimate percentages for non-production, but all production databases should be included if the customer has requested a size for the cloud.
  3. Run the correct AWR report for the database type:
    1. All command line reports are available from the following directory on the database server, ($ORACLE_HOME/rdbms/admin)
    2. Single instance is the awrrpt.sql
    3. If a RAC database, use the awrgrpt.sql, (global) not the awrrpti.sql, (instance level)
  4.  Don’t send multiple reports from short intervals hoping to trap processes-  this doesn’t help with a sizing estimate.  I will reiterate- short intervals are for an optimization exercise, (and there are better ways to trap that kind of information, too.)
  5. Name the reports with clear identifiers for the database specialist.
    1. Customer identifier
    2. database name
    3. database use, (prod, dev, test)
    4. snapshots-  beginning and end
  6. Use correct report format-  where I used to say I didn’t care, as of 12c, some additional data was added into the HTML version of the report for some Oracle features.  Consider using the HTML report over the TEXT format due to this.

Don’t Boil the Ocean

The goal of the sizing estimate is to size the workload.  Although I do think it’s a good idea to inspect the database usage, (OLTP/OLAP) version, settings and parameters, to take on a full resiliency review of the database if you’re new to this type of work, it takes considerable time to understand how to perform Oracle database optimization.

Collect the sizing information from the AWR and place it into the Excel sizing estimate worksheet first.  Don’t get distracted by the usage or other data and overthink the process.  Verify the entries from the AWR and remember, database version and features can impact where the values reside on the AWR.  IOPS and DB CPU being two values that can be widely dispersed on different reports.  Take the time to verify you have the correct numbers entered.

If a customer sends you multiple reports for a single database, an hour report when a week of data was requested or any other scenario where the result will be inaccurate information for you to work from, let all stakeholders know immediately and truthfully stress that this will make it impossible for you to size effectively.  The biggest mistake when performing a sizing estimate is to assume anything.  I don’t expect the customer or other technical specialists to know how to do the work that I’m doing, but I do expect them to respect my decision when I state I don’t have what I need to perform my tasks accurately to completion.  Keep in mind- inaccurate data for the sizing estimate is just as worthless as the on-prem infrastructure numbers and we’re simply wasting everyone’s time, so getting the right data is worth it.



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [Oracle on Azure- Sizing vs. Optimizing], All Right Reserved. 2020.

Copying a SQL Plan Baseline from one database to another

Hopefully this post saves you a few minutes looking the procedure up. I know it’ll save me some time ;) In this rather lengthy article I’d like to cover how I copied a SQL Plan Baseline from one database to another. If you find this procedure useful, please ensure your system is appropriately licensed for it and test it first!

My Setup

My source database is named ORA19NCDB, patched to 19.7.0 running on Oracle Linux 7x/UEK 5. As I do so often, I’m using Dominic Giles’s Swingbench as the source for this experiment. This is the query in question:

SELECT oi.order_id,
    SUM(oi.unit_price * oi.quantity)
FROM
    soe.orders         o,
    soe.order_items    oi
WHERE
        o.order_id = oi.order_id
    and o.order_date = DATE '2012-04-01'
    and o.delivery_type = 'Collection'
GROUP BY
    oi.order_id
ORDER BY
    2 DESC; 

I would like to make sure the execution plan for this statement is available to my destination database. Like the source, it’s a 19.7.0 database running on Linux.

Create the SQL Plan Baseline

In the first step I need to create the SQL Plan Baseline, for which I need the SQL ID and plan hash value. One way to get these is to run the query and check the cursor cache:

SQL> @/tmp/query

  ORDER_ID SUM(OI.UNIT_PRICE*OI.QUANTITY)
---------- ------------------------------
   1138103                          30648

...

    244488                           3696

32 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
SQL_ID  cbynz8srbydr7, child number 0
-------------------------------------
SELECT oi.order_id,     SUM(oi.unit_price * oi.quantity) FROM
soe.orders         o,     soe.order_items    oi WHERE
o.order_id = oi.order_id     and o.order_date = DATE '2012-04-01'
and o.delivery_type = 'Collection' GROUP BY     oi.order_id ORDER BY
 2 DESC

Plan hash value: 1126116122

---------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name              | E-Rows |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                   |        |       |       |          |
|   1 |  SORT ORDER BY                          |                   |      1 |  2048 |  2048 | 2048  (0)|
|   2 |   HASH GROUP BY                         |                   |      1 |  1160K|  1160K| 1429K (0)|
|   3 |    NESTED LOOPS                         |                   |      1 |       |       |          |
|   4 |     NESTED LOOPS                        |                   |      3 |       |       |          |
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS            |      1 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | ORD_ORDER_DATE_IX |      1 |       |       |          |
|*  7 |      INDEX RANGE SCAN                   | ITEM_ORDER_IX     |      3 |       |       |          |
|   8 |     TABLE ACCESS BY INDEX ROWID         | ORDER_ITEMS       |      3 |       |       |          |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("O"."DELIVERY_TYPE"='Collection')
   6 - access("O"."ORDER_DATE"=TO_DATE(' 2012-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("O"."ORDER_ID"="OI"."ORDER_ID")

Note
-----
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level


37 rows selected. 

With the SQL ID and PHV identified I can create a SQL Plan Baseline for this query. I have a little script to do that for me:

SQL> get create_baseline
  1  pro Create a SQL Plan Baseline based on SQL ID and Plan Hash Value
  2  var count number
  3  exec :count := dbms_spm.load_plans_from_cursor_cache(sql_id=>'&sqlid',plan_hash_value=>&phv)
  4* select :count || ' SQL Plan Baselines captured' as result from dual;

SQL> start create_baseline
Create a SQL Plan Baseline based on SQL ID and Plan Hash Value
Enter value for sqlid: cbynz8srbydr7
Enter value for phv: 1126116122

PL/SQL procedure successfully completed.


RESULT
--------------------------------------------------------------------
1 SQL Plan Baselines captured

SQL>  

The baseline should have been captured, as per the “result” column. It’s simple enough to verify.

SQL> select sql_handle, plan_name, origin, enabled, accepted, fixed 
  2  from dba_sql_plan_baselines where created > systimestamp - interval '5' minute;

SQL_HANDLE                     PLAN_NAME                      ORIGIN                         ENA ACC FIX
------------------------------ ------------------------------ ------------------------------ --- --- ---
SQL_6f18f4fc0f67407e           SQL_PLAN_6y67nzh7qfh3ya116ef60 MANUAL-LOAD-FROM-CURSOR-CACHE  YES YES NO

A quick check should be enough to show the baseline is used:

SQL> @/tmp/query

  ORDER_ID SUM(OI.UNIT_PRICE*OI.QUANTITY)
---------- ------------------------------
   1138103                          30648

...

    244488                           3696

32 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------
SQL_ID  cbynz8srbydr7, child number 0
-------------------------------------
SELECT oi.order_id,     SUM(oi.unit_price * oi.quantity) FROM
soe.orders         o,     soe.order_items    oi WHERE
o.order_id = oi.order_id     and o.order_date = DATE '2012-04-01'
and o.delivery_type = 'Collection' GROUP BY     oi.order_id ORDER BY
 2 DESC

Plan hash value: 1126116122

---------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name              | E-Rows |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                   |        |       |       |          |
|   1 |  SORT ORDER BY                          |                   |      1 |  2048 |  2048 | 2048  (0)|
|   2 |   HASH GROUP BY                         |                   |      1 |  1160K|  1160K| 1425K (0)|
|   3 |    NESTED LOOPS                         |                   |      1 |       |       |          |
|   4 |     NESTED LOOPS                        |                   |      3 |       |       |          |
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS            |      1 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | ORD_ORDER_DATE_IX |      1 |       |       |          |
|*  7 |      INDEX RANGE SCAN                   | ITEM_ORDER_IX     |      3 |       |       |          |
|   8 |     TABLE ACCESS BY INDEX ROWID         | ORDER_ITEMS       |      3 |       |       |          |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("O"."DELIVERY_TYPE"='Collection')
   6 - access("O"."ORDER_DATE"=TO_DATE(' 2012-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("O"."ORDER_ID"="OI"."ORDER_ID")

Note
-----
   - SQL plan baseline SQL_PLAN_6y67nzh7qfh3ya116ef60 used for this statement
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level


38 rows selected. 

This looks good! I can see the SQL Plan Baseline used as per the “Note” section.

Transferring the SQL Plan Baseline

Now the fun begins. I need to transfer the SQL Plan Baseline from my source system to the target. The procedure is well documented, and I really like the 11.2 documentation. There is of course more out there, but this is the most concise piece of information in my opinion.

To copy the SQL Plan Baseline I need to first put it into a staging table and then export it from the source to import it into the destination database. I have chosen to store the baseline in the SOE schema, but that’s of course up to you. This little script can help save some time:

set serveroutput on
prompt prepare a SQL Plan Baseline for transport
prompt
accept v_table_name  prompt 'enter the name of the staging table to be created: ' 
accept v_table_owner prompt 'enter the schema name where the staging table is to be created: ' 
accept v_sql_handle  prompt 'which SQL handle should be exported? ' 
accept v_plan_name prompt 'enter the corresponding plan name: ' 

declare
  v_packed_baselines number;
begin
  dbms_spm.create_stgtab_baseline( 
    table_name => '&v_table_name', 
    table_owner => '&v_table_owner');

  v_packed_baselines := dbms_spm.pack_stgtab_baseline(
    table_name => '&v_table_name',
    table_owner => '&v_table_owner',
    sql_handle => '&v_sql_handle',
    plan_name => '&v_plan_name');
  dbms_output.put_line(v_packed_baselines || ' baselines have been staged in &v_table_owner..&v_table_name');
end;
/ 

set serveroutput off

This procedure should confirm a single SQL Plan Baseline to have been staged:

SQL> start prep_transport
prepare a SQL Plan Baseline for transport

enter the name of the staging table to be created: blogpost
enter the schema name where the staging table is to be created: soe
which SQL handle should be exported? SQL_6f18f4fc0f67407e
enter the corresponding plan name: SQL_PLAN_6y67nzh7qfh3ya116ef60
1 baselines have been staged in soe.blogpost

PL/SQL procedure successfully completed. 

Next, export the table and import it into the destination database. Here is the output from my system:

$ expdp directory=the_export_dir tables=soe.blogpost dumpfile=soe_blogpost.dmp logfile=soe_blogpost.log

Export: Release 19.0.0.0.0 - Production on Tue Jun 2 17:54:13 2020
Version 19.7.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Username: martin  
Password: 

Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Starting "MARTIN"."SYS_EXPORT_TABLE_01":  martin/******** directory=the_export_dir tables=soe.blogpost dumpfile=soe_blogpost.dmp logfile=soe_blogpost.log 
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "SOE"."BLOGPOST"                            51.74 KB       9 rows
Master table "MARTIN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MARTIN.SYS_EXPORT_TABLE_01 is:
  /u01/app/oracle/admin/ORA19NCDB/dpdump/soe_blogpost.dmp
Job "MARTIN"."SYS_EXPORT_TABLE_01" successfully completed at Tue Jun 2 17:54:39 2020 elapsed 0 00:00:22 

The import is equally simple:

$ impdp directory=the_import_dir logfile=soe_blogpost.log dumpfile=soe_blogpost.dmp

Import: Release 19.0.0.0.0 - Production on Tue Jun 2 16:59:39 2020
Version 19.7.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Username: martin
Password: 

Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Master table "MARTIN"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "MARTIN"."SYS_IMPORT_FULL_01":  martin/******** directory=the_import_dir logfile=soe_blogpost.log dumpfile=soe_blogpost.dmp 
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "SOE"."BLOGPOST"                            51.74 KB       9 rows
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Job "MARTIN"."SYS_IMPORT_FULL_01" successfully completed at Tue Jun 2 17:00:22 2020 elapsed 0 00:00:26 

Making the SQL Plan Baseline available and using it

After the import completed successfully it’s time to unpack the SQL Plan Directive. The following code snippet did that for me

var num_unpacked number

begin
  :num_unpacked := dbms_spm.unpack_stgtab_baseline(
    table_name => '&table_name',
    table_owner => '&table_owner');
end;
/

print :num_unpacked 

After which you can see it in the database:

SQL> select sql_handle, plan_name, origin, enabled, accepted, fixed
  2   from dba_sql_plan_baselines;

SQL_HANDLE                     PLAN_NAME                      ORIGIN                        ENA ACC FIX
------------------------------ ------------------------------ ----------------------------- --- --- ---
SQL_6f18f4fc0f67407e           SQL_PLAN_6y67nzh7qfh3ya116ef60 MANUAL-LOAD-FROM-CURSOR-CACHE YES YES NO

SQL>  

Any execution of SQL ID cbynz8srbydr7 will now use the imported SQL Plan Baseline.

SQL> l
  1  SELECT oi.order_id,
  2     SUM(oi.unit_price * oi.quantity)
  3  FROM
  4     soe.orders     o,
  5     soe.order_items    oi
  6  WHERE
  7         o.order_id = oi.order_id
  8     and o.order_date = DATE '2012-04-01'
  9     and o.delivery_type = 'Collection'
 10  GROUP BY
 11     oi.order_id
 12  ORDER BY
 13*     2 DESC
SQL> /

[...]

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
SQL_ID  cbynz8srbydr7, child number 0

SELECT oi.order_id,     SUM(oi.unit_price * oi.quantity) FROM
soe.orders         o,     soe.order_items    oi WHERE
o.order_id = oi.order_id     and o.order_date = DATE '2012-04-01'
and o.delivery_type = 'Collection' GROUP BY     oi.order_id ORDER BY
 2 DESC

Plan hash value: 1126116122

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                   |       |       |     3 (100)|          |
|   1 |  SORT ORDER BY                          |                   |     1 |    74 |     3  (67)| 00:00:01 |
|   2 |   HASH GROUP BY                         |                   |     1 |    74 |     3  (67)| 00:00:01 |
|   3 |    NESTED LOOPS                         |                   |     1 |    74 |     1   (0)| 00:00:01 |
|   4 |     NESTED LOOPS                        |                   |     1 |    74 |     1   (0)| 00:00:01 |
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS            |     1 |    35 |     1   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                  | ORD_ORDER_DATE_IX |     1 |       |     1   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                   | ITEM_ORDER_IX     |     1 |       |     0   (0)|          |
|   8 |     TABLE ACCESS BY INDEX ROWID         | ORDER_ITEMS       |     1 |    39 |     0   (0)|          |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("O"."DELIVERY_TYPE"='Collection')
   6 - access("O"."ORDER_DATE"=TO_DATE(' 2012-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("O"."ORDER_ID"="OI"."ORDER_ID")

Note
-----
   - SQL plan baseline SQL_PLAN_6y67nzh7qfh3ya116ef60 used for this statement


35 rows selected.

This fact is also replicated in v$sql:

SQL> select sql_id, child_number, plan_hash_value, sql_plan_baseline, executions from v$sql where sql_id = 'cbynz8srbydr7';

SQL_ID        CHILD_NUMBER PLAN_HASH_VALUE SQL_PLAN_BASELINE              EXECUTIONS
------------- ------------ --------------- ------------------------------ ----------
cbynz8srbydr7            0      1126116122 SQL_PLAN_6y67nzh7qfh3ya116ef60          2 

Summary

It isn’t too hard to transfer a SQL Plan Baseline from one host to another, as this post hopes to demonstrate.

Oracle 12c – peak detection with MATCH_RECOGNIZE

By Franck Pachot

.
This post is part of a series of small examples of recent features. I’m running this in the Oracle 20c preview in the Oracle Cloud. I’ll show a very basic example of “Row Pattern Recognition” (the MATCH_RECOGNIZE clause in a SELECT which is documented as “row pattern matching in native SQL” feature by Oracle”). You may be afraid of those names. Of course, because SQL is a declarative language there is a small learning curve to get beyond this abstraction. Understanding procedurally how it works may help. But when you understand the declarative nature it is really powerful. This post is there to start simple on a simple table with time series where I just want to detect peaks (the points where the value goes up and then down).

Historically, a SELECT statement was operating on single rows (JOIN, WHERE, SELECT) within a set, or an aggregation of rows (GROUP BY, HAVING) to provide a summary. Analytic functions can operate on windows of rows (PARTITION BY, ORDER BY, ROWS BETWEEN,…) where you keep the detailed level or rows and compare it to the aggregated values of the group. A row can then look at its neighbours and when needing to go further, the SQL MODEL can build the equivalent of spreadsheet cells to reference other rows and columns. As in a spreadsheet, you can also PIVOT to move row detail to columns or vice versa. All that can be done in SQL, which means that you don’t code how to do it but just define the result you want. However, there’s something that is easy to do in a spreadsheet application like Excel but not easy to code with analytic functions: looking at a Chart, as a Line Graph, to detect some behaviour. That’s something we can code in SQL with MATCH_RECOGNIZE.

For example, from the “COVID” table I have imported in the previous post I want to see each peak of covid-19 cases in Switzerland:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/06/12c_mat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/06/12c_mat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/06/12c_mat... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/06/12c_mat... 2048w" sizes="(max-width: 1024px) 100vw, 1024px" />

I did this manually in Excel: showing all labels but keeping only those that are at a peak, whether it is a small peak or high one. There’s one value per day in this timeseries but I’m am not interested by the intermediate values. Only peaks. So, this was done from the .csv imported from http://opendata.ecdc.europa.eu/covid19/casedistribution/csv/ through an external table but, as I imported it into an Oracle table for the previous post (Oracle 18c – select from a flat file).

Ok, let’s show directly the result. Here is a small SQL statement that show me exactly those peaks, each match being numbered:


SQL> select countriesandterritories "Country","Peak date","Peak cases","match#"
  2  from covid
  3  match_recognize (
  4   partition by continentexp, countriesandterritories order by daterep
  5   measures
  6    match_number() as "match#",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   one row per match
 10   pattern (GoingUp+ GoingDown+)
 11   define
 12    GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
 14  )
 15  where countriesandterritories='Switzerland';

       Country    Peak date    Peak cases    match#
______________ ____________ _____________ _________
Switzerland    26-FEB-20                1         1
Switzerland    28-FEB-20                7         2
Switzerland    07-MAR-20              122         3
Switzerland    09-MAR-20               68         4
Switzerland    14-MAR-20              267         5
Switzerland    16-MAR-20              841         6
Switzerland    18-MAR-20              450         7
Switzerland    22-MAR-20             1237         8
Switzerland    24-MAR-20             1044         9
Switzerland    28-MAR-20             1390        10
Switzerland    31-MAR-20             1138        11
Switzerland    03-APR-20             1124        12
Switzerland    08-APR-20              590        13
Switzerland    10-APR-20              785        14
Switzerland    16-APR-20              583        15
Switzerland    18-APR-20              346        16
Switzerland    20-APR-20              336        17
Switzerland    24-APR-20              228        18
Switzerland    26-APR-20              216        19
Switzerland    01-MAY-20              179        20
Switzerland    09-MAY-20               81        21
Switzerland    11-MAY-20               54        22
Switzerland    17-MAY-20               58        23
Switzerland    21-MAY-20               40        24
Switzerland    24-MAY-20               18        25
Switzerland    27-MAY-20               15        26
Switzerland    29-MAY-20               35        27
Switzerland    06-JUN-20               23        28


28 rows selected.

Doing that with analytic functions or MODEL clause is possible, but not easy.

So let’s explain the clauses in this simple example.

Define

I’ll need to define what is a peak. For that, I need to define two very primary patterns. The value I’m looking for, which is the one you see on the graph, is the column “CASES”, which is the number of covid-19 cases for the day and country. How do you detect peaks visually? Like when hiking in mountains: it goes up and when you continue it goes down. Here are those two primary patterns:


 11   define
 12    GoingUp as ( GoingUp.cases >= prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))

“GoingUp” matches a row where “cases” value is higher than the preceding row and “GoingDown” matches a row where “cases” is lower than the preceding one. The sense of “preceding one”, of course, depends on an order, like with analytic functions. We will see it below.

Pattern

A peak is when a row matches GoingDown just after matching GoingUp. That’s simple but you can imagine crazy things that a data scientist would want to recognize. And then the MATCH_RECOGNIZE defines patterns in a similar way as Regular Expressions: mentioning the primary patterns in a sequence with some modifiers. Mine is so simple:


 10   pattern (GoingUp+ GoingDown+)

This means: one or more GoingUp followed by one or more GoingDown. This is exactly what I did in the graph above: ignore intermediate points. So, the primary pattern compares a row with the preceding only and consecutive comparisons are walked through and compared with the pattern.

Partition by

As mentioned, I follow the rows in order. For a timeseries, this is simple: the key is the country here, I partition by continent and country, and the order (x-axis) is the date. I’m looking at the peaks per country when the value (“cases”) is ordered by date (“daterep”):


  2  from covid
...
  4   partition by continentexp, countriesandterritories order by daterep
...
 15* where countriesandterritories='Switzerland';

I selected only my country here with a standard where clause, to show simple things.

Measures

Eatch time a pattern is recognized, I want to display only one row (“ONE ROW PER MATCH”) with some measures for it. Of course, I must access to the point I’m interested in: the x-axis date and y-axis value for it. I can reference points within the matching window and I use the pattern variables to reference them. The peak is the last row in the “GoingUp” primary pattern and last(GoingUp.dateRep) and last(GoingUp.cases) are my points:


  5   measures
  6    match_number() as "match#",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   one row per match

Those measures are accessible in the SELECT clause of my SQL statement. I added the match_number() to identify the points.

Here is the final query, with the partition, measures, pattern and define clauses within the MATCH_RECOGNIZE():


select countriesandterritories "Country","Peak date","Peak cases","match#"
from covid
match_recognize (
 partition by continentexp, countriesandterritories order by daterep
 measures
  match_number() as "match#",
  last(GoingUp.dateRep) as "Peak date",
  last(GoingUp.cases) as "Peak cases"
 one row per match
 pattern (GoingUp+ GoingDown+)
 define
  GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
  GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
)
where countriesandterritories='Switzerland';

The full syntax can have more and of course all is documented: https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8982

Debug mode

In order to understand how it works (and debug) we can display “all rows” (ALL ROWS PER MATCH instead of ONE ROW PER MATCH in line 9), and add the row columns (DATEREP and CASES in line 1) and, in addition to the match_number() I have added the classifier() measure:


  1  select countriesandterritories "Country","Peak date","Peak cases","match#",daterep,cases,"classifier"
  2  from covid
  3  match_recognize (
  4   partition by continentexp, countriesandterritories order by daterep
  5   measures
  6    match_number() as "match#", classifier() as "classifier",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   all rows per match
 10   pattern (GoingUp+ GoingDown+)
 11   define
 12    GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
 14  )
 15* where countriesandterritories='Switzerland';

“all rows per match” shows all rows where pattern matching is tested, classifier() shows which primary pattern is matched.

Here are the rows around the 10th match. You must keep in mind that rows are processed in order and for each row, it looks ahead to recognize a pattern.


       Country    Peak date    Peak cases    match#      DATEREP    CASES    classifier
______________ ____________ _____________ _________ ____________ ________ _____________
...
Switzerland    24-MAR-20             1044         9 24-MAR-20        1044 GOINGUP
Switzerland    24-MAR-20             1044         9 25-MAR-20         774 GOINGDOWN
Switzerland    26-MAR-20              925        10 26-MAR-20         925 GOINGUP
Switzerland    27-MAR-20             1000        10 27-MAR-20        1000 GOINGUP
Switzerland    28-MAR-20             1390        10 28-MAR-20        1390 GOINGUP
Switzerland    28-MAR-20             1390        10 29-MAR-20        1048 GOINGDOWN
Switzerland    30-MAR-20             1122        11 30-MAR-20        1122 GOINGUP
Switzerland    31-MAR-20             1138        11 31-MAR-20        1138 GOINGUP              
Switzerland    31-MAR-20             1138        11 01-APR-20         696 GOINGDOWN  
Switzerland    02-APR-20              962        12 02-APR-20         962 GOINGUP
Switzerland    03-APR-20             1124        12 03-APR-20        1124 GOINGUP
Switzerland    03-APR-20             1124        12 04-APR-20        1033 GOINGDOWN

You see here how we came to output the 10th matched (28-MAR-20 1390 cases). After the peak of 24-MAR-20 we were going down the next day 25-MAR-20 (look at the graph). This was included in the 10th match because of regular expression “GoingDown+”. Then up 26-MAR-2020 to 28-MAR-20, which matches GoingUp+ followed by a “GoingDown” on 29-MAR-20 which means that a 11th match has been recognized. It continues for all “GoingDown+” but there’s only one here as the next one is a higher value: 1122 > 1048 so the 11th match is closed here on 29-MAR-20. This is where the ONE ROW PER MATCH is returned, when processing the row from 29-MAR-20, with the values from the last row classified as GOINGUP, and defined in the measures, which are 28-MAR-20 and 1390. And then the pattern matching continues from this row and a GoingUp has been detected…

If you want to go further, there are good examples from Lucas Jellama: https://technology.amis.nl/?s=match_recognize
And about its implementation in SQL engines, read Markus Winand https://modern-sql.com/feature/match_recognize

And I’ll probably have more blog posts here in this series about recent features interesting for BI and DWH…

Cet article Oracle 12c – peak detection with MATCH_RECOGNIZE est apparu en premier sur Blog dbi services.

Video : Podman : Generate and Play Kubernetes YAML Files

Today’s video demonstrates Podman’s ability to generate and play Kubernetes YAML files.

This is based on the following article.

You can might want to check out these also.

The star of today’s video is Max McDonald, son of Connor McDonald.

Cheers

Tim…


Video : Podman : Generate and Play Kubernetes YAML Files was first posted on June 8, 2020 at 7:11 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

VirtualBox 6.1.10

VirtualBox 6.1.10 has been released.

The downloads and changelog are in the usual places.

I’ve installed it on Windows 10, macOS Catalina and Oracle Linux 7 hosts and there wasn’t any drama. My Vagrant boxes seem to be running fine.

Cheers

Tim…


VirtualBox 6.1.10 was first posted on June 6, 2020 at 3:52 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

What is a serverless database?

By Franck Pachot

.
After reading the https://cloudwars.co/oracle/oracle-deal-8×8-larry-ellison-picks-amazons-pocket-again/ paper, I am writing some thoughts about how a database can be serverless and elastic. Of course, a database needs a server to process its data. Serverless doesn’t mean that there are no servers.

Serverless as not waiting for server provisioning

The first idea of “serverless” is about provisioning. In the past when a developer required a new database to start a new project she had to wait that a server is installed. In 1996 my first development on Oracle Database started like this: we asked Sun for a server and OS and asked Oracle for the database software, all for free for a few months, in order to start our prototype. Today this would be a Cloud Free Tier access. At that time we had to wait to receive, unbox, and install all this. I learned a lot there about Installing an OS, configuring the network, setting up disk mirroring… This was an awesome experience for a junior starting in IT. Interestingly, I think that today a junior can learn the same concepts with a Cloud Foundation training and certification. This has not really changed except the unboxing and cabling. The big difference is that today we do not have to wait weeks for it and can setup the same infrastructure in 10 minutes.

That was my first DevOps experience: we wanted to develop our application without waiting for the IT department. But it was not serverless at all.

A few years later I was starting a new datawarehouse for a mobile telco in Africa. Again, weeks to months were required to order and install a server for it. And we didn’t wait. We started the first version of the datawarehouse on a spare PC we had. This was maybe my first serverless experience: the server provisioning is out of the critical path in the project planning. Of course, a PC is not a server and reliability and performance were not there. But we were lucky and when the server arrived we already had good feedback from this first version.

We need serverless, but we need real servers behind it. Today, this is possible: you don’t need to wait and you can provision a new database in the public or private cloud, or simply on a VM, without waiting. And all security, reliability and performance are there. With Oracle, it is a bit more difficult if you can’t do it in their public cloud because licensing do not count vCPUs and you often need specific hardware for it like in the old days. Appliances like ODA can help. Public Cloud or Cloud@Customer definitely helps.

Serverless as not taking responsibility for server administration

Serverless is not only about running on virtual servers with easy provisioning. If you are serverless, you don’t want to manage those virtual machines. You start and connect to a compute instance. You define its shape (CPU, RAM) but you don’t want to know where it runs physically. Of course, you want to define the region for legal, performance or cost reasons, but not which data center, which rack,… That’s the second step of serverless: you don’t manage the physical servers. In Oracle Cloud, you run a Compute Instance where you can install a database. In AWS this is an EC2 instance where you can install a database.

But, even if you don’t own the responsibility of the servers, this is not yet “serverless”. Because you pay for them. If your CFO still sees a bill for compute instance, you are not serverless.

Serverless as not paying for the server

AWS has a true serverless and elastic database offer: Amazon Aurora Serverless. You don’t have to start or stop the servers. This is done automatically when you connect. More activity adds more servers. No connection stops it. And you pay only for what the application is using. You don’t pay for the database servers running. You really pay for what the application is using.

Azure has also a Serverless SQL Server: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-serverless

Those are, as far as I know, the only true serverless databases yet. If we need to stop and start the compute services ourselves, even with some level of auto-scaling, we can call that on-demand but not serverless.

All AWS RDS services including Aurora can be started and stopped on demand. They can scale up or down with minimal downtime, especially in Multi-AZ because the standby can be scaled and activated. Redshift cannot be stopped because it uses local storage. But you can take a snapshot and terminate the instance, and restore it later.

On Oracle side, the Autonomous Database can be stopped and started. Then again, we can say that we don’t pay when we don’t use the database but cannot say that we don’t pay when we don’t use the application. Because the database is up even if the application is not used. However, you can scale without the need to stop and start. And there’s also some level of auto-scaling where the additional application usage is really billed on CPU usage metrics: you pay for n OCPUs when the ATP or ADB is up and you can use up to n*3 sessions on CPU, with true serverless billing for what is above the provisioned OCPUs. Maybe the future will go further. The technology allows it: multitenant allows PDB level CPU caging where the capacity can be changed online (setting CPU_COUNT) and AWR gathers the CPU load with many metrics that can be used for billing.

Serverless

The name is funny because serverless programs run on servers. And the crush for running without servers is paradoxical. When I started programming, it was on very small computers (ZX-81, Apple //e, IBM PC-XT) and I was really proud when I started to do real stuff running on real servers, with a schema on *the* company database. Actually, what is called serverless today is, in my opinion, showing the full power of servers: don’t need to buy a computer for a project but use some mutualized compute power.

The cloud wars use strange marketing terms, but really good technology and concepts are coming.

Cet article What is a serverless database? est apparu en premier sur Blog dbi services.

Analytic cost error

Here’s a surprising costing error that was raised on the Oracle Developer Forum a few days ago. There’s a glitch in the cost atributed to sorting when an analytic over() clause – with corresponding “window sort” operation – makes a “sort order by” operation redundant. Here’s a script to generate the data set I’ll use for a demonstration with a template for a few queries I’ll be running against the data.


rem
rem     Script:         window_sort_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                    id,
        cast(lpad(rownum,30,'0') as varchar2(30)) vc30,
        cast(lpad(rownum,65,'0') as varchar2(65)) vc65,
        lpad('x',100,'x')                         padding
from
        generator
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

select
--      row_number() over (order by vc65) rn1,
--      row_number() over (order by vc30) rn2,
--      vc30,
--      vc65,
        id
from
        t1
-- order by
--      vc65
--      vc30
/


I’m (optionally) using the row_number() analytic function over the entire data set and for each row_number() I include in the select list Oracle will have to sort the data; I’ve also got an (optional) order by on the two columns that appear in the row_number() functions and that may introduce some sorting as well. Here, for example, are a few examples of the queries I might run:


prompt  ===========================================
prompt  Select vc30, order by vc30
prompt  ===========================================

explain plan for
select 
        vc30,
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ==========================================
prompt  Select row_number over vc30 - no ordering
prompt  ==========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ===========================================
prompt  Select row_number over vc30 - order by vc65
prompt  ===========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

In the first query we select and sort vc30 so (approximately) we sort 10,000 rows x 30(-ish) bytes for 300K of sort space. In the second query we generate the row_number() based on sorting vc30 – the size of the output is much smaller (it’s only 10,000 numbers between 1 and 10,000) but to generate those numbers we do have to select and sort vc30, so the workload (predicted and actual) will probably be similar to that of the firsrt query. In the final query we have to select and sort vc30 to generate the row_number() but we also have to select (without reporting) and sort vc65 in order to report the results in the right order – so we should expect the workload to be roughly 3 times the size (approximately 10,000 * (30 + 65) bytes). Here, from 12.2.0.1, are the execution plans (with a little cosmetic tidying):

===========================================
Select vc30, order by vc30
===========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  SORT ORDER BY     |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

==========================================
Select row_number over vc30 - no ordering
==========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

===========================================
Select row_number over vc30 - order by vc65
===========================================
-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  SORT ORDER BY      |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

As expected, the execution plans are roughly consistent with the estimates I gave for volume of data – the agreement between the query with order by vc30 and the query with over(order by vc30), and the increased load of ordering by vc65 when selecting the row_number(over vc30) is good.

So let’s see what the plan looks like when we select row_number(over vc30) and then order by vc30. If the optimizer is smart it will recognise that it’s possible to adopt a strategy that allows it to take advantage of the sorting from the over() clause to avoid a separate sort order by:


explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|    42   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|    42   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|    42   (5)|
----------------------------------------------------------------

The plan shows us that Oracle has used the 10gR2 “order by elimination” feature to bypass the need for a “sort order by” operation because it knows the data will be arriving in the right order from the “Window Sort” operation. Ynfortunately it also shows us that Oracle has lost the cost of doing the Window Sort!

Let’s try another experiment – let’s generate two different row_number() columns, with and without ordering:


prompt  =====================================
prompt  Select Both row_numbers - no ordering
prompt  =====================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ======================================
prompt  Select Both row_numbers order by vc30
prompt  ======================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ============================================
prompt  Select Both row_numbers order by vc65
prompt  ============================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

You’ll notice that I’ve included a request for the projection information in the plans for these examples so that you can see what columns are passed up from each operation to its parent. Again, though, we’ll start by focusing on just the costs:


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC65"[VARCHAR2,65], "VC30"[VARCHAR2,30],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC30")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC65")[22]
   2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22],
       "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The first query – without an order by” clause – reports a cost of 463; add an order by clause and the cost drops to 253 (and the “order by” clause doesn’t appear as a sort order by operation in the plan). The cost differential between the ordered and “unordered” plans , by the way, is 210 (and from there down to the base tablescan is another 211) – and here’s another way to see that number (+/- 1) appearing:


explain plan for
select
        vc65,
        id
from
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   986K|       |   253   (3)|
|   1 |  SORT ORDER BY     |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
------------------------------------------------------------------------

The cost of the query with no “order by” clause is basically the cost of a table scan plus two sorts of (vc65, vc30, plus a few bits). When you add in an “order by” clause the optimizer discards the “order by” clause and then subtracts one of the sort costs as well.

CBO trace file

Every time I say something about 10053 (CBO) trace files I feel compelled to remind everyone that I rarely look at them, and then it’s usually because I think I know there’s a bug and where I’ll find it in the trace. That’s exactly the case here.

I’m expecting to see two differences in the trace files between the “no order” query, and a query where I’ve added in an “order by” clause. One difference is that one trace file will have an “OBYE” (eliminate order by) comment which won’t be in the other trace, one trace file will have a “cost for SORT” calculation which won’t be in the other.

So here are the relevant bits – first from the query without the order by clause:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE bypassed: no order by to eliminate.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 463.384707  card: 10000.000000  bytes: 1010000.000000
***********************

And from one of the plans with an order by:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE performed.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 252.512458  card: 10000.000000  bytes: 1010000.000000
***********************

As you can see, the first (no order) trace file has two sort calculations under WiF sort, (Window Function?) while the second (order by) trace file reports “OBYE performed” and then loses one of its WiF sorts.

Note: If I had ordered by ID I would have seen two calculations of cost under the heading of WiF sort and a third calculation below that with the heading ORDER BY sort. Unfortunately when I ran the test to check this I also found that the OBYE report said: “OBYE performed” when it wasn’t relevant and there wasn’t an “order by” available for elimination.

Interestingly I tested to see if I could change the optimizer’s behaviour by adding the no_eliminate_oby(@sel$1) hint to the “order by” queries but the GENERAL PLANS section didn’t change, even though the trace file report: “OBYE: OBYE bypassed: hinted”, and the “Hint Report” from the Oracle 19i execution plan acknowledge the hint as legal and used.

Summary

If you have an “order by” clause in a query block that includes analytic functions and the optimizer decides that it can eliminate the “order by” and rely on the side effect of an analytic over() clause you may find that the cost of the query block is reduced by the cost of one of the Window Sort operations. (Technically this might lead to cases where the optimizer then made some poor choices in overall shape of the execution plan – though such cases might be very rare given that this costing error doesn’t affect the cardinality estimates.)

Lagniappe

In the last set of tests I added in the projection information as a simple example of a case where it can help you understand a little more of what the plan is supposed to achieve.  If you examine the last two query plans carefully (select both row_number() values and order by vc30 / vc65 respectively) Operation 2 of the first plan reports:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

while operation 2 of the second plan reports:

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

It’s not until you look at these two lines that the plans show any differences – operations 1 and 2 simply say “Window Sort” without giving any clue about which window sort is for which over() clause. The projection information, though, tells you which way around the over() clauses operated – in the first query the over(order by vc65) is applied to the result of the tablescan first, while in the second query it’s the over(order by vc30) that is applied first.

Lagniappe 2

There’s another little oddity you might spot when you look at the projection information and think about the WiF sort costs from the unordered query. The columns passed from operation 3 to operation 2 are:

 3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The columns passed from operation 2 to operation 1 are one of:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

Operation 2 sorts the data from operation 3, and operation 1 sorts the data from operation 2 – but the columns arriving from operation 2 have an extra row_number()[22] added to them. So when you look in the trace file at the two Wif Sort calculations why do they both show:

Blocks to Sort: 150 Row size: 122 Total Rows: 10000

Shouldn’t one of them show a Row size that’s (at least) 22 longer than the other ?

 

 

 

 

 

 

The lunchtime nuisance…

We’ve all been there … You’re trying to save some changes to the database, and no matter what you try, your application just hangs. After some to-and-fro with the DBA, or perhaps you are the DBA, you stumble across that uncommitted transaction that “Joe from marketing” commenced at 11:55am.  You pop around to Joe’s desk, and there it is on screen….and Joe has popped out to lunch for an hour ….. grrrrr!

image

Of course, you don’t really see this on their screen, because their screen is locked and unless they are one of those people that likes to attach their Windows password on a post-it note next to their monitor Smile then you are out of luck.

(Apologies to anyone really called Joe who works in marketing for the clash of fiction with reality Smile)

Luckily for more modern applications, they are typically stateless and thus the days of transactions commencing and being held straight from the customer’s PC are predominantly a thing of the past. However, even in these situations, the problem can simply shift to that application server, where a browser session has long since either crashed or been shutdown and the accompanying application server process happily sits there forever with an open transaction waiting for a customer response that never comes.

Help is at hand with a nice new parameter in 20c. If a session is idle, but it is blocking others, the database can now turf that session (“turf” is an aussie colloquialism for “kill” Smile) so that others can continue their database work unimpeded.

Here’s an example of that in action. I’ll set the idle block limit to 1 minute.


SQL> alter system set max_idle_blocker_time = 1;

System altered.

Now session 1 will be our “lunchtime nuisance”.


SQL> delete from t;

1 row deleted.

Session 2 would normally wait forever to get access to this row, but after waiting for a little while … look what happens:


SQL> select * from t for update ;

         X
----------
         1

Elapsed: 00:00:57.42

When I go back to session 1, you can see that the database has given him his marching orders!


SQL> select * from t;
select * from t
              *
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 3638
Session ID: 65 Serial number: 16907

Take that Joe! Smile