Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Linux Scripting, Part III

In the previous blog posts, we learned how to set up the first part of a standard shell script- how to interactively set variables, including how to pass them as part of the script execution. In this next step, we’ll use those to build out Azure resources. If you’re working on-premises, you can use this type of scripting with SQL Server 2019 Linux but will need to use CLI commands and SQLCMD. I will cover this in later posts, but honestly, the cloud makes deployment quicker for any business to get what they need deployed and with the amount of revenue riding on getting to market faster, this should be the first choice of any DBA with vision.

When I started at Microsoft close to a year ago, along with subsequent journey to Azure, it was challenging to locate the Azure commands for BASH, (which is still incorrectly referred to as “the CLI”). I could locate the Powershell commands immediately, but as Microsoft matures its foothold in Linux, it is realizing, just like those of us that were adept in other shell languages- there’s one shell that’s winning the war and that’s BASH. We can love the shell we started with, the one that has served us well, but BASH has been around so long and is so robust, we need to recognize that instead of re-inventing the wheel unnecessarily, we can use what is out there already. I now locate the AZ commands to be used with BASH faster than the Powershell specific commands. I am glad they will continue to support Powershell Azure commands, but highly recommend learning about the standard AZ commands, using them in Powershell and learning BASH for the future.

OK, off my soapbox and let’s get to the learning…</p />
</p></div>

    	  	<div class=

DBMS_JOB – the joy of transactions

This is a followup to yesterdays post on DBMS_JOB and is critical if you’re upgrading to 19c soon. Mike Dietrich wrote a nice piece last week on the “under the covers” migration of the now deprecated DBMS_JOB package to the new Scheduler architecture. You should check it out before reading on here.

Mike’s post concerned mainly what would happen during upgrade (spoiler: the DBMS_JOB jobs become scheduler jobs but you can maintain them using the old API without drama), but immediately Twitter was a buzz with a couple of concerns that I wanted to address:

1) What about new jobs submitted via the old API after the upgrade?

For comparison, here’s what you see currently in 18c – DBMS_JOB is quite separate from the scheduler.


SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production

SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'begin dbms_session.sleep(60); end;',sysdate,'sysdate+1');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> select job, what from user_jobs;

       JOB WHAT
---------- --------------------------------------------------
       181 begin dbms_session.sleep(60); end;

SQL> select job_name, job_action from user_scheduler_jobs;

no rows selected


Now here’s the same in 19c


SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'begin dbms_session.sleep(60); end;',sysdate,'sysdate+1');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> select job, what from user_jobs;

       JOB WHAT
---------- --------------------------------------------------
        22 begin dbms_session.sleep(60); end;

SQL> select job_name, job_action from user_scheduler_jobs;

JOB_NAME             JOB_ACTION
-------------------- ------------------------------------------------------------
DBMS_JOB$_22         begin dbms_session.sleep(60); end;

SQL> select * from user_scheduler_jobs
  2  @pr
==============================
JOB_NAME                      : DBMS_JOB$_22
JOB_SUBNAME                   :
JOB_STYLE                     : REGULAR
JOB_CREATOR                   : MCDONAC
CLIENT_ID                     :
GLOBAL_UID                    :
PROGRAM_OWNER                 :
PROGRAM_NAME                  :
JOB_TYPE                      : PLSQL_BLOCK
JOB_ACTION                    : begin dbms_session.sleep(60); end;
NUMBER_OF_ARGUMENTS           : 0
SCHEDULE_OWNER                :
SCHEDULE_NAME                 :
SCHEDULE_TYPE                 : PLSQL
START_DATE                    : 26-MAY-19 07.12.47.000000 PM +08:00
REPEAT_INTERVAL               : sysdate+1
EVENT_QUEUE_OWNER             :
EVENT_QUEUE_NAME              :
EVENT_QUEUE_AGENT             :
EVENT_CONDITION               :
EVENT_RULE                    :
FILE_WATCHER_OWNER            :
FILE_WATCHER_NAME             :
END_DATE                      :
JOB_CLASS                     : DEFAULT_JOB_CLASS
ENABLED                       : TRUE
AUTO_DROP                     : TRUE
RESTART_ON_RECOVERY           : FALSE
RESTART_ON_FAILURE            : FALSE
STATE                         : SCHEDULED
JOB_PRIORITY                  : 3
RUN_COUNT                     : 0
UPTIME_RUN_COUNT              :
MAX_RUNS                      :
FAILURE_COUNT                 : 0
UPTIME_FAILURE_COUNT          :
MAX_FAILURES                  :
RETRY_COUNT                   : 0
LAST_START_DATE               :
LAST_RUN_DURATION             :
NEXT_RUN_DATE                 : 27-MAY-19 07.12.47.000000 PM -07:00
SCHEDULE_LIMIT                :
MAX_RUN_DURATION              :
LOGGING_LEVEL                 : OFF
STORE_OUTPUT                  : TRUE
STOP_ON_WINDOW_CLOSE          : FALSE
INSTANCE_STICKINESS           : TRUE
RAISE_EVENTS                  :
SYSTEM                        : FALSE
JOB_WEIGHT                    : 1
NLS_ENV                       : NLS_LANGUAGE='AMERICAN' NLS_TERRITORY='AMERICA'...
SOURCE                        :
NUMBER_OF_DESTINATIONS        : 1
DESTINATION_OWNER             :
DESTINATION                   :
CREDENTIAL_OWNER              :
CREDENTIAL_NAME               :
INSTANCE_ID                   :
DEFERRED_DROP                 : FALSE
ALLOW_RUNS_IN_RESTRICTED_MODE : FALSE
COMMENTS                      :
FLAGS                         : 9007199254872624
RESTARTABLE                   : FALSE
HAS_CONSTRAINTS               : FALSE
CONNECT_CREDENTIAL_OWNER      :
CONNECT_CREDENTIAL_NAME       :
FAIL_ON_SCRIPT_ERROR          : FALSE

PL/SQL procedure successfully completed.

You can see that it will be enabled by default and is classed as a regular job. Even if you submit a one-off job, it will still be classed as regular not lightweight.


SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'begin dbms_session.sleep(60); end;',sysdate);
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> select job, what from user_jobs;

       JOB WHAT
---------- --------------------------------------------------
        25 begin dbms_session.sleep(60); end;

SQL> select * from user_scheduler_jobs
  2  @pr
==============================
JOB_NAME                      : DBMS_JOB$_25
JOB_SUBNAME                   :
JOB_STYLE                     : REGULAR
JOB_CREATOR                   : MCDONAC
CLIENT_ID                     :
GLOBAL_UID                    :
PROGRAM_OWNER                 :
PROGRAM_NAME                  :
JOB_TYPE                      : PLSQL_BLOCK
JOB_ACTION                    : begin dbms_session.sleep(60); end;
NUMBER_OF_ARGUMENTS           : 0
SCHEDULE_OWNER                :
SCHEDULE_NAME                 :
SCHEDULE_TYPE                 : ONCE
START_DATE                    : 26-MAY-19 08.37.09.000000 PM +08:00
REPEAT_INTERVAL               :
EVENT_QUEUE_OWNER             :
EVENT_QUEUE_NAME              :
EVENT_QUEUE_AGENT             :
EVENT_CONDITION               :
EVENT_RULE                    :
FILE_WATCHER_OWNER            :
FILE_WATCHER_NAME             :
END_DATE                      :
JOB_CLASS                     : DEFAULT_JOB_CLASS
ENABLED                       : TRUE
AUTO_DROP                     : TRUE
RESTART_ON_RECOVERY           : FALSE
RESTART_ON_FAILURE            : FALSE
STATE                         : SCHEDULED
JOB_PRIORITY                  : 3
RUN_COUNT                     : 0
UPTIME_RUN_COUNT              :
MAX_RUNS                      :
FAILURE_COUNT                 : 0
UPTIME_FAILURE_COUNT          :
MAX_FAILURES                  :
RETRY_COUNT                   : 0
LAST_START_DATE               :
LAST_RUN_DURATION             :
NEXT_RUN_DATE                 : 27-MAY-19 11.37.09.268652 AM +08:00
SCHEDULE_LIMIT                :
MAX_RUN_DURATION              :
LOGGING_LEVEL                 : OFF
STORE_OUTPUT                  : TRUE
STOP_ON_WINDOW_CLOSE          : FALSE
INSTANCE_STICKINESS           : TRUE
RAISE_EVENTS                  :
SYSTEM                        : FALSE
JOB_WEIGHT                    : 1
NLS_ENV                       : NLS_LANGUAGE='AMERICAN' NLS_TERRITORY='AMERICA' NLS_CURRENC
NLS_NUMERIC_CHARACTERS='.,' NLS_CALENDAR='GREGORIAN' NLS_DATE_FORMAT='DD-MON-RR' NLS_DATE_L
NLS_TIME_FORMAT='HH.MI.SSXFF AM' NLS_TIMESTAMP_FORMAT='DD-MON-RR HH.MI.SSXFF AM' NLS_TIME_T
NLS_TIMESTAMP_TZ_FORMAT='DD-MON-RR HH.MI.SSXFF AM TZR' NLS_DUAL_CURRENCY='$' NLS_COMP='BINA
NLS_NCHAR_CONV_EXCP='FALSE'
SOURCE                        :
NUMBER_OF_DESTINATIONS        : 1
DESTINATION_OWNER             :
DESTINATION                   :
CREDENTIAL_OWNER              :
CREDENTIAL_NAME               :
INSTANCE_ID                   :
DEFERRED_DROP                 : FALSE
ALLOW_RUNS_IN_RESTRICTED_MODE : FALSE
COMMENTS                      :
FLAGS                         : 9007199254880304
RESTARTABLE                   : FALSE
HAS_CONSTRAINTS               : FALSE
CONNECT_CREDENTIAL_OWNER      :
CONNECT_CREDENTIAL_NAME       :
FAIL_ON_SCRIPT_ERROR          : FALSE

PL/SQL procedure successfully completed.

Important Note: There is one critical thing you need to be aware of with this change. DBMS_JOB is an “old-school” public API, hence anyone and everyone pretty much had access to it. Anyone familiar with DBMS_SCHEDULER will know that the components within it are true database objects, which can be protected with privileges. So when you upgrade to 19c, to ensure that you do not get nasty surprises, users that are using DBMS_JOB will need the CREATE JOB privilege otherwise their previous ability to submit jobs will disappear. For example:


SQL> conn scott/tiger@db192_pdb1
Connected.

SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'begin dbms_session.sleep(60); end;',sysdate);
  5  end;
  6  /
declare
*
ERROR at line 1:
ORA-27486: insufficient privileges
ORA-06512: at "SYS.DBMS_ISCHED", line 9387
ORA-06512: at "SYS.DBMS_ISCHED", line 9376
ORA-06512: at "SYS.DBMS_ISCHED", line 175
ORA-06512: at "SYS.DBMS_ISCHED", line 9302
ORA-06512: at "SYS.DBMS_IJOB", line 196
ORA-06512: at "SYS.DBMS_JOB", line 168
ORA-06512: at line 4

2) What about the cherished transactional nature of DBMS_JOB?

If the old style jobs are now scheduler jobs, do we lose the transactional element of DBMS_JOB? Nope. Even though we will create a paired scheduler entry, DBMS_JOB is still transactional (which I love!).


SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'begin dbms_session.sleep(60); end;',sysdate,'sysdate+1');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> select job, what from user_jobs;

       JOB WHAT
---------- --------------------------------------------------
        21 begin dbms_session.sleep(60); end;

SQL> select job_name, job_action from user_scheduler_jobs;

JOB_NAME             JOB_ACTION
-------------------- ------------------------------------------------------------
DBMS_JOB$_21         begin dbms_session.sleep(60); end;

SQL> roll;
Rollback complete.
SQL> select job, what from user_jobs;

no rows selected

SQL> select job_name, job_action from user_scheduler_jobs;

no rows selected

SQL>

If you love that transactional capability too, then make sure to visit the database ideas page to vote up this idea. I have no issue with DBMS_SCHEDULER doing commits by default, but it would be cool if (say) for lightweight jobs we had an option to choose whether we commit or not.

Re-partitioning 2

Last week I wrote a note about turning a range-partitioned table into a range/list composite partitioned table using features included in 12.2 of Oracle. But my example was really just an outline of the method and bypassed a number of the little extra problems you’re likely to see in a real-world system, so in this note I’m going to bring in an issue that you might run into – and which I’ve seen appearing a number of times: ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION.

It’s often the case that a system has a partitioned table that’s been around for a long time, and over its lifetime it may have had (real or virtual) columns added, made inivisble, dropped, or mark unused. As a result you may find that the apparent definition of the table is not the same as the real definition of the table – and that’s why Oracle has given us (in 12c) the option to “create table for exchange”.

You might like to read a MoS note giving you one example of a problem with creating an exchange table prior to this new feature. ORA-14097 At Exchange Partition After Adding Column With Default Value (Doc ID 1334763.1) I’ve created a little model by cloning the code from that note.


rem
rem     Script:         pt_exchange_problem.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table mtab (pcol number)
partition by list (pcol) (
        partition p1 values (1),
        partition p2 values (2)
);

alter table mtab add col2 number default 0 not null;

prompt  ========================================
prompt  Traditional creation method => ORA-14097
prompt  ========================================

create table mtab_p2 as select * from mtab where 1=0;
alter table mtab exchange partition P2 with table mtab_p2;

prompt  ===================
prompt  Create for exchange
prompt  ===================

drop table mtab_p2 purge;
create table mtab_p2 for exchange with table mtab;
alter table mtab exchange partition P2 with table mtab_p2;

[/sourcecode}


Here's the output from running this on an instance of 18.3


Table created.

Table altered.

========================================
Traditional creation method => ORA-14097
========================================

Table created.

alter table mtab exchange partition P2 with table mtab_p2
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

===================
Create for exchange
===================

Table dropped.


Table created.


Table altered.

So we don’t have to worry about problems creating an exchange table in Oracle 12c or later. But we do still have a problem if we’re trying to convert our range-partitioned table into a range/list composite partitioned table by doing using the “double-exchange” method. In my simple example I used a “create table” statement to create an empty table that we could exchange into; but without another special version of a “create table” command I won’t be able to create a composite partitioned table that is compatible with the simple table that I want to use as my intermediate table.

Here’s the solution to that problem – first in a thumbnail sketch:

  • create a table for exchange (call it table C)
  • alter table C modify to change it to a composite partitioned table with one subpartition per partition
  • create a table for exchange (call it table E)
  • Use table E to exchange partitions from the original table to the (now-partitioned) table C
  • Split each partition of table C into the specific subpartitions required

And now some code to work through the details – first the code to create and populate the partitioned table.


rem
rem     Script:         pt_comp_from_pt_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

drop table t purge;
drop table pt_range purge;
drop table pt_range_list purge;

-- @@setup

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum-1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
	rownum <= 600 -- > comment to avoid WordPress format issue
;

commit;

Then some code to create the beginnings of the target composite partitioned table. We create a simple heap table “for exchange”, then modify it to be a composite partitioned table with a named starting partition and high_value and a template defining a single subpartition then, as a variant on the example from last week, specifying interval partitioning.


prompt	==========================================
prompt	First nice feature - "create for exchange"
prompt	==========================================

create table pt_range_list for exchange with table pt_range;

prompt	============================================
prompt	Now alter the table to composite partitioned
prompt	============================================

alter table pt_range_list modify
partition by range(id) interval (200)
subpartition by list (grp) 
subpartition template (
        subpartition p_def      values(default)
)
(
	partition p200 values less than (200)
)
;

If you want to do the conversion from range partitioning to interval partitioning you will have to check very carefully that your original table will be able to convert safely – which means you’ll need to check that the “high_value” values for the partitions are properly spaced to match the interval you’ve defined and (as a special requirement for the conversion) there are no omissions from the current list of high values. If your original table doesn’t match these requirement exactly you may end up trying to exchange data into a partition where it doesn’t belong; for example, if my original table had partitions with high value of 200, 600, 800 then there may be values in the 200-399 range currently stored in the original “600” range partition which shouldn’t go into the new “600” interval partition. You may find you have to split (and/or merge) a few partitions in your range-partitioned table before you can do the main conversion.

Now we create create the table that we’ll actually use for the exchange and go through each exchange in turn. Because I’ve got an explicitly named starting partition the first exchange takes only two steps – exchange out, exchange in. But because I’m using interval partitioning in the composite partitioned table I’m doing a “lock partition” before the second exchange on all the other partitions as this will bring the required target partition into existence. I’m also using the “[sub]partition for()” syntax to identify the pairs of [sub]partitions – this isn’t necessary for the original range-partitioned table, of course, but it’s the only way I can identify the generated subpartitions that will appear in the composite partitioned table.


create table t for exchange with table pt_range;

prompt	=======================================================================
prompt	Double exchange to move a partition to become a composite subpartition
prompt	Could drive this programatically by picking one row from each partition
prompt	=======================================================================

alter table pt_range exchange partition p200 with table t;
alter table pt_range_list exchange subpartition p200_p_def with table t;

alter table pt_range exchange partition for (399) with table t;
lock  table pt_range_list partition for (399) in exclusive mode;
alter table pt_range_list exchange subpartition for (399,'0') with table t;

alter table pt_range exchange partition for (599) with table t;
lock  table pt_range_list partition for (599) in exclusive mode;
alter table pt_range_list exchange subpartition for (599,'0') with table t;

prompt	=====================================
prompt	Show that we've got the data in place
prompt	=====================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

Now that the data is in the target table we can split each default subpartition into the four subpartitions that we want for each partition. To cater for the future, though, I’ve first modified the subpartition template so that each new partition will have four subpartitions (though the naming convention won’t be applied, of course, Oracle will generate system name for all new partitions and subpartitions).


prompt  ================================================
prompt  Change the subpartition template to what we want
prompt  ================================================

alter table pt_range_list
set subpartition template(
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
;

prompt  ====================================================
prompt  Second nice feature - multiple splits in one command
prompt  Again, first split is fixed name.
prompt  We could do this online after allowing the users in
prompt  ====================================================

alter table pt_range_list split subpartition p200_p_def
        into (
                subpartition p200_p_0 values(0),
                subpartition p200_p_1 values(1),
                subpartition p200_p_2 values(2),
                subpartition p200_p_def
        )
;

alter table pt_range_list split subpartition for (399,'0')
        into (
                subpartition p400_p_0 values(0),
                subpartition p400_p_1 values(1),
                subpartition p400_p_2 values(2),
                subpartition p400_p_def
        )
;

alter table pt_range_list split subpartition for (599,'0')
        into (
                subpartition p600_p_0 values(0),
                subpartition p600_p_1 values(1),
                subpartition p600_p_2 values(2),
                subpartition p600_p_def
        )
;

Finally a little demonstration that we can’t add an explicitly named partition to the interval partitioned table; then we insert a row to generate the partition and show that it has 4 subpartitions.

Finishing off we rename everything (though that’s a fairly pointless exercise).


prompt  ==============================================================
prompt  Could try adding a partition to show it uses the new template
prompt  But that's not allowed for interval partitions: "ORA-14760:"
prompt  ADD PARTITION is not permitted on Interval partitioned objects
prompt  So insert a value that would go into the next (800) partition
prompt  ==============================================================

alter table pt_range_list add partition p800 values less than (800);

insert into pt_range_list (
        id, grp, small_vc, padding
)
values ( 
        799, '0', lpad(799,10,'0'), rpad('x',100,'x')
)
;

commit;

prompt  ===================================================
prompt  Template naming is not used for the subpartitions,
prompt  so we have to use the "subpartition for()" strategy 
prompt  ===================================================

alter table pt_range_list rename subpartition for (799,'0') to p800_p_0;
alter table pt_range_list rename subpartition for (799,'1') to p800_p_1;
alter table pt_range_list rename subpartition for (799,'2') to p800_p_2;
alter table pt_range_list rename subpartition for (799,'3') to p800_p_def;

prompt  ==============================================
prompt  Might as well clean up the partition names too
prompt  ==============================================

alter table pt_range_list rename partition for (399) to p400;
alter table pt_range_list rename partition for (599) to p600;
alter table pt_range_list rename partition for (799) to p800;

prompt  =======================================
prompt  Finish off by listing the subpartitions 
prompt  =======================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

It’s worth pointing out that you could do the exchanges (and the splitting and renaming at the same time) through some sort of simple PL/SQL loop – looping through the named partitions in the original table and using a row from the first exchange to drive the lock and second exchange (and splitting and renaming). For exanple something like the following which doesn’t have any of the error-trapping and defensive mechanisms you’d want to use on a production system:



declare
        m_pt_val number;
begin
        for r in (select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position) 
        loop
                execute immediate
                        'alter table pt_range exchange partition ' || r.partition_name ||
                        ' with table t';
        
                select id into m_pt_val from t where rownum = 1;
        
                execute immediate 
                        'lock table pt_range_list partition for (' || m_pt_val || ') in exclusive mode';
        
                execute immediate
                        'alter table pt_range_list exchange subpartition  for (' || m_pt_val || ',0)' ||
                        ' with table t';
        
        end loop;
end;
/

If you do go for a programmed loop you have to be really careful to consider what could go wrong at each step of the loop and how your program is going to report (and possibly attempt to recover) the situation. This is definitely a case where you don’t want code with “when others then null” appearing anywhere, and don’t be tempted to include code to truncate the exchange table.

 

PostgreSQL: measuring query activity(WAL size generated, shared buffer reads, filesystem reads,…)

PostgreSQL: measuring query activity (WAL size generated, shared buffer reads, filesystem reads,…)

When I want to know if my application scales, I need to understand the work done by my queries. No need to run a huge amount of data from many concurrent threads. If I can get the relevant statistics behind a single unit-test, then I can infer how it will scale. For example, reading millions of pages to fetch a few rows will cause shared buffer contention. Or generating dozens of megabytes of WAL for a small update will wait on disk, and penalize the backup RTO, or the replication gap.

I’ll show some examples. From pgsql I’ll collect the statistics (which are cumulative from the start if the instance) before:

select *,pg_current_wal_lsn() from pg_stat_database where datname=current_database() \gset

and calculate the difference to show the delta:

select blks_hit-:blks_hit"blk hit",blks_read-:blks_read"blk read",tup_inserted-:tup_inserted"ins",tup_updated-:tup_updated"upd",tup_deleted-:tup_deleted"del",tup_returned-:tup_returned"tup ret",tup_fetched-:tup_fetched"tup fch",xact_commit-:xact_commit"commit",xact_rollback-:xact_rollback"rbk",pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),:'pg_current_wal_lsn')) "WAL",pg_size_pretty(temp_bytes-:temp_bytes)"temp" from pg_stat_database where datname=current_database();

Most of the statistics come from pg_stat_database. The WAL size is calculated from the latest WAL write pointer exposed with pg_current_wal_lsn() and size calculated with pg_wal_lsn_diff(). I use \gset to get them as plsql substitution variables before and used them to get the difference after.

Better with examples, I create a DEMO database and a DEMO table:

demo=# create table DEMO(n int primary key,flag char,text varchar(1000));
CREATE TABLE

Insert 1000 rows:

I’ve run the following insert:

insert into DEMO
select generate_series, 'N',lpad('x',1000,'x') from generate_series(1,10000);
INSERT 0 10000

But I’ve run my queries before and after in order to get the statistics:

I’ve inserted about 10MB of data (10000 rows with a 1000 bytes text and some additional small columns). All those new rows bust be logged by the WAL, which is the reason for the 11MB of redo information. The pages where it was written had to be read from disks in order to update them, which is the reason for 1504 block reads (that’s about 11MB in 8k blocks). Actually, we see 30000 additional block hits in buffer cache and this is the index maintenance. Those 10000 rows were inserted one by one, updating the B*Tree index which may have a height of 3 and that’s 30000 pages touched. Fortunately, they stayed in the shared buffers which is why there’s no block read each time.

This matches with the size of my table:

select relkind,relname,relpages,pg_size_pretty(relpages::bigint*8*1024) from pg_class natural join (select oid relnamespace,nspname from pg_namespace) nsp where nspname='public';

11MB of data, nearly 1500 pages — that’s what I got from the statistics.

Count the rows

Now, I’ll check how many blocks have to be read when counting all rows:

select count(*) from DEMO ;

All table blocks were read, fortunately from the shared buffers. And the reason is in the explain plan: full table scan. Actually, I would expect an Index Only scan for this query because all rows can be counted from the index. But MVCC implementation of PostgreSQL is versioning only the table tuples. Not the index entries, and it has to go to the table, unless a recent vacuum has updated the visibility map.

Vacuum

Let’s vacuum to update the visibility map:

vacuum DEMO;

Many block reads, that’s like a full table scan, with minimal updates here because there’s no dead tuples to remove.

Count from vacuumed table

Now, counting the rows again:

select count(*) from DEMO ;

Finally I have a real Index Only scan, accessing only to 45 buffers to count the 10000 index entries. That’s the most important point with PostgreSQL performance: MVCC allows to mix read and write workloads, thanks to non-blocking reads, but ideally, a vacuum should occur between massive write and massive read use-cases. Again, no need to run a massive pgbench concurrent workload to observe this. Small queries are sufficient as long as we look at the right statistics.

Update

I update the ‘flag’ column to set half of the rows to ‘Y’:

update DEMO set flag='Y' where n%2=0;

This is the operation where PostgreSQL MVCC is less efficient. I changed only one byte per row, but all rows were copied to new blocks. As I’m touching half the rows, they fit in half the blocks. This is the 743 read from disk. And the old version had to be marked as so… finally, nearly 30000 buffer access to update 5000 flags. And the worse is the redo generation. 16MB as the new tuples must be logged, as well as the old versions marked. And PostgreSQL must do full page logging even when few rows/columns are modified. More about this:

Full page logging in Postgres and Oracle - Blog dbi services

In this previous blog I was using strace to get the size of WAL written. Now using the delta offset of WAL pointer is see the same figures.

Delete

Now deleting the non-flagged rows

delete from DEMO where flag='N';

Deleting 5000 rows here, has to scan all blocks to find them (that’s the 10000 tuples returned) which is about 1500 buffers accessed. And for the 5000 found, mark them as deleted, which is 5000 additional buffers accessed.

Why?

This post is there mainly to show the simple query I use to get SQL execution statistics, including WAL writes which are probably the most useful, but unfortunately missing from pg_stat_database. I’m also advocating here for small test cases fully understood rather than general benchmarks difficult to analyze. It is, in my opinion, the best way to understand how it works, both for educational purpose and to guarantee scalable applications.

DBMS_JOB – watching for failures

I had a friend point this one out to me recently. They use DBMS_JOB to allow some “fire and forget” style functionality for user, and in their case, the jobs are “best efforts” in that if they fail, it is not a big deal.

So whilst this may sound counter-intuitive, but if you rely on jobs submitted via DBMS_JOB to fail, then please read on.

By default, if a job fails 16 times in a row, it will marked as broken by the the database. Here’s a simple of example that, where the anonymous block will obviously fail because we are dividing by zero each time. I’ll set the job to run every 5 seconds, so that within a couple of minutes we’ll have 16 failures. First I’ll run this on 11.2.0.4


SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4 - 64bit Production

SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'declare x int; begin x := 1/0; end;',sysdate,'sysdate+5/86400');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

[wait for a bit]

SQL> select job, what, failures from user_jobs;

       JOB WHAT                                                 FAILURES
---------- -------------------------------------------------- ----------
        23 declare x int; begin x := 1/0; end;                         8

[wait for a bit]

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                         9 N

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        16 Y

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        16 Y

You can see the “breaking” of the job in action here. We got to 16 failures, and the database decided “enough is enough” Smile and the job will no longer run until some sort of intervention is performed by an administrator.

Now I’ll run that demo again in 12.2


SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'declare x int; begin x := 1/0; end;',sysdate,'sysdate+5/86400');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

[wait for a bit]

SQL> select job, what, failures from user_jobs;

       JOB WHAT                                                 FAILURES
---------- -------------------------------------------------- ----------
        23 declare x int; begin x := 1/0; end;                         8

[wait for a bit]

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        13 N

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        19 N

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        25 N

You can see that in 12.2 (and I’ve tested in 18 and 19) that failures can continue past 16 unabated. If you’re being hit by this, patches are available, so please get in touch with Support. A quick workaround until you can apply patches is to use another job to monitor the others. Here’s a small anonymous block you could run in each container as SYSDBA that you could schedule (say) every few minutes


SQL> begin
  2    for i in ( select job
  3               from dba_jobs
  4               where  failures > 16
  5               and    job not in ( sys_context('userenv','fg_job_id'),sys_context('userenv','bg_job_id'))
  6               and    job not in ( select job from dba_jobs_running )
  7               )
  8    loop
  9      dbms_ijob.broken(i.job,true);
 10    end loop;
 11    commit;
 12  end;
 13  /

PL/SQL procedure successfully completed.

Ideally, you probably want to be moving to DBMS_SCHEDULER where possible, which has a greater flexibility and control over error handling amongst many other things.

Re-partitioning – 18

In yesterday’s note on the options for converting a range-partioned table into a composite range/list parititioned table I mentioned that you could do this online with a single command in 18c, so here’s some demonstration code to demonstrate that claim:


rem
rem     Script:         pt_comp_from_pt_18.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum - 1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 600 -- > comment to avoid WordPress format issue
;

commit;

alter table pt_range modify
partition by range(id) interval (200)
subpartition by list (grp) 
subpartition template (
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
(
        partition p200 values less than (200)
)
-- online
;

execute dbms_stats.gather_table_stats(null, 'pt_range', granularity=>'all')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE'
order by
        partition_name, subpartition_name
;

Run this (with or without the online option) and you’ll (probably) see the Oracle error “ORA-00604: error occurred at recursive SQL level 1” with one of two underlying errors:

    ORA-01950: no privileges on tablespace 'SYSTEM'
    ORA-01536: space quota exceeded for tablespace 'SYSTEM'

So what’s gone wrong – it ought to work.

After enabling a 10046 trace I repeated the “alter table” command then scanned the trace file for the text “err=1950” (that being the error I’d received on my first attempt) and scanned backwards for the “PARSING IN CURSOR” line with a matching cursor id:


ERROR #139721552722200:err=1950 tim=26541227462

PARSING IN CURSOR #139721552722200 len=182 dep=1 uid=104 oct=1 lid=0 tim=26541224560 hv=2451601965 ad='7f1377267890' sqlid='0wsjfgk920yjd'
create table  "TEST_USER"."SYS_RMTAB$$_H124028"  ( src_rowid rowid not null , tgt_rowid rowid not null) 
    segment creation immediate nologging 
    tablespace  "SYSTEM"  
    rowid_mapping_table
END OF STMT

The code is trying to create a “rowid_mapping_table” in the system tablespace and I have no quota for the tablespace. (The 124028 in the table name relates to the object_id of the table I was trying to modify, by the way.)

The source of the error offered a big clue about a possible workaround sp I gave myself a quota (unlimited) on the system tablespace (alter user test_user quota unlimited on system) and that made it possible for the restructuring to take place. It’s not really an appropriate workaround for a production system though – especially if you’re using the online option and the table is subject to a lot of change.  (Note – this “rowid_mapping_table” and a “journal” table are created even if you haven’t selected the online option.)

Footnotes

  • The problem has been fixed in 19c (tested on LiveSQL)  and is listed on MoS as Bug 27580976 : INTERNAL RECURSIVE MAPPING TABLE MISTAKENLY PLACE IN SYSTEM FOR ONLINE OPS. There are no patches for 18c at present.
  • After I’d finished testing the quota workaround I tried to deprive myself of the quota on the system tablespace. I may have missed something in the manuals but it looks like the only way to do this is to give myself a zero quota (or, as I have done occasionally in the past, drop user cascade) because there’s no option for “quota denied” or “revoke quota” . This is why you may get one of two different messages after the ORA-00604. If you’ve never had a quota on the system tablespace you’ll get the “ORA-1950: no privileges” message, if you’ve had a quota at some time in the pasat and then had it set to zero’ you’ll get the “ORA-01536: space quota exceeded” message.

 

Re-partitioning

I wrote a short note a little while ago demonstrating how flexible Oracle 12.2 can be about physically rebuilding a table online to introduce or change the partitioning while discarding data, and so on.  But what do you do (as a recent question on ODC asked) if you want to upgrade a customer’s database to meet the requirements of a new release of your application by changing a partitioned table into a composite partitioned table and don’t have enough room to do an online rebuild. Which could require two copies of the data to exist at the same time.)

If you’ve got the down time (and not necessarily a lot is needed) you can fall back on “traditional methods” with some 12c enhancements. Let’s start with a range partitioned table:


rem
rem     Script:         pt_comp_from_pt.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum-1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 600
;

commit;

So we’ve got a range-partitioned table with three partitions and some data in each partition. Let’s pretend we want to change this to range/list with the grp column as the subpartition key, allowing explicit use of values 0,1,2 and a bucket subpartition for anything else. First we create an empty version of the table with a suitable subpartition template, and a simple heap table to be used as an exchange medium:


create table pt_range_list (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id)
subpartition by list (grp)
subpartition template (
        subpartition p_def      values(default)
)
(
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

prompt  ===============================================
prompt  First nice 12.2 feature - "create for exchange"
prompt  ===============================================

create table t for exchange with table pt_range;

You’ll notice that our subpartition template identifies just a single subpartition that takes default values – i.e. anything for which no explicit subpartition has been identified. This means we have a one to one correspondance between the data segments of the original table and the copy table. So now we go through a tedious loop (which we could code up with a PL/SQL “execute immediate” approach) to do a double-exchange for each partition in turn. (Any PL/SQL code is left as an exercise to the interested reader.)


alter table pt_range exchange partition p200 with table t;
alter table pt_range_list exchange subpartition p200_p_def with table t;

alter table pt_range exchange partition p400 with table t;
alter table pt_range_list exchange subpartition p400_p_def with table t;

alter table pt_range exchange partition p600 with table t;
alter table pt_range_list exchange subpartition p600_p_def with table t;

prompt  =====================================
prompt  Show that we've got the data in place
prompt  =====================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows
from    user_tab_subpartitions
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;


PARTITION_NAME         SUBPARTITION_NAME        NUM_ROWS
---------------------- ---------------------- ----------
P200                   P200_P_DEF                    200

P400                   P400_P_DEF                    200

P600                   P600_P_DEF                    200


3 rows selected.

We now have to split the newly arrived subpartitions into the 4 pieces we want – but before we do that let’s make sure that any new partitions automatically have the correct subpartitions by changing the subpartition template:


alter table pt_range_list
set subpartition template(
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
;

prompt  =========================================================
prompt  Second nice 12.2 feature - multiple splits in one command
prompt  We could do this online after allowing the users back on.
prompt  =========================================================

alter table pt_range_list split subpartition p200_p_def
        into (
                subpartition p200_p_0 values(0),
                subpartition p200_p_1 values(1),
                subpartition p200_p_2 values(2),
                subpartition p200_p_def
        )
;

alter table pt_range_list split subpartition p400_p_def
        into (
                subpartition p400_p_0 values(0),
                subpartition p400_p_1 values(1),
                subpartition p400_p_2 values(2),
                subpartition p400_p_def
        )
;

alter table pt_range_list split subpartition p600_p_def
        into (
                subpartition p600_p_0 values(0),
                subpartition p600_p_1 values(1),
                subpartition p600_p_2 values(2),
                subpartition p600_p_def
        )
;

Now, just to check that everything is behaving, let’s add a new partition, and check to see what partitions and subpartitions we end up with:


alter table pt_range_list add partition p800 values less than (800);

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

select  partition_name, subpartition_name, num_rows
from    user_tab_subpartitions
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

PARTITION_NAME         SUBPARTITION_NAME        NUM_ROWS
---------------------- ---------------------- ----------
P200                   P200_P_0                      100
                       P200_P_1                      100
                       P200_P_2                        0
                       P200_P_DEF                      0

P400                   P400_P_0                      100
                       P400_P_1                      100
                       P400_P_2                        0
                       P400_P_DEF                      0

P600                   P600_P_0                      100
                       P600_P_1                      100
                       P600_P_2                        0
                       P600_P_DEF                      0

P800                   P800_P_0                        0
                       P800_P_1                        0
                       P800_P_2                        0
                       P800_P_DEF                      0


16 rows selected.

And as a final note – if we decide we want to put it all back we could merge four subpartitions down to one subpartition with a single command – then loop through every partition in turn:


alter table pt_range_list
        merge subpartitions  p200_p_0, p200_p_1, p200_p_2, p200_p_def
        into  subpartition  p200_p_def
;

And now I feel like I’m turning into Tim Hall – writing potentially useful demonstrations instead of trying to baffle people with rocket science. But I hope to get over that quite soon. Of course I have left out some important “real-world” details – particularly how you choose to handle indexes while doing the double-exchange. My view would be to take the local indexes with you on the exchange, bypass the global indexes on the exchange out, and be choosy about which global indexes to maintain on the exchange back in; but it all depends on how much downtime you have, how many indexes there are, and the state they’re likely to start or end in.

As ever it’s possible to start with a simple idea like this, then discover there are real-world complications that have to be dealt with. So there’s another article in the pipeline to handle a slightly more complex case. I’ll also be publishing a short note about the easy way of getting the job done from 18c onwards – if you’ve got the spare resources.

 

Indexing Null Values - Part 2

In the previous post I've demonstrated that Oracle has some problems to make efficient use of B*Tree indexes if an IS NULL condition is followed by IN / OR predicates also covered by the same index - the predicates following are not used to navigate the index structure efficiently but are applied as filters on all index entries identified by the IS NULL.

In this part I'll show what results I got when repeating the same exercise using Bitmap indexes - after all they include NULL values anyway, so no special tricks are required to use them for an IS NULL search. Let's start again with the same data set (actually not exactly the same but very similar) and an index on the single expression that gets searched for via IS NULL - results are again from 18.3.0:

#eeeeee; border: 1px dashed rgb(204 , 204 , 204); overflow: auto;">SQL> create table null_index as select * from dba_tables;

Table created.

SQL> insert /*+ append */ into null_index select a.* from null_index a, (select /*+ no_merge cardinality(100) */ rownum as dup from dual connect by level <= 100);

214500 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(null, 'NULL_INDEX', method_opt => 'for all columns size auto for columns size 254 owner')

PL/SQL procedure successfully completed.

SQL> create bitmap index null_index_idx on null_index (pct_free);

Index created.

SQL> set serveroutput off pagesize 5000 arraysize 500
SQL>
SQL> set autotrace traceonly
SQL>
SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1297049223

------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 2342 (1)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 2342 (1)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX | | | | |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
3 - access("PCT_FREE" IS NULL)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2192 consistent gets
30 physical reads
0 redo size
7199 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

So indeed the Bitmap index was successfully used to identify the PCT_FREE IS NULL rows but the efficiency suffers from the same problem and to the same degree as the corresponding B*Tree index plan - too many rows have to be filtered on table level:

#eeeeee; border: 1px dashed rgb(204 , 204 , 204); overflow: auto;">Plan hash value: 1297049223

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2342 (100)| 101 |00:00:00.01 | 2192 | 30 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 1 | 19 | 2342 (1)| 101 |00:00:00.01 | 2192 | 30 |
| 2 | BITMAP CONVERSION TO ROWIDS | | 1 | | | 13433 |00:00:00.01 | 3 | 30 |
|* 3 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX | 1 | | | 3 |00:00:00.01 | 3 | 30 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
3 - access("PCT_FREE" IS NULL)

Some interesting points to mention: The 13,000+ rows are identified in the Bitmap index using just three index row entries / bitmap fragments, so that's the special efficiency of Bitmap indexes where a single index row entry can cover many, many table rows, and it's also interesting to see that the costing is pretty different from the B*Tree index costing (2342 vs. 1028, in this case closer to reality of 2,200 consistent gets but we'll see in a moment how this can change) - and no cardinality estimate gets mentioned on Bitmap index level  - the B*Tree index plan showed the spot on 13,433 estimated rows.

So reproducing the B*Tree test case, let's add the OWNER column to the Bitmap index in an attempt to increase the efficiency. Note that I drop the previous index to prevent Oracle from a "proper" usage of Bitmap indexes, as we'll see in a moment:

#eeeeee; border: 1px dashed rgb(204 , 204 , 204); overflow: auto;">SQL> drop index null_index_idx;

Index dropped.

SQL> create bitmap index null_index_idx2 on null_index (pct_free, owner);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1751956722

-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 2343 (1)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 2343 (1)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 3 | BITMAP INDEX RANGE SCAN | NULL_INDEX_IDX2 | | | | |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
3 - access("PCT_FREE" IS NULL)
filter("PCT_FREE" IS NULL AND ("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
105 consistent gets
30 physical reads
0 redo size
7199 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

Plan hash value: 1751956722

---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2343 (100)| 101 |00:00:00.01 | 105 | 30 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 1 | 19 | 2343 (1)| 101 |00:00:00.01 | 105 | 30 |
| 2 | BITMAP CONVERSION TO ROWIDS | | 1 | | | 101 |00:00:00.01 | 4 | 30 |
|* 3 | BITMAP INDEX RANGE SCAN | NULL_INDEX_IDX2 | 1 | | | 1 |00:00:00.01 | 4 | 30 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
3 - access("PCT_FREE" IS NULL)
filter(("PCT_FREE" IS NULL AND ("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')))

So now we end up with an "Bitmap index range scan" operation, which in reality looks pretty efficient - just 105 consistent gets, so assuming 101 consistent gets for accessing the 101 table rows it just required 4 consistent gets on index level. But then look at the cost estimate: 2343, which is even greater than the cost estimate of the previous plan, and also check the "Predicate Information" section, which looks pretty weird, too - an access only for PCT_FREE IS NULL, a filter on index level repeating the whole predicates including the PCT_FREE IS NULL and most significantly the predicates on OWNER repeated on table level.

Clearly what the optimizer assumes in terms of costing and predicates required doesn't correspond to what happens at runtime, which looks pretty efficient, but at least according the predicates on index level again doesn't look like the optimal strategy we would like to see again: Why the additional filter instead of just access? We can also see that echoed in the Rowsource statistics: Only a single Bitmap index fragment gets produced by the "Bitmap index range scan" but it requires 4 consistent gets on index level, so three of them get "filtered" after access.

The costing seems to assume that only the PCT_FREE IS NULL rows are identified on index level, which clearly isn't the case at runtime...

Of course this is not proper usage of Bitmap indexes - typically you don't create a multi column Bitmap index but instead make use of the real power of Bitmap indexes, which is how Oracle can combine multiple of them for efficient usage and access.

Before doing so, let's just for the sake of completeness repeat the combined Bitmap index of the B*Tree variant that turned out to be most efficient for the B*Tree case:

#eeeeee; border: 1px dashed rgb(204 , 204 , 204); overflow: auto;">SQL> drop index null_index_idx2;

Index dropped.

SQL> create bitmap index null_index_idx3 on null_index (owner, pct_free);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1022155563

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 83 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 83 (0)| 00:00:01 |
| 3 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 4 | BITMAP INDEX RANGE SCAN | NULL_INDEX_IDX3 | | | | |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("PCT_FREE" IS NULL)
4 - access("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
207 consistent gets
30 physical reads
0 redo size
7199 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

Plan hash value: 1022155563

----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 83 (100)| 101 |00:00:00.01 | 207 | 30 |
| 1 | INLIST ITERATOR | | 1 | | | 101 |00:00:00.01 | 207 | 30 |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 2 | 19 | 83 (0)| 101 |00:00:00.01 | 207 | 30 |
| 3 | BITMAP CONVERSION TO ROWIDS | | 2 | | | 303 |00:00:00.01 | 5 | 30 |
|* 4 | BITMAP INDEX RANGE SCAN | NULL_INDEX_IDX3 | 2 | | | 2 |00:00:00.01 | 5 | 30 |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("PCT_FREE" IS NULL)
4 - access(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))

While we see now again the desired "INLIST ITERATOR" this one looks weird for several reasons, in particular because we now have a much lower cost estimate (83) but in reality it is less efficient than the previous one (cost estimate 2343 but 105 consistent gets) due to the 207 consistent gets required. Why is this so? The "Predicate Information" section shows why: Only the predicate on OWNER is evaluated on index level (303 rows identified on index level) and therefore rows need to be filtered on table level, which looks again like an implementation limitation and pretty unnecessary - after all the PCT_FREE IS NULL should be somehow treated on index level instead.

So finally let's see how things turn out when using Bitmap indexes the way they are designed - by creating multiple ones and let Oracle combine them:

#eeeeee; border: 1px dashed rgb(204 , 204 , 204); overflow: auto;">SQL> create bitmap index null_index_idx on null_index (pct_free);

Index created.

SQL> create bitmap index null_index_idx4 on null_index (owner);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 704944303

-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 8 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 8 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 3 | BITMAP AND | | | | | |
|* 4 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX | | | | |
| 5 | BITMAP OR | | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX4 | | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX4 | | | | |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("PCT_FREE" IS NULL)
6 - access("OWNER"='AUDSYS')
7 - access("OWNER"='CBO_TEST')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
108 consistent gets
60 physical reads
0 redo size
33646 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

Plan hash value: 704944303

---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 8 (100)| 101 |00:00:00.01 | 108 | 60 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 1 | 19 | 8 (0)| 101 |00:00:00.01 | 108 | 60 |
| 2 | BITMAP CONVERSION TO ROWIDS | | 1 | | | 101 |00:00:00.01 | 7 | 60 |
| 3 | BITMAP AND | | 1 | | | 1 |00:00:00.01 | 7 | 60 |
|* 4 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX | 1 | | | 3 |00:00:00.01 | 3 | 30 |
| 5 | BITMAP OR | | 1 | | | 1 |00:00:00.01 | 4 | 30 |
|* 6 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX4 | 1 | | | 1 |00:00:00.01 | 2 | 30 |
|* 7 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX4 | 1 | | | 1 |00:00:00.01 | 2 | 0 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("PCT_FREE" IS NULL)
6 - access("OWNER"='AUDSYS')
7 - access("OWNER"='CBO_TEST')

So now we see access predicates only and Oracle making efficient use by combining multiple Bitmap indexes. Nevertheless I find the range of costing amazing: This plan is assigned a cost of 8 but it's actually less efficient at runtime (108 consistent gets) than the plan above having a cost of 2343 assigned but requiring just 105 consistent gets at runtime. Clearly the costing of Bitmap indexes is still - even in version 18.3 - full of surprises.

Summary

Repeating the same exercise as previously using Bitmap indexes shows several things:

- Oracle isn't necessarily good at costing and using multi column Bitmap indexes properly
- The costing of Bitmap indexes is still questionable (the most important figure "Clustering Factor" is still meaningless for Bitmap indexes)
- For proper handling use Bitmap indexes the way they are supposed to be used: By creating separate ones and let Oracle combine them

Danger – Hints

It shouldn’t be possible to get the wrong results by using a hint – but hints are dangerous and the threat may be there if you don’t know exactly what a hint is supposed to do (and don’t check very carefully what has happened when you’ve used one that you’re not familiar with).

This post was inspired by a blog note from Connor McDonald titled “Being Generous to the Optimizer”. In his note Connor gives an example where the use of “flexible” SQL results in an execution plan that is always expensive to run when a more complex version of the query could produce a “conditional” plan which could be efficient some of the time and would be expensive only when there was no alternative. In his example he rewrote the first query below to produce the second query:


select data
from   address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select  data
from    address
where   ( :choice = 1 and street = :val )
union all
select  data
from    address
where   ( :choice = 2 and suburb = :val );

(We are assuming that bind variable :choice is constrained to be 1 or 2 and no other value.)

In its initial form the optimizer had to choose a tablescan for the query, in its final form the query can select which half of a UNION ALL plan to execute because the optimizer inserts a pair of FILTER operations that check the actual value of :choice at run-time.

When I started reading the example my first thought was to wonder why the optimizer hadn’t simply used “OR-expansion” (or concatenation if you’re running an older version), then I remembered that by the time the optimizer really gets going it has forgotten that “:choice” is the same bind variable in both cases, so doesn’t realise that it would use only one of two possible predicates. However, that doesn’t mean you can’t tell the optimizer to use concatenation. Here’s a model – modified slightly from Connor’s original:


drop table address purge;
create table address ( street int, suburb int, post_code int,  data char(100));

insert into address
select mod(rownum,1e4), mod(rownum,10), mod(rownum,1e2), rownum
from dual connect by level  <= 1e5 -- > comment to avoid WordPress format issue
;

commit;

exec dbms_stats.gather_table_stats('','ADDRESS')

create index ix1 on address ( street );
create index ix2 on address ( suburb );
create index ix3 on address ( post_code );

variable val number = 6
variable choice number = 1

alter session set statistics_level = all;
set serveroutput off
set linesize 180
set pagesize 60

select
        /*+ or_expand(@sel$1) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

I’ve added one more column to the table and indexed it – I’ll explain why later. I’ve also modified the query to show the output but restricted the result set to a count of the data column rather than a (long) list of rows.

Here’s the execution plan output when hinted:


SQL_ID  6zsh2w6d9mddy, child number 0
-------------------------------------
select  /*+ or_expand(@sel$1) */  count(data) from    address where  (
:choice = 1 and street = :val ) or     ( :choice = 2 and suburb = :val )

Plan hash value: 3986461375

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |      12 |     27 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |      12 |     27 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |     10 |00:00:00.01 |      12 |     27 |
|   3 |    UNION-ALL                            |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|*  4 |     FILTER                              |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      1 |     10 |     10 |00:00:00.01 |      12 |     27 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      1 |     10 |     10 |00:00:00.01 |       2 |     27 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |      0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |      0 |
------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

As you can see we have a UNION ALL plan with two FILTER operations, and the filter operations allow one or other of the two branches of the UNION ALL to execute depending on the value for :choice. Since I’ve reported the rowsource execution statistics you can also see that the table access through index range scan (operations 5 and 6) has executed once (Starts = 1) but the tablescan (operation 8) has not been executed at all.

If you check the Predicate Information you will see that operation 8 has introduced two lnnvl() predicates. Since the optimizer has lost sight of the fact that :choice is the same variable in both cases it has to assume that sometimes both branches will be relevant for a single execution, so it has to add predicates to the second branch to eliminate data that might have been found in the first branch. This is the (small) penalty we pay for avoiding a “fully-informed” manual rewrite.

Take a look at the Outline Data – we can see our or_expand() hint repeated there, and we can discover that it’s been enhanced. The hint should have been or_expand(@sel$1 (1) (2)). This might prompt you to modify the original SQL to use the fully qualified hint rather than the bare-bones form we’ve got so far. So let’s assume we do that before shipping the code to production.

Now imagine that a couple of months later an enhancement request appears to allow queries on post_code and the front-end has been set up so that we can specify a post_code query by selecting choice number 3. The developer who happens to pick up the change request duly modifies the SQL as follows:


select
        /*+ or_expand(@sel$1 (1) (2)) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
or     ( :choice = 3 and post_code = :val)
;

Note that we’ve got the “complete” hint in place, but there’s now a 3rd predicate. Do you think the hint is still complete ? What do you think will happen when we run the query ? Here’s the execution plan when I set :choice to 3.


select  /*+ or_expand(@sel$1 (1) (2)) */  count(data) from    address
where  ( :choice = 1 and street = :val ) or     ( :choice = 2 and
suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 3986461375

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |      0 |00:00:00.01 |
|   3 |    UNION-ALL                            |                 |      1 |        |      0 |00:00:00.01 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

We get a UNION ALL with two branches, one for :choice = 1, one for :choice = 2 and both of them show zero starts – and we don’t have any part of the plan to handle :choice = 3. The query returns no rows – and if you check the table creation code you’ll see it should have returned 1000 rows. An incorrect (historically adequate) hint has given us wrong results.

If we want the full hint for this new queryy we need to specify the 3rd predicate, by adding (3) to the existing hint to get the following plan (and correct results):


select  /*+ or_expand(@sel$1 (1) (2) (3)) */  count(data) from
address where  ( :choice = 1 and street = :val ) or     ( :choice = 2
and suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 2153173029

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |    1639 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |    1639 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  11009 |   1000 |00:00:00.01 |    1639 |
|   3 |    UNION-ALL                            |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |
|*  9 |     FILTER                              |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|* 10 |      TABLE ACCESS FULL                  | ADDRESS         |      1 |    999 |   1000 |00:00:00.01 |    1639 |
---------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$49E1C21B_3")
      OUTLINE_LEAF(@"SET$49E1C21B_2")
      OUTLINE_LEAF(@"SET$49E1C21B_1")
      OUTLINE_LEAF(@"SET$49E1C21B")
      OR_EXPAND(@"SEL$1" (1) (2) (3))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1")
      FULL(@"SET$49E1C21B_2" "ADDRESS"@"SET$49E1C21B_2")
      FULL(@"SET$49E1C21B_3" "ADDRESS"@"SET$49E1C21B_3")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))
   9 - filter(:CHOICE=3)
  10 - filter(("POST_CODE"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL)) AND (LNNVL(:CHOICE=2) OR
              LNNVL("SUBURB"=:VAL))))


We now have three branches to the UNION ALL, and the final branch (:choice =3) ran to show A-rows = 1000 selected in the tablescan.

Conclusion

You shouldn’t mess about with hints unless you’re very confident that you know how they work and then test extremely carefully – especially if you’re modifying old code that already contains some hints.

 

Being generous to the optimizer

In a perfect world, the optimizer would reach out from the server room, say to us: “Hey, lets grab a coffee and have a chat about that query of yours”. Because ultimately, that is the task we are bestowing on the optimizer – to know what our intent was in terms of running a query in a way that meets the performance needs of our applications. It generally does a pretty good job even without the coffee Smile, but if we can keep that caffeine hit in mind, we can do our bit as SQL developers to give the optimizer as much assistance as we can.

Here’s such an example. Let’s assume users of your application can perform searches for street addresses. They nominate what kind of search they want to do (street name or suburb), and then provide a value to search on.

image

Here’s some base tables to support the application.


SQL> create table address ( street int, suburb int, data char(100));

Table created.

SQL> insert into address
  2  select mod(rownum,1000), mod(rownum,10), rownum from dual connect by level 
SQL> commit;

Commit complete.

SQL>
SQL> exec dbms_stats.gather_table_stats('','ADDRESS');

PL/SQL procedure successfully completed.

SQL> create index ix1 on address ( street );

Index created.

SQL> create index ix2 on address ( suburb );

Index created.

To run that search from the application, probably the most straightforward SQL that handles the requirement is:


SQL> select data
  2  from   address
  3  where ( :choice = 1 and street = :val )
  4  or    ( :choice = 2 and suburb = :val );

on the assumption here that the application sends “1” for street search type, and “2” for a suburb search.

Let’s give that a run for a nominated street and see what the optimizer makes of this SQL.


SQL> variable choice number = 1
SQL> variable val number = 6
SQL> set feedback only
SQL> select data
  2  from   address
  3  where ( :choice = 1 and street = :val )
  4  or    ( :choice = 2 and suburb = :val );

100 rows selected.

SQL> set feedback on
SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  cwwcr79bfx5zz, child number 0
-------------------------------------
select data from   address where ( :choice = 1 and street = :val ) or
 ( :choice = 2 and suburb = :val )

Plan hash value: 3645838471

-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |       |       |   445 (100)|          |
|*  1 |  TABLE ACCESS FULL| ADDRESS |   101 | 10908 |   445   (1)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(((:CHOICE=2 AND "SUBURB"=:VAL) OR ("STREET"=:VAL AND
              :CHOICE=1)))

Notice that even though we have indexes on both STREET and SUBURB it opted for a full table scan. The driving metrics behind this decision is that the selectivity on the SUBURB index is very poor (only 10 distinct values). Of course, this means that users who perform searches on STREET are also going to be punished with this table scan.

In an ideal world, the optimizer would be clever enough to dive into the SQL, notice that the same bind variable is being used in each predicate, and that the values for that bind variable are mutually exclusive, and thus know that only one of the OR conditions can ever be true and optimize the query with that knowledge. But as I mentioned, we never get that chance to “grab a coffee” with the optimizer.

However, as developers, we can assist by restructuring our query to give that little bit more information to the optimizer. I’ve rewritten the query as as UNION ALL query to separate both predicates into their own SQL.


SQL> set feedback only
SQL> select data
  2  from   address
  3  where ( :choice = 1 and street = :val )
  4  union all
  5  select data
  6  from   address
  7  where    ( :choice = 2 and suburb = :val );

100 rows selected.

SQL> set feedback on
SQL>
SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID  3x1nc068ntchg, child number 0
-------------------------------------
select data from   address where ( :choice = 1 and street = :val )
union all select data from   address where    ( :choice = 2 and suburb
= :val )

Plan hash value: 1995695946

-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |       |       |   545 (100)|          |
|   1 |  UNION-ALL                            |         |       |       |            |          |
|*  2 |   FILTER                              |         |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS |   100 | 10500 |   101   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | IX1     |   100 |       |     1   (0)| 00:00:01 |
|*  5 |   FILTER                              |         |       |       |            |          |
|*  6 |    TABLE ACCESS FULL                  | ADDRESS | 10000 |  1015K|   444   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:CHOICE=1)
   4 - access("STREET"=:VAL)
   5 - filter(:CHOICE=2)
   6 - filter("SUBURB"=:VAL)

After running that SQL, the plan might look worse, but the key elements here are line 2 and 5. Notice that these are FILTER steps which means that if they evaluate to false, then the underlying steps will not be run. In effect, the optimizer now knows that it will only need to run one “half” of the UNION ALL based on the outcome of the two FILTER steps.

That all might sound good in theory but we need to prove this hypothesis. I’ll run the first query and check the statistics:


SQL> set autotrace traceonly stat
SQL> select data
  2  from   address
  3  where ( :choice = 1 and street = :val )
  4  or    ( :choice = 2 and suburb = :val );

100 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1578  consistent gets
          0  physical reads
          0  redo size
      11184  bytes sent via SQL*Net to client
        624  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

and now compare that to the second query.


SQL> select data
  2  from   address
  3  where ( :choice = 1 and street = :val )
  4  union all
  5  select data
  6  from   address
  7  where    ( :choice = 2 and suburb = :val );

100 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        103  consistent gets
          0  physical reads
          0  redo size
      11184  bytes sent via SQL*Net to client
        624  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Nice stuff. Note that this does not stop the full table scan when searching for SUBURB, but what has been achieved that we are getting much better search performance for STREET lookups.

Using the GATHER_PLAN_STATISTICS hint, it is also possible to see the benefit of the FILTER steps by looking at the Starts/A-Rows columns.


SQL> select /*+ gather_plan_statistics */ data
  2  from   address
  3  where ( :choice = 1 and street = :val )
  4  union all
  5  select data
  6  from   address
  7  where    ( :choice = 2 and suburb = :val );

100 rows selected.

SQL> set lines 200
SQL> set feedback on
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
SQL_ID  ch0h5t76r7d2m, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ data from   address where (
:choice = 1 and street = :val ) union all select data from   address
where    ( :choice = 2 and suburb = :val )

Plan hash value: 1995695946

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |      1 |        |    100 |00:00:00.01 |     103 |
|   1 |  UNION-ALL                            |         |      1 |        |    100 |00:00:00.01 |     103 |
|*  2 |   FILTER                              |         |      1 |        |    100 |00:00:00.01 |     103 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS |      1 |    100 |    100 |00:00:00.01 |     103 |
|*  4 |     INDEX RANGE SCAN                  | IX1     |      1 |    100 |    100 |00:00:00.01 |       3 |
|*  5 |   FILTER                              |         |      1 |        |      0 |00:00:00.01 |       0 |
|*  6 |    TABLE ACCESS FULL                  | ADDRESS |      0 |  10000 |      0 |00:00:00.01 |       0 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:CHOICE=1)
   4 - access("STREET"=:VAL)
   5 - filter(:CHOICE=2)
   6 - filter("SUBURB"=:VAL)

Notice that we never commenced (started) the full scan for the SUBURB lookup, because the overarching FILTER step returned nothing.

So when you discover that you know something about a query that the optimizer does not, look for ways pass that information onto the optimizer so that it (and your users) benefit from the knowledge you have as a developer of your applications.