Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

What’s new with Oracle database 12.1.0.2.190416 versus 12.1.0.2.190716

There are a couple of undocumented spare parameters changed to named undocumented parameters, this is quite normal to see.

With the Oracle database version 12.1.0.2.190416 patched to 12.1.0.2.190716 on linux, the following things have changed:

parameters unique in version 12.1.0.2.190416 versus 12.1.0.2.190716

NAME
--------------------------------------------------
_ninth_spare_parameter
_one-hundred-and-forty-sixth_spare_parameter
_tenth_spare_parameter

parameters unique in version 12.1.0.2.190716 versus 12.1.0.2.190416

NAME
--------------------------------------------------
_bug20684983_buffer_deadlock_cancel_count
_bug22690648_gcr_cpu_min_hard_limit
_gcs_disable_imc_preallocation

parameter values changed isdefault between 12.1.0.2.190416 versus 12.1.0.2.190716

parameter values unique to 12.1.0.2.190416 versus 12.1.0.2.190716

parameter values unique to 12.1.0.2.190716 versus 12.1.0.2.190416

waitevents unique in version 12.1.0.2.190416 versus 12.1.0.2.190716

waitevents unique in version 12.1.0.2.190716 versus 12.1.0.2.190416

waitevents changed parameter description between 12.1.0.2.190416 versus 12.1.0.2.190716

x$ tables unique to 12.1.0.2.190416 versus 12.1.0.2.190716

x$ tables unique to 12.1.0.2.190716 versus 12.1.0.2.190416

x$ tables columns unique to 12.1.0.2.190416 versus 12.1.0.2.190716

x$ tables columns unique to 12.1.0.2.190716 versus 12.1.0.2.190416

v$ tables unique to 12.1.0.2.190416 versus 12.1.0.2.190716

v$ tables unique to 12.1.0.2.190716 versus 12.1.0.2.190416

v$ tables columns unique to 12.1.0.2.190416 versus 12.1.0.2.190716

v$ tables columns unique to 12.1.0.2.190716 versus 12.1.0.2.190416

gv$ tables unique to 12.1.0.2.190416 versus 12.1.0.2.190716

gv$ tables unique to 12.1.0.2.190716 versus 12.1.0.2.190416

gv$ tables columns unique to 12.1.0.2.190416 versus 12.1.0.2.190716

gv$ tables columns unique to 12.1.0.2.190716 versus 12.1.0.2.190416

sysstat statistics unique to 12.1.0.2.190416 versus 12.1.0.2.190716

sysstat statistics unique to 12.1.0.2.190716 versus 12.1.0.2.190416

sys_time_model statistics unique to 12.1.0.2.190416 versus 12.1.0.2.190716

sys_time_model statistics unique to 12.1.0.2.190716 versus 12.1.0.2.190416

dba tables unique to 12.1.0.2.190416 versus 12.1.0.2.190716

dba tables unique to 12.1.0.2.190716 versus 12.1.0.2.190416

dba tables columns unique to 12.1.0.2.190716 versus 12.1.0.2.190416

dba tables columns unique to 12.1.0.2.190416 versus 12.1.0.2.190716

cdb tables unique to 12.1.0.2.190416 versus 12.1.0.2.190716

cdb tables unique to 12.1.0.2.190716 versus 12.1.0.2.190416

cdb tables column unique to 12.1.0.2.190416 versus 12.1.0.2.190716

cdb tables column unique to 12.1.0.2.190716 versus 12.1.0.2.190416

And here are the differences in symbols (c functions).

The most notable thing to see are:
– almost no functions removed, it is normal to see a bigger number of functions to vanish.
– quite a lot of functions that are added are related to krsr*, kernel redo standby/dataguard remote file server.
– version 12.2 up to version 19 has had an addition of a huge amount of functions related to little cms, but it seems version 12.1 did not get this addition.

code symbol names unique in version 12.1.0.2.190416 versus 12.1.0.2.190716

NAME                                                         RESOLVE                                                      ANNOTATION
------------------------------------------------------------ ------------------------------------------------------------ --------------------------------------------------------------------------------
free_ex_data                                                 free_ex_data                                                 ??
set_dh                                                       set_dh                                                       ??

code symbol names unique in version 12.1.0.2.190716 versus 12.1.0.2.190416

NAME                                                         RESOLVE                                                      ANNOTATION
------------------------------------------------------------ ------------------------------------------------------------ --------------------------------------------------------------------------------
R_SSL_clear_options                                          R_SSL_clear_options                                          ??
R_SSL_clear_options_by_type                                  R_SSL_clear_options_by_type                                  ??
free_ex_data_arg                                             free_ex_data_arg                                             ??
kcl_try_cancel                                               (kcl)_try_cancel                                             kernel cache lock manager/buffer cache ??
kcvfdb_set_dict_check                                        (kcv)fdb_set_dict_check                                      kernel cache recovery ??
kewmcdmvbrwm                                                 (kewm)cdmvbrwm                                               kernel event AWR metrics ??
kewmcomprmdif                                                (kewm)comprmdif                                              kernel event AWR metrics ??
kewmgetendftm                                                (kewm)getendftm                                              kernel event AWR metrics ??
kfdvaOpRslvFail                                              (kfdva)OpRslvFail                                            kernel automatic storage management disk virtual ATB ??
kfdvaPopRslvFail                                             (kfdva)PopRslvFail                                           kernel automatic storage management disk virtual ATB ??
kfdvaVopRslvFail                                             (kfdva)VopRslvFail                                           kernel automatic storage management disk virtual ATB ??
kfgbCheckInterrupt                                           (kfgb)CheckInterrupt                                         kernel automatic storage management diskgroups background ??
kfis_sageonly_anygroup_all                                   (kfis)_sageonly_anygroup_all                                 kernel automatic storage management intelligent storage interfaces ??
kfnrclFindRmt                                                (kfn)rclFindRmt                                              kernel automatic storage management networking subsystem ??
kgfnConnect3                                                 (kgf)nConnect3                                               kernel generic ASM ??
kgfnGetCSSMisscount                                          (kgf)nGetCSSMisscount                                        kernel generic ASM ??
kgfpInitComplete3                                            (kgf)pInitComplete3                                          kernel generic ASM ??
kjblimcestimate                                              (kjbl)imcestimate                                            kernel lock management global cache service lock table ??
krbCreateXml                                                 (krb)CreateXml                                               kernel redo backup/restore ??
krbi_create_xml                                              krbi_create_xml                                              kernel redo backup/restore dbms_backup_restore package DBMS_BACKUP_RESTORE.KRBI_
                                                                                                                          CREATE_XML CREATETEMPXMLFILE

krcdct_int                                                   (krc)dct_int                                                 kernel redo block change tracking ??
krcpsckp_canc                                                (krc)psckp_canc                                              kernel redo block change tracking ??
krsr_chk_sna                                                 (krsr)_chk_sna                                               kernel redo standby/dataguard (?) remote file server ??
krsr_copy_rcv_info                                           (krsr)_copy_rcv_info                                         kernel redo standby/dataguard (?) remote file server ??
krsr_copy_thread_info                                        (krsr)_copy_thread_info                                      kernel redo standby/dataguard (?) remote file server ??
krsr_get_rta_info                                            (krsr)_get_rta_info                                          kernel redo standby/dataguard (?) remote file server ??
krsr_pic_release                                             (krsr)_pic_release                                           kernel redo standby/dataguard (?) remote file server ??
krsr_reset_sna                                               (krsr)_reset_sna                                             kernel redo standby/dataguard (?) remote file server ??
krsr_rta_adjusted_numblks                                    (krsr)_rta_adjusted_numblks                                  kernel redo standby/dataguard (?) remote file server ??
krsr_sna_io_finish                                           (krsr)_sna_io_finish                                         kernel redo standby/dataguard (?) remote file server ??
krsr_snldt_get_inlist                                        (krsr)_snldt_get_inlist                                      kernel redo standby/dataguard (?) remote file server ??
krsr_snldt_get_simap                                         (krsr)_snldt_get_simap                                       kernel redo standby/dataguard (?) remote file server ??
krsr_update_lamport                                          (krsr)_update_lamport                                        kernel redo standby/dataguard (?) remote file server ??
ksfd_estimate_fobpools                                       (ksfd)_estimate_fobpools                                     kernel service (VOS) functions disk IO ??
ksfd_iorm_throttled                                          (ksfd_io)rm_throttled                                        kernel service (VOS) functions disk IO perform IO ??
r_ex_data_clear                                              r_ex_data_clear                                              ??
r_ex_data_update                                             r_ex_data_update                                             ??
r_ssl_ctx_ex_data_clear                                      r_ssl_ctx_ex_data_clear                                      ??
r_ssl_ctx_get_dh_uses                                        r_ssl_ctx_get_dh_uses                                        ??
r_ssl_ctx_set_dh_uses                                        r_ssl_ctx_set_dh_uses                                        ??
r_ssl_ec_cert_algs_are_equal                                 r_ssl_ec_cert_algs_are_equal                                 ??
r_ssl_get_dh_uses                                            r_ssl_get_dh_uses                                            ??
r_ssl_set_dh_uses                                            r_ssl_set_dh_uses                                            ??
ri_ssl3_base                                                 ri_ssl3_base                                                 ??
ri_ssl3_ctx_dh_tmp                                           ri_ssl3_ctx_dh_tmp                                           ??
ri_ssl3_dh_tmp                                               ri_ssl3_dh_tmp                                               ??
ri_ssl_cert_dup_params                                       ri_ssl_cert_dup_params                                       ??
ri_ssl_cipher_ctx_cipher_size                                ri_ssl_cipher_ctx_cipher_size                                ??
ri_ssl_cipher_ctx_is_aead                                    ri_ssl_cipher_ctx_is_aead                                    ??

Trace Files

A recent blog note by Martin Berger about reading trace files in 12.2 poped up in my twitter timeline yesterday and reminded me of a script I wrote a while ago to create a simple view I could query to read the tracefile generated by the current session while the session was still connected. You either have to create the view and a public synonym through the SYS schema, or you have to use the SYS schema to grant select privileges on several dynamic performance views to the user to allow the user to create the view in the user’s schema. For my scratch database I tend to create the view in the SYS schema.

Script to be run by SYS:

rem
rem     Script: read_trace_122.sql
rem     Author: Jonathan Lewis
rem     Dated:  Sept 2018
rem
rem     Last tested
rem             12.2.0.1

create or replace view my_trace_file as
select 
        *
from 
        v$diag_trace_file_contents
where
        (adr_home, trace_filename) = (
                select
                --      substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                        substr(
                                substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                                1,
                                instr(
                                        substr(tracefile, 1, instr(tracefile,'/',-1)),
                                        'trace'
                                ) - 2
                        ),
                        substr(tracefile, instr(tracefile,'/',-1)+1) trace_filename
                from 
                        v$process
                where   addr = (
                                select  paddr
                                from    v$session
                                where   sid = (
                                        sys_context('userenv','sid')
                                        -- select sid from v$mystat where rownum = 1
                                        -- select dbms_support.mysid from dual
                                )
                        )
        )
;


create public synonym my_trace_file for sys.my_trace_file;
grant select on my_trace_file to {some role};

Alternatively, the privileges you could grant to a user from SYS so that they could create their own view:


grant select on v_$process to some_user;
grant select on v_$session to some_user;
grant select on v_$diag_trace_file_contents to some_user;
and optionally one of:
        grant select on v_$mystat to some_user;
        grant execute on dbms_support to some_user;
                but dbms_support is no longer installed by default.

The references to package dbms_support and view v$mystat are historic ones I have lurking in various scripts from the days when the session id (SID) wasn’t available in any simpler way.

Once the view exists and is available, you can enable some sort of tracing from your session then query the view to read back the trace file. For example, here’s a simple “self-reporting” (it’s going to report the trace file that it causes) script that I’ve run from 12.2.0.1 as a demo:


alter system flush shared_pool;
alter session set sql_trace true;

set linesize 180
set trimspool on
set pagesize 60

column line_number      format  999,999
column piece            format  a150    
column plan             noprint
column cursor#          noprint

break on plan skip 1 on cursor# skip 1

select
        line_number,
        line_number - row_number() over (order by line_number) plan,
        substr(payload,1,instr(payload,' id=')) cursor#,
        substr(payload, 1,150) piece
from
        my_trace_file
where
        file_name = 'xpl.c'
order by
        line_number
/

alter session set sql_trace false;

The script flushes the shared pool to make sure that it’s going to trigger some recursive SQL then enables a simple SQL trace. The query then picks out all the lines in the trace file generated by code in the Oracle source file xpl.c (execution plans seems like a likely guess) which happens to pick out all the STAT lines in the trace (i.e. the ones showing the execution plans).

I’ve used the “tabibitosan” method to identify all the lines that belong to a single execution plan by assuming that they will be consecutive lines in the output starting from a line which includes the text ” id=1 “ (the surrounding spaces are important), but I’ve also extracted the bit of the line which includes the cursor number (STAT #nnnnnnnnnnnnnnn) because two plans may be dumped one after the other if multiple cursors close at the same time. There is still a little flaw in the script because sometimes Oracle will run a sys-recursive statement in the middle of dumping a plan to turn an object_id into an object_name, and this will cause a break in the output.

The result of the query is to extract all the execution plans in the trace file and print them in the order they appear – here’s a sample of the output:


LINE_NUMBER PIECE
----------- ------------------------------------------------------------------------------------------------------------------------------------------------------
         38 STAT #140392790549064 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=53 us cost=4 size=113 card
         39 STAT #140392790549064 id=2 cnt=0 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=0 card=1)'


         53 STAT #140392790535800 id=1 cnt=1 pid=0 pos=1 obj=0 op='MERGE JOIN OUTER (cr=5 pr=0 pw=0 str=1 time=95 us cost=2 size=178 card=1)'
         54 STAT #140392790535800 id=2 cnt=1 pid=1 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=0 pw=0 str=1 time=57 us cost=2 size=138 card=1)'
         55 STAT #140392790535800 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'
         56 STAT #140392790535800 id=4 cnt=0 pid=1 pos=2 obj=0 op='BUFFER SORT (cr=2 pr=0 pw=0 str=1 time=29 us cost=0 size=40 card=1)'
         57 STAT #140392790535800 id=5 cnt=0 pid=4 pos=1 obj=73 op='TABLE ACCESS BY INDEX ROWID TAB_STATS$ (cr=2 pr=0 pw=0 str=1 time=10 us cost=0 size=40 card=1)
         58 STAT #140392790535800 id=6 cnt=0 pid=5 pos=1 obj=74 op='INDEX UNIQUE SCAN I_TAB_STATS$_OBJ# (cr=2 pr=0 pw=0 str=1 time=8 us cost=0 size=0 card=1)'


         84 STAT #140392791412824 id=1 cnt=1 pid=0 pos=1 obj=20 op='TABLE ACCESS BY INDEX ROWID BATCHED ICOL$ (cr=4 pr=0 pw=0 str=1 time=25 us cost=2 size=54 card
         85 STAT #140392791412824 id=2 cnt=1 pid=1 pos=1 obj=42 op='INDEX RANGE SCAN I_ICOL1 (cr=3 pr=0 pw=0 str=1 time=23 us cost=1 size=0 card=2)'


         94 STAT #140392790504512 id=1 cnt=2 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=7 pr=0 pw=0 str=1 time=432 us cost=6 size=374 card=2)'
         95 STAT #140392790504512 id=2 cnt=2 pid=1 pos=1 obj=0 op='HASH JOIN OUTER (cr=7 pr=0 pw=0 str=1 time=375 us cost=5 size=374 card=2)'
         96 STAT #140392790504512 id=3 cnt=2 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 str=1 time=115 us cost=2 size=288 card=2)'
         97 STAT #140392790504512 id=4 cnt=2 pid=3 pos=1 obj=19 op='TABLE ACCESS CLUSTER IND$ (cr=3 pr=0 pw=0 str=1 time=100 us cost=2 size=184 card=2)'
         98 STAT #140392790504512 id=5 cnt=1 pid=4 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=85 us cost=1 size=0 card=1)'
         99 STAT #140392790504512 id=6 cnt=0 pid=3 pos=2 obj=75 op='TABLE ACCESS BY INDEX ROWID IND_STATS$ (cr=1 pr=0 pw=0 str=2 time=8 us cost=0 size=52 card=1)'
        100 STAT #140392790504512 id=7 cnt=0 pid=6 pos=1 obj=76 op='INDEX UNIQUE SCAN I_IND_STATS$_OBJ# (cr=1 pr=0 pw=0 str=2 time=7 us cost=0 size=0 card=1)'
        101 STAT #140392790504512 id=8 cnt=0 pid=2 pos=2 obj=0 op='VIEW  (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=43 card=1)'
        102 STAT #140392790504512 id=9 cnt=0 pid=8 pos=1 obj=0 op='SORT GROUP BY (cr=3 pr=0 pw=0 str=1 time=44 us cost=3 size=15 card=1)'
        103 STAT #140392790504512 id=10 cnt=0 pid=9 pos=1 obj=31 op='TABLE ACCESS CLUSTER CDEF$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=15 card=1)'
        104 STAT #140392790504512 id=11 cnt=1 pid=10 pos=1 obj=30 op='INDEX UNIQUE SCAN I_COBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        116 STAT #140392791480168 id=1 cnt=4 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 str=1 time=62 us cost=3 size=858 card=13)'
        117 STAT #140392791480168 id=2 cnt=4 pid=1 pos=1 obj=21 op='TABLE ACCESS CLUSTER COL$ (cr=3 pr=0 pw=0 str=1 time=24 us cost=2 size=858 card=13)'
        118 STAT #140392791480168 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        126 STAT #140392789565328 id=1 cnt=1 pid=0 pos=1 obj=14 op='TABLE ACCESS CLUSTER SEG$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=68 card=1)'
        127 STAT #140392789565328 id=2 cnt=1 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN I_FILE#_BLOCK# (cr=2 pr=0 pw=0 str=1 time=12 us cost=1 size=0 card=1)'


        135 STAT #140392789722208 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=22 us cost=3 size=51 card=
        136 STAT #140392789722208 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=0 pw=0 str=1 time=16 us cost=2 size=0 card=1)'


        153 STAT #140392792055264 id=1 cnt=1 pid=0 pos=1 obj=68 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=0 pw=0 str=1 time=25 us)'
        154 STAT #140392792055264 id=2 cnt=1 pid=1 pos=1 obj=70 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 str=1 time=19 us)'

If you want to investigate further, the “interesting” columns in the underlying view are probably: section_name, component_name, operation_name, file_name, and function_name. The possible names of functions, files, etc. vary with the trace event you’ve enabled.

 

Midlands Microsoft 365 and Azure User Group – October 2019

https://oracle-base.com/blog/wp-content/uploads/2019/09/m365-and-azure-3... 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px" />

On Tuesday evening I went to the second event of the Midlands Microsoft 365 and Azure User Group. It was co-organised by Urfaan Azhar and Lee Thatcher from Pure Technology Group, and Adrian Newton from my company.

First up was Matt Fooks speaking about security in M365 and Azure. He did an overview and demos of a number of security features, giving an idea of their scope, how easy/hard they were to configure and importantly what licenses they were covered by. Some of this was a bit over my head, but for me it’s about understanding possibilities and what’s available, even if I wouldn’t have a clue what to do with it. Other people in the company do that stuff. At one point I leaned over to one of my colleagues and said, “Could we use that for…”, and he came back with, “Yeah. We’ve done a POC with that. Look!”, then showing me an example of it working with one of our services. </p />
</p></div>

    	  	<div class=

The definition of proof

One of the pieces of advice that I often see on the ‘net is that undo space is somehow this incredibly precious thing, and as a consequence, one should always keep the amount of uncommitted changes in the database to a small size.

Personally I think that is baloney (Ed-in reality, as an Australian I have a slightly more powerful choice of term, but lets keep things PG-rated </p />
</em></p></div>

    	  	<div class=

_cursor_obsolete_threshold

At the recent Trivadis Performance Days in Zurich, Chris Antognini answered a question that had been bugging me for some time. Why would Oracle want to set the default value of _cursor_obsolete_threshold to a value like 8192 in 12.2 ?

In 11.2.0.3 the parameter was introduced with the default value 100; then in 11.2.0.4, continuing into 12.1, the default value increased to 1,024 – what possible reason could anyone have for thinking that 8192 was a good idea ?

The answer is PDBs – specifically the much larger number of PDBs a single CBD can (theoretically) support in 12.2.

In fact a few comments, and the following specific explanation, are available on MoS in Doc ID 2431353.1 “High Version Counts For SQL Statements (>1024) Post Upgrade To 12.2 and Above Causing Database Slow Performance”:

The default value of _cursor_obsolete_threshold is increased heavily (8192 from 1024) from 12.2 onwards in order to support 4096 PDBs which was only 252 PDBs till 12.1. This parameter value is the maximum limit for obsoleting the parent cursors in an multitenant environment and cannot be increased beyond 8192.

Having said, this is NOT applicable for non-CDB environment and hence for those databases, this parameter should be set to 12.1 default value manually i.e. 1024. The default value of 1024 holds good for non-CDB environment and the same parameter can be adjusted case-to-case basis should there be a problem.

It’s all about PDBs – more precisely, it’s all about CDBs running a huge number of PDBs, which is not necessarily the way that many companies are likely to use PDBs. So if you’re a fairly typical companyy running a handful of PDBs in a single CDB then it’s probably a good idea to set the parameter down to the 12.1 value of 1024 (and for bad applications I’d consider going even lower) – and this MOS note actually makes that an official recommendation.

Impact analysis

What’s the worst that could happen if you actually have many PDBs all executing the same application and that application has a few very popular and frequently executed statements? Chris Antognini described a model he’d constructed and some tests he’d done to show the effects. The following code is a variation onhis work. It addresses the following question:

If you have an application that repeatedly issues (explicitly or implicitly) parse calls but doesn’t take advantage of the session cursor cache it has to search the library cache by hash_value / sql_id for the parent cursor, then has to walk the chain of child cursors looking for the right child. What’s the difference in the work done if this “soft parse” has to walk the list to child number 8,191 instead of finding the right cursor at child number 0.

Here’s the complete code for the test:


create table t1
select 1 id from dual
/

alter table t1 add constraint t1_pk primary key (id)
/

spool cursor_obsolete.lst

alter system flush shared_pool;
alter system flush shared_pool;

set serveroutput off
select /*+ index(t1) */ id from t1 where id > 0;
select * from table(dbms_xplan.display_cursor);

execute snap_my_stats.start_snap
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+8192 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

column sql_text format a60
select sql_id, child_number, loaded_versions, executions, sql_text from v$sql where sql_text like 'SELECT%T1%' order by child_number;

prompt  ===============
prompt  Low child reuse
prompt  ===============

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

prompt  ================
prompt  High child reuse
prompt  ================

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 7168+1..7168+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

spool off

I’ve created a table with just one row and given it a primary key. My testing query is going to be very short and simple. A query hinted to return that one row by primary key index range scan.

I’ve flushed the shared pool (twice) to minimise fringe contention from pre-existing information, then executed the statement to populate the dictionary cache and some library cache information and to check the execution plan.

The call to the package snap_my_stats is my standard method for reporting changes in v$mystat across the test. I’ve called the start_snap procedure twice in a row to make sure that its first load doesn’t add some noise to the statistics that we’re trying to capture.

The test runs in three parts.

  • First I loop 8192 times executing the same statement, but with a different value for the optimizer_index_cost_adj for each execution – this gives me the limit of 8192 child cursors, each reporting “Optimizer Mismatch” as the reason for not sharing. I’ve run a query against v$sql after this to check that I have 8192 child cursors – you’ll need to make sure your shared pool is a few hundred megabytes if you want to be sure of keeping them all in memory.
  • The second part of the test simply repeats the loop, but only for the first 1,024 child cursors. At this point the child cursors exist, so the optimizer should be doing “soft” parses rather than hard parses.
  • The final part of the test repeats the loop again, but only for the last 1,024 child cursors. Again they should exist and be usable, so the optimizer should again be doing “soft” parses rather than hard parses.

What I’m looking for is the extra work it takes for Oracle to find the right child cursor when there’s a very long chain of child cursors. From my memory of dumping the library cache in older versions of Oracle, the parent will point to a “segmented array” of pointers to child cursors, and each segment of the array will consist of 16 pointers, plus a pointer to the next segment. So if you have to find child cursor 8191 you will have to following 512 segment pointers, and 16 pointers per segment (totalling 8708 pointers) before you find the child you want – and you’re probably holding a mutex (or latch) while doing so.

One preipheral question to ask, of course, is whether Oracle keeps appending to the segmented array, or whether it uses a “pushdown” approach when allocating a new segment so that newer child cursors are near the start of the array. (i.e. will searching for child cursor 0 be the cheapest one or the most expensive one).

And the results, limited to just the second and third parts, with just a couple of small edits are as follows:


host sdiff -w 120 -s temp1.txt temp2.txt >temp.txt

===============                                            |    ================
Low child reuse                                            |    High child reuse
===============                                            |    ================

Interval:-  0 seconds                                      |    Interval:-  6 seconds

opened cursors cumulative                      2,084       |    opened cursors cumulative                      2,054
recursive calls                                6,263       |    recursive calls                                6,151
recursive cpu usage                               33       |    recursive cpu usage                              570
session logical reads                          1,069       |    session logical reads                          1,027
CPU used when call started                        33       |    CPU used when call started                       579
CPU used by this session                          37       |    CPU used by this session                         579
DB time                                           34       |    DB time                                          580
non-idle wait count                               16       |    non-idle wait count                                5
process last non-idle time                         1       |    process last non-idle time                         6
session pga memory                           524,288       |    session pga memory                            65,536
enqueue requests                                  10       |    enqueue requests                                   3
enqueue releases                                  10       |    enqueue releases                                   3
consistent gets                                1,069       |    consistent gets                                1,027
consistent gets from cache                     1,069       |    consistent gets from cache                     1,027
consistent gets pin                            1,039       |    consistent gets pin                            1,024
consistent gets pin (fastpath)                 1,039       |    consistent gets pin (fastpath)                 1,024
consistent gets examination                       30       |    consistent gets examination                        3
consistent gets examination (fastpath)            30       |    consistent gets examination (fastpath)             3
logical read bytes from cache              8,757,248       |    logical read bytes from cache              8,413,184
calls to kcmgcs                                    5       |    calls to kcmgcs                                    3
calls to get snapshot scn: kcmgss              1,056       |    calls to get snapshot scn: kcmgss              1,026
table fetch by rowid                              13       |    table fetch by rowid                               1
rows fetched via callback                          6       |    rows fetched via callback                          1
index fetch by key                                 9       |    index fetch by key                                 1
index scans kdiixs1                            1,032       |    index scans kdiixs1                            1,024
session cursor cache hits                         14       |    session cursor cache hits                          0
cursor authentications                         1,030       |    cursor authentications                         1,025
buffer is not pinned count                     1,066       |    buffer is not pinned count                     1,026
parse time cpu                                    23       |    parse time cpu                                   558
parse time elapsed                                29       |    parse time elapsed                               556
parse count (total)                            2,076       |    parse count (total)                            2,052
parse count (hard)                                11       |    parse count (hard)                                 3
execute count                                  1,050       |    execute count                                  1,028
bytes received via SQL*Net from client         1,484       |    bytes received via SQL*Net from client         1,486

Two important points to note:

  • the CPU utilisation goes up from 0.33 seconds to 5.7 seconds.
  • the number of hard parses is zero, this is all about searching for the

You might question are the 2,048-ish parse count(total) – don’t forget that we do an “execute immediate” to change the optimizer_index_cost_adj on each pass through the loop. That’s probably why we double the parse count, although the “alter session” doesn’t then report as an “execute count”.

The third call to a statement is often an important one – it’s often the first one that doesn’t need “cursor authentication”, so I ran a similar test executing the last two loops a second time – there was no significant change in the CPU or parse activity between the 2nd and 3rd executions of each cursor. For completeness I also ran a test with the loop for the last 1,024 child cursors ran before the loop for the first child cursors. Again this made no significant difference to the results – the low number child cursors take less CPU to find than the high number child cursors.

Bottom line

The longer the chain of child cursors the more time (elapsed and CPU) you spend searching for the correct child; and when a parent is allowed 8,192 child cursors the extra time can become significant. I would claim that the ca. 5 seconds difference in CPU time appearing in this test corresponds purely to an extra 5 milliseconds walking an extra 7,000 steps down the chain.

If you have a well-behaved application that uses the session cursor cache effectively, or uses “held cursors”, then you may not be worried by very long chains of child cursors. But I have seen many applications where cursor caching is not used and every statement execution from the client turns into a parse call (usually implicit) followed by a hunt through the library cache and walk along the child chain. These applications will not scale well if they are cloned to multiple PDBs sharing the same CDB.

Footnote 1

The odd thing about this “cursor obselete” feature is that I have a distinct memory that when  PDBs were introduced at an ACE Director’s meeting a few years ago the first thought that crossed my mind was about the potential for someone running multiple copies of the same application as separate PDBs seeing a lot of library cache latch contention or cursor mutex contention because any popular statement would now be hitting the same parent cursor from multiple PDBs. I think the casual (i.e. neither formal, nor official) response I got when I raised the point was that the calculation of the sql_id in future releases would take the con_id into consideration. It seems that that idea fell by the wayside.

Footnote 2

If you do see a large number of child cursors for a single parent then you will probably end up looking at v$sql_shared_cursor for the sql_id to see if that gives you some good ideas about why a particular statement has generated so many child cursors. For a list of explainations of the different reasons captured in this view MOS Doc Id  296377.1“Troubleshooting: High Version Count Issues” is a useful reference.

A refreshing look at PIVOT

We had an AskTOM question come in recently where our customer was very excited about the PIVOT operator in SQL which lets you transpose rows to columns. This is a very common requirement in applications that want to take data that has been modelled in “pure” relational form, and present in a more “human-digestible” form. There are plenty of posts and examples out there about PIVOT, but if you haven’t seen one, here’s a trivial example just so that you are up to speed for the rest of this post.


SQL> select deptno, sum(sal) tot_sal
  2  from   scott.emp
  3  group by deptno
  4  order by 1;

    DEPTNO    TOT_SAL
---------- ----------
        10       8750
        20      10875
        30       9400

3 rows selected.

--
-- which we can flip into a single row with three columns
--

SQL>
SQL> select *
  2  from   (select deptno, sal
  3          from   scott.emp)
  4  pivot  (sum(sal) as tot_sal for (deptno) in (10 as dept10, 20 as dept20, 30 as dept30));

DEPT10_TOT_SAL DEPT20_TOT_SAL DEPT30_TOT_SAL
-------------- -------------- --------------
          8750          10875           9400

More on PIVOT here in the docs.

Like any form of data aggregation, there is a cost is doing a PIVOT and our customer wanted to offset that cost by putting the results into a materialized view. And as the saying goes… “That is when the fight started”  Smile

For the purpose of demonstration, let’s assume we run a medical practice and we capture information about doctor’s offices and their patient. Here is my rudimentary data model with some sample data:


SQL> create table patient
  2   ( region      int,
  3     office       int,
  4     patient      int,
  5     some_date date );

Table created.

SQL>
SQL> alter table patient add primary key ( region, office, patient );

Table altered.

SQL>
SQL> insert into patient values (1,1,1,sysdate);

1 row created.

SQL> insert into patient values (1,1,2,sysdate);

1 row created.

SQL> insert into patient values (1,1,3,sysdate);

1 row created.

SQL>
SQL> create table patient_attrib
  2   ( region      int,
  3     office       int,
  4     patient      int,
  5     property      varchar2(10),
  6     val           number);

Table created.

SQL>
SQL>
SQL> alter table patient_attrib add primary key ( region, office, patient, property );

Table altered.

SQL> alter table patient_attrib add constraint patient_attrib_fk
  2  foreign key ( region,office,patient) references patient (region,office,patient);

Table altered.

SQL>
SQL> insert into patient_attrib values (1,1,2,'weight',60);

1 row created.

SQL> insert into patient_attrib values (1,1,2,'height',1);

1 row created.

SQL> insert into patient_attrib values (1,1,2,'bp',2);

1 row created.

SQL> insert into patient_attrib values (1,1,2,'heart',3);

1 row created.

SQL> insert into patient_attrib values (1,1,2,'chol',4);

1 row created.

SQL> insert into patient_attrib values (1,1,2,'fatpct',5);

1 row created.

SQL>
SQL> insert into patient_attrib values (1,1,3,'weight',61);

1 row created.

SQL> insert into patient_attrib values (1,1,3,'height',1.1);

1 row created.

SQL> insert into patient_attrib values (1,1,3,'bp',2.1);

1 row created.

SQL> insert into patient_attrib values (1,1,3,'heart',3.1);

1 row created.

SQL> insert into patient_attrib values (1,1,3,'chol',4.1);

1 row created.

SQL> insert into patient_attrib values (1,1,3,'fatpct',5.1);

1 row created.

We have patients and various measurements about those patients. For reporting purposes, we want to output the patient records in a pivoted style, and hence the PIVOT operator is a natural fit:


SQL> select region, office, patient, some_date, weight_val, height_val, bp_val, heart_val, chol_val, fatpct_val
  2  from
  3  (
  4  select h.*, hs.property, hs.val
  5  from   patient h,
  6         patient_attrib hs
  7  where  h.region = hs.region
  8  and    h.office  = hs.office
  9  and    h.patient = hs.patient
 10  )
 11  pivot ( sum(val) as val for ( property ) in ('weight' as weight, 'height' as height, 
 12       'bp' as bp, 'heart' as heart, 'chol' as chol, 'fatpct' as fatpct));

    REGION     OFFICE    PATIENT SOME_DATE WEIGHT_VAL HEIGHT_VAL     BP_VAL  HEART_VAL   CHOL_VAL FATPCT_VAL
---------- ---------- ---------- --------- ---------- ---------- ---------- ---------- ---------- ----------
         1          1          2 02-OCT-19         60          1          2          3          4          5
         1          1          3 02-OCT-19         61        1.1        2.1        3.1        4.1        5.1

2 rows selected.

As mentioned, as the patient volume increases, we’ll seek to make that output come from a fast refreshable materialized view. To support that, we’ll throw in some materialized view logs in the normal way


SQL> create materialized view log on patient
  2    with sequence, rowid (region,office,patient,some_date) including new values
  3  /

Materialized view log created.

SQL>
SQL> create materialized view log on patient_attrib
  2    with sequence, rowid (region,office,patient,property, val) including new values
  3  /

Materialized view log created.

Now I’ll take my existing PIVOT query and use that as the source for my materialized view


SQL> create materialized view MV
  2  refresh fast
  3  -- on commit
  4  as
  5  select region, office, patient, some_date, weight_val, height_val, bp_val, heart_val, chol_Val, fatpct_val
  6  from
  7  (
  8  select h.*, hs.property, hs.val
  9  from   patient h,
 10         patient_attrib hs
 11  where  h.region = hs.region
 12  and    h.office  = hs.office
 13  and    h.patient = hs.patient
 14  )
 15  pivot ( sum(val) as val for ( property ) in ('weight' as weight, 'height' as height, 
 16          'bp' as bp, 'heart' as heart, 'chol' as chol, 'fatpct' as fatpct));
create materialized view MV
*
ERROR at line 1:
ORA-12015: cannot create a fast refresh materialized view from a complex query

Unfortunately for us, PIVOT is a no-go for a fast refresh materialized view. Even if I try to utilize the precreated table “trick” which sometimes can work around this issue, we’re still stuck.


SQL> create table MV as
  2  select region, office, patient, some_date, weight_val, height_val, bp_val, heart_val, chol_Val, fatpct_val
  3  from
  4  (
  5  select h.*, hs.property, hs.val
  6  from   patient h,
  7         patient_attrib hs
  8  where  h.region = hs.region
  9  and    h.office  = hs.office
 10  and    h.patient = hs.patient
 11  )
 12  pivot ( sum(val) as val for ( property ) in ('weight' as weight, 'height' as height, 
 13          'bp' as bp, 'heart' as heart, 'chol' as chol, 'fatpct' as fatpct));

Table created.

SQL> create materialized view MV
  2  on prebuilt table
  3  refresh fast on commit
  4  as
  5  select region, office, patient, some_date, weight_val, height_val, bp_val, heart_val, chol_Val, fatpct_val
  6  from
  7  (
  8  select h.*, hs.property, hs.val
  9  from   patient h,
 10         patient_attrib hs
 11  where  h.region = hs.region
 12  and    h.office  = hs.office
 13  and    h.patient = hs.patient
 14  )
 15  pivot ( sum(val) as val for ( property ) in ('weight' as weight, 'height' as height, 
 16     'bp' as bp, 'heart' as heart, 'chol' as chol, 'fatpct' as fatpct));
create materialized view MV
          *
ERROR at line 1:
ORA-12054: cannot set the ON COMMIT refresh attribute for the materialized view

But all is not lost. Before PIVOT arrived in 11g, SQL practitioners had other mechanisms for achieving the same result, albeit with a more verbose and unwieldy syntax, via DECODE:



SQL> select h.region, h.office, h.patient, h.some_date,
  2         sum(decode(hs.property, 'weight', hs.val, 0)) weight_val,
  3         sum(decode(hs.property, 'height', hs.val, 0)) height_val,
  4         sum(decode(hs.property, 'bp', hs.val, 0)) bp_val,
  5         sum(decode(hs.property, 'heart', hs.val, 0)) heart_val,
  6         sum(decode(hs.property, 'chol', hs.val, 0)) chol_Val,
  7         sum(decode(hs.property, 'fatpct', hs.val, 0)) fatpct_val
  8  from   patient h,
  9         patient_attrib hs
 10  where  h.region = hs.region
 11  and    h.office  = hs.office
 12  and    h.patient = hs.patient
 13  group by h.region, h.office, h.patient, h.some_date;

    REGION     OFFICE    PATIENT SOME_DATE WEIGHT_VAL HEIGHT_VAL     BP_VAL  HEART_VAL   CHOL_VAL FATPCT_VAL
---------- ---------- ---------- --------- ---------- ---------- ---------- ---------- ---------- ----------
         1          1          2 02-OCT-19         60          1          2          3          4          5
         1          1          3 02-OCT-19         61        1.1        2.1        3.1        4.1        5.1

2 rows selected.

Armed with this, and knowing that we need some additional aggregates for fast refresh on commit materialized views, we can achieve our desired result.



SQL> create materialized view MV
  2  refresh fast
  3  on commit
  4  as
  5  select h.region, h.office, h.patient, h.some_date,
  6         count(*) c,
  7         count(decode(hs.property, 'weight', hs.val, 0)) weight_cnt,
  8         count(decode(hs.property, 'height', hs.val, 0)) height_cnt,
  9         count(decode(hs.property, 'bp', hs.val, 0)) bp_cnt,
 10         count(decode(hs.property, 'heart', hs.val, 0)) heart_cnt,
 11         count(decode(hs.property, 'chol', hs.val, 0)) chol_cnt,
 12         count(decode(hs.property, 'fatpct', hs.val, 0)) fatpct_cnt,
 13         sum(decode(hs.property, 'weight', hs.val, 0)) weight_val,
 14         sum(decode(hs.property, 'height', hs.val, 0)) height_val,
 15         sum(decode(hs.property, 'bp', hs.val, 0)) bp_val,
 16         sum(decode(hs.property, 'heart', hs.val, 0)) heart_val,
 17         sum(decode(hs.property, 'chol', hs.val, 0)) chol_Val,
 18         sum(decode(hs.property, 'fatpct', hs.val, 0)) fatpct_val
 19  from   patient h,
 20         patient_attrib hs
 21  where  h.region = hs.region
 22  and    h.office  = hs.office
 23  and    h.patient = hs.patient
 24  group by h.region, h.office, h.patient, h.some_date;

Materialized view created.

Now we can test out the refresh capabilities of the view with some standard DML



SQL> select region, office, patient, some_date, weight_val, height_val, bp_val, heart_val, chol_Val,fatpct_val
  2  from mv order by 1,2,3;

    REGION     OFFICE    PATIENT SOME_DATE WEIGHT_VAL HEIGHT_VAL     BP_VAL  HEART_VAL   CHOL_VAL FATPCT_VAL
---------- ---------- ---------- --------- ---------- ---------- ---------- ---------- ---------- ----------
         1          1          2 02-OCT-19         60          1          2          3          4          5
         1          1          3 02-OCT-19         61        1.1        2.1        3.1        4.1        5.1

2 rows selected.

SQL> insert into patient values (1,1,4,sysdate);

1 row created.

SQL>
SQL> insert into patient_attrib values (1,1,4,'weight',62);

1 row created.

SQL> insert into patient_attrib values (1,1,4,'height',1.2);

1 row created.

SQL> insert into patient_attrib values (1,1,4,'bp',2.2);

1 row created.

SQL> insert into patient_attrib values (1,1,4,'heart',3.2);

1 row created.

SQL> insert into patient_attrib values (1,1,4,'chol',4.2);

1 row created.

SQL> insert into patient_attrib values (1,1,4,'fatpct',5.2);

1 row created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select region, office, patient, some_date, weight_val, height_val, bp_val, heart_val, chol_Val,fatpct_val
  2  from mv order by 1,2,3;

    REGION     OFFICE    PATIENT SOME_DATE WEIGHT_VAL HEIGHT_VAL     BP_VAL  HEART_VAL   CHOL_VAL FATPCT_VAL
---------- ---------- ---------- --------- ---------- ---------- ---------- ---------- ---------- ----------
         1          1          2 02-OCT-19         60          1          2          3          4          5
         1          1          3 02-OCT-19         61        1.1        2.1        3.1        4.1        5.1
         1          1          4 02-OCT-19         62        1.2        2.2        3.2        4.2        5.2

3 rows selected.

SQL>
SQL> update patient_attrib
  2  set val = 65
  3  where patient = 3
  4  and property = 'weight';

1 row updated.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select region, office, patient, some_date, weight_val, height_val, bp_val, heart_val, chol_Val,fatpct_val
  2  from mv order by 1,2,3;

    REGION     OFFICE    PATIENT SOME_DATE WEIGHT_VAL HEIGHT_VAL     BP_VAL  HEART_VAL   CHOL_VAL FATPCT_VAL
---------- ---------- ---------- --------- ---------- ---------- ---------- ---------- ---------- ----------
         1          1          2 02-OCT-19         60          1          2          3          4          5
         1          1          3 02-OCT-19         65        1.1        2.1        3.1        4.1        5.1
         1          1          4 02-OCT-19         62        1.2        2.2        3.2        4.2        5.2

3 rows selected.

and like any materialized view, we can get a report on all of the capabilities available to us via DBMS_MVIEW 



SQL> @?/rdbms/admin/utlxmv.sql

Table created.

SQL> delete mv_capabilities_table;

0 rows deleted.

SQL> EXEC dbms_mview.explain_mview('MV');

PL/SQL procedure successfully completed.

SQL> select * from mv_capabilities_table
  2  @pr
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : PCT
POSSIBLE                      : N
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         :
MSGTXT                        :
SEQ                           : 1
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REFRESH_COMPLETE
POSSIBLE                      : Y
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         :
MSGTXT                        :
SEQ                           : 1002
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REFRESH_FAST
POSSIBLE                      : Y
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         :
MSGTXT                        :
SEQ                           : 2003
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REWRITE
POSSIBLE                      : N
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         :
MSGTXT                        :
SEQ                           : 3004
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : PCT_TABLE
POSSIBLE                      : N
RELATED_TEXT                  : PATIENT
RELATED_NUM                   : 840
MSGNO                         : 2068
MSGTXT                        : relation is not a partitioned table
SEQ                           : 4005
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : PCT_TABLE
POSSIBLE                      : N
RELATED_TEXT                  : PATIENT_ATTRIB
RELATED_NUM                   : 858
MSGNO                         : 2068
MSGTXT                        : relation is not a partitioned table
SEQ                           : 4006
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REFRESH_FAST_AFTER_INSERT
POSSIBLE                      : Y
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         :
MSGTXT                        :
SEQ                           : 5007
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REFRESH_FAST_AFTER_ONETAB_DML
POSSIBLE                      : Y
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         :
MSGTXT                        :
SEQ                           : 6008
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REFRESH_FAST_AFTER_ANY_DML
POSSIBLE                      : Y
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         :
MSGTXT                        :
SEQ                           : 7009
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REFRESH_FAST_PCT
POSSIBLE                      : N
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         : 2157
MSGTXT                        : PCT is not possible on any of the detail tables in the materialized view
SEQ                           : 8010
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REWRITE_FULL_TEXT_MATCH
POSSIBLE                      : N
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         : 2159
MSGTXT                        : query rewrite is disabled on the materialized view
SEQ                           : 9011
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REWRITE_PARTIAL_TEXT_MATCH
POSSIBLE                      : N
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         : 2159
MSGTXT                        : query rewrite is disabled on the materialized view
SEQ                           : 10012
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REWRITE_GENERAL
POSSIBLE                      : N
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         : 2159
MSGTXT                        : query rewrite is disabled on the materialized view
SEQ                           : 11013
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : REWRITE_PCT
POSSIBLE                      : N
RELATED_TEXT                  :
RELATED_NUM                   :
MSGNO                         : 2158
MSGTXT                        : general rewrite is not possible or PCT is not possible on any of the detail tables
SEQ                           : 12014
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : PCT_TABLE_REWRITE
POSSIBLE                      : N
RELATED_TEXT                  : PATIENT
RELATED_NUM                   : 840
MSGNO                         : 2068
MSGTXT                        : relation is not a partitioned table
SEQ                           : 13015
==============================
STATEMENT_ID                  :
MVOWNER                       : MCDONAC
MVNAME                        : MV
CAPABILITY_NAME               : PCT_TABLE_REWRITE
POSSIBLE                      : N
RELATED_TEXT                  : PATIENT_ATTRIB
RELATED_NUM                   : 858
MSGNO                         : 2068
MSGTXT                        : relation is not a partitioned table
SEQ                           : 13016

PL/SQL procedure successfully completed.

SQL>
SQL>

ORA-01950 Error on a Sequence - Error on Primary Key Index

I posted yesterday a blog about an error on a sequence of ORA-01950 on tablespace USERS - ORA-01950 Error on a Sequence . This was attributed to the sequence by me because that's where the error in Oracle was pointing....[Read More]

Posted by Pete On 01/10/19 At 01:12 PM

How to enlarge an #Exasol database by adding a node

Adding a cluster node will not only increase the available storage capacity but also the total compute power of your cluster. This scale-out is a quite common operation for Exasol customers to do.

My example shows how to change an existing 2+1 cluster into a 3+0 cluster. Before you can enlarge the database with an active node, this node has to be a reserve node first. See here how to add a reserve to a 2+0 cluster. Of course you can add another reserve node to change from 3+0 to 3+1 afterwards. See here if you wonder why you may want to have a reserve node at all.

Initial state – reserve node is present

I start with a 2+1 cluster – 2 active nodes and 1 reserve node:

https://uhesse.files.wordpress.com/2019/09/enlarge1.png?w=150&h=38 150w, https://uhesse.files.wordpress.com/2019/09/enlarge1.png?w=300&h=76 300w, https://uhesse.files.wordpress.com/2019/09/enlarge1.png?w=768&h=195 768w, https://uhesse.files.wordpress.com/2019/09/enlarge1.png 881w" sizes="(max-width: 620px) 100vw, 620px" />

For later comparison, let’s look at the distribution of rows of one of my tables:

https://uhesse.files.wordpress.com/2019/09/enlarge2.png?w=150&h=44 150w, https://uhesse.files.wordpress.com/2019/09/enlarge2.png?w=300&h=89 300w, https://uhesse.files.wordpress.com/2019/09/enlarge2.png 685w" sizes="(max-width: 620px) 100vw, 620px" />

The rows are roughly even distributed across the two active nodes.

Before you continue, it would be a good idea to take a backup on a remote archive volume now – just in case.

Shutdown database before volume modification

A data volume used used by a database cannot be modified while that database is up, so shut it down first:

https://uhesse.files.wordpress.com/2019/09/enlarge3.png?w=150&h=33 150w, https://uhesse.files.wordpress.com/2019/09/enlarge3.png?w=300&h=66 300w, https://uhesse.files.wordpress.com/2019/09/enlarge3.png?w=768&h=170 768w, https://uhesse.files.wordpress.com/2019/09/enlarge3.png 854w" sizes="(max-width: 620px) 100vw, 620px" />

After going to the Storage branch in EXAoperation, click on the data volume:

https://uhesse.files.wordpress.com/2019/09/enlarge4.png?w=150&h=27 150w, https://uhesse.files.wordpress.com/2019/09/enlarge4.png?w=300&h=54 300w, https://uhesse.files.wordpress.com/2019/09/enlarge4.png?w=768&h=137 768w, https://uhesse.files.wordpress.com/2019/09/enlarge4.png 834w" sizes="(max-width: 620px) 100vw, 620px" />

Then click on Edit:

https://uhesse.files.wordpress.com/2019/09/enlarge5.png?w=140 140w, https://uhesse.files.wordpress.com/2019/09/enlarge5.png?w=280 280w" sizes="(max-width: 472px) 100vw, 472px" />

Decrease volume redundancy to 1

Change the redundany from 2 to 1, then click Apply:

https://uhesse.files.wordpress.com/2019/09/enlarge6.png?w=115 115w, https://uhesse.files.wordpress.com/2019/09/enlarge6.png?w=230 230w" sizes="(max-width: 465px) 100vw, 465px" />

Why is the redundancy reduced from 2 to 1 here? Let’s try to explain that. Initially, I had 2 active nodes with a volume using redundancy 2:

https://uhesse.files.wordpress.com/2019/09/cluster.png?w=150&h=95 150w, https://uhesse.files.wordpress.com/2019/09/cluster.png?w=300&h=189 300w, https://uhesse.files.wordpress.com/2019/09/cluster.png?w=768&h=485 768w, https://uhesse.files.wordpress.com/2019/09/cluster.png 790w" sizes="(max-width: 620px) 100vw, 620px" />

A and B are master segments while A’ and B’ are mirrored segments. If I could add a node to this volume keeping the existing segments, it would look like this:

https://uhesse.files.wordpress.com/2019/09/badidea.png?w=133&h=150 133w, https://uhesse.files.wordpress.com/2019/09/badidea.png?w=267&h=300 267w, https://uhesse.files.wordpress.com/2019/09/badidea.png?w=768&h=863 768w, https://uhesse.files.wordpress.com/2019/09/badidea.png 854w" sizes="(max-width: 620px) 100vw, 620px" />

Of course this would be a bad idea. The redundancy is reduced to 1 before the new node is added to the volume:

https://uhesse.files.wordpress.com/2019/09/redundancy1.png?w=133&h=150 133w, https://uhesse.files.wordpress.com/2019/09/redundancy1.png?w=267&h=300 267w, https://uhesse.files.wordpress.com/2019/09/redundancy1.png?w=768&h=863 768w, https://uhesse.files.wordpress.com/2019/09/redundancy1.png 854w" sizes="(max-width: 620px) 100vw, 620px" />

Only distributed master segments with no mirrors at first. Then the redundancy is again increased to 2:

https://uhesse.files.wordpress.com/2019/09/redundancy2.png?w=133&h=150 133w, https://uhesse.files.wordpress.com/2019/09/redundancy2.png?w=267&h=300 267w, https://uhesse.files.wordpress.com/2019/09/redundancy2.png?w=768&h=863 768w, https://uhesse.files.wordpress.com/2019/09/redundancy2.png 854w" sizes="(max-width: 620px) 100vw, 620px" />

This way, every master segment can be mirrored on a neighbor node. That’s why the redundancy needs to be reduced to 1.

Add new node to volume

After having decreased the volume redundancy to 1, click Edit on the volume detail page again and add n13 as a new master node to the volume and click Apply:

https://uhesse.files.wordpress.com/2019/09/enlarge7.png?w=116 116w, https://uhesse.files.wordpress.com/2019/09/enlarge7.png?w=233 233w" sizes="(max-width: 468px) 100vw, 468px" />

Increase redundancy to 2

Now click Edit again and increase the redudancy to 2:

https://uhesse.files.wordpress.com/2019/09/enlarge8.png?w=116 116w, https://uhesse.files.wordpress.com/2019/09/enlarge8.png?w=232 232w" sizes="(max-width: 464px) 100vw, 464px" />

The state of the volume shows now as RECOVERING – don’t worry, it just means that mirrored segments are now created.

https://uhesse.files.wordpress.com/2019/09/enlarge9.png?w=141 141w, https://uhesse.files.wordpress.com/2019/09/enlarge9.png?w=282 282w" sizes="(max-width: 474px) 100vw, 474px" />

Enlarge the database

Now click on the database link on the EXASolution screen:

https://uhesse.files.wordpress.com/2019/09/enlarge10.png?w=150&h=33 150w, https://uhesse.files.wordpress.com/2019/09/enlarge10.png?w=300&h=66 300w, https://uhesse.files.wordpress.com/2019/09/enlarge10.png?w=768&h=169 768w, https://uhesse.files.wordpress.com/2019/09/enlarge10.png 848w" sizes="(max-width: 620px) 100vw, 620px" />

Select the Action Enlarge and click Submit:

https://uhesse.files.wordpress.com/2019/09/enlarge11.png?w=150&h=38 150w, https://uhesse.files.wordpress.com/2019/09/enlarge11.png?w=300&h=76 300w, https://uhesse.files.wordpress.com/2019/09/enlarge11.png?w=768&h=194 768w, https://uhesse.files.wordpress.com/2019/09/enlarge11.png 903w" sizes="(max-width: 620px) 100vw, 620px" />

Enter 1 and click Apply:

https://uhesse.files.wordpress.com/2019/09/enlarge12.png?w=150&h=51 150w, https://uhesse.files.wordpress.com/2019/09/enlarge12.png?w=300&h=102 300w, https://uhesse.files.wordpress.com/2019/09/enlarge12.png 642w" sizes="(max-width: 620px) 100vw, 620px" />

The database detail page looks like this now:

https://uhesse.files.wordpress.com/2019/09/enlarge13.png?w=150&h=37 150w, https://uhesse.files.wordpress.com/2019/09/enlarge13.png?w=300&h=74 300w, https://uhesse.files.wordpress.com/2019/09/enlarge13.png?w=768&h=189 768w, https://uhesse.files.wordpress.com/2019/09/enlarge13.png 912w" sizes="(max-width: 620px) 100vw, 620px" />

Reorganize

Technically, this is a 3+0 cluster now – but the third node doesn’t contain any data yet. If we look at the same table as before, we see that no rows are on the new node:

https://uhesse.files.wordpress.com/2019/09/enlarge2.png?w=150&h=44 150w, https://uhesse.files.wordpress.com/2019/09/enlarge2.png?w=300&h=89 300w, https://uhesse.files.wordpress.com/2019/09/enlarge2.png 685w" sizes="(max-width: 620px) 100vw, 620px" />

To change that, a REORGANIZE needs to be done. Either on the database layer, on schema layer or on table layer. Most easy to perform is REORGANIZE DATABASE:

https://uhesse.files.wordpress.com/2019/09/enlarge14.png?w=150&h=37 150w, https://uhesse.files.wordpress.com/2019/09/enlarge14.png?w=300&h=75 300w, https://uhesse.files.wordpress.com/2019/09/enlarge14.png?w=768&h=192 768w, https://uhesse.files.wordpress.com/2019/09/enlarge14.png 942w" sizes="(max-width: 620px) 100vw, 620px" />

Took me about 10 Minutes on my tiny database. That command re-distributes every table across all cluster nodes and can be time consuming with high data volume. While a table is reorganized, that table is locked against DML. You can monitor the ongoing reorganization by selecting from EXA_DBA_PROFILE_RUNNING in another session.

Final state

Let’s check the distribution of the previous table again:

https://uhesse.files.wordpress.com/2019/09/enlarge15.png?w=150&h=49 150w, https://uhesse.files.wordpress.com/2019/09/enlarge15.png?w=300&h=97 300w, https://uhesse.files.wordpress.com/2019/09/enlarge15.png 684w" sizes="(max-width: 620px) 100vw, 620px" />

As you can see above, now there are rows on the added node. Also EXAoperation confirms that the new node is not empty any more:

https://uhesse.files.wordpress.com/2019/09/enlarge16.png?w=150&h=37 150w, https://uhesse.files.wordpress.com/2019/09/enlarge16.png?w=300&h=74 300w, https://uhesse.files.wordpress.com/2019/09/enlarge16.png?w=768&h=190 768w, https://uhesse.files.wordpress.com/2019/09/enlarge16.png 900w" sizes="(max-width: 620px) 100vw, 620px" />

On a larger database, you would see that the volume usage of the nodes is less than before per node and every node is holding roughly the same amount of data. For failsafety, you could add another reserve node now.

Summary of steps

  1. Add a reserve node (if not yet existing)
  2. Take a backup on a remote archive volume
  3. Shutdown database
  4. Decrease volume redundancy to 1
  5. Add former reserve node as new master node to the volume
  6. Increase redundancy to 2
  7. Enlarge database by 1 active node
  8. Reorganize
  9. Add another reserve node (optionally)

ORA-01950 Error on a Sequence

UPDATE: I have updated information for this post and rather than make this one much longer i created a new post - please see ORA-01950 Error on a Sequence - Error on Primary Key Index Wow, its been a while....[Read More]

Posted by Pete On 30/09/19 At 01:42 PM

Negative Offload

At the Trivadis Performance Days 2019 I did a presentation on using execution plans to understand what a query was doing. One of the examples I showed was a plan from an Exadata system (using 11.2.0.4) that needed to go faster. The plan was from the SQL Monitor report and all I want to show you is one line that’s reporting a tablescan. To fit the screen comfortably I’ve removed a number of columns from the output.

The report had been generated while the statement was still running (hence the “->” at the left hand edge) and the query had scanned 166 segments (with no partition elimination) of a table with 4,500 data segments (450 range partitions and 10 hash sub-partitions – note the design error, by the way, hash partitioning in Oracle should always hash for a powert of 2).


SQL Plan Monitoring Details (Plan Hash Value=3764612084)  
============================================================================================================================================
| Id   |           Operation            | Name  | Read  | Read  | Write | Write |   Cell   | Mem  | Activity |       Activity Detail       |  
|      |                                |       | Reqs  | Bytes | Reqs  | Bytes | Offload  |      |   (%)    |         (# samples)         |   
============================================================================================================================================
| -> 5 |      TABLE ACCESS STORAGE FULL | TXN   |  972K | 235GB |       |       | -203.03% |   7M |    63.43 | Cpu (1303)                  | 
|      |                                |       |       |       |       |       |          |      |          | cell smart table scan (175) | 
============================================================================================================================================

In the presentation I pointed out that for a “cell smart table scan” (note the Activity Detail colum) this line was using a surprisingly large amount of CPU.

We had been told that the table was using hybrid columnar compression (HCC) and had been given some figures that showed the compression factor was slightly better than 4. I had also pointed out that the typical size of a read request was 256KB. (Compare Read Reqs with Read Bytes)

To explain the excessive CPU I claimed that we were seeing “double decompression” – the cell was decompressing (uncompressing) compression units (CUs), finding that the resulting decompressed data was larger than the 1MB unit that Exadata allows and sending the original compressed CU to the database server where it was decompressed again – and the server side decompression was burning up the CPU.

This claim is (almost certainly) true – but the justification I gave for the claim was at best incomplete (though, to be brutally honest, I have to admit that I’d made a mistake): I pointed out that the Cell Offload was negative 200% and that this was what told us about the double decompression. While double decompression was probably happening the implication I had made was that a negative offload automatically indicated double decompression – and that’s was an incorrect assumption on my part. Fortunately Maurice Müller caught up with me after the session was over and pointed out the error then emailed me a link to a relevant article by Ahmed Aangour.

The Cell Offload is a measure of the difference between the volume of data read and the volume of data returned to the server. If the cell reads 256KB from disc, but the column and row selection means the cell returns 128KB the Cell Offload would be 50%; if the cell returns 64KB the Cell Offload would be 75% (100 * (1 – 64KB/256KB)). But what if you select all the rows and columns from a compressed table – the volume of data after decompression would be larger than the compressed volume the cell had read from disc – and in this case we knew that we were reading 256KB at a time and the compression factor was slightly greater than 4, so the uncompressed data would probably be around 1MB, giving us a Cell Offload of 100 * (1 – 1024KB / 256KB) = negative 300%

Key Point: Any time that decompression, combined with the row and column selection, produces more data than the volume of data read from disc the Cell Offload will go negative. A negative Cell Offload is not inherently a problem (though it might hint at a suboptimal use of compression).

Follow-up Analysis

Despite the error in my initial understanding the claim that we were seeing double decompression was still (almost certainly) true – but we need to be a little more sophisticated in the analysis. The clue is in the arithmetic a few lines further up the page. We can see that we are basically reading 256KB chunks of the table, and we know that 256KB will expand to roughly 1MB so we ought to see a Cell Offload of about -300%; but the Cell Offload is -200%. This suggests fairly strongly that on some of the reads the decompressed data is slightly less than 1MB, which allows the cell to return the decompressed data to the database server, while some of the time the decompressed data is greater than 1MB, forcing the cell to send the original (compressed) CU to the databsae server.

We may even be able work the arithmetic backwards to estimate the number of times double decompression appeared.  Assume that two-thirds of the time the cell decompressed the data and successfully sent (just less than) 1MB back to the database server and one-third of the time the cell decompressed the data and found that the result was too large and sent 256KB of compressed data back to the server, and let’s work with the 972,000 read requests reported to see what drops out of the arithmetic:

  • Total data read: 972,000 * 256KB = 243,000 MB
  • Data sent to db server:  648,000 * 1MB + 324,000 * 256KB = 729,000 MB
  • Cell Offload = 100 * (1 – 729/243) = -200%   Q.E.D.

Of course it would be nice to avoid guessing – and if we were able to check the session activity stats (v$sessstat) while the query was running (or after it had completed) we could pick up several numbers that confirmed our suspicion. For 11.2.0.4, for example, we would keep an eye on:

	cell CUs sent uncompressed
	cell CUs processed for uncompressed
	EHCC {class} CUs Decompressed

Differences between these stats allows you to work out the number of compression units that failed the 1MB test on the cell server and were sent to the database server to be decompressed. There is actually another statistic named “cell CUs sent compressed” which would make life easy for us, but I’ve not seen it populated in my tests – so maybe it doesn’t mean what it seems to say.

Here’s an example from an 11.2.0.4 system that I presented a few years ago showing some sample numbers.

cell CUs sent uncompressed              5,601
cell CUs processed for uncompressed     5,601

EHCC CUs Decompressed                  17,903
EHCC Query High CUs Decompressed       12,302 

This reveals an annoying feature of 11g (continued in 12.1) that results in double counting of the statistics, confusing the issue when you’re trying to analyze what’s going on. In this case the table consisted of 12,302 compression units, and the query was engineered to cause the performance problem to appear. The first two statistics show us how many CUs were decompressed successfully (we’ll see a change appearing there in 12.1). We then see that all 12,302 of the table’s “query high” compression units were decompressed – but the “total” of all CUs decompressed was 17.903.

It’s not a coincidence that 12,302 + 5,601 = 17,903; there’s some double counting going on. I don’t know how many of the statistics are affected in this way, but Oracle has counted the CUs that passsed decompression once as they were processed at the cell server and again as they arrived at the database server. In this example we can infer that 12,302 – 5,601 = 6,701 compression units failed decompression at the cell server and were sent to the database server in compressed form to be decompressed again.

Here’s a couple of sets of figures from some similar tests run on 12.1.0.2 – one with a table compressed to query high another compressed to query low. There is one critical difference from the 11g figures but the same double-counting seems to have happened. In both cases the “EHCC Query [Low|High] CUs Decompressed” show the correct number of CUs in each table. Note, though that the “cell CUs processed for uncompress” in 12.1 appear to report the number of attempted decompressions rather than 11g’s number of successful decompressions.


=========================

cell CUs sent uncompressed                     19,561	-- successful decompressions at cell server
cell CUs processed for uncompressed            19,564	=> 3 failures

EHCC CUs Decompressed                          39,125	=  2 * 19,561 successes + 3 db server decompression
EHCC Query High CUs Decompressed               19,564

=========================

cell CUs sent uncompressed                     80,037	-- successful decompressions at cell server
cell CUs processed for uncompressed            82,178	=> 2,141 failures

EHCC CUs Decompressed                         162,215	=  2 * 80,037 successes + 2,141 db server decompressions
EHCC Query Low CUs Decompressed                82,178

=========================

I’ve annotated the figures to explain the arithmetic.

There has been some significant renaming and separation of statistics in 12.2, as described in this post by Roger MacNicol, and the problems of double-counting should have disappeared. I haven’t yet tested my old models in the latest versions of Oracle, though, so can’t show you anyy figures to demonstrate the change.

Takeaways

There are 4 key points to note in this posting.

  • Hash (sub)partitioning should be based on powers of 2, otherwise some partitions will be twice size of others.
  • There is a 1MB limit on the “data packet” sent between the cell server and database server in Exadata.
  • If you select a large fraction of the rows and columns from an HCC compressed table you may end up decompressing a lot of your data twice if the decompressed data for a read request is larger than the 1MB unit (and the cost will be highly visible at the database server as CPU usage).
  • The Cell Offload figure for a tablescan (in particular) will go negative if the volume of data sent from the cell server to the database server is larger than the volume of data read from the disk- even if double decompression hasn’t been happening.

A little corollary to the third point: if you are writing to a staging table with the expectation of doing an unfiltered tablescan (or a select *), then you probably don’t want to use hybrid columnar compression on the table as you will probably end up using a lot of CPU at the database server to compress it, then do double-decompression using even more CPU on the database server.  It’s only if you really need to minimise disk usage and have lots of CPU capacity to spare that you have a case for using hybrid columnar compression for the table (and Oracle In-Memory features may also change the degree of desirability).

Footnote

I haven’t said anything about accessing table data by index when the table is subject to HCC compression. I haven’t tested the mechanism in recent versions of Oracle but it used to be the case that the cell server would supply the whole compression unit (CU) to the database server which would decompress it to construct the relevant row. One side effect of this was that the same CU could be decompressed (with a high CPU load) many times in the course of a single query.