Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Multitenant, PDB, ‘save state’, services and standby databases

Creating – and using – your own services has always been the recommendation. You can connect to a database without a service name, though the instance SID, but this is not what you should do. Each database registers its db_unique_name as a service, and you can use it to connect, but it is always better to create your own application service(s). In multitenant, each PDB registers its name as a service, but the recommendation is still there: create your own services, and connect with your services.
I’ll show in this blog post what happens if you use the PDB name as a service and the standby database registers to the same listener as the primary database. Of course, you can workaround the non-unique service names by registering to different listeners. But this just hides the problem. The main reason to use services is to be independent from physical attributes, so being forced to assign a specific TCP/IP port is not better than using an instance SID.

I have the primary (CDB1) and standby (CDB2) databases registered to the default local listener:

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 03-FEB-2018 23:11:23
 
Copyright (c) 1991, 2016, Oracle. All rights reserved.
 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date 02-FEB-2018 09:32:30
Uptime 1 days 13 hr. 38 min. 52 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/12.2.0/dbhome_1/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/VM122/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=VM122)(PORT=5501))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "627f7512a0452fd4e0537a38a8c055c0" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1XDB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1_CFG" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB1_DGB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1_DGMGRL" has 1 instance(s).
Instance "CDB1", status UNKNOWN, has 1 handler(s) for this service...
Service "CDB2" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB2XDB" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB2_DGB" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB2_DGMGRL" has 1 instance(s).
Instance "CDB2", status UNKNOWN, has 1 handler(s) for this service...
Service "pdb1" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
The command completed successfully

Look at service ‘pdb1′, which is the name for my PDB. Connecting to //localhost:1521/PDB1 can connect you randomly to CDB1 (the primary database) or CDB2 (the standby database).

Here is an example, connecting several times to the PDB1 service:

[oracle@VM122 ~]$ for i in {1..5} ; do sqlplus -L -s sys/oracle@//localhost/pdb1 as sysdba <<< 'select name,open_mode,instance_name from v$instance , v$database;'; done
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1

I was connected at random to CDB1 or CDB2.

As an administrator, you know the instance names and you can connect to the one you want with: //localhost:1521/PDB1/CDB1 or //localhost:1521/PDB1/CDB2:

[oracle@VM122 ~]$ for i in {1..3} ; do sqlplus -L -s sys/oracle@//localhost/pdb1/CDB1 as sysdba <<< 'select name,open_mode,instance_name from v$instance , v$database;'; done
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
[oracle@VM122 ~]$ for i in {1..3} ; do sqlplus -L -s sys/oracle@//localhost/pdb1/CDB2 as sysdba <<< 'select name,open_mode,instance_name from v$instance , v$database;'; done
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2

Of course this is not what you want. And we must not start or stop the default services. For the application, the best you can do is to create your service. And if you want to be able to connect to the Active Data Guard standby, which is opened in read-only, then you can create a ‘read-write’ service and a ‘read-only’ service that you start depending on the role.

Create and Start a read-write service on the primary

This example supposes that you have only Oracle Database software installed. If you are in RAC, with the resources managed by Grid Infrastructure, or simply with Oracle Restart, creating a service is easy with srvctl, and you add it to a PDB with ‘-pdb’ and also with a role to start it automatically in the primary or in the standby. But without it, you use dbms_service:

SQL> connect /@CDB1 as sysdba
Connected.
 
SQL> alter session set container=pdb1;
Session altered.
 
SQL> exec dbms_service.create_service(service_name=>'pdb1_RW',network_name=>'pdb1_RW');
PL/SQL procedure successfully completed.
 
SQL> exec dbms_service.start_service(service_name=>'pdb1_RW');
PL/SQL procedure successfully completed.
 
SQL> alter session set container=cdb$root;
Session altered.

The service is created, stored in SERVICE$ visible with DBA_SERVICES:

SQL> select name,name_hash,network_name,creation_date,pdb from cdb_services order by con_id,service_id;
NAME NAME_HASH NETWORK_NAME CREATION_DATE PDB
---- --------- ------------ ------------- ---
pdb1_RW 3128030313 pdb1_RW 03-FEB-18 PDB1
pdb1 1888881990 pdb1 11-JAN-18 PDB1

Save state

I have created and started the PDB1_RW service. However, if I restart the database, the service will not start automatically. How do you ensure that the PDB1 pluggable database starts automatically when you open the CDB? You ‘save state’ when it is opened. It is the same for the services you create. You need to ‘save state’ when they are opened.


SQL> alter pluggable database all save state;
Pluggable database ALL altered.

The information is stored in PDB_SVC_STATE$, and I’m not aware of a dictionary view on it:

SQL> select name,name_hash,network_name,creation_date,con_id from v$active_services order by con_id,service_id;
 
NAME NAME_HASH NETWORK_NAME CREATION_DATE CON_ID
---- --------- ------------ ------------- ------
pdb1_RW 3128030313 pdb1_RW 03-FEB-18 4
pdb1 1888881990 pdb1 11-JAN-18 4
 
SQL> select * from containers(pdb_svc_state$);
 
INST_ID INST_NAME PDB_GUID PDB_UID SVC_HASH SPARE1 SPARE2 SPARE3 SPARE4 SPARE5 SPARE6 CON_ID
------- --------- -------- ------- -------- ------ ------ ------ ------ ------ ------ ------
1 CDB1 627F7512A0452FD4E0537A38A8C055C0 2872139986 3128030313 1

The name is not in this table, you have to join with v$services using(name_hash):

SQL> select name,name_hash,network_name,creation_date,con_id from v$active_services order by con_id,service_id;
 
NAME NAME_HASH NETWORK_NAME CREATION_DATE CON_ID
---- --------- ------------ ------------- ------
SYS$BACKGROUND 165959219 26-JAN-17 1
SYS$USERS 3427055676 26-JAN-17 1
CDB1_CFG 1053205690 CDB1_CFG 24-JAN-18 1
CDB1_DGB 184049617 CDB1_DGB 24-JAN-18 1
CDB1XDB 1202503288 CDB1XDB 11-JAN-18 1
CDB1 1837598021 CDB1 11-JAN-18 1
pdb1 1888881990 pdb1 11-JAN-18 4
pdb1_RW 3128030313 pdb1_RW 03-FEB-18 4

So, in addition to storing the PDB state in PDBSTATE$, visible with dba_pdb_saved_states, the service state is also stored. Note that they are at different level. PDBSTATE$ is a data link: stored on CDB$ROOT only (because the data must be read before opening the PDB) but PDB_SVC_STATE$ is a local table in the PDB as the services can be started only when the PDB is opened.

This new service is immediately registered on CDB1:

Service "pdb1" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RW" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
The command completed successfully

Create and Start a read-only service for the standby

If you try to do the same on the standby for a PDB1_RO service, you cannot because service information has to be stored in the dictionary:

SQL> exec dbms_service.create_service(service_name=>'pdb1_RO',network_name=>'pdb1_RO');
 
Error starting at line : 56 File @ /media/sf_share/122/blogs/pdb_svc_standby.sql
In command -
BEGIN dbms_service.create_service(service_name=>'pdb1_RO',network_name=>'pdb1_RO'); END;
Error report -
ORA-16000: database or pluggable database open for read-only access

So, the read-only service has to be created on the primary:

SQL> connect /@CDB1 as sysdba
Connected.
SQL> alter session set container=pdb1;
Session altered.
 
SQL> exec dbms_service.create_service(service_name=>'pdb1_RO',network_name=>'pdb1_RO');
 
SQL> select name,name_hash,network_name,creation_date,pdb from cdb_services order by con_id,service_id;
NAME NAME_HASH NETWORK_NAME CREATION_DATE PDB
---- --------- ------------ ------------- ---
pdb1_RW 3128030313 pdb1_RW 03-FEB-18 PDB1
pdb1_RO 1562179816 pdb1_RO 03-FEB-18 PDB1
pdb1 1888881990 pdb1 11-JAN-18 PDB1

The SERVICE$ dictionary table is replicated to the standby, so I can I can start it on the standby:

SQL> connect /@CDB2 as sysdba
Connected.
SQL> alter session set container=pdb1;
Session altered.
 
SQL> exec dbms_service.start_service(service_name=>'pdb1_RO');
PL/SQL procedure successfully completed.

Here is what is registered to the listener:

Service "pdb1" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RO" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RW" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
The command completed successfully

Now, the PDB_RO connects to the standby and PDB_RW to the primary. Perfect.

No ‘save state’ on the standby

At this point, you would like to have the PDB_RO started when PDB1 is opened on the standby, but ‘save state’ is impossible on a read-only database:

SQL> alter session set container=cdb$root;
Session altered.
 
SQL> alter pluggable database all save state;
 
Error starting at line : 84 File @ /media/sf_share/122/blogs/pdb_svc_standby.sql
In command -
alter pluggable database all save state
Error report -
ORA-16000: database or pluggable database open for read-only access

You can’t manage the state (open the PDB, start the services) in the standby database.

The primary ‘save state’ is replicated in standby

For the moment, everything is ok with my services:

Service "pdb1_RO" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RW" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...

If I restart the primary CDB1, everything is ok again because I saved the state of the PDB and the service. But what happens when the standby CDB2 restarts?


SQL> connect /@CDB2 as sysdba
Connected.
SQL> startup force;
...
SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
4 PDB1 MOUNTED

The PDB is not opened: the ‘saved state’ for PDB is not read in the standby.
However, when I open the PDB, it seems that the ‘saved state’ for service is applied, and this one is replicated from the primary:

SQL> alter pluggable database PDB1 open;
Pluggable database altered.
SQL> host lsnrctl status
...
Service "pdb1" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RW" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
The command completed successfully

My PDB1_RW is registered for both, connections will connect at random to the primary or the standby, and then the transactions will fail half of the times. It will be the same in case of switchover. This is not correct.

Save state instances=()

What I would like is the possibility to save state for a specific DB_UNIQUE_NAME, like with pluggable ‘spfile’ parameters. But this is not possible. What is possible is to mention an instance but you can use it only for the primary instance where you save the state (or you get ORA-65110: Invalid instance name specified) and anyway, this will not be correct after a switchover.

So what?

Be careful, with services and ensure that the services used by the application are registered only for the correct instance. Be sure that this persists when the instances are restarted. For this you must link a service name to a database role. This cannot be done correctly with ‘save state’. You can use startup triggers, or better, Grid Infrastructure service resources.

Do not connect to the default service with the PDB name, you cannot remove it and cannot stop it, so you may have the same name for different instances in a Data Guard configuration. You can register the standby instances to different local listeners, to avoid the confusion, but you may still register to the same SCAN listener.

Create your own services, start them depending on the database role, and do not use ‘save state’ in a physical standby configuration.

 

Cet article Multitenant, PDB, ‘save state’, services and standby databases est apparu en premier sur Blog dbi services.

A look into Oracle redo, part 2: the discovery of the KCRFA structure

This is the second post in a series of blogposts on Oracle database redo internals. If you landed on this blogpost without having read the first blogpost, here is a link to the first blogpost: https://fritshoogland.wordpress.com/2018/01/29/a-look-into-oracle-redo-part-1-redo-allocation-latches/ The first blogpost contains all the versions used and a synopsis on what the purpose of this series of blogposts is.

In the first part, I showed how the principal access to the public redo strands is controlled by redo allocation latches, and showed a snippet of trace information of memory accesses of a foreground session when using the first public redo strand:

kcrfw_redo_gen_ext+11277:0x0000000076fdf3bc(shared pool|permanent memor,duration 1,cls perm+16114620 ):R:4:0x1256f/75119()
kcrfw_redo_gen_ext+11290:0x0000000076fdf3d8(shared pool|permanent memor,duration 1,cls perm+16114648 ):R:8:0/0()
kcrfw_redo_gen_ext+11300:0x0000000076fdf430(shared pool|permanent memor,duration 1,cls perm+16114736 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf3d0(shared pool|permanent memor,duration 1,cls perm+16114640 ):R:8:0xdeadbeef0000726e/-2401053092612115858()
kcrfw_redo_gen_ext+11312:0x0000000076fdf428(shared pool|permanent memor,duration 1,cls perm+16114728 ):W:8:0xdeadbeef0000726e/-2401053092612115858()
kcrfw_redo_gen_ext+11317:0x0000000076fdf3c8(shared pool|permanent memor,duration 1,cls perm+16114632 ):R:8:0x100010000726e/281479271707246()
kcrfw_redo_gen_ext+11322:0x0000000076fdf420(shared pool|permanent memor,duration 1,cls perm+16114720 ):W:8:0x100010000726e/281479271707246()
kcrfw_redo_gen_ext+11327:0x0000000076fdf3c0(shared pool|permanent memor,duration 1,cls perm+16114624 ):R:8:0x1b800001000/1889785614336(??)
kcrfw_redo_gen_ext+11332:0x0000000076fdf418(shared pool|permanent memor,duration 1,cls perm+16114712 ):W:8:0x1b800001000/1889785614336(??)
kcrfw_redo_gen_ext+11337:0x0000000076fdf3b8(shared pool|permanent memor,duration 1,cls perm+16114616 ):R:8:0x1256f0000b2fe/322633648354046()
kcrfw_redo_gen_ext+11342:0x0000000076fdf410(shared pool|permanent memor,duration 1,cls perm+16114704 ):W:8:0x1256f0000b2fe/322633648354046()
kcrfw_redo_gen_ext+11347:0x0000000076fdf3b0(shared pool|permanent memor,duration 1,cls perm+16114608 ):R:8:0/0()
kcrfw_redo_gen_ext+11352:0x0000000076fdf408(shared pool|permanent memor,duration 1,cls perm+16114696 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):R:8:0x7270/29296()
kcrfw_redo_gen_ext+11312:0x0000000076fdf400(shared pool|permanent memor,duration 1,cls perm+16114688 ):W:8:0x7270/29296()
kcrfw_redo_gen_ext+11317:0x0000000076fdf3a0(shared pool|permanent memor,duration 1,cls perm+16114592 ):R:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf3f8(shared pool|permanent memor,duration 1,cls perm+16114680 ):W:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11327:0x0000000076fdf398(shared pool|permanent memor,duration 1,cls perm+16114584 ):R:8:0/0()
kcrfw_redo_gen_ext+11332:0x0000000076fdf3f0(shared pool|permanent memor,duration 1,cls perm+16114672 ):W:8:0/0()
kcrfw_redo_gen_ext+11337:0x0000000076fdf390(shared pool|permanent memor,duration 1,cls perm+16114576 ):R:8:0x758e9f/7704223()
kcrfw_redo_gen_ext+11342:0x0000000076fdf3e8(shared pool|permanent memor,duration 1,cls perm+16114664 ):W:8:0x758e9f/7704223()
kcrfw_redo_gen_ext+11347:0x0000000076fdf388(shared pool|permanent memor,duration 1,cls perm+16114568 ):R:8:0x758ea0/7704224()
kcrfw_redo_gen_ext+11352:0x0000000076fdf3e0(shared pool|permanent memor,duration 1,cls perm+16114656 ):W:8:0x758ea0/7704224()
kcrfw_redo_gen_ext+11366:0x0000000076fdf438(shared pool|permanent memor,duration 1,cls perm+16114744 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )
kcrfw_redo_gen_ext+11380:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):R:4:0x7270/29296()
kcrfw_redo_gen_ext+11386:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):W:4:0x7271/29297()
kcrfw_redo_gen_ext+11392:0x0000000076fdf468(shared pool|permanent memor,duration 1,cls perm+16114792 ):R:8:0xe0000000/3758096384(Redo Buffers(pgsz:2048k)|+134217728 )

And a snippet of trace information of memory accesses of a foreground session when using the second public redo strand:

kcrfw_redo_gen_ext+11277:0x0000000076fdf4e4(shared pool|permanent memor,duration 1,cls perm+16114916 ):R:4:0x1256f/75119()
kcrfw_redo_gen_ext+11290:0x0000000076fdf500(shared pool|permanent memor,duration 1,cls perm+16114944 ):R:8:0/0()
kcrfw_redo_gen_ext+11300:0x0000000076fdf558(shared pool|permanent memor,duration 1,cls perm+16115032 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf4f8(shared pool|permanent memor,duration 1,cls perm+16114936 ):R:8:0xdeadbeef000000e6/-2401053092612144922()
kcrfw_redo_gen_ext+11312:0x0000000076fdf550(shared pool|permanent memor,duration 1,cls perm+16115024 ):W:8:0xdeadbeef000000e6/-2401053092612144922()
kcrfw_redo_gen_ext+11317:0x0000000076fdf4f0(shared pool|permanent memor,duration 1,cls perm+16114928 ):R:8:0x10001000000e6/281479271678182()
kcrfw_redo_gen_ext+11322:0x0000000076fdf548(shared pool|permanent memor,duration 1,cls perm+16115016 ):W:8:0x10001000000e6/281479271678182()
kcrfw_redo_gen_ext+11327:0x0000000076fdf4e8(shared pool|permanent memor,duration 1,cls perm+16114920 ):R:8:0x800/2048()
kcrfw_redo_gen_ext+11332:0x0000000076fdf540(shared pool|permanent memor,duration 1,cls perm+16115008 ):W:8:0x800/2048()
kcrfw_redo_gen_ext+11337:0x0000000076fdf4e0(shared pool|permanent memor,duration 1,cls perm+16114912 ):R:8:0x1256f00012488/322633648383112()
kcrfw_redo_gen_ext+11342:0x0000000076fdf538(shared pool|permanent memor,duration 1,cls perm+16115000 ):W:8:0x1256f00012488/322633648383112()
kcrfw_redo_gen_ext+11347:0x0000000076fdf4d8(shared pool|permanent memor,duration 1,cls perm+16114904 ):R:8:0/0()
kcrfw_redo_gen_ext+11352:0x0000000076fdf530(shared pool|permanent memor,duration 1,cls perm+16114992 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):R:8:0xe6/230()
kcrfw_redo_gen_ext+11312:0x0000000076fdf528(shared pool|permanent memor,duration 1,cls perm+16114984 ):W:8:0xe6/230()
kcrfw_redo_gen_ext+11317:0x0000000076fdf4c8(shared pool|permanent memor,duration 1,cls perm+16114888 ):R:8:0xdc01ce00/3691105792(Redo Buffers(pgsz:2048k)|+67227136 redo|PUB_REDO_1+118272 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf520(shared pool|permanent memor,duration 1,cls perm+16114976 ):W:8:0xdc01ce00/3691105792(Redo Buffers(pgsz:2048k)|+67227136 redo|PUB_REDO_1+118272 )
kcrfw_redo_gen_ext+11327:0x0000000076fdf4c0(shared pool|permanent memor,duration 1,cls perm+16114880 ):R:8:0/0()
kcrfw_redo_gen_ext+11332:0x0000000076fdf518(shared pool|permanent memor,duration 1,cls perm+16114968 ):W:8:0/0()
kcrfw_redo_gen_ext+11337:0x0000000076fdf4b8(shared pool|permanent memor,duration 1,cls perm+16114872 ):R:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11342:0x0000000076fdf510(shared pool|permanent memor,duration 1,cls perm+16114960 ):W:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11347:0x0000000076fdf4b0(shared pool|permanent memor,duration 1,cls perm+16114864 ):R:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11352:0x0000000076fdf508(shared pool|permanent memor,duration 1,cls perm+16114952 ):W:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11366:0x0000000076fdf560(shared pool|permanent memor,duration 1,cls perm+16115040 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )
kcrfw_redo_gen_ext+11380:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):R:4:0xe6/230()
kcrfw_redo_gen_ext+11386:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):W:4:0xe7/231()
kcrfw_redo_gen_ext+11392:0x0000000076fdf590(shared pool|permanent memor,duration 1,cls perm+16115088 ):R:8:0xe0200000/3760193536(Redo Buffers(pgsz:2048k)|+136314880 )

If you pay really close attention you see the memory location moved from 0x76fdf3?? / 0x76fdf4?? to 0x76fdf4?? / 0x76fdf5??, all in permanent memory in the shared pool. Would this mean information for each strand is kept in a struct per strand? Let’s take a memory allocation on a specific offset in the kcrfw_redo_gen_ext function, and take the allocation on exactly the same offset in the kcrfw_redo_gen_ext function, which means it’s probably doing the same, and calculate the difference of the memory allocation:

strand 0
kcrfw_redo_gen_ext+11322:0x0000000076fdf3f8(shared pool|permanent memor,duration 1,cls perm+16114680 ):W:8
strand 1
kcrfw_redo_gen_ext+11322:0x0000000076fdf520(shared pool|permanent memor,duration 1,cls perm+16114976 ):W:8

Now take the memory addresses, and calculate the difference: 0x76fdf520 – 0x76fdf3f8 = 0x128 = 296

Okay, so would the size of the probable struct per strand be 296 bytes? Another question: where would this potential struct start?

Often, the location of structs is fetched from another location, just like we saw earlier for the redo allocation latch, especially if they are more dynamically created than fixed SGA structures, which are not dynamic by nature. Upon close examination of similarities between both pina traces, I found usage of a pointer in the kcrfsg_ struct at offset +0:

kcrfw_redo_gen_ext+14362:0x0000000060016228(fixed sga|var:kcrfsg_+0 ):R:8:0x76fdf388/1996354440(shared pool|permanent memor,duration 1,cls perm+16114568 )

The difference between the memory address I took and the pointer address in kcrfsg_ at offset 0 for the first public redo strand is:
0x76fdf3f8 – 0x76fdf388 = 0x70 = 112
The difference between the memory address I took for the second public redo strand and the pointer in kcrfsg_ at offset 0 is:
0x76fdf520 – 0x76fdf388 = 0x198 = 408
Then subtract with the calculated difference between the memory locations of the first and second public redo strand:
408 – 296 = 112
Bingo!

It would be really helpful to have some kind of explanation about this structure… I searched through My Oracle Support (very seldomly the regular knowledge base provides some internals information, the best chance for internals is the bug database, the rest is…well, let me say: not that helpful. I also hate that other tabs are enabled if nothing is found), oraus.msg, oradebug dumplist, oradebug doc component for any tracing or dumps that could give more information, but nothing seemed to point to any explanation. I dumped the shared pool, but I found the same information (the memory location being permanent memory) as I obtained from x$ksmsp. Also, since the pinatrace annotation takes care of any x$ view that has it’s source address in the SGA, it seems there is no way other than puzzle all the individual locations in what I assume is a structure that holds per strand metadata.

I decided to go through the x$ views one more time, and found an x$ view that I already knew of, which points to both ‘strands’ and ‘kcrf’; x$kcrfstrand. However, the reason it is not annotated is because this x$ view has it’s addresses in the PGA:

SQL> select addr from x$kcrfstrand;

ADDR
----------------
00007F14CE31AE00
00007F14CE31AE00

This means that when querying this x$ view, the results are gathered from wherever it comes from (the source can be file headers like redo (x$kccle) and controlfile (x$kcccf) for example, in this case we will the actual source of x$kcrfstrand is the SGA) and then stored temporarily in PGA, potentially manipulated, and then shown from the PGA, as we can see from the addresses (addr field in x$kcrfstrand).

However, what I can do to investigate where the x$kcfstrand takes its information from and run the pina (memory access) trace on the execution of ‘select * from x$kcrfstrand’, and try to see if I can find where it takes its data from. This is how the relevant part looks like:

qerfxFetch+1563:0x00007ffee590a638():W:8:0x10ea8781/283805569()
kcrf_strand_ftcb+2:0x00007ffee590a630():W:8:0x7ffee590b130/140732749885744()
kcrf_strand_ftcb+10:0x00007ffee590a620():W:8:0x7f7780569f40/140151230996288()
kcrf_strand_ftcb+23:0x00000000600169b8(Fixed Size(pgsz:2048k)|+92600 fixed sga|var:kcrf_max_strands_+0 ):R:4:0x2/2()
kcrf_strand_ftcb+29:0x00000000600169c8(Fixed Size(pgsz:2048k)|+92616 fixed sga|var:kcrf_actv_pvt_strands_+0 ):R:4:0/0()
kcrf_strand_ftcb+35:0x00007f7780448558():R:4:0/0()
kcrf_strand_ftcb+52:0x0000000060016228(Fixed Size(pgsz:2048k)|+90664 fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrf_strand_ftcb+64:0x00007ffee590a618():W:8:0xcf7cb25/217565989()
_intel_fast_memcpy+7:0x00000000157f62a0():R:8:0x137fff/1277951()
__intel_ssse3_rep_memcpy+160:0x00000000157b9488():R:4:0x8000/32768()
__intel_ssse3_rep_memcpy+184:0x0000000076fdf4a0(shared pool|permanent memor,duration 1,cls perm+16114848 ):R:16:0x00000000200040000/8590196736(??)
__intel_ssse3_rep_memcpy+3438:0x00007f7780448670():W:16:0x0000ec00000000000/4151755906482176()
...
__intel_ssse3_rep_memcpy+9254:0x0000000076fdf388(shared pool|permanent memor,duration 1,cls perm+16114568 ):R:16:0xfe56e00000fe56e00000/-2305841916834480128()
__intel_ssse3_rep_memcpy+9258:0x00007f7780448560():W:16:0xfe56e00000fe56e00000/-2305841916834480128()

Going from top to bottom through this snippet:
– qerfxFetch is the query execute rowsource function to query x$ tables, kcrf_strand_ftcb is an helper function (kernel cache recovery file strand fetch copy buffer is my guess) to fetch the kcrf strand data from the shared pool structure; the other functions are memory copy functions.
– Interestingly, the kcrf_strand_ftcb function reads offset 0 of kcrfsg_ just like we seen before. This is quite logical if you think about it, this struct is dynamically allocated in the shared pool, which means it doesn’t have a fixed address, so it needs to be looked up every time something must be read from it.
– The next functions are memcpy functions. I’ve shown the relevant ones. The first memcpy (_intel_fast_memcpy) probably is a helper function that switches to an intel optimised memcpy, __intel_ssse3_rep_memcpy, which takes advantage of intel’s SIMD optimisations (SSE3 means Supplemental Streaming SIMD Extensions 3).
– The memcpy functions at lines 11 and 12 show it’s reading 16 bytes from 0x76fdf4a0, and then writing 16 bytes to 0x7f7780448670.
– 0x76fdf4a0 is the struct for the second strand: kcrfsg_+0 = 0x76fdf388 + 296 = 0x76fdf4b0 + 16 (copy size) = 0x76fdf4a0.
– 0x7f7780448670 is quite probably and address in the process’ PGA, which is used for the intermediate representation of the struct’s data for displaying it for the x$ view.

Even more interesting, this confirms that x$kcrfstrand is the externalisation of the struct that is used together with kcrfsg_, and the calculations that were made previously are correct; kcrfsg_+0 = 0x76fdf388, and that address with 296 added is 0x76fdf4b0, which is the same as 0x76fdf4a0 + 16 (if you look into the above example, the memcpy function copies 16 bytes at a time).

I think it’s pretty safe to assume this kcrf related struct in the shared pool is commonly referred to as ‘KCRFA’ in x$kcrfstrand. Of course we can only guess that the ‘A’ means in kernel cache recovery file, I would say ‘auxiliary’, but is could also very well be the A in ‘strAnd’. It is a structure that clearly was needed when oracle changed the concept of a single log buffer into multiple public redo strands, because identical information of multiple strands need to be stored at the same time, and the information in kcrfsg_ is singular of nature.

Please mind that because of the intermediate representation of the x$ table in PGA, I could not use the trick I showed in my ‘drilldown into the logwriter inner working and communication’, which is query a single column in the x$ view, and see which memory addresses in this struct are touched; the entire struct is copied into PGA and processed before it is used to show the contents, even if only a single column of a single row is queried.

Also, also consider that since the struct data is copied into the process’ PGA in order to make this information human readable, not all offsets (in x$kqfco) of the PGA representation might be applicable to the SGA struct. In fact, I think I have proven that the KCRFA per strand metadata width is 296. Now look at the column offsets from x$kqfco of x$kcrfstrand:

SQL> select c.kqfconam, c.kqfcooff, c.kqfcosiz
from x$kqfco c, x$kqfta t
where t.indx = c.kqfcotab
and t.kqftanam = 'X$KCRFSTRAND'
order by c.kqfcooff;

KQFCONAM                         KQFCOOFF   KQFCOSIZ
------------------------------ ---------- ----------
ADDR                                    0          8
INDX                                    0          4
INST_ID                                 0          4
CON_ID                                  0          2
STRAND_HDR_SCN_KCRFA_CLN                0          8
HIGHEST_SCN_KCRFA_CLN                   0          8
PNEXT_BUF_KCRFA_CLN                    32          8
NEXT_BUF_NUM_KCRFA_CLN                 40          4
BYTES_IN_BUF_KCRFA_CLN                 48          8
AVAILABLE_BUFS_KCRFA_CLN               56          4
MEM_BUFS_AVAILABLE_KCRFA_CLN           60          4
STRAND_HEADER_BNO_KCRFA_CLN            72          4
FLAG_KCRFA_CLN                         76          1
PVT_STRAND_STATE_KCRFA_CLN             77          1
HIGHEST_SUBSCN_KCRFA_CLN               78          2
LWN_BUF_NUM_KCRFA_CLN                  80          4
STRAND_NUM_ORDINAL_KCRFA_CLN           84          4
STRAND_INFO_PTR_KCRFA_CLN              88          8
ACTION_KCRFA                          184          4
LOG_RESIDUE_KCRFA                     188          1
LAST_BUF_WRITTEN_KCRFA                216          4
LAST_BUF_GATHERED_KCRFA               220          4
CONS_LOW_BUFNO_KCRFA                  224          4
FIRST_BUF_KCRFA                       240          8
LAST_BUF_KCRFA                        248          8
TOTAL_BUFS_KCRFA                      256          4
STRAND_SIZE_KCRFA                     260          4
LOWBUF_KCRFA                          272          4
HIBUF_KCRFA                           276          4
WRAPHIBUF_KCRFA                       280          4
LOG_SEQ_KCRFA                         284          4
PTR_KCRF_PVT_STRAND                   304          8
INDEX_KCRF_PVT_STRAND                 312          4
SPACE_KCRF_PVT_STRAND                 316          4
TXN_KCRF_PVT_STRAND                   320          2

This shows that the representation of the KCRFA struct in PGA apparently has a width of 322 bytes, while we know the KCRFA struct itself is 296. I am not sure why this is, the most obvious reason would be some data is processed already in PGA to make easier to show. This might also very well be the reason there is a need for copying it to PGA first, and not display it directly from its shared pool location.

UNDO sizing

90% databases that I see for the first time have the same issue with UNDO tablespace: it’s over sized, yet still causing infamous ORA-1555 errors at times. Here is why.

UNDO is over-sized because of a dumb monitoring which treats UNDO tablespace in the same way as normal datafiles, i.e. the monitoring suggests to have some free space in UNDO. So it notifies DBAs and tells them to increase UNDO datafiles. And most DBAs are doing it straight away without thinking. This is the most common reason why databases end up with ridiculously large UNDO tablespaces.
It shouldn’t be this way. UNDO is a rollback history: there’s some really old history which can be ignored, there’s some history that might be necessary for a long running transaction, delayed block cleanout of flashback query, and finally there’s active transactions rollback history. Unless you have a special test environment, at any single point in time of a database instance nobody knows when and which UNDO data might be required. Oracle provides 3 options to handle this unknown:

  • let Oracle to decide how much undo to keep based on undo_retention and autoextensible undo data files
  • let Oracle to decide how much undo to keep based on the fixed size of UNDO tablespace
  • ask Oracle to guarantee undo_retention setting

Since datafiles are usually autoextensible, option 1 is the most common scenario. There are two possibilities with this option:
1) undo_retention is left default at 900 seconds. This effectively means Oracle doesn’t need to keep a lot of undo history since you didn’t ask for it. And no matter how big your UNDO is, this setup may easily lead to “snapshot too old” errors – Oracle doesn’t need to keep much UNDO, and is happily re-using undo space quickly enough
2) undo_retention is set to relatively high value, like few hours. This should set the minimum undo retention time, and, in theory, should guarantee you enough transactions history in the UNDO to run queries of at least undo_retention duration. However I’ve seen it more than once that setting undo_retention with autoextensible UNDO doesn’t help to avoid ORA-1555 even for queries that are much shorter than undo_retention. I don’t know how to explain this, and I don’t have a test case and I hate Oracle Support

With fixed size UNDO Oracle behaves rather differently and tries to utilize all the allocated UNDO datafiles as much as possible. It ignores undo_retention and just shows you tuned_undoretention that represents how much history you keep in your fixed size UNDO. I think this is the only reliable way to run Oracle DB in production. In fact documentation suggests to use this approach. It’s just not default.

Recently I’ve seen a database with multi terabyte UNDO. Its size was about 20% of all the datafiles in the system. This prompted me to try to write a query which would compute approximate UNDO size required to handle the load.
Here’re the ideas:

  • the minimum UNDO size should be derived from the maximum number of active undo blocks over the run time of the database. It’s most likely not enough, but gives a good starting point to understanding how much space active transactions require in extreme situations
  • the maximum UNDO size should be equal to the maximum of the sum of ‘undo change vector size’ statistic over the sliding window representing target undo_retention with retention guarantee. This value most likely is way more than it is required for a normal database operation.
  • sufficient UNDO size should be somewhere between those two numbers. I think it should be derived empirically starting from some intermediate best guess between min and max; then adjusted if ORA-1555 or inefficient undo space error appear

Here’s the script I’ve come up with:

prompt
prompt UNDO sizing report
prompt ==================
prompt 
prompt Reports current undo set up, and minimum/maximum required undo tablespace size based on historical AWR data
prompt
prompt Note:
prompt   a) if your undo datafiles are not fixed size (all of them), you are doing it wrong most likely
prompt   b) awr_snapshots_count is used as a period to calculate maximum required undo size with guaranteed undo retention
prompt   c) there is no way to calculate sufficient undo size with a good precision and this report is an approximation
prompt 
prompt For feedback mailto:timur.akhmadeev@gmail.com
prompt

col inst_id                         format 99 head 'In|st'
col current_size_mb                 format 999,999,999 head 'Current|undo, MB'
col is_autoextensible               format a4 head 'Auto|ext?'
col undo_retention                  format 999.9 head 'Retention|hours'
col undo_size_min_mb                format 999,999,999 head 'Minimumal|req UNDO, MB'
col undo_size_guarantee_mb          format 999,999,999 head 'Max req|UNDO, MB'
col longest_sql                     format 999,999.9 head 'Longest|SQL, h'
col longest_sql_id                  format a13 head 'Longest|sql_id'
col max_ora1555_cnt                 format 999,999 head 'Max ORA-|1555 cnt'
col max_no_space_cnt                format 999,999 head 'Max no|space cnt'

select
  ua.inst_id,
  ua.current_size_mb,
  ua.is_autoextensible,
  ua.undo_retention/3600 undo_retention,
  um.undo_size_min_mb,
  u.undo_size_guarantee_mb,
  um.longest_sql/3600 longest_sql,
  um.longest_sql_id,
  um.max_ora1555_cnt,
  um.max_no_space_cnt
from
  gv$parameter p,
  ( -- how much undo is required to guarantee undo retention for awr_snapshots_count period
    select 
      inst_id,
      max(required_undo_mb) undo_size_guarantee_mb
    from
    (
      select
        inst_id,
        sum(undo_size) over (partition by inst_id order by begin_interval_time rows &awr_snapshots_count preceding) required_undo_mb
      from
      (     
        select
          s.instance_number inst_id,
          s.begin_interval_time,
          round((ss.value - lag(ss.value) over (partition by s.instance_number order by s.begin_interval_time))/1024/1024) undo_size
        from 
          dba_hist_snapshot s,
          v$database d,
          dba_hist_sysstat ss,
          v$statname n
        where
          s.dbid = d.dbid and
          s.dbid = ss.dbid and
          s.instance_number = ss.instance_number and
          s.snap_id = ss.snap_id and
          ss.stat_id = n.stat_id and
          n.name = 'undo change vector size'
      )
    )
    group by inst_id
  ) u,
  ( -- minimally required undo as max active blocks
    select
      uh.instance_number inst_id,
      max(activeblks * p.value/1024/1024) undo_size_min_mb,
      max(maxquerylen) longest_sql,
      max(maxquerysqlid) keep (dense_rank first order by maxquerylen desc) longest_sql_id,
      max(ssolderrcnt) max_ora1555_cnt,
      max(nospaceerrcnt) max_no_space_cnt
    from
      dba_hist_snapshot s,
      v$database d,
      dba_hist_undostat uh,
      v$parameter p
    where
      s.dbid = d.dbid and
      s.dbid = uh.dbid and
      s.instance_number = uh.instance_number and
      s.snap_id = uh.snap_id and
      p.name = 'db_block_size'
    group by
      uh.instance_number
  ) um,
  ( -- current undo setup
    select
      p.inst_id,
      round(sum(bytes)/1024/1024) current_size_mb,
      max(t.autoextensible) is_autoextensible,
      max(ur.value) undo_retention
    from 
      dba_data_files t, 
      gv$parameter p,
      (select inst_id, value from gv$parameter where name = 'undo_retention') ur
    where
      t.tablespace_name = p.value and
      p.name = 'undo_tablespace' and
      p.inst_id = ur.inst_id
    group by
      p.inst_id
  ) ua
where
  p.inst_id = u.inst_id and
  p.name = 'undo_retention' and
  u.inst_id = ua.inst_id and
  ua.inst_id = um.inst_id
;

Example output:

Enter value for awr_snapshots_count: 5

 In      Current Auto Retention    Minimumal      Max req    Longest Longest       Max ORA-    Max no
 st     undo, MB ext?     hours req UNDO, MB     UNDO, MB     SQL, h sql_id        1555 cnt space cnt
--- ------------ ---- --------- ------------ ------------ ---------- ------------- -------- ---------
  1       65,508 YES         .3        3,678        9,213        1.7 6xxxxxxxxxxxg        0         0
  2       55,508 YES         .3        5,325       26,797        1.7 6xxxxxxxxxxxg        0         0

What does the output mean?

  • this is a 2 node RAC with autoextensible UNDO tablespaces of more than 50G each
  • undo_retention is default 900 seconds (non important rounding error from 0.25)
  • maximum active undo blocks as reported in dba_hist_undostat were few GBs per instance and these numbers are around minimally required UNDO tablespaces
  • maximum UNDO vector size that was generated for 5 consecutive AWR snapshots is 9 and 26 G respectively. Those are required UNDO tablespaces to guarantee undo_retention of 6 hours (hourly snapshots in this DB).
  • few FYI columns: the longest SQL, its sql_id and max number of ORA-1555 and space errors (should be sum perhaps)
  • based on these numbers I think I’d try to set UNDO fixed size about 20G for both nodes. This is smaller than currently allocated UNDO, and it should still be enough to run the DB without errors. This DB didn’t have any undo errors though, so technically, no need to touch it unless it’s broken or there’s not enough space for such UNDO

I tried to use Undo Advisor for this DB and the only result I got was … “Undo tablespace is OK.” </p />
</p></div>

    	  	<div class=

NULL’s vs NOT NULL’s and Performance

When it comes to giving the cost based optimiser the best possible chance to make the “right” decisions, many DBA’s are diligent in keeping statistics up to date, using histograms where appropriate, creating more indexes (or removing surplus indexes).

However one often neglected area is that the the null-ness of columns also impacts the optimiser decisions. NULL and NOT NULL do more than just act as constraints, they also add (or detract) to the value of indexes on those columns. Here’s an example of how the null-ness of a column impacts optimizer decisions. I have a table T which is a copy of DBA_OBJECTS, indexed on OBJECT_ID.


SQL> create table t as
  2  select * from dba_objects
  3  where object_id is not null;

Table created.

SQL>
SQL> create index IX on T ( object_id );

Index created.

Now I want to perform a standard pagination style query, namely, get the first 5 rows in order of OBJECT_ID


SQL>
SQL> set autotrace traceonly
SQL> select *
  2  from
  3    ( select * from t
  4      order by object_id
  5    )
  6  where rownum <= 5;

5 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3299198703

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     5 |  2405 |       |  2755   (1)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 78750 |    36M|       |  2755   (1)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|      | 78750 |     9M|    14M|  2755   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | T    | 78750 |     9M|       |   428   (1)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
         52  recursive calls
        130  db block gets
       1591  consistent gets
          0  physical reads
      25420  redo size
       2735  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          5  rows processed

I scanned the entire table and burned around 1500 consistent gets. Now I’ll let the database know what I already know, that is, that the OBJECT_ID column is contains no nulls. Then I’ll repeat the same query.


SQL>
SQL> alter table T modify object_id not null;

Table altered.

SQL>
SQL> select *
  2  from
  3    ( select * from t
  4      order by object_id
  5    )
  6  where rownum <= 5;

5 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3114946973

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     5 |  2405 |     3   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |      |       |       |            |          |
|   2 |   VIEW                        |      |     5 |  2405 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T    | 78750 |     9M|     3   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IX   |     5 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
         85  recursive calls
        132  db block gets
         72  consistent gets
          1  physical reads
      27192  redo size
       2735  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          5  rows processed

Woo hoo! 20 times more efficient. The extra information we provided to the database allowed for more execution paths to be considered by the optimizer.

That’s all well and good. By how do we know which columns might be missing an appropriate NOT NULL constraint ?

Well, the following routine might help Smile

The PL/SQL procedure below deduces a ‘number of rows that are null’ count for all columns that are indexed for all tables in the schema (passed as parameter P_SCHEMA), although you can pass a table name to restrict the procedure to just that table. For any column that contains no nulls, the appropriate ‘alter table … ( column_name not null)’ command will be outputted. (Make sure you set serveroutput on).

Notes:

  • The client for whom I wrote this script, had no NOT NULL constraints on any table so the procedure only takes a rough stab at ALL_CONSTRAINTS for existing constraints.
  • It tries to keep things sensible – avoiding complex data types, tables that are auto-generated or used for queues etc, but there’s no guarantee it won’t either miss a table, or pick up it should not.
  • This is a brute force approach – it works by scanning every table in the schema, so use your discretion as to when would be a suitable time for running this routine.  But it will only scan each table once to determine the null count for all candidate columns.
  • (As with any diagnosis script), you should not apply it’s recommendations without some careful thought first.

12.2 version


create or replace 
procedure check_indexed_columns_for_null(
                 p_schema varchar2, 
                 p_table_name varchar2 default null) is
  cursor x is
   select 
     table_name,
     column_name,
     count(*) over ( partition by table_name ) as colcount
     from
     (
       select 
         table_name,
         column_name,
         min(existing_constraint)
       from 
       (
       select  
           a.table_name, 
           a.column_name, 
           ( select  count(*)
             from    all_constraints x,
                     all_cons_columns cc
             where   x.owner = c.owner
             and     x.table_name = c.table_name
             and     cc.owner      = x.owner
             and     cc.constraint_name = x.constraint_name
             and     
            (
               ( x.constraint_type = 'C'  and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
                   or 
               ( x.constraint_type = 'P'  and cc.column_name = a.column_name )
             )
            ) existing_constraint
       from 
         all_ind_columns a,
         all_tables c,
         all_tab_columns ccol
       where a.index_owner = p_schema
       and a.index_owner = p_schema
       and a.table_name = nvl(upper(p_table_name),a.table_name)
       and c.table_name = a.table_name
       and c.owner      = a.table_owner
       and c.owner      = ccol.owner
       and c.table_name = ccol.table_name
       and a.column_name = ccol.column_name
       and c.secondary = 'N'
       and c.temporary = 'N'
       and c.nested    = 'NO'
       and c.external  = 'NO'
       and ccol.data_type_owner is null
       and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
       and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = p_schema )
       )
       group by 
         table_name,
         column_name
       having min(existing_constraint) = 0
     );

  str0 varchar2(32767); 
  str1 varchar2(32767); 
  str2 varchar2(32767); 
  str3 varchar2(32767);

  prev varchar2(100) := '*';
  cnt number;
  trailer varchar2(5);

procedure do_sql(thesql varchar2) is
  tcursor integer;
  dummy integer;
begin
  -- dbms_output.put_line(thesql);
  execute immediate thesql;
end;

begin
for i in x loop
  if prev != i.table_name then
    str0 := 'declare ';
    str1 := 'begin select '; str2 := ' into ';
    str3 := ' '; cnt := 1;
  end if;
  if cnt = i.colcount then 
    trailer := ' ';
  else
    trailer := ','||chr(10);
  end if;
  str0 := str0 || 'v'||ltrim(cnt)||' number;';
  str1 := str1 || 'sum(decode('||i.column_name||',null,1,0))'||trailer;
  str2 := str2 || 'v'||ltrim(cnt)||trailer;
  str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
    'dbms_output.put_line(''alter table '||p_schema||'.'||i.table_name||
    ' modify ('||i.column_name||' not null);''); end if;'||chr(10);
  if cnt = i.colcount then
    str2 := str2 ||' from '||p_schema||'.'||i.table_name||';';
    str3 := str3 ||' end;';
    do_sql(str0||' '||str1||' '||str2||' '||str3);
  end if;
  prev := i.table_name;
  cnt := cnt + 1;
end loop;
end;
/
sho err

set serverout on
exec check_indexed_columns_for_null('SCOTT')

11.2 version

(Needs a direct CREATE TABLE grant to the owner of the procedure)


create or replace
procedure check_indexed_columns_for_null(
                 p_schema varchar2,
                 p_table_name varchar2 default null) is
  cursor c_template is
    select table_name, column_name, 0 colcount
    from   all_tab_columns;

  type rowlist is table of c_template%rowtype;  
  r    rowlist;
  rc   sys_refcursor;

  str0 varchar2(32767);
  str1 varchar2(32767);
  str2 varchar2(32767);
  str3 varchar2(32767);

  prev varchar2(100) := '*';
  cnt number;
  trailer varchar2(5);

procedure do_sql(thesql varchar2) is
  tcursor integer;
  dummy integer;
begin
  dbms_output.put_line(thesql);
  execute immediate thesql;
end;

begin

  begin
    execute immediate 'drop table tmp$cons purge';
  exception when others then null;
  end;

  execute immediate 'create table tmp$cons as  select owner, table_name, constraint_name, constraint_type, to_lob(search_condition) search_condition_vc '||
                    'from all_constraints';

  open rc for
    q'{select
      table_name,
      column_name,
      count(*) over ( partition by table_name ) as colcount
      from
      (
        select
          table_name,
          column_name,
          min(existing_constraint)
        from
        (
        select
            a.table_name,
            a.column_name,
            ( select  count(*)
              from    ( select owner, table_name, constraint_name, constraint_type, cast(search_condition_vc as varchar2(4000)) search_condition_vc 
                        from tmp$cons ) x,
                      all_cons_columns cc
              where   x.owner = c.owner
              and     x.table_name = c.table_name
              and     cc.owner      = x.owner
              and     cc.constraint_name = x.constraint_name
              and
             (
                ( x.constraint_type = 'C'  and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
                    or
                ( x.constraint_type = 'P'  and cc.column_name = a.column_name )
              )
             ) existing_constraint
        from
          all_ind_columns a,
          all_tables c,
          all_tab_columns ccol
        where a.index_owner = :p_schema
        and a.index_owner = :p_schema
        and a.table_name = nvl(upper(:p_table_name),a.table_name)
        and c.table_name = a.table_name
        and c.owner      = a.table_owner
        and c.owner      = ccol.owner
        and c.table_name = ccol.table_name
        and a.column_name = ccol.column_name
        and c.secondary = 'N'
        and c.temporary = 'N'
        and c.nested    = 'NO'
        and (c.owner,c.table_name) not in ( select owner, table_name from all_external_tables where owner = :p_schema )
        and ccol.data_type_owner is null
        and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
        and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = :p_schema )
        )
        group by
          table_name,
          column_name
        having min(existing_constraint) = 0
      )
      }' using p_schema,p_schema,p_table_name,p_schema,p_schema;

  fetch rc bulk collect into r;
  close rc;
  
  for i in 1 .. r.count loop
    if prev != r(i).table_name then
      str0 := 'declare ';
      str1 := 'begin select '; str2 := ' into ';
      str3 := ' '; cnt := 1;
    end if;
    if cnt = r(i).colcount then
      trailer := ' ';
    else
      trailer := ','||chr(10);
    end if;
    str0 := str0 || 'v'||ltrim(cnt)||' number;';
    str1 := str1 || 'sum(decode('||r(i).column_name||',null,1,0))'||trailer;
    str2 := str2 || 'v'||ltrim(cnt)||trailer;
    str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
      'dbms_output.put_line(''alter table '||p_schema||'.'||r(i).table_name||
      ' modify ('||r(i).column_name||' not null);''); end if;'||chr(10);
    if cnt = r(i).colcount then
      str2 := str2 ||' from '||p_schema||'.'||r(i).table_name||';';
      str3 := str3 ||' end;';
      do_sql(str0||' '||str1||' '||str2||' '||str3);
    end if;
    prev := r(i).table_name;
    cnt := cnt + 1;
  end loop;
end;
/

 

If you want to see the underlying table scan queries that are being run, simply comment back in the “dbms_output.put_line” in the DO_SQL subroutine.

Enjoy!

Friday Philosophy – Condoning Bad Behaviour

I used to work with a man called Nick(*). Nick was friendly enough, he was good at programming and he had very few annoying personal habits. Nick was easy to work with.

https://mwidlake.files.wordpress.com/2018/02/temper-tantrum.jpg?w=150 150w" sizes="(max-width: 300px) 100vw, 300px" />


I WON’T take my share of the Christmas Cover!

When he finally turned up.

You see, Nick would sometimes turn up around 9am like everyone else. But more often he would get in just before 10am. And then it started to go past 10am and more like 10:15, 10:30… soon it was closer to 11am. He used to stay quite late to make up the time and he got done the programming work he was allocated. But it was a pain in the backside for everyone else. People who worked with him would be waiting for him to turn up and he would sometimes amble into a meeting after it had started.

Then I found myself managing Nick and about the first thing I did was have a little chat about his timekeeping. Nice, friendly Nick did not like this, he could not see the problem, he stayed late to do his work, the company was getting it’s “pound of flesh” as he put it. Why did it matter? So I explained the impact on the rest of the team and that core hours were clearly stated: 10:00-12:00 & 14:00-16:00. During those hours we all knew everyone was around and we could collaborate, it’s called team work.

Nick was having none of this – “If this was a problem, how come Sarah never raised it as an issue?”. And there was the reason that this was not just a small problem but a big problem. Yes, Sarah was his prior boss and she had not said anything to him about it. “You are just trying to show who is boss!”. Yes, yes I am, and being your boss is partly to tell you when you are doing things wrong, so stop it.

Nick’s prior boss had made the decision to condone bad behaviour, to let Nick come in later and later without intervening. Sometimes condoning bad behaviour is an active thing, like laughing at sexist/racist jokes, but usually it is a passive thing. If someone is doing something wrong and, as their manager, you do not challenge it then you are accepting it, you are condoning it. And once you have let it slip a few times, challenging it is harder. In Nick’s case it had resulted in the occasional late arrival becoming common, an accepted situation and a much more significant issue. It was also now a harder behaviour to challenge.

This situation is of course not limited to the manager/subordinate relationship, sometimes our friends or relatives behave badly and you have the choice to accept it or challenge it. I think that helps us immediately understand why we condone bad behaviour, as to challenge it causes confrontation. And very few of us like confrontation.

In another situation I had, there was a guy who would suddenly just go off the deep end for no good reason. Something would annoy him and he would start shouting and getting angry, way beyond what was reasonable. Now, to challenge that kind of bad behaviour you know it is going to be hard work. Thankfully, my boss at the time did, and explained to me at length and very forcefully that I needed to be more mature and less of a dick.

I think we can all agree that we should not condone bad behaviour but we can be reticent to do so due to the conflict.

Of course, a particularly difficult situation is when it is your boss (or parent!) who is behaving badly!

https://mwidlake.files.wordpress.com/2018/02/tantrum2.jpg?w=150&h=107 150w, https://mwidlake.files.wordpress.com/2018/02/tantrum2.jpg 500w" sizes="(max-width: 300px) 100vw, 300px" />


You will respect my authority!

Also, at what point do you challenge the behaviour? Probably not at the first incident, especially if it is minor like turning up to work late. After all, it might be a one-off, they may have reasons for the behaviour (one person I managed was turning up late as they were having a hell of a time at home, they needed some slack). Something more serious such as socially unacceptable behaviour, you need to question it right away. You also can’t challenge every small thing you perceive as wrong, you will just annoy everyone and become regarded as a control freak/moral bore.

You also need to consider the impact of challenging them. If it is over something that would embarrass or offend them, it could sour your relationship with them and the rest of the team. Catching someone out lying can be tricky to deal with (I once had someone ask me for holiday on short notice as a relative was ill. But his new girlfriend also reported to me and she was honest about the “urgent need” for the holiday…). I think the most common decision made when the bad behaviour is one that the other person will be embarrassed or in denial over is to let it lie or challenge it “if it happens again”. Only, just like with Nick and his late arrivals, each time you delay addressing the bad behaviour it will get harder to do so.

I can’t claim that I always handled the condoning of bad behaviour as well as I should, I was by no means a perfect boss or friend (or relative). I think it is one of the hardest parts of being a manager, especially if you are averse to confrontation. But over all, I’ve suffered more in the long run by not challenging bad behaviour than I have by trying to handle it.

As to how you handle it, that’s a whole different topic…

(* Nick was not his real name, I changed it to protect the innocent… It was Dave)

PeopleSoft and Invalid Views in the Oracle Database

I was listening to the section on Invalid Views in PSADMIN Podcast #117 (@19:00). Essentially, when you drop and recreate a view that is referenced by a second view, the status on the second view in the database goes invalid. This is not a huge problem because as soon as you query the second view it is compiled. However, you would like to know whether any change to a view prevents any dependent views from compiling, although you would expect have teased these errors out before migration to production.
The PeopleSoft solution to this is to include all the dependent views in the Application Designer project. However, as pointed out, in the podcast you are now releasing code, possibly unintentionally releasing code changes and certainly updating last change dates on record definitions when really you just need to compile the database objects.   PeopleSoft does this because it is a platform generic solution, but really this is using the PeopleSoft Application Designer to solve a database management issue.
A similar problem also occurs in the Oracle database with dependent PL/SQL procedures and packages where you sometimes get referential loops. Oracle provides a procedure DBMS_UTILITY.COMPILE_SCHEMA that recompiles all invalid objects in a schema and reports any errors to the ALL_ERRORS view.  I think this is a much safer option.

Here is a very simple (non-PeopleSoft) example

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">drop table t purge;
drop view a;
drop view b;

create table t (a number);
insert into t values (1);
create view a as select a from t;
create view b as select a from a;

column object_name format a12
select object_type, object_name, status
from user_objects where object_name IN('T','A','B')
/

OBJECT_TYPE OBJECT_NAME STATUS
----------------------- ------------ -------
TABLE T VALID
VIEW B VALID
VIEW A VALID

Dropping and recreating view A renders view B invalid.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">drop view a;
create view a as select a from t;

select object_type, object_name, status
from user_objects
where object_name IN('T','A','B');

OBJECT_TYPE OBJECT_NAME STATUS
----------------------- ------------ -------
TABLE T VALID
VIEW B INVALID
VIEW A VALID

Just querying B makes it valid again.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select * from b;
select object_type, object_name, status
from user_objects where object_name IN('T','A','B');

OBJECT_TYPE OBJECT_NAME STATUS
----------------------- ------------ -------
TABLE T VALID
VIEW B VALID
VIEW A VALID

Let's make B invalid again by rebuild A, but this time I will change the name of the column in view A from A to T so that view B cannot compile without an error.  I can recompile every invalid object in the schema by calling DBMS_UTILITY_COMPILE_SCHEMA.  However, B remains invalid because there is an error.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">drop view a;
create view a (t) as select a from t;

EXEC DBMS_UTILITY.COMPILE_SCHEMA(schema => 'SCOTT');
select object_type, object_name, status
from user_objects where object_name IN('T','A','B');

OBJECT_TYPE OBJECT_NAME STATUS
----------------------- ------------ -------
TABLE T VALID
VIEW B INVALID
VIEW A VALID

I can query the errors from USER_ERRORS.  So now I have recompiled all invalid objects and have a report of the exceptions that I can work on fixing.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">NAME
---------------------------------------------------
TYPE SEQUENCE LINE POSITION
------------ ---------- ---------- ----------
TEXT
---------------------------------------------------
ATTRIBUTE MESSAGE_NUMBER
--------- --------------
B
VIEW 1 0 0
ORA-00904: "A": invalid identifier
ERROR 0

N.B.: if you use CREATE OR REPLACE VIEW is not left invalid unless there is an error. Unfortunately, Application Designer always drops and recreates views.

Conferences 2017

Last year I’ve been to a few conferences. At some point I thought I need to record some of the sessions to let more people see them as well.
So I took a cheap action camera & recorded several presentations. Video quality is not good (mostly) due to lighting but still is enough to get an idea of how was it back there in a room.
Here’re links to the videos. Enjoy!

Hatem Mahmoud – Memory Access Tracing/Profiling

Jonathan Lewis – Just Don’t Do It

Panel Discussion at POUG2017

Marcin Przepiorowski – dNFS for DBAs

Neil Chandler – Why Has My Plan Changed

Timur Akhmadeev – Common Pitfalls in Complex Apps Performance Troubleshooting (same presentation in Russian at Russian OUG meetup in Moscow)

Neil Chandler – Histograms are Evil like Chocolate is Evil (part of the same presentation recorded in Poland as well)

Kamil Stawiarski – VOODOO: the black magic of cheep cross database replication

Roger MacNicol – Table Access Full

Mike Dietrich – Ensure Performance Stability When Upgrading Oracle Database

Adam Bolinski – High Performance Oracle Environment on Not Only NFS

Oracle Solaris 11.4 Public Beta Released

Yesterday the Oracle Solaris 11.4 Beta was released to the public. You can download it from OTN to have your go with it. Please read the documentation to learn all about the new and improved features. And remember that you can always contact me in case you need training on the most advanced Operating System […]

Creating synthetic and random test data from an existing table

We have all been there at some point. Either we need to run a test but can't bring production data
outside of the production network or we need to produce a test case for a support organization that
are not allowed to view production data. What to do?

Creating synthetic and random test data from an existing table

We have all been there at some point. Either we need to run a test but can't bring production data
outside of the production network or we need to produce a test case for a support organization that
are not allowed to view production data. What to do?