Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Data Modeling, Dates and DAX

Presenting data in the format to ease visualization is required for any BI product.  Power BI provides much of this with Data Analysis Expressions, (DAX).   As a DBA, I admit to cringing every time a reference was made how similar it is to functions in Excel or other non-database platforms.  I’m a DBA and I naturally am going to see data at a much larger, more complex level.  I love the simplicity of DAX, which granted me the ability to acquire basic skills using it in just a day, but considering Power BI’s ability to pull from multiple data sources, including SQL Server, Oracle, PostgreSQL and even JSON files, the comparison to Excel left me feeling, well, ‘meh.’  </p />
</p></div>

    	  	<div class=

18c: Order by in WITH clause is not preserved

For a previous post I’ve run on 18c a script of mine to get the V$MYSTAT delta values between two queries. This script (new version available on GitHub) generates the queries to store some values and subtract them on the next execution. But I had to fix it for 18c because I relied on some order by in a CTE which is lost in 18c.
The idea was to get the statistic names in a Common Table Expression (CTE):

with stats as (
select rownum n,stat_id,name from (select stat_id,name from v$statname where name in (&names) order by stat_id)
)

and query it from different parts of the UNION ALL which generates the script:

select 'select ' from dual
union all
select ' '||decode(n,1,' ',',')||'"CUR'||stat_id||'" - '||'&'||'LAG'||stat_id||' "DIF'||stat_id||'"' from stats
union all
select ' '||',nvl("CUR'||stat_id||'",0) "CUR'||stat_id||'"' from stats
union all

The rowum in the WITH clause is used later to add the comma for all rows except the first one – using decode(n,1,’ ‘,’,’)

But this relied on two assumptions:

  1. the WITH keeps the rows ordered
  2. the UNION ALL keeps the rows ordered

Of course, it would be better to add a number on each part and an ORDER BY at the end of the query. But for better readability, I didn’t.

However, the first assumption is wrong since 12.2 where the optimizer introduced In-Memory Cursor Duration Temp Tables. In the query above, the CTE is materialized because it is used multiple times. And this optimization keeps it in memory. But it seems that this structure does not keep the rows ordered.

Here is an example where I have a 4 rows table:

SQL> create table DEMO as select rownum id from xmltable('1 to 4');
Table DEMO created.
 
SQL> select * from DEMO order by id;
 
ID
----------
1
2
3
4

When I put the same query in a WITH clause, with its ORDER BY, and query it from 2 union all statements, the rows are not ordered anymore:

SQL> with cte as ( select * from DEMO order by id)
select rownum,id from cte
union all
select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 4
2 3
3 2
4 1
1 4
2 3
3 2
4 1

The execution plan shows the materialization of the CTE result, and mentions that it is an In-Memory Cursor Duration Temp Table (CURSOR DURATION MEMORY)

SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 9uwc06ana6trn, child number 0
-------------------------------------
with cte as ( select * from DEMO order by id) select rownum,id from
cte union all select rownum,id from cte
 
Plan hash value: 4025392480
 
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6640_23D74B | | | | |
| 3 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 5 | UNION-ALL | | | | | |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6640_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
| 9 | COUNT | | | | | |
| 10 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6640_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - STRDEF[22], STRDEF[22]
2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[120], SYSDEF[0]
3 - (#keys=1) "DEMO"."ID"[NUMBER,22]
4 - (rowset=256) "DEMO"."ID"[NUMBER,22]
5 - STRDEF[22], STRDEF[22]
6 - "ID"[NUMBER,22], ROWNUM[8]
7 - "ID"[NUMBER,22]
8 - "C0"[NUMBER,22]
9 - "ID"[NUMBER,22], ROWNUM[8]
10 - "ID"[NUMBER,22]
11 - "C0"[NUMBER,22]

/*+ inline */

If the CTE is not materialized, the rows are ordered, bit the table is read two times:


SQL> with cte as ( select /*+ inline */ * from DEMO order by id)
2 select rownum,id from cte
3 union all
4 select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 1
2 2
3 3
4 4
1 1
2 2
3 3
4 4
 
8 rows selected.
 
SQL>
SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0dad0vwg3k0th, child number 0
-------------------------------------
with cte as ( select /*+ inline */ * from DEMO order by id) select
rownum,id from cte union all select rownum,id from cte
 
Plan hash value: 2913170750
 
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 8 (100)| |
| 1 | UNION-ALL | | | | | |
| 2 | COUNT | | | | | |
| 3 | VIEW | | 4 | 52 | 4 (25)| 00:00:01 |
| 4 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL| DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 4 (25)| 00:00:01 |
| 8 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 9 | TABLE ACCESS FULL| DEMO | 4 | 12 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - STRDEF[22], STRDEF[22]
2 - "ID"[NUMBER,22], ROWNUM[8]
3 - "ID"[NUMBER,22]
4 - (#keys=1) "DEMO"."ID"[NUMBER,22]
5 - (rowset=256) "DEMO"."ID"[NUMBER,22]
6 - "ID"[NUMBER,22], ROWNUM[8]
7 - "ID"[NUMBER,22]
8 - (#keys=1) "DEMO"."ID"[NUMBER,22]
9 - (rowset=256) "DEMO"."ID"[NUMBER,22]

“_in_memory_cdt”=off

If the CTE is materialized, but the new feature to keep it in memory is disabled, the rows are ordered (but probably by chance – there si no guarantee):


SQL> alter session set "_in_memory_cdt"=off;
 
Session altered.
 
SQL>
SQL> with cte as ( select /*+ materialize */ * from DEMO order by id)
2 select rownum,id from cte
3 union all
4 select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 1
2 2
3 3
4 4
1 1
2 2
3 3
4 4
 
8 rows selected.
 
SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 9uwc06ana6trn, child number 1
-------------------------------------
with cte as ( select * from DEMO order by id) select rownum,id from
cte union all select rownum,id from cte
 
Plan hash value: 4025392480
 
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D664D_23D74B | | | | |
| 3 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 5 | UNION-ALL | | | | | |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D664D_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
| 9 | COUNT | | | | | |
| 10 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D664D_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - STRDEF[22], STRDEF[22]
2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[120], SYSDEF[0]
3 - (#keys=1) "DEMO"."ID"[NUMBER,22]
4 - (rowset=256) "DEMO"."ID"[NUMBER,22]
5 - STRDEF[22], STRDEF[22]
6 - "ID"[NUMBER,22], ROWNUM[8]
7 - "ID"[NUMBER,22]
8 - "C0"[NUMBER,22]
9 - "ID"[NUMBER,22], ROWNUM[8]
10 - "ID"[NUMBER,22]
11 - "C0"[NUMBER,22]

So what?

We should never rely on the preservation of the row order except when explicitly documented (like rownum over a subquery with order by). And this In-Memory Cursor Duration Temporary table also works in parallel and RAC, so we can understand that the rows are not read in the same order as they were stored. This is always a problem when we migrate. When something works by chance in a version and not in the newer versions, people usually incriminate the migration, rather than recognizing an old bug in the application, just because it was working before.

 

Cet article 18c: Order by in WITH clause is not preserved est apparu en premier sur Blog dbi services.

#Exasol Fail-Safety explained

The building blocks of an Exasol cluster are commodity Intel servers like e.g. Dell PowerEdge R740 with 96 GB RAM,12 x 1.2 TB SAS Hot-plug hard-drives and 2 x 10Gb Ethernet Cards for the private network. That’s sufficient to deliver outstanding performance combined with high availability. The picture below shows a 4+1 cluster, one of our most popular configurations:

Exasol 4+1 Clusterhttps://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=1240&h=908 1240w, https://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=150&h=110 150w, https://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=300&h=220 300w, https://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=768&h=562 768w, https://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=1024&h=750 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Shared Nothing Architecture

Each active node hosts one database instance that works on its part of the database (A,B,C,D) in an MPP way. The instances communicate over the private network. Optionally, the private network can be separated into one database network and one storage network. In this case, the instances communicate over the database network. Notice that the instances access their part of the database directly on their local hard drives, they do not need the private network respectively the storage network for that. The reserve node becomes relevant only if one of the active nodes fails. The local hard drives are being setup in RAID 1 pairs, so single disk failures can be tolerated without losing database availability. Not listed is the license node that is required to boot the cluster initially. After that, the license node is no longer required to keep the cluster running.

If data volumes with redundancy 2 are in use – which is the most common case – then each node holds a copy of the data operated on by a neighbor node:

Exasol 4+1 Cluster: Redundancy 2https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=1240&h=892 1240w, https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=150&h=108 150w, https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=300&h=216 300w, https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=768&h=552 768w, https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=1024&h=736 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Redundancy 2

If a Master-Segment like A is modified, the Slave-Segment (A’) is synchronized accordingly over the private network respectively the storage network.

Availability comes with a price: The raw disk capacity is reduced by half because of the RAID 1 mirroring and again by half because of the redundancy 2, so you remain with approximately (Linux OS and database software also require a small amount of disk space) 1/4 of your raw disk capacity. But since we are running on commodity hardware – no storage servers, no SAN, no SSDs required etc. – this is actually a very competitive price.

Now what if one node fails?

Exasol 4+1 Cluster: Node failurehttps://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=1240&h=892 1240w, https://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=150&h=108 150w, https://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=300&h=216 300w, https://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=768&h=552 768w, https://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=1024&h=737 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Node failure

ExaClusterOS – Exasols Clusterware – will detect the node failure within seconds and shutdown all remaining database instances in order to preserve a consistent state of the database. Then it restarts them again on the still available 3 nodes and also on the Reserve node that now becomes an Active node too. The database itself becomes available again with the node n15 now immediately working with segment B’.

The downtime of the system caused by the node failure is below 30 seconds typically. The restart of the database triggers a threshold called Restore Delay which defaults to 10 Minutes. If within that time the failed node becomes available again, we will just re-synchronize the segments (A’ and B in the example) which can be done fast. The instance on n15 will then work with the segment B as a Master-Segment until the database is manually restarted. Then n15 becomes a reserve node again and n12 is active with an instance running there.

If the failed node doesn’t come back within Restore Delay:

Exasol 4+1 Cluster: Restore Delay is overhttps://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=1240&h=904 1240w, https://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=150&h=109 150w, https://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=300&h=219 300w, https://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=768&h=559 768w, https://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=1024&h=746 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Restore Delay is over

We will then create new segments on node n15: A’ is copied from n11 and B is copied from n13. This activity is time-consuming and puts a significant load on the private network, which is why configuring a dedicated storage network may be beneficial to avoid a drop in performance during that period. A new reserve node should now be added to the cluster, replacing the crashed n12.

Complex materialized views and fast refresh

Just a quick discovery that came across the AskTOM “desk” recently. We have an outstanding bug in some instances of fast refresh materialized views when the definition of the materialized view references a standard view.

Here’s a simple demo of the issue – I’ll use a simplified version of the EMP and DEPT tables, linked by a foreign key in the usual way:


SQL> create table dept(dept_id number(10) primary key, dname varchar2(20));
 
Table created.
 
SQL> create table emp(empid number(20) primary key, ename varchar2(20), sal number(10,2), dept_id number(10) references dept(dept_id));
 
Table created.
 
SQL> insert into dept values(10,'IT');
 
1 row created.
 
SQL> insert into dept values(20,'HR');
 
1 row created.
 
SQL> insert into dept values(30,'MAT');
 
1 row created.
 
SQL> insert into emp values(1,'MIKE',20000,10);
 
1 row created.
 
SQL> insert into emp values(2,'JOHN',30000,20);
 
1 row created.
 
SQL> insert into emp values(3,'SUE',20000,20);
 
1 row created.
 
SQL> insert into emp values(4,'TOM',40000,30);
 
1 row created.
 
SQL>
SQL> commit;
 
Commit complete.

One of the cool things with materialized views is that even with complicated SQL definitions (such as joins), the materialized view can still be fast refreshable as long as the materialized view logs and database constraints are correctly defined.


SQL> create materialized view log on dept
  2  with rowid, primary key, sequence
  3  including new values;
 
Materialized view log created.
 
SQL>
SQL> create materialized view log on emp
  2  with rowid, primary key, sequence
  3  including new values;
 
Materialized view log created.
 
SQL> create materialized view mv
  2  --build deferred
  3  refresh fast on demand
  4  with primary key
  5  enable query rewrite
  6  as
  7  select a.rowid erowid,b.rowid drowid ,b.dname, a.*
  8  from emp a, dept b
  9  where a.dept_id=b.dept_id;
 
Materialized view created.
 

Note: See the documentation for DBMS_MVIEW.EXPLAIN_MVIEW for how to check on the refresh characteristics of a materialized view (or potential materialized view).

Now I’ll repeat the same experiment, but I’ll wrap that SQL that joins EMP and DEPT within a standard view called VW. Since a view is just stored SQL text, and the previous usage of the same SQL worked fine, we’d expect no difference in functionality.  However, the results do not meet the expectation.


SQL> drop materialized view mv ;
 
Materialized view dropped.
 
SQL>
SQL> create or replace view vw
  2  as
  3  select a.rowid erowid,b.rowid drowid ,b.dname, a.*
  4  from emp a, dept b
  5  where a.dept_id=b.dept_id;
 
View created.
 
SQL>
SQL> create materialized view mv
  2  --build deferred
  3  refresh fast on demand
  4  with primary key
  5  enable query rewrite
  6  as
  7  select * from vw;
select * from vw
              *
ERROR at line 7:
ORA-12015: cannot create a fast refresh materialized view from a complex query

This is not a flaw in our DDL – it is a bug in the database that will be fixed in due course. So if you have standard views being used within your materialized view definitions, and you are getting unexpected restrictions on whether the materialized views can be fast refreshed, try a simple workaround of in-lining the view text directly.  You might have hit this bug.

Index Column Order – Impact On Index Branch Blocks Part II (The Weeping Song)

In Part I, I discussed how the order of columns in an index makes no real difference to the effectiveness of the index if all columns are referenced via equality predicates. If the leading column has a high number of distinct columns, it might result in less necessary data within index branches as less data […]

I Got Here On My Own

Click bait worthy title, eh? </p />
</p></div>

    	  	<div class=

18c: some optimization about redo size

Some years ago, at the time of 12.1 release, I published in the SOUG newsletter some tests to show the amount of redo generated by different operations on a 10000 rows table. I had run it on 12.2 without seeing the differences and now on 18.1
I get the statistics from mystat using a script that displays them as columns, with the value being the difference from the previous run. I’ve run the same as in the article, and most of the statistics were in the same ballpark.

12.2

In 12.2 I get the same numbers. I was surprised about that because there is an optimization on 12.2 when updating a column to the same value. But this optimization may not occur for all situations. This reduction of redo generation has been analyzed by Maris Elsins with redo log dumps and by Sayan Malakshinov on triggers. And then the origin of this optimization has been exposed by Bryn Llewellyn. All info and links from the following Twitter conversation:

Here are the numbers I have in 18c, which are very similar to those from 12.1


SQL> --- update with different values (the original ones were all lowercase)
SQL>
SQL> update TEST_TABLE set a=upper(a),b=upper(b),c=upper(c),d=upper(d);
10000 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
190,485 32,226,368 13,288,940 95,258
 
1 row selected.
 
SQL>
SQL> --- update with same valus
SQL>
SQL> update TEST_TABLE set a=upper(a),b=upper(b),c=upper(c),d=upper(d);
10000 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
20,346 4,594,528 1,844,012 10,085
 
1 row selected.

The second update has updated the 4 columns with the same value. For about 1MB of data (10000 rows with avg_row_len=100), we have 1.8MB of undo and 4.4MB of redo (which covers the new data and the undo). I have 4 indexes there but Oracle do not update index entries when the old and new values are the same.

The first update changes all values, and then, in addition to the changes in the table block, the indexes must be updated.

So, here, on my test, it seems that the 12.2 optimization, referenced in the tweet above, did not occur because the redo generated for the table blocks is stull full logging when the old and new values are the same. I can check from a block dump that I have the same value in undo and redo:

REDO RECORD - Thread:1 RBA: 0x000008.00002444.0010 LEN: 0x01c8 VLD: 0x01 CON_UID: 1008806272
SCN: 0x00000000002cb8a4 SUBSCN:3047 07/03/2018 12:23:22
CHANGE #1 CON_ID:4 TYP:0 CLS:36 AFN:14 DBA:0x02405a20 OBJ:4294967295 SCN:0x00000000002cb8a4 SEQ:34 OP:5.1 ENC:0 RBL:0 FLG:0x0000
ktudb redo: siz: 184 spc: 2020 flg: 0x0022 seq: 0x0147 rec: 0x22
xid: 0x000a.009.000002bd
ktubu redo: slt: 9 rci: 33 opc: 11.1 objn: 77968 objd: 77978 tsn: 0
Undo type: Regular undo Undo type: Last buffer split: No
Tablespace Undo: No
0x00000000
KDO undo record:
KTB Redo
op: 0x02 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C uba: 0x02405a20.0147.21
KDO Op code: URP row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0040a994 hdba: 0x0040a7d8
itli: 3 ispac: 0 maxfr: 4863
tabn: 0 slot: 3(0x3) flag: 0x2c lock: 0 ckix: 0
ncol: 4 nnew: 4 size: 0
Vector content:
col 0: [24]
31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 1: [24]
31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 2: [24]
31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 3: [24]
31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
CHANGE #2 CON_ID:4 TYP:0 CLS:1 AFN:12 DBA:0x0040a994 OBJ:77978 SCN:0x00000000002cb8a4 SEQ:3 OP:11.5 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C uba: 0x02405a20.0147.22
KDO Op code: URP row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0040a994 hdba: 0x0040a7d8
itli: 3 ispac: 0 maxfr: 4863
tabn: 0 slot: 3(0x3) flag: 0x2c lock: 3 ckix: 0
ncol: 4 nnew: 4 size: 0
Vector content:
col 0: [24]
31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 1: [24]
31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 2: [24]
31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 3: [24]
31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58

The redo record has the old and new values even when they are the same. I hope that we will have more documentation about the 12.2 redo optimization so that it can be predictable and tunable.

18.1

So, I didn’t see the 12.2 optimizations I expected here. However, but it seems that we have one with deletes on 18c.

In 12.2 the delete of all 10000 rows without index generates 2MB of undo and 3.5MB of redo:

SQL> --- delete all rows
SQL>
SQL> delete from TEST_TABLE;
10000 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
20,690 3,670,476 2,053,292 10,145
 
1 row selected.

The same in 18.1 generates only 1MB of undo and redo:

SQL> --- delete all rows
SQL>
SQL> delete from TEST_TABLE;
10000 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
872 1,187,120 1,116,812 293
 
1 row selected.

So in 12.2 I had one block change, and one redo record per row deleted. In 18.1 it seems that I have one redo record per block where all rows are deleted. Still in the same Twitter conversation, Tanel Poder had the idea to do a sparse delete leaving one row in each block:

SQL> delete from TEST_TABLE where dbms_rowid.rowid_row_number(rowid)!=42;
9849 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
20,431 3,660,204 2,102,584 10,011

Now, as there is no blocks that are fully emptied, I’m back to one redo entry per row deleted.

Update 4-JUL-18

I got a comment from Jonathan Lewis that the delete optimization occurs when using mod() to filter one row per block instead of dbms_rowid. Actually it seems that this optimization is more related to the way the rows are accessed.

 

Cet article 18c: some optimization about redo size est apparu en premier sur Blog dbi services.

Standard Edition–different optimizer but still cool

One cool technique that the optimizer can employ is the BITMAP CONVERSION TO ROWIDS method to take advantage of B-tree indexes in a means that we would normally associate with a bitmap index. This can be particularly useful with multiple predicates on individually indexed columns because it lets us establish the rows of interest before having to visit the heap blocks.  Here’s an example of that in action, even when the indexes in question are Text indexes.

Enterprise Edition plan


SQL> create table MY_TAB ( col1 varchar2(50), col2 varchar2(50));

Table created.

SQL>
SQL> insert into MY_TAB
  2  select dbms_random.string('x', 30), dbms_random.string('x', 30)
  3  from dual
  4  connect by level <= 100000;

100000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> create index MY_TAB_IX1 on MY_TAB(col1) indextype is ctxsys.context;

Index created.

SQL> create index MY_TAB_IX2 on MY_TAB(col2) indextype is ctxsys.context;

Index created.

SQL>
SQL> exec  dbms_stats.gather_table_stats('','MY_TAB');

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select * from MY_TAB where contains(col1, 'string1') > 0 or contains(col2, 'string2') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4174159475

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |     1 |    62 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| MY_TAB      |     1 |    62 |     2   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |             |       |       |            |          |
|   3 |    BITMAP OR                        |             |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |             |       |       |            |          |
|   5 |      SORT ORDER BY                  |             |       |       |            |          |
|*  6 |       DOMAIN INDEX                  | MY_TAB_IX1  |       |       |     1   (0)| 00:00:01 |
|   7 |     BITMAP CONVERSION FROM ROWIDS   |             |       |       |            |          |
|   8 |      SORT ORDER BY                  |             |       |       |            |          |
|*  9 |       DOMAIN INDEX                  | MY_TAB_IX2  |       |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("CTXSYS"."CONTAINS"("COL1",'string1')>0)
   9 - access("CTXSYS"."CONTAINS"("COL2",'string2')>0)

But a quick trip to the licensing guide states the following about the optimizer in Standard Edition:

The following methods are not available in SE:

Bitmapped index, bitmapped join index, and bitmap plan conversions

which make sense given that you do not have access to bitmap indexes in Standard Edition. 

But all is not lost.  Even though the bitmap conversion optimization is not available, the optimizer can still come up with alternate and intelligent mechanisms to extract the required rows. Whereas you might think that an OR condition on two different columns might force a full table scan, here is the same example as above in Standard Edition.

Standard Edition plan


SQL> set autotrace traceonly explain
SQL> select * from MY_TAB where contains(col1, 'string1') > 0 or contains(col2, 'string2') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1568130183

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     2 |   108 |     2   (0)| 00:00:01 |
|   1 |  VIEW                         | VW_ORE_A5827389 |     2 |   108 |     2   (0)| 00:00:01 |
|   2 |   UNION-ALL                   |                 |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID| MY_TAB          |     1 |    62 |     1   (0)| 00:00:01 |
|*  4 |     DOMAIN INDEX              | MY_TAB_IX1      |       |       |     1   (0)| 00:00:01 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| MY_TAB          |     1 |    62 |     1   (0)| 00:00:01 |
|*  6 |     DOMAIN INDEX              | MY_TAB_IX2      |       |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CTXSYS"."CONTAINS"("COL1",'string1')>0)
   5 - filter(LNNVL("CTXSYS"."CONTAINS"("COL1",'string1')>0))
   6 - access("CTXSYS"."CONTAINS"("COL2",'string2')>0)

SQL> set autotrace off

A “UNION ALL” suggests the two separate table access paths will potentially return duplicate rows. But notice the nice touch on line 5 – when probing the rows via MY_TAB_IDX2, the optimizer threw in an additional FILTER (LNNVL(“CTXSYS”.”CONTAINS”(“COL1”,’string1′)>0)) to remove those rows that will be returned by the the “partnering” half of the UNION ALL.  In this way, we avoided a potentially expensive sort to remove duplicated results.

So you might see the occasional difference between optimizer plans for Enterprise edition versus Standard edition – but both should do a fine job at executing your queries Smile

(Both tests done on 12.2)

Announcement: New Europe Seminar in Brussels, Belgium 27-28 September 2018

Due to popular demand, I’ll be running another of my acclaimed seminars in Europe later in the year, this time in Brussels, Belgium on 27-28 September 2018. This is a must attend seminar of benefit to not only DBAs, but also to Developers, Solution Architects and anyone else interested in designing, developing or maintaining high […]

Event Sourcing: CQN is not a replacement for CDC

We are in an era where software architects want to stream the transactions out of the database and distribute them, as events, to multiple microservices. Don’t ask why, but that’s the trend: store inconsistent eventually consistent copies of data in different physical components, rather than simply using logical views in the same database, where the data is ACIDely stored, processed and protected. Because it was decided that this segregation, in CQRS (Command Query Responsibility Segregation), will be physical, on different systems, the need for logical replication and change data capture is raising, with a new name: Event Sourcing.

When we want to replicate the changes without adding an overhead to the database, the solution is Change Data Capture from the redo stream. The redo contains all the physical changes and, with dictionary information and a little supplemental logging, we can mine it to extract the logical changes. Currently are commercial products (Oracle GoldenGate, Attunity, Dbvisit replicate) and there are some open source ones based on LogMiner (StreamSets, Debezium). LogMiner is available on all Oracle Database editions without any option. In Enterprise Edition, a more efficient solution was possible with Streams but now you have to pay for GoldenGate to use Streams. Unfortunately, sometimes you pay software update to get features removed and be sold in additional products.

Oracle has another feature that can help to replicate changes: Database Change notification, now known as Continuous Query Notification (CQN) or Object Change Notification (OCN). This feature has been implemented to refresh caches: you have a query that loads the cache and you want to be notified when some changes occurred, so that you have to update/refresh the cache. Then, in theory, this can be used to stream out the changes. However, CQN was not built for frequent changes but rather for nearly static, or slowly changing data. But sometimes we have to test by ourselves and here are my test using CQN with a lot of changes on the underlying table, just to show how it increases the load on the database and slows down the changes.

I create a DEMO table with one million rows:

17:21:56 SQL> whenever sqlerror exit failure;
17:21:56 SQL> create table DEMO (ID constraint DEMO_ID primary key) as select rownum from xmltable('1 to 1000000');
 
Table DEMO created.
 

And a table to hold notifications. As always when I want to start with an example, I start to get it from oracle-base:

17:21:58 SQL> -- from Tim Hall https://oracle-base.com/articles/10g/dbms_change_notification_10gR2
17:21:58 SQL> CREATE TABLE notifications (
2 id NUMBER,
3 message VARCHAR2(4000),
4 notification_date DATE
5 );
 
Table NOTIFICATIONS created.
 
17:21:58 SQL> CREATE SEQUENCE notifications_seq;
 
Sequence NOTIFICATIONS_SEQ created.

The callback function:

17:21:58 SQL> CREATE OR REPLACE PROCEDURE callback (ntfnds IN SYS.chnf$_desc) IS
2 l_regid NUMBER;
3 l_table_name VARCHAR2(60);
4 l_event_type NUMBER;
5 l_numtables NUMBER;
6 l_operation_type NUMBER;
7 l_numrows NUMBER;
8 l_row_id VARCHAR2(20);
9 l_operation VARCHAR2(20);
10 l_message VARCHAR2(4000) := NULL;
11 BEGIN
12 l_regid := ntfnds.registration_id;
13 l_numtables := ntfnds.numtables;
14 l_event_type := ntfnds.event_type;
15 IF l_event_type = DBMS_CHANGE_NOTIFICATION.EVENT_OBJCHANGE THEN
16 FOR i IN 1 .. l_numtables LOOP
17 l_table_name := ntfnds.table_desc_array(i).table_name;
18 l_operation_type := ntfnds.table_desc_array(i).Opflags;
19 IF (BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.ALL_ROWS) = 0) THEN
20 l_numrows := ntfnds.table_desc_array(i).numrows;
21 ELSE
22 l_numrows :=0; /* ROWID INFO NOT AVAILABLE */
23 END IF;
24 CASE
25 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.INSERTOP) != 0 THEN
26 l_operation := 'Records Inserted';
27 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.UPDATEOP) != 0 THEN
28 l_operation := 'Records Updated';
29 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.DELETEOP) != 0 THEN
30 l_operation := 'Records Deleted';
31 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.ALTEROP) != 0 THEN
32 l_operation := 'Table Altered';
33 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.DROPOP) != 0 THEN
34 l_operation := 'Table Dropped';
35 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.UNKNOWNOP) != 0 THEN
36 l_operation := 'Unknown Operation';
37 ELSE
38 l_operation := '?';
39 END CASE;
40 l_message := 'Table (' || l_table_name || ') - ' || l_operation || '. Rows=' || l_numrows;
41 INSERT INTO notifications (id, message, notification_date)
42 VALUES (notifications_seq.NEXTVAL, l_message, SYSDATE);
43 COMMIT;
44 END LOOP;
45 END IF;
46 END;
47 /
 
Procedure CALLBACK compiled
 
17:21:58 SQL> -- thanks Tim

and the CQN registration:

17:21:58 SQL> -- register on DEMO;
17:21:58 SQL>
17:21:58 SQL> DECLARE
2 reginfo CQ_NOTIFICATION$_REG_INFO;
3 v_cursor SYS_REFCURSOR;
4 regid NUMBER;
5 BEGIN
6 reginfo := cq_notification$_reg_info ( 'callback', DBMS_CHANGE_NOTIFICATION.QOS_ROWIDS, 0, 0, 0);
7 regid := sys.DBMS_CHANGE_NOTIFICATION.new_reg_start(reginfo);
8 OPEN v_cursor FOR
9 SELECT dbms_cq_notification.CQ_NOTIFICATION_QUERYID, demo.* from DEMO;
10 CLOSE v_cursor;
11 sys.DBMS_CHANGE_NOTIFICATION.reg_end;
12 END;
13 /
 
PL/SQL procedure successfully completed.

Now I delete 1 million rows and commit:


17:21:58 SQL> exec dbms_workload_repository.create_snapshot;
 
PL/SQL procedure successfully completed.
 
17:22:02 SQL>
17:22:02 SQL> -- 1000000 deletes
17:22:02 SQL>
17:22:02 SQL> exec for i in 1..1000000 loop delete from DEMO WHERE id=i; commit; end loop;
 
PL/SQL procedure successfully completed.
 
17:39:23 SQL>
17:39:23 SQL> exec dbms_workload_repository.create_snapshot;

Here are the notifications captured:

17:39:41 SQL> select count(*) from notifications;
COUNT(*)
--------
942741
 
17:39:54 SQL> select * from notifications fetch first 10 rows only;
 
ID MESSAGE NOTIFICATION_DATE
--- ------------------------------------------- -----------------
135 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
138 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
140 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
142 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
145 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
147 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
149 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
152 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
154 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
156 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18

The DML has been long and SQL Monitoring shows that 64% of the time was waiting on ‘Wait for EMON to process ntfns’ which is the notification process:
CaptureCQN

The execution of the delete itself (cdq5w65zk18r1 DELETE FROM DEMO WHERE ID=:B1) is only a small part of the database time. And we have additional load on the database:
CaptureCQN01

The following is activity related to Continuous Query notification queuing of messages, the one that slows down the modifications, during the delete (from 17:22 to 17:38):

59p1yadp2g6mb call DBMS_AQADM_SYS.REGISTER_DRIVER ( )
gzf71xphapf1b select /*+ INDEX(TAB AQ$_AQ_SRVNTFN_TABLE_1_I) */ tab.rowid, tab.msgid, tab.corrid, tab.priority, tab.delay, tab.expiration ,tab.retry_count, tab.exception_qschema, tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time, tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no, tab.sender_name, tab.sender_address, tab.sender_protocol, tab.dequeue_msgid, tab.user_prop, tab.user_data from "SYS"."AQ_SRVNTFN_TABLE_1" tab where q_name = :1 and (state = :2 ) order by q_name, state, enq_time, step_no, chain_no, local_order_no for update skip locked
61cgh171qq5m6 delete /*+ CACHE_CB("AQ_SRVNTFN_TABLE_1") */ from "SYS"."AQ_SRVNTFN_TABLE_1" where rowid = :1
ccrv58ajb7pxg begin callback(ntfnds => :1); end;
cdq5w65zk18r1 DELETE FROM DEMO WHERE ID=:B1

And at the end (17:38), when the modifications are committed, my callback function is running to process the messages:
CaptureCQN02
the main query is the insert from the callback function:
8z4m5tw9uh02d INSERT INTO NOTIFICATIONS (ID, MESSAGE, NOTIFICATION_DATE) VALUES (NOTIFICATIONS_SEQ.NEXTVAL, :B1 , SYSDATE)
The callback function may send the changes to another system, rather than inserting them here, but then you can question the availability and, anyway, this will still have a high overhead in context switches and network roundtrips.

In summary, for 1 million rows deleted, here are the queries that have been executed 1 million times:

Elapsed
Executions Rows Processed Rows per Exec Time (s) %CPU %IO SQL Id
------------ --------------- -------------- ---------- ----- ----- -------------
1,000,000 1,000,000 1.0 123.4 55.2 3.2 cdq5w65zk18r1 Module: java@VM188 (TNS V1-V3) DELETE FROM DEMO WHERE ID=:B1
999,753 999,753 1.0 261.5 88.6 .7 dw9yv631knnqd insert into "SYS"."AQ_SRVNTFN_TABLE_1" (q_name, msgid, corrid, priority, state, delay, expiration, time_manager_info, local_order_no, chain_no, enq_time, step_no, enq_uid, enq_tid, retry_count, exception_qschema, exception_queue, recipient_key, dequeue_msgid, user_data, sender_name, sender_address, sender_protoc
978,351 978,351 1.0 212.5 64.3 0 61cgh171qq5m6 Module: DBMS_SCHEDULER delete /*+ CACHE_CB("AQ_SRVNTFN_TABLE_1") */ from "SYS"."AQ_SRVNTFN_TABLE_1" where rowid = :1
978,248 942,657 1.0 971.6 20 .7 8z4m5tw9uh02d Module: DBMS_SCHEDULER INSERT INTO NOTIFICATIONS (ID, MESSAGE, NOTIFICATION_DATE) VALUES (NOTIFICATIONS_SEQ.NEXTVAL, :B1 , SYSDATE)
978,167 942,559 1.0 1,178.7 33.1 .5 ccrv58ajb7pxg Module: DBMS_SCHEDULER begin callback(ntfnds => :1); end;
977,984 977,809 1.0 73.9 96.5 0 brq600g3299zp Module: DBMS_SCHEDULER SELECT INSTANCE_NUMBER FROM SYS.V$INSTANCE
933,845 978,350 1.0 446.9 51.4 .7 gzf71xphapf1b Module: DBMS_SCHEDULER select /*+ INDEX(TAB AQ$_AQ_SRVNTFN_TABLE_1_I) */ tab.rowid, tab.msgid, tab.corrid, tab.priority, tab.delay, tab.expiration ,tab.retry_count, tab.exception_qschema, tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time, tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no, tab.sender_name

This is a huge overhead. And all this has generated 8 millions of redo entries.

In summary, just forget about CQN to stream changes. This feature is aimed at cache refresh for rarely changing data. What we call today ‘event sourcing’ exists for a long time in the database, with redo logs. When a user executes some DML, Oracle generates the redo records first, store them and apply them to update the current version of the table rows. And the redo logs keeps the atomicity of transaction (the ‘A’ in ACID). Then better use this if the changes need to be propagated to other systems.

 

Cet article Event Sourcing: CQN is not a replacement for CDC est apparu en premier sur Blog dbi services.