Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Nested MVs

A recent client was seeing a very large redo penalty from refreshing materialized views. Unfortunately they had to be refreshed very frequently, and were being handled with a complete refresh in atomic mode – which means delete every row from every MV then re-insert every row.  The total redo was running at about 5GB per hour, which wasn’t a problem for throughput, but the space for handling backup and recovery was getting a bit extreme.

The requirement consisted of two MVs which extracted and aggregated row and column subsets in two different ways from a single table; then two MVs that aggregated one of the first MVs in two different ways; then two MVs which each joined one of the first level MVs to one of the scond level MVs.

No problem – join MVs are legal, aggregate MVs are legal, “nested” MVs are legal: all you have to do is create the right MV logs and pick the right refresh command.  Since the client was also running Standard Editions (SE2) there was no need to worry about how to ensure that query rewrite would work (feature not implemented on SE).

So here, simplified and camouflaged, is a minimum subset of just the first few stages of the construction: a base table with MV log, one first-level aggregate MV with its own MV log, and two aggregate MVs based on the first MV.

drop materialized view log on req_line;
drop materialized view log on jpl_req_group_numlines;

drop materialized view jpl_req_group_numlines;
drop materialized view jpl_req_numsel;
drop materialized view jpl_req_basis;

drop table req_line;

-- ----------
-- Base Table
-- ----------

create table req_line(
        eventid         number(10,0),
        selected        number(10,0),
        req             number(10,0),
        basis           number(10,0),
        lnid            number(10,0),
        area            varchar2(10),
        excess          number(10,0),
        available       number(10,0),
        kk_id           number(10,0),
        eventdate       number(10,0),
        rs_id           number(10,0)
)
;

-- --------------------
-- MV log on base table
-- --------------------

create materialized view log 
on
req_line
with rowid(
        req, basis, lnid, eventid, selected, area,
        excess, available, kk_id, eventdate, rs_id
)
including new values
;

-- --------------------
-- Level 1 aggregate MV
-- --------------------

create materialized view jpl_req_group_numlines(
        eventid, selected, 
        row_ct, req_ct, basis_ct, req, basis, 
        maxlnid, excess, numsel, area, available, kk_id, 
        rs_id, eventdate
)
segment creation immediate
build immediate
refresh fast on demand 
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(lnid)       maxlnid,
        excess,
        count(selected) numsel,
        area,
        available,
        kk_id,
        rs_id,
        eventdate
from 
        req_line
group by 
        eventid, selected, area, excess,
        available, kk_id, eventdate, rs_id
;

-- ------------------------
-- MV log on first level MV
-- ------------------------

create materialized view log 
on
jpl_req_group_numlines
with rowid 
(
        eventid, area, selected, available,
        basis, req, maxlnid, numsel
)
including new values
;


-- ----------------------------
-- First "level 2" aggregate MV
-- ----------------------------

create materialized view jpl_req_numsel(
        eventid, selected, 
        row_ct, totalreq_ct, totalbasis_ct, totalreq, totalbasis, 
        maxlnid, numsel_ct, numsel, area
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel_ct,
        sum(numsel)     numsel,
        area
from 
        jpl_req_group_numlines
group by 
        eventid, selected, area
;


-- -----------------------------
-- Second "level 2" aggregate MV
-- -----------------------------

create materialized view jpl_req_basis(
        eventid, 
        row_ct, totalbasis_ct, totalreq_ct, totalbasis, totalreq, 
        area, selected, available, maxlnid ,
        numsel_ct, numsel
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        count(*)        row_ct,
        count(basis)    totalbasis_ct,
        count(req)      totalreq_ct,
        sum(basis)      totalbasis,
        sum(req)        totalreq,
        area,
        selected,
        available,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel,
        sum(numsel)     numsel
from
        jpl_req_group_numlines
group by 
        eventid, area, available, selected
;

Once the table, MV logs and MVs exist we can insert some data into the base table, then try refreshing the views. I have tried three different calls to the dbms_refresh package, dbms_mview.refresh_all_mviews(), dbms_mview.refresh_dependent(), and dbms_mview.refresh(), specifying the ‘F’ (fast) refresh method, atomic refresh, and nested. All three fail in the same way on 12.2.0.1. The code below shows only the refresh_dependent() call.

I’ve included a query to report the current state of the materialized views before and after the calls, and set a two second sleep before the refresh so that changes in “last refresh” time will appear. The final queries are just to check that the expected volume of data has been transferred to the materialized views.


-- ------------------------------------
-- Insert some data into the base table
-- ------------------------------------

begin
        for i in 1..100 loop
                execute immediate 'insert into req_line values( :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx)' 
                using i,i,i,i,i,i,i,i,i,i,i;
                commit;
        end loop;
end;
/

set linesize 144
column mview_name format a40

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
ORDER by
        last_refresh_date, mview_name
;

prompt  Waiting for 2 seconds to allow refresh time to change

execute dbms_lock.sleep(2)

declare
        m_fail_ct       number(6,0);
begin
        dbms_mview.refresh_dependent(
                number_of_failures      => m_fail_ct,
                list                    => 'req_line',
                method                  => 'F',
                nested                  => true,
                atomic_refresh          => true
        );

        dbms_output.put_line('Failures: ' || m_fail_ct);
end;
/

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
order by
        last_refresh_date, mview_name
;

-- --------------------------------
-- Should be 100 rows in each table
-- --------------------------------

select count(*) from jpl_req_basis;
select count(*) from jpl_req_group_numlines;
select count(*) from jpl_req_numsel;

Both the earlier versions of Oracle are happy with this code and refresh all three materialized view without fail. Oracle 12.2.0.1 crashes the procedure call with a deadlock error which, when traced, shows itself to be a self-deadlock while attempting to select a data dictionary row for update:


MVIEW_NAME                               STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_BASIS                            FRESH		     VALID		 COMPLETE 19-jan 14:03:01
JPL_REQ_GROUP_NUMLINES			 NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_NUMSEL                           FRESH		     VALID		 COMPLETE 19-jan 14:03:01

3 rows selected.

Waiting for 2 seconds to allow refresh time to change

PL/SQL procedure successfully completed.

declare
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2952
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 85
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 245
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 1243
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2414
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2908
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3699
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3723
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 75
ORA-06512: at line 4


MVIEW_NAME				 STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_NUMSEL                           NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_BASIS                            FRESH		     VALID		 FAST	  19-jan 14:03:04
JPL_REQ_GROUP_NUMLINES                   FRESH		     VALID		 FAST	  19-jan 14:03:04

The deadlock graph from the trace file, with a little extra surrounding information, looks like this:


Deadlock graph:
                                          ------------Blocker(s)-----------  ------------Waiter(s)------------
Resource Name                             process session holds waits serial  process session holds waits serial
TX-00020009-00000C78-A9B090F8-00000000         26      14     X        40306      26      14           X  40306


*** 2018-01-19T14:18:03.925859+00:00 (ORCL(3))
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=2vnzfjzg6px33) -----
select log, oldest, oldest_pk, oldest_oid, oldest_new, youngest+1/86400,  flag, yscn, oldest_seq, oscn, oscn_pk, oscn_oid, oscn_new, oscn_seq  from sys.mlog$ where mowner = :1 and master = :2 for update
----- PL/SQL Stack -----

So far I haven’t been able to spot whether or not I’m doing something wrong, or prohibited, and I haven’t been able to find a matching problem on MoS. Since the code works on 11gR2 and 12cR1 I’m inclined to believe it’s a bug introduced in the 12cR2 timeline – which is a nuisance for my client, but if it is a bug then perhaps a fix will appear fairly promptly.

CDB Views and Query Optimizer Cardinality Estimations

Today I faced a performance problem caused by a bad cardinality estimation involving a CDB view in a 12.1.0.2 multitenant environment. While solving the problem I did a number of observations that I try to summarize in this blog post.

First of all, when checking the execution plan of a query already running for more than two hours, I noticed that, in the execution plan, neither the referenced CDB view nor one of its underlying objects were referenced. The following query (and its execution plan) executed while connect to the CDB illustrates (I also added the 12.2.0.1 output to show you the difference it that area):

12.1.0.2

SQL> EXPLAIN PLAN FOR SELECT * FROM cdb_tables;

SQL> SELECT * FROM table(dbms_xplan.display);

Plan hash value: 1439328272

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 20000 |    16M|     1 (100)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 | 20000 |    16M|     1 (100)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION LIST ALL|          | 20000 |    16M|     1 (100)| 00:00:01 |     1 |   254 |  Q1,00 | PCWC |            |
|   4 |     FIXED TABLE FULL    | X$CDBVW$ | 20000 |    16M|     1 (100)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------

12.2.0.1

SQL> EXPLAIN PLAN FOR SELECT * FROM cdb_tables;

SQL> SELECT * FROM table(dbms_xplan.display);

Plan hash value: 1043806087

-------------------------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            | 20000 |    28M|     1 (100)| 00:00:01 |       |       |
|   1 |  PARTITION LIST ALL|            | 20000 |    28M|     1 (100)| 00:00:01 |     1 |     3 |
|   2 |   CONTAINERS FULL  | DBA_TABLES | 20000 |    28M|     1 (100)| 00:00:01 |       |       |
-------------------------------------------------------------------------------------------------

As you can see, the 12.1.0.2 execution plan doesn’t reference any object related to the CDB_TABLES view or one of its underlying tables. Instead, it uses the generic fixed table X$CDBVW$. Simply put, X$CDBVW$ is a fixed table that gives access to data stored in PDBs. To know more, I advise you to read Laurent Leturgez’s blog post entitled Oracle Database 12c CDB$VIEW function.

In the real query, the CDB view was joined to a number of V$ views. Unfortunately, the query optimizer selected the wrong join method (no surprise, it was a nested loops join instead of a hash join…) and the performance was abysmal. When I saw that the cardinality estimations were wrong, I checked whether the involved objects had statistics. But, because of its particular behavior, the fixed table X$CDBVW$ had no statistics. And, by the way, statistics on it can’t be gathered. If you try, you get the following error:

12.1.0.2

SQL> exec dbms_stats.gather_table_stats('SYS','X$CDBVW$')
BEGIN dbms_stats.gather_table_stats('SYS','X$CDBVW$'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "SYS"."X$CDBVW$", analyzing the table is not supported
ORA-06512: at "SYS.DBMS_STATS", line 35464
ORA-06512: at line 1

12.2.0.1

SQL> exec dbms_stats.gather_table_stats('SYS','X$CDBVW$')
BEGIN dbms_stats.gather_table_stats('SYS','X$CDBVW$'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "SYS"."X$CDBVW$", insufficient privileges or does not exist
ORA-06512: at "SYS.DBMS_STATS", line 36873
ORA-06512: at "SYS.DBMS_STATS", line 36496
ORA-06512: at "SYS.DBMS_STATS", line 36716
ORA-06512: at line 1

As Laurent mentioned in his blog post, the query optimizer uses a default value instead. However, what I noticed is that the estimation wasn’t 10000 rows as he mentioned. In my case it was 30000 rows. The difference is probably due to the version. In fact, he wrote his blog post when only 12.1.0.1 was available, but my customer is using 12.1.0.2. So, I did a couple of test on my own test environment and found out that as of and including 12.1.0.2 the number of rows increases proportionally as the number of open PDBs increases. The following example illustrates:

12.1.0.2 / 12.2.0.1

SQL> SELECT con_id, name, open_mode FROM v$pdbs WHERE open_mode LIKE 'READ%';

    CON_ID NAME                           OPEN_MODE
---------- ------------------------------ ----------
         2 PDB$SEED                       READ ONLY

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'seed' FOR SELECT * FROM cdb_tables;

SQL> ALTER PLUGGABLE DATABASE test1 OPEN;

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'test1' FOR SELECT * FROM cdb_tables;

SQL> ALTER PLUGGABLE DATABASE test2 OPEN;

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'test2' FOR SELECT * FROM cdb_tables;

SQL> ALTER PLUGGABLE DATABASE test3 OPEN;

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'test3' FOR SELECT * FROM cdb_tables;

SQL> SELECT con_id, name, open_mode FROM v$pdbs WHERE open_mode LIKE 'READ%';

    CON_ID NAME                           OPEN_MODE
---------- ------------------------------ ----------
         2 PDB$SEED                       READ ONLY
         4 TEST1                          READ WRITE
         5 TEST2                          READ WRITE
         6 TEST3                          READ WRITE

SQL> SELECT statement_id, cardinality FROM plan_table WHERE id = 0;

STATEMENT_ID                   CARDINALITY
------------------------------ -----------
seed                                 20000
test1                                30000
test2                                40000
test3                                50000

Finally, in the real query, because of the join condition based on the CON_ID column, the query optimizer incorrectly adjusted the number of rows returned through the fixed table X$CDBVW$. That led me doing few tests related to the selectivity estimations related to the CON_ID column. As the following example illustrates, the query optimizer uses a default selectivity of 1% for equality predicates, and 5% for range predicates.

12.1.0.2

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'all' FOR SELECT * FROM cdb_tables;

Explained.

SQL> SELECT * FROM table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1439328272

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 50000 |    40M|     2 (100)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 | 50000 |    40M|     2 (100)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION LIST ALL|          | 50000 |    40M|     2 (100)| 00:00:01 |     1 |   254 |  Q1,00 | PCWC |            |
|   4 |     FIXED TABLE FULL    | X$CDBVW$ | 50000 |    40M|     2 (100)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'eq' FOR SELECT * FROM cdb_tables WHERE con_id = 0;

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'gt' FOR SELECT * FROM cdb_tables WHERE con_id > 0;

SQL> SELECT statement_id, cardinality FROM plan_table WHERE id = 0;

STATEMENT_ID                   CARDINALITY
------------------------------ -----------
all                                  50000
eq                                     500
gt                                    2500

12.2.0.1

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'all' FOR SELECT * FROM cdb_tables;

Explained.

SQL> SELECT * FROM table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1281079049

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            | 50000 |    42M|     1 (100)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |            |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000   | 50000 |    42M|     1 (100)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION LIST ALL|            | 50000 |    42M|     1 (100)| 00:00:01 |     1 |     3 |  Q1,00 | PCWC |            |
|   4 |     CONTAINERS FULL     | DBA_TABLES | 50000 |    42M|     1 (100)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'eq' FOR SELECT * FROM cdb_tables WHERE con_id = 0;

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'gt' FOR SELECT * FROM cdb_tables WHERE con_id > 0;

SQL> SELECT statement_id, cardinality FROM plan_table WHERE id = 0;

STATEMENT_ID                   CARDINALITY
------------------------------ -----------
all                                  50000
eq                                     500
gt                                    2500

It goes without saying that such estimates are way off. Good estimates should consider the number of open PDBs….

In summary, if you see wrong estimates related to CDB views, don’t be surprised. In fact, the query optimizer bases its estimations on a number of default values.

Column Stats

I’ve made several comments in the past about the need for being selective when gathering objects statistics with particular reference to the trade-offs when creating histograms. With Oracle 12c it’s now reasonably safe (as far as I’m concerned) to set a method_opt as a table preference that identifies columns where you expect to see Frequency or (pace the buggy behaviour described in a recent post) a Top-N histograms. The biggest problem I have is that I keep forgetting the exact syntax I need – so I’ve written this note more as a reminder to myself than anything else.

Typically I might expect to use the standard 254 columns for gathering histograms, with an occasional variation to increase the bucket count; but for the purposes of this note I’m going to demonstarate with a much lower value. So here’s a table creation statement (running 12.1.0.2 – so it will gather basic stats on the create) and two variations of a call to gather stats with a specific method_opt – followed by a question:

create table t1
as
select
        object_type o1,
        object_type o2,
        object_type o3,
        object_id,
        object_name
from
        all_objects
where
        rownum <= 50000 -- > comment to bypass wordpress format problem
;

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1';


The big question is this: which columns will have histograms after each of the gather_table_stats() calls:

method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'

The problem I have is simple – to me both options look as if they will create histograms on all three named columns but the first option is the one that I type in “intuitively” if I don’t stop to think about it carefully. The first option, alas, will only gather a histogram on column o3 – the second option is the one that creates three histograms.

The manuals are a little unclear and ambiguous about how to construct a slightly complicated method_opt; there’s a fragment of text with the usual mix of square brackets, italics and ellipses to indicate optional and repeated clauses (interestingly the only clue about multiple columns is that comma separation seems to be required – despite one of the examples above working withough commas) but there’s no explanation of when a “size” clause should go before a “column” column and when it should go after.

So here are a few more method_opt clauses – can you work out in advance which columns would have histograms if you used them and how many buckets in each histogram; there are a couple that may surprise you:


for columns o1 size 12, o2 size 13, o3 size 14

for columns o1 size 15 o2 size 16 o3 size 17

for columns size 18 o1 size 19 o2 size 20 o3

for columns size 21 o1 o2 size 22 o3

for columns o1 size 12, o2 size 12, o3 size 13, object_id size 13 object_name size 14

for columns size 22 o1 o2 for columns size 23 o3 object_id for columns size 24  object_name

Bottom line – to me – is to check very carefully that the method_opt is going to do what I want it to do; and for production systems I tend to use the final form that repeats the “for columns {size clause} {column list}”.

GDPR ‘Murica!

Just over a year ago, an alarm of emails, posts and projects arose in Europe surrounding the General Data Protection Regulation, also known with the acronym, GDPR.  It was as if someone had poked the sleeping bear of IT and woke it and boy, was it grumpy.

Suddenly EU technologists were learning all about advanced system security, how to encrypt and mask data, multi-tier authentication, along with creating procedures when a user requested to be forgotten.  Projects and money were being allocated to take on this demand that considering its initiative was passed back in 2014, you’d think would have been earlier than a year ago, but hey, procrastination isn’t just for us ADHDr’s.

Now GDPR is everywhere you look in American news, in articles and blogs with the May 25th, 2018 EU deadline quickly approaching.  The realization that we’re more connected than we’ve ever realized globally and that we’re held accountable to GDPR has many scrambling to be “GDPR complaint”.  Far beyond anyone saying, let’s make America great again, we have to recognize the regulations of our European customers.

As relational databases are the most common location for data in businesses today, the DBA, developer and application support  are going to feel the most pressure of this looming deadline.  It’s also a law that is legally binding to any company doing business with customers in the EU and must be adhered to.

So what are the main areas of concern for database technology concerning GDPR?

Security breaches of electronic data increased over 40% in just the last year.  This is expected to increase and with the introduction of the cloud, advanced security methods must be embraced by everyone.  Per the GDPR, a security breach is”:

the accidental or unlawful destruction, loss, alteration, unauthorized disclosure or, access to, personal data transmitted, stored, or otherwise processed.

The EU is taking an unprecedented step in what it identifies as personal data, too, which includes much of the data outside of what we commonly identify as data used for identity theft.

Companies that do business with any country that is part of the EU will be held accountable, so there isn’t a choice in compliance because your company only has customers in one or two countries.  This includes companies that use a third party that use data from the EU, as well.  There is a six degrees of separation that makes a high percentage of companies liable to require compliance.

There are specific rights to every EU citizen as part of GDPR:

  • The right to be informed
  • The right to access
  • The right to rectification
  • The right to be forgotten
  • The right to restrict processing
  • The right to data portability
  • The right to object
  • The right to not be part of automated profiling or decisions based on data

As a citizen of the EU, you have the right to know how your data is being used.  It must be clearly disclosed, i.e. with full transparency.  You have the right to access your own data and to view how it’s being used.  I’m unsure what format or interface that will be offered, but I’m foreseeing something similar to credit reporting, but it will be individual data reporting.  Also, similar to credit reporting, if the data that’s out there is incorrect, you’ll have the ability to request it be corrected if it’s incorrect or incomplete.

If you decide you want to be forgotten, you have the right to request your data be removed.  The company must provide a valid reason for storing the data in the first place, but if they can’t provide any, then it must be deleted.  Even if they do get to store it, you can also request for it not to be processed, especially if the data is incomplete or incorrect.  You’ll have the opportunity to require a company to wait to process until time a data completion request is finalized, ensuring that incomplete or incorrect data doesn’t progress through other systems.

Just like with medical data, due to HIPAA, you can make requests of your data and ask that it be used in another system for credit and banking.  This could cut down on unnecessary copies and may build the initiative for a personal ID, outside of a person’s Social Security Number that is both inefficient and leaves us more open to identify theft.

You can object to your data being added to companies, such as marketing campaigns, which could cut down on spam calls, marketing mailings, etc.  Think about the amount of trees that could be saved with that second one.

The next is in the best interest in the future of machine learning and AI-  You have the right to remove your data from any automated decision or profiling performed on your data.  This could have serious impact to automated advertising through machine learning and for those that are considering breaking this rule, I’d reconsider.  Any organization that breaches any of these could suffer fines up to 20 million euros or 4% of your global turnover.  That’s even more money in the US, so don’t take this lightly.

So, whatever you do, if you haven’t started working on your GDPR initiative, start.  You’re behind and there’s a lot to cover in a short 4 months to gain compliance.

 

 

 

 

 

 

 



Tags:  , , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [GDPR 'Murica!], All Right Reserved. 2018.

The post GDPR ‘Murica! appeared first on DBA Kevlar.

Histogram Hassle

I came across a simple performance problem recently that ended up highlighting a problem with the 12c hybrid histogram algorithm. It was a problem that I had mentioned in passing a few years ago, but only in the context of Top-N histograms and without paying attention to the consequences. In fact I should have noticed the same threat in a recent article by Maria Colgan that mentioned the problems introduced in 12c by the option “for all columns size repeat”.

So here’s the context (note – all numbers used in this example are approximations to make the arithmetic obvious).  The client had a query with a predicate like the follwing:

    t4.columnA = :b1
and t6.columnB = :b2

The optimizer was choosing to drive the query through an indexed access path into t6, which returned ca. 1,000,000 rows before joining (two tables later) to t4 at which point all but a couple of rows remained – typical execution time was in the order of tens of minutes. A /*+ leading(t4) */ hint to start on t4 with an index that returned two rows reduced the response time to the classic “sub-second”.

The problem had arisen because the optimizer had estimated a cardinality of 2 rows for the index on t6 and the reason for this was that, on average, that was the correct number. There were 2,000,000 rows in the table with 1,000,000 distinct values. It was just very unlucky that one of the values appeared 1,000,000 times and that was the value the users always wanted to query – and there was no histogram on the column to tell the optimizer that there was a massive skew in the data distibribution.

Problem solved – all I had to do was set a table preference for this table to add a histogram to this column and gather stats. Since there were so many distinct values and so much “non-popular” data in the table the optimizer should end up with a hybrid histogram that would highlight this value. I left instructions for the required test and waited for the email telling me that my suggestion was brilliant and the results were fantastic… I got an email telling me it hadn’t worked.

Here’s a model of the situation – I’ve created a table with 2 million rows and a column where every other row contains the same value but otherwise contains the rownum. Because the client code was using a varchar2() column I’ve done the same here, converting the numbers to character strings left-padded with zeros. There are a few rows (about 20) where the column value is higher than the very popular value.


rem
rem     Script:         histogram_problem_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 2e4
)
select
        rownum  as id,
        case
                when mod(rownum,2) = 0
                        then '999960'
                        else lpad(rownum,6,'0')
        end     as bad_col
from
        generator       v1,
        generator       v2
where
        rownum <= 2e6
;

Having created the data I'm going to create a histogram on the bad_col – specifying 254 columns – then query user_tab_histograms for the resulting histogram (from which I’ll delete a huge chunk of boring rows in the middle):


begin

        dbms_stats.gather_table_stats(
                ownname         => 'TEST_USER',
                tabname         => 'T1',
                method_opt      => 'for columns bad_col size 254'
        );

end;
/

select
        column_name, histogram, sample_size
from
        user_tab_columns
where
        table_name = 'T1'
;

column end_av format a12

select
        endpoint_number         end_pt,
        to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') end_val,
        endpoint_actual_value   end_av,
        endpoint_repeat_count   end_rpt
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'BAD_COL'
order by
        endpoint_number
;

COLUMN_NAME          HISTOGRAM             Sample
-------------------- --------------- ------------
BAD_COL              HYBRID                 5,513
ID                   NONE               2,000,000

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
         1  303030303031001f0fe211e0800000 000001                1
        12  3030383938311550648a5e3d200000 008981                1
        23  303135323034f8f5cbccd2b4a00000 015205                1
        33  3032333035311c91ae91eb54000000 023051                1
        44  303239373236f60586ef3a0ae00000 029727                1
...
      2685  3938343731391ba0f38234fde00000 984719                1
      2695  39393235303309023378c0a1400000 992503                1
      2704  3939373537370c2db4ae83e2000000 997577                1
      5513  393939393938f86f9b35437a800000 999999                1

254 rows selected.

So we have a hybrid histogram, we’ve sampled 5,513 rows to build the histogram, we have 254 buckets in the histogram report, and the final row in the histogram is end point 5513 (matching the sample size). The first row of the histogram shows us the (real) low value in the column and the last row of the histogram reports the (real) high value. But there’s something very odd about the histogram – we know that ‘999960’ is the one popular value, occurring 50% of the time in the data, but it doesn’t appear in the histogram at all.

Looking more closely we see that every bucket covers a range of about 11 (sometimes 9 or 10) rows from the sample, and the highest value in each bucket appears just once; but the last bucket covers 2,809 rows from the sample with the highest value in the bucket appearing just once. We expect a hybrid histogram to have buckets which (at least initially) are all roughly the same size – i.e. “sample size”/”number of buckets” – with some buckets being larger by something like the amount that appears in their repeat count, so it doesn’t seem right that we have an enormous bucket with a repeat count of just 1. Something is broken.

The problem is that the sample didn’t find the low and high values for the column – although the initial full tablescan did, of course – so Oracle has “injected” the low and high values into the histogram fiddling with the contents of the first and last buckets. At the bottom end of the histogram this hasn’t really caused any problems (in our case), but at the top end it has taken the big bucket for our very popular ‘999960’ and apparently simply replaced the value with the high value of ‘999999’ and a repeat count of 1.

As an indication of the truth of this claim, here are the last few rows of the histogram if I repeat the experiment but, before gathering the histogram, delete the rows where bad_col is greater than ‘999960’. (Oracle’s sample is random, of course, and has changed slightly for this run.)

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2641  3938373731371650183cf7a0a00000 987717                1
      2652  3939353032310e65c1acf984a00000 995021                1
      2661  393938393433125319cc9f5ba00000 998943                1
      5426  393939393630078c23b063cf600000 999960             2764

Similarly, if I inserted a few hundred rows with a higher value than my popular value (in this case I thought 500 rows would be a fairly safe bet as the sample was about one in 360 rows) I got a histogram which started with a bucket about the popular bucket, so the problem of that bucket being hacked to the high value was less significant:


    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2718  393736313130fe68d8cfd6e4000000 976111                1
      2729  393836373630ebfe9c2b7b94c00000 986761                1
      2740  39393330323515efa3c99771600000 993025                1
      5495  393939393630078c23b063cf600000 999960             2747
      5497  393939393938f86f9b35437a800000 999999                1

Bottom line, then: if you have an important popular value in a column and there aren’t very many rows with a higher value, you may find that Oracle loses sight of the popular value as it fudges the column’s high value into the final bucket.

Workaround

I did consider writing a bit of PL/SQL for the client to fake a realistic frequency histogram, but decided that that wouldn’t be particularly friendly to future DBAs who might have to cope with changes. Luckily the site doesn’t gather stats using the automatic scheduler job and only rarely updates stats anyway, so I suggested we create a histogram on the column using an estimate_percent of 100. This took about 8 minutes to run – for reasons that I will go into in a moment – after which I suggested we lock stats on the table and document the fact that when stats are collected on this table it’s got to be a two-pass job – the normal gather with its auto_sample_size to start with, then a 100% sample for this column to gather the histogram:


begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt       => 'for columns bad_col size 254',
                estimate_percent => 100,
                cascade          => false
        );
end;
/

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
       125  39363839393911e01d15b75c600000 968999                0
       126  393834373530e98510b6f19a000000 984751                0
       253  393939393630078c23b063cf600000 999960                0
       254  393939393938f86f9b35437a800000 999999                0

129 rows selected.

This took a lot longer, of course, and produced an old-style height-balanced histogram. Part of the time came from the increased volume of data that had to be processed, part of it came from a suprise (which also appeared, in a different guise, in the code that created the original hybrid histogram).

I had specifically chosen the method_opt to gather for nothing but the single column. In fact whether I forced the “legact” (height-balanced) code or the modern (hybrid) code, I got a full tablescan that did some processing of EVERY column in the table and then threw most of the results away. Here are fragements of the SQL – old version first:


select /*+
            no_parallel(t) no_parallel_index(t) dbms_stats
            cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
            xmlindex_sel_idx_tbl no_substrb_pad
       */
       count(*),
       count("ID"), sum(sys_op_opnsize("ID")),
       count("BAD_COL"), sum(sys_op_opnsize("BAD_COL"))
       ...
from
       "TEST_USER"."T1" t

select /*+
           full(t)    no_parallel(t) no_parallel_index(t) dbms_stats
           cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
           xmlindex_sel_idx_tbl no_substrb_pad
       */
       to_char(count("ID")),
       to_char(count("BAD_COL")),
       substrb(dump(min("BAD_COL"),16,0,64),1,240),
       substrb(dump(max("BAD_COL"),16,0,64),1,240),
       ...
       count(rowidtochar(rowid))
from
       "TEST_USER"."T1" t  /* ACL,TOPN,NIL,NIL,RWID,U,U254U*/

The new code only used the substrb() functions on the bad_col, but all other columns in the table were subject to the to_char(count()).
The old code applied the count() and sys_op_opnsize() to every column in the table.

This initial scan was a bit expensive – and disappointing – for the client since their table had 290 columns (which means intra-block chaining as a minimum) and had been updated so much that 45% of the rows in the table had to be “continued fetches”. I can’t think why every column had to be processed like this, but if they hadn’t been that would have saved a lot of CPU and I/O since the client’s critical column was very near the start of the table.

Finally

This problem with the popular value going missing is a known issue, for which there is a bug number, but there is further work going on in the same area which means this particular detail is being rolled into another bug fix. More news when it becomes available.

Bear in mind that this problem also appears for Top-N (aka Top-Frequency) histograms – where both the lowest and highest buckets may be replaced with a bucket that reports the low-value and high-value for the column with a repeat-count of 1.

Update (Jan 2018)

This is now fixed under bug number “25994960: CARDINALITY MISESTIMATE FROM HYBRID HISTOGRAM” with a patch (of the same number) for 12.1.0.2

 

 

 

 

Spectre and Meltdown on Oracle Public Cloud UEK – LIO

In the last post I published the strange results I had when testing physical I/O with the latest Spectre and Meltdown patches. There is the logical I/O with SLOB cached reads.

Logical reads

I’ve run some SLOB cache reads with the latest patches, as well as with only KPTI disabled, and with KPTI, IBRS and IBPB disabled.
I am on the Oracle Public Cloud DBaaS with 4 OCPU

DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 670,001.2
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 671,145.4
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 672,464.0
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 685,706.7 nopti
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 689,291.3 nopti
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 689,386.4 nopti
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 699,301.3 nopti noibrs noibpb
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 704,773.3 nopti noibrs noibpb
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 704,908.2 nopti noibrs noibpb

This is what I expected: when disabling the mitigation for Meltdown (PTI), and for some of the Spectre (IBRS and IBPB), I have a slightly better performance – about 5%. This is with only one SLOB session.

However, with 2 sessions I have something completely different:

DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,235,637.8 nopti noibrs noibpb
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,237,689.6 nopti
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,243,464.3 nopti noibrs noibpb
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,247,257.4 nopti
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,247,257.4 nopti noibrs noibpb
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,251,485.1
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,253,477.0
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,271,986.7

This is not a saturation situation here. My VM shape is 4 OCPUs, which is supposed to be the equivalent of 4 hyperthreaded cores.

And this figure is even worse with 4 sessions (all cores used) and more:

DB Time(s) : 4.0 DB CPU(s) : 4.0 Logical read (blocks) : 2,268,272.3 nopti noibrs noibpb
DB Time(s) : 4.0 DB CPU(s) : 4.0 Logical read (blocks): 2,415,044.8


DB Time(s) : 6.0 DB CPU(s) : 6.0 Logical read (blocks) : 3,353,985.7 nopti noibrs noibpb
DB Time(s) : 6.0 DB CPU(s) : 6.0 Logical read (blocks): 3,540,736.5


DB Time(s) : 8.0 DB CPU(s) : 7.9 Logical read (blocks) : 4,365,752.3 nopti noibrs noibpb
DB Time(s) : 8.0 DB CPU(s) : 7.9 Logical read (blocks): 4,519,340.7

The graph from those is here:
CaptureOPCLIO001

If I compare with the Oracle PaaS I tested last year (https://blog.dbi-services.com/oracle-public-cloud-liops-with-4-ocpu-in-paas/) which was on Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz you can also see a nice improvement here on Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz.

This test was on 4.1.12-112.14.10.el7uek.x86_64 and Oracle Linux has now released a new update: 4.1.12-112.14.11.el7uek

 

Cet article Spectre and Meltdown on Oracle Public Cloud UEK – LIO est apparu en premier sur Blog dbi services.

Spectre/Meltdown on Oracle Public Cloud UEK – PIO

The Spectre and Meltdown is now in the latest Oracle UEK kernel, after updating it with ‘yum update':

[opc@PTI ~]$ rpm -q --changelog kernel-uek
| awk '/CVE-2017-5715|CVE-2017-5753|CVE-2017-5754/{print $NF}' | sort | uniq -c
43 {CVE-2017-5715}
16 {CVE-2017-5753}
71 {CVE-2017-5754}

As I did on the previous post on AWS, I’ve run quick tests on the Oracle Public Cloud.

Physical reads

I’ve run some SLOB I/O reads with the patches, as well sit KPTI disabled, and with KPTI, IBRS and IBPB disabled.

And I was quite surprised by the result:


DB Time(s) : 1.0 DB CPU(s) : 0.4 Read IO requests : 23,335.6 nopti
DB Time(s) : 1.0 DB CPU(s) : 0.4 Read IO requests : 23,420.3 nopti
DB Time(s) : 1.0 DB CPU(s) : 0.4 Read IO requests : 24,857.6
DB Time(s) : 1.0 DB CPU(s) : 0.4 Read IO requests : 25,332.1


DB Time(s) : 2.0 DB CPU(s) : 0.7 Read IO requests : 39,857.7 nopti
DB Time(s) : 2.0 DB CPU(s) : 0.7 Read IO requests : 40,088.4 nopti
DB Time(s) : 2.0 DB CPU(s) : 0.7 Read IO requests : 40,627.0
DB Time(s) : 2.0 DB CPU(s) : 0.7 Read IO requests : 40,707.5


DB Time(s) : 4.0 DB CPU(s) : 0.9 Read IO requests : 47,491.4 nopti
DB Time(s) : 4.0 DB CPU(s) : 0.9 Read IO requests : 47,491.4 nopti
DB Time(s) : 4.0 DB CPU(s) : 0.9 Read IO requests : 49,438.2
DB Time(s) : 4.0 DB CPU(s) : 0.9 Read IO requests : 49,764.5


DB Time(s) : 8.0 DB CPU(s) : 1.2 Read IO requests : 54,227.9 nopti
DB Time(s) : 8.0 DB CPU(s) : 1.2 Read IO requests : 54,582.9 nopti
DB Time(s) : 8.0 DB CPU(s) : 1.3 Read IO requests : 57,288.6
DB Time(s) : 8.0 DB CPU(s) : 1.4 Read IO requests : 57,057.2

Yes. I all tests that I’ve done, the IOPS is higher with KPTI enabled vs. when booting the kernel with the nopti option. Here is a graph with those numbers:

CaptureOPCPIO001

I did those tests on the Oracle Cloud because I know that we have very fast I/O here, in hundreds of microseconds, probably all cached in the storage:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total Wait Avg % DB Wait
Event Waits Time (sec) Wait time Class
------------------------------ ----------- ---------- --------- ------ --------
db file parallel read 196,921 288.8 1.47ms 48.0 User I/O
db file sequential read 581,073 216.3 372.31us 36.0 User I/O
DB CPU 210.5 35.0
 
% of Total Waits
----------------------------------------------- Waits
Total 1ms
Event Waits <8us <16us <32us <64us <128u <256u =512 Event to 32m <512 <1ms <2ms <4ms <8ms <16ms =32m
------------------------- ------ ----- ----- ----- ----- ----- ----- ----- ----- ------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
db file parallel read 196.9K .0 1.0 99.0 db file parallel read 194.9K 1.0 15.4 74.7 8.5 .3 .1 .0 .0
db file sequential read 581.2K 17.3 69.5 13.3 db file sequential read 77.2K 86.7 10.7 2.3 .2 .1 .0 .0 .0
 

So what?

I expected to have higher IOPS when disabling the page table isolation, because of the overhead of context switches. And it is the opposite here. Maybe this is because I have a very small SGA (because my goal is to have only physical reads). Note also that, as far as I know, only my guest OS has been patched for Meltdown and Spectre. We will see if the numbers are different after the next Oracle Cloud maintenance.

 

Cet article Spectre/Meltdown on Oracle Public Cloud UEK – PIO est apparu en premier sur Blog dbi services.

Clone a table

Sometimes doing a CREATE TABLE AS SELECT is all we need to copy the data from an existing table.  But what if we want more than that ?  What if we really want to clone that table to match the original as closely as possible.  We had a question along these lines on AskTOM today.  A standard CTAS copies the NOT NULL attributes and the data types, but not really much else.  We know that Data Pump will take care of it, but that is more complex than a simple CTAS.

So here is a simple routine to wrap the Data Pump calls so that the CTAS can be achieved with just as simple a command.  A database link pointing back to the same database is all we need.

Note:  The true innovation in this blog post came from Laurent’s excellent idea here.  I am just adding a small wrapper to make the process a little more palatable.  So all credit to Laurent here please.


SQL> create table emp as select * from scott.emp;

Table created.

SQL> create sequence seq start with 8000;

Sequence created.

SQL> alter table emp modify empno default seq.nextval;

Table altered.

SQL> alter table emp add primary key ( empno );

Table altered.

SQL> alter table emp add unique ( ename );

Table altered.

SQL> alter table emp compress;

Table altered.

SQL> alter table emp enable row movement;

Table altered.

And here is the routine to clone it, whilst keeping all of those additional bits of metadata.


SQL> create or replace
  2  procedure clone_tab(p_source varchar2,p_target varchar2) is
  3    n number;
  4    g varchar2(30);
  5    j varchar2(30);
  6  begin
  7    select global_name into g from global_name;
  8    begin
  9      execute immediate 'alter session close database link tmp$1';
 10    exception
 11      when others then null;
 12    end;
 13
 14    begin
 15      execute immediate 'drop database link tmp$1';
 16    exception
 17      when others then null;
 18    end;
 19
 20    execute immediate 'create database link tmp$1 using '''||g||'''';
 21
 22    if p_target like '%.%' or p_source like '%.%' then
 23      raise_application_error(-20000,'No schema prefix allowed');
 24    end if;
 25
 26    n := dbms_datapump.open('IMPORT','TABLE','TMP$1');
 27    dbms_datapump.metadata_filter(n,'NAME_LIST',''''||upper(p_source)||'''');
 28    dbms_datapump.metadata_remap(n,'REMAP_TABLE',upper(p_source),upper(p_target));
 29    dbms_datapump.start_job(n);
 30    dbms_datapump.wait_for_job(n, j);
 31  end;
 32  /

Procedure created.

SQL>
SQL> set serverout on
SQL> exec clone_tab('emp','emp2');

PL/SQL procedure successfully completed.

SQL>
SQL> select dbms_metadata.get_ddl('TABLE','EMP2',user) from dual
  2
SQL> select dbms_metadata.get_ddl('TABLE','EMP2',user) from dual;

DBMS_METADATA.GET_DDL('TABLE','EMP2',USER)
---------------------------------------------------------------------------------------

  CREATE TABLE "MCDONAC"."EMP2"
   (    "EMPNO" NUMBER(4,0) DEFAULT "MCDONAC"."SEQ"."NEXTVAL" NOT NULL ENABLE,
        "ENAME" VARCHAR2(10),
        "JOB" VARCHAR2(9),
        "MGR" NUMBER(4,0),
        "HIREDATE" DATE,
        "SAL" NUMBER(7,2),
        "COMM" NUMBER(7,2),
        "DEPTNO" NUMBER(2,0),
         UNIQUE ("ENAME")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE,
         PRIMARY KEY ("EMPNO")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 COMPRESS BASIC LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE ROW MOVEMENT

SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select count(*) from emp2;

  COUNT(*)
----------
        14

Secret Hacking Session: Oracle Background Process Communication, Exotic Wait Events and Some Tracing too

Update: I unexpectedly ended up falling ill and decided to reschedule this hacking session to January 24, 10am PST. No need to re-register if you already have done so. Sorry for the inconvenience. I will upload the video to Youtube after the event.

Since I’m running my Advanced Oracle Troubleshooting Training in the end of this month, I’ll do one of my “secret” hacking sessions too for promotion and noise-making reasons next week! ;-)

Secret Hacking Session with Tanel Poder: Oracle Background Process Communication, Exotic Wait Events and Some Tracing too

In this session we will look into some internals of Oracle background process communication and also some special types of wait events that most people aren’t aware of. We will use some exotic tracing for internals research and fun and some of this stuff is actually useful in real life too! I’m not going to reveal everything upfront, as this is a secret internals hacking session after all ;-)

We will use various techniques to research what the “reliable message” wait event is about and how reliable background process communication is orchestrated in Oracle.

This is a hacking session, not formal structured training, so I’ll just do free form demos and talk (probably no slides, just hacking stuff on the command line). I will later upload the video to my Youtube channel too – https://youtube.com/TanelPoder

Oh and it’s free!

Date & Time: Wed 24 Jan 10am PST

Location: GotoWebinar

See you soon!

(I said there would probably be no slides, but maybe I’ll still show one or two ;-)

 

https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/01/oracle-... 300w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/01/oracle-... 768w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/01/oracle-... 50w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/01/oracle-... 1600w" sizes="(max-width: 800px) 100vw, 800px" data-recalc-dims="1" />

 

 

NB! I am running one more Advanced Oracle Troubleshooting training in 2018! You can attend the live online training and can download personal video recordings too. The Part 1 starts on 29th January 2018 - sign up here!

ASSM tangle

Here’s a follow-on from Tuesday’s (serious) note about a bug in 12.1.0.2 that introduces random slowdown on large-scale inserts. This threat in this note, while truthful and potentially a nuisance, is much less likely to become visible because it depends on you doing something that you probably shouldn’t be doing.

There have always been problems with ASSM and large-scale deletes – when should Oracle mark a block as having free space on deletion: if your session does it immediately then other sessions will start trying to use the free space that isn’t really there until you commit; if your session doesn’t do it immediately when can it happen, since you won’t want it done on commit – but that means the segment could “lose” a lot of free space if something doesn’t come along in a timely fashion and tidy up.

But here’s a quirky problem that takes things one step further. What happens if you try to delete a load of data and fail and your session rolls back? If we start with yesterday’s script (running on 11.2.0.4 or 12.2.0.1) we can create a table with 1M rows in it and the following space usage:


Unformatted		      : 	   0 /		      0
Freespace 1 (  0 -  25% free) : 	   0 /		      0
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	  67 /		548,864
Full			      :       41,666 /	    341,327,872

You will recall that each “Full” block actually had the basic 10% free space, plus a couple of hundred extra bytes which Oracle had to “forget about” because the incoming rows were always 290 bytes long. Let’s take this table and delete the first 100,000 rows, then emulate a session error and roll back, and then check the space usage:


delete from t1 where rownum <= 100000;
rollback;

-- generate space usage report

Unformatted		      : 	   0 /		      0
Freespace 1 (  0 -  25% free) :        4,167 /	     34,136,064
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	  67 /		548,864
Full			      :       37,499 /	    307,191,808

We have 4,167 blocks which were full, and we know they are effectively full for the purposes of our data, but they’re now declared as having some free space. When Oracle rolled back the delete it wasn’t running code that would attempt to discover that the block was going to go over the limit, it simply calculated the byte change from re-inserting the row, added it to the total free space (tosp) and produced a number that hadn’t reached the limit set by pctfree – so flagged the block accordingly. (Remember my comment in the earlier article that Oracle doesn’t generate undo for the state changes on the Level 1 bitmap blocks – this is, at least in part – a consequence of that strategy).

Guideline

If you’re going to do large-scale deletes in an ASSM environment, make sure they don’t fail or subsequent inserts may take a long time.