Search

OakieTags

Who's online

There are currently 0 users and 30 guests online.

Recent comments

Oakies Blog Aggregator

Better to be safe than sorry…

I’ve always been worried about taking a script that is fine to run in my non-production environments (in particular a DROP script) and accidentally running it in a Production environment, shortly followed by the typing up of a fresh resume to look for a new job once the mistake is discovered Smile

The standard means of a “least privilege required” model when it comes to Production access is a good step along this road to protecting the database. But ultimately, at one time or another, an administrator with appropriate privileges, good intentions, but (say) bad cut-paste skills might inadvertently run a drop command against a database they did not want to.  (A well publicised example of this happened quite recently https://www.theregister.co.uk/2017/02/01/gitlab_data_loss/)

So it would be nice to have a method of adding a ‘double check’ facility so that we can guard against a DROP being run on your production system inadvertently. That way, even if you did  run that drop script against Production, you would be “saved from yourself”. Here’s one such possible means you might want to consider. We’ll use a context variable that has to be set in order for DROP commands to be processed.


--
-- Our context
--
SQL> create context DDL_CHECK using ddl_checker;

Context created.

--
-- A routinte to set it
--
SQL> create or replace
  2  procedure ddl_checker(p_val varchar2) is
  3  begin
  4    dbms_session.set_context('DDL_CHECK','ENABLED',p_val);
  5  end;
  6  /

Procedure created.

--
-- If we are on the Production database, DROP DDL's only proceed is the context variable is set
--
SQL> create or replace
  2  trigger ddl_checker_before
  3  before drop on scott.schema
  4  begin
  5    if sys.database_name like '%PROD%' and sys_context('DDL_CHECK','ENABLED') is null
  6    then
  7      raise_application_error(-20000,'DDL confirmation context is not set');
  8    end if;
  9  end;
 10  /

Trigger created.

--
-- And after every DROP, the context is reset for the next use so it cannot be left on inadvertently
--
SQL> create or replace
  2  trigger ddl_checker_after
  3  after drop on scott.schema
  4  begin
  5    ddl_checker(null);
  6  end;
  7  /

Trigger created.

SQL>
SQL>
SQL> grant execute on ddl_checker to scott;

Grant succeeded.

SQL>

So now we can see what happens when SCOTT attempts some DDL in Production by mistake.


SQL> conn scott/tiger
Connected.

SQL> drop table t purge;
drop table t purge
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-20000: DDL confirmation context is not set
ORA-06512: at line 4

--
-- We must explicitly express our intention to perform a DDL
--
SQL> exec admin.ddl_checker('true')

PL/SQL procedure successfully completed.

--
-- and only then are we allowed to run a DROP command
--
SQL> drop table t purge;

Table dropped.

--
-- and in doing so, we don't have any lingering access
--
SQL> drop table t1 purge;
drop table t1 purge
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-20000: DDL confirmation context is not set
ORA-06512: at line 4


SQL>

I stress – this is not about securing administrator access or replacing your existing processes for auditing or locking down the privileges on your production systems. This is just about that extra check to help your over-worked and/or over-tired administrators from committing a small mistake that becomes a catastrophic error.

plsql promises - promise complete solution

Finally. I got the 2 last methods completed. promise.all and promise.race is finished. So now the plsql promises library is feature complete. promise.all method takes a list of promises, built with the help of the promises_ninja package, and once the entire list of promises is fulfilled the top promise is set to fulfilled. Here is an example:

-- Test all method
declare
test_promise promise := promise();
a_promise1 promise;
a_promise2 promise;
a_promise3 promise;
all_promises promises_ninja.promise_list_type;
begin
  a_promise1 := promise('p_multiplier', 10);
  promises_ninja.build_promise_list(all_promises, a_promise1);
  a_promise2 := promise('p_multiplier', 20);
  promises_ninja.build_promise_list(all_promises, a_promise2);
  a_promise3 := promise('p_multiplier', 30);
  promises_ninja.build_promise_list(all_promises, a_promise3);
  test_promise.all_p(promises_ninja.convert_promise_list(all_promises));
  dbms_lock.sleep(5);
  test_promise.check_and_set_value();
  dbms_output.put_line('State is: ' || test_promise.state);
end;
/

The output is the list of the promises with the fulfilled values. So you can loop through the output and display individual values like this:

declare
test_promise promise := promise();
a_promise1 promise;
a_promise2 promise;
a_promise3 promise;
all_promises promises_ninja.promise_list_type;
promise_list_o promises_list_obj;
promise_val promise;
begin
  a_promise1 := promise('p_multiplier', 10);
  promises_ninja.build_promise_list(all_promises, a_promise1);
  a_promise2 := promise('p_multiplier', 20);
  promises_ninja.build_promise_list(all_promises, a_promise2);
  a_promise3 := promise('p_multiplier', 30);
  promises_ninja.build_promise_list(all_promises, a_promise3);
  test_promise.all_p(promises_ninja.convert_promise_list(all_promises));
  dbms_lock.sleep(5);
  test_promise.check_and_set_value();
  dbms_output.put_line('State is: ' || test_promise.state);
  if test_promise.val.getObject(promise_list_o) = dbms_types.success then
    -- We converted the object successfully
    for i in 1..promise_list_o.promise_list.count loop
      if promise_list_o.promise_list(i).getObject(promise_val) = dbms_types.success then
        -- We have an individual promise output value
        dbms_output.put_line('Value of promise is: ' || promise_val.getanyvalue);
      end if;
    end loop;
  end if;
end;
/

Same code and technique can be used for the race method. The only difference is that the output in the race call is a normal promise and not a list of promises, as only the first promise to complete is returned.

New Events for Data Guard and Synchronous Redo Transport in 12c (1)

I may have said it before but I consider presenting and teaching a great way to expand one’s knowledge: first of all it requires me to really understand a subject. Secondly, when presenting, you get lots of interesting questions that can turn into blog posts like this one.

Lately I have been asked about the impact of synchronous log shipping to a physical standby database. I was sure there was an effect to be observed, depending most likely on the network latency between systems but I didn’t have any evidence I could pull out of the hat to back up my thoughts. So what better than trying! I also read that some of the events have changed in 12c, and wanted to make them visible. My environment is based on the 2 node RAC primary/2 node RAC standby configuration I wrote about in my previous posts.

Since their initial setup I upgraded the cluster to 12.1.0.2.170117 for Clusterware and RDBMS.

I am planning to share my findings in two posts: the first one (the one you are reading) will present the baseline, the second one to follow shows how performance can be impacted by poor network performance. I guess the second one will be a bit longer …

Background

Data Guard allows the administrator to define the databases in a Data Guard configuration to be in one of three possible modes. The first one is Maximum Performance, the second Maximum Availability and finally Maximum Protection. All three modes are covered in the official documentation set and MOS. Without saying it very scientifically:

  • Maximum Performance uses asynchronous log shipping and is the default. The performance of the standby database (or rather lack thereof) should not impact the performance of the primary. At a cost: it is not guaranteed that redo is actually written to the standby when the commit command returns.
  • Maximum Protection allows you to enforce a Zero Data Loss (ZDL) strategy but also at a cost: The primary will shut down if it can’t transmit redo information to the standby. That is how you prevent data loss!
  • Maximum Availability is the middle ground and I’ll focus on that protection mode here.

Make sure to read the documentation and understand the implications of each protection mode when assessing the most suitable mode for your environment!

Background: New Events

Before going into details I should probably point out that there are some new Data Guard events. Hurray! More events and more meaningful names, great. Just one caveat-have a look at the output from my 12c database:

select name, wait_class from v$event_name
where name in (
'ARCH Remote Write','ASYNC Remote Write','Redo Transport Attach',
'Redo Transport Close','Redo Transport Detach','Redo Transport Open',
'Redo Transport Ping','Redo Transport Slave Shutdown','Redo Transport Slave Startup',
'Redo Writer Remote Sync Complete', 'Redo Writer Remote Sync Notify',
'Remote SYNC Ping','SYNC Remote Write');

NAME                                               WAIT_CLASS
-------------------------------------------------- ----------------------------------------
ASYNC Remote Write                                 Other
SYNC Remote Write                                  Other
ARCH Remote Write                                  Other
Redo Transport Attach                              Other
Redo Transport Detach                              Other
Redo Transport Open                                Other
Redo Transport Close                               Other
Redo Transport Ping                                Other
Remote SYNC Ping                                   Other
Redo Transport Slave Startup                       Other
Redo Transport Slave Shutdown                      Other
Redo Writer Remote Sync Notify                     Other
Redo Writer Remote Sync Complete                   Other

13 rows selected.

Compare this to the pre 12c events (taken from an 11.2.0.3 database since they are mostly gone in 12c)

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                                               WAIT_CLASS
-------------------------------------------------- --------------------
ARCH wait on ATTACH                                Network
LNS wait on ATTACH                                 Network
LNS wait on SENDREQ                                Network
LNS wait on DETACH                                 Network
ARCH wait on SENDREQ                               Network
ARCH wait on DETACH                                Network
LNS wait on LGWR                                   Network
LGWR wait on LNS                                   Network
LGWR-LNS wait on channel                           Other

9 rows selected.

For the sake of completeness here are those events that remained in 12c:

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                           WAIT_CLASS
------------------------------ ------------------------------
LNS wait on LGWR               Network
LGWR wait on LNS               Network
LGWR-LNS wait on channel       Other

Bugger. The new events are all in the “Other” wait class. My initial attempts at filtering information from v$event_histogram based on events with the “Network” class had little success. I guess friends of the OEM performance pages might also find that interesting. If you wonder from where I pulled these events – I used a white paper named “Best Practices for Synchronous Redo Transport” as reference.

Testing: The Environment

As you read in the introduction, this environment is based on a two node RAC primary -> two node RAC standby configuration:

DGMGRL> show configuration verbose

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS
DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBA1
    NCDBA2

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 177.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBA' LogXptMode
  LogXptMode = 'async'
DGMGRL> show database 'NCDBB' LogXptMode
  LogXptMode = 'async'

The configuration is currently set to “Maximum Performance”, databases ship redo asynchronously.

All systems are patched to the January 2017 Proactive Bundle Patch:

[oracle@rac12pri1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
24917972;Database PSU 12.1.0.2.170117, Oracle JavaVM Component (JAN2017)
24732082;Database Patch Set Update : 12.1.0.2.170117 (24732082)
24828633;OCW Patch Set Update : 12.1.0.2.170117 (24828633)

OPatch succeeded.

Testing part I: The Baseline

I am using Swingbench’s Order Entry benchmark to generate a little bit of load. For the avoidance of doubt: I am not interested in a comparison between asynchronous and synchronous log shipping. I would like to see the effect caused by the difference in latency in the form of hugely increased waits. In my first test I am running with the defaults of my VM (the NIC uses virtio drivers), no change to SDU or send/receive buffers.

$ date
Fri Feb 17 13:58:58 UTC 2017
$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxperf.xml -rt 00:10
Author  :        Dominic Giles
Version :        2.5.0.971

Results will be written to results_20_maxperf.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

1:59:55 PM      20      3852    255

On the apply side this translates to the following figures (the average apply rate is misleading and should be ignored; it captures all apply since MRP0 was started)

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 713.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

There is no transport lag, and no apply lag either. When checking for waits on background processes (using Tanel Poder’s ashtop – which requires you to have the Diagnostic Pack license as it access v$active_session_history) I noticed the following:

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 09:02','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 09:07','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
       53      .2   29% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:42 2017-02-17 09:06:53                 1
       23      .1   13% | oracle@rac12pri2 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:02:03 2017-02-17 09:03:26                 1
       19      .1   10% | oracle@rac12pri2 (DBW0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:39                 1
       12      .0    7% | oracle@rac12pri2 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:02:03 2017-02-17 09:03:27                 1
        9      .0    5% | oracle@rac12pri1 (LMS1)                                                                                   ON CPU  2017-02-17 09:04:37 2017-02-17 09:06:58                 1
        8      .0    4% | oracle@rac12pri2 (TT00)                                                                                   ON CPU  2017-02-17 09:02:15 2017-02-17 09:06:56                 1
        6      .0    3% | oracle@rac12pri1 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:06:44                 1
        6      .0    3% | oracle@rac12pri2 (LGWR)                                                                                   ON CPU  2017-02-17 09:04:05 2017-02-17 09:06:41                 1
        5      .0    3% | oracle@rac12pri2 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:03:27                 1
        5      .0    3% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:14                 1
        5      .0    3% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 09:02:24 2017-02-17 09:06:28                 1
        4      .0    2% | oracle@rac12pri1 (LMS0)                                                                                   ON CPU  2017-02-17 09:04:21 2017-02-17 09:06:05                 1
        3      .0    2% | oracle@rac12pri1 (LGWR)                          LGWR all worker groups                   Other           WAITING 2017-02-17 09:03:08 2017-02-17 09:06:13                 1
        2      .0    1% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:03:42                 1
        2      .0    1% | oracle@rac12pri2 (CKPT)                                                                                   ON CPU  2017-02-17 09:05:10 2017-02-17 09:05:13                 1
        2      .0    1% | oracle@rac12pri2 (TT00)                          log file sequential read                 System I/O      WAITING 2017-02-17 09:03:17 2017-02-17 09:03:46                 1
        1      .0    1% | oracle@rac12pri1 (CJQ0)                                                                                   ON CPU  2017-02-17 09:04:44 2017-02-17 09:04:44                 1
        1      .0    1% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 09:03:05 2017-02-17 09:03:05                 1
        1      .0    1% | oracle@rac12pri1 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:02:02                 1
        1      .0    1% | oracle@rac12pri1 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:05:47 2017-02-17 09:05:47                 1

The time difference between charbench and the database can be explained by time zones: my load generator is set to use UTC whereas the database is on EST; the result is a 6 hour time difference. This is a minor oversight on my part and has nothing to do with the location of the machines. In fact, they are all virtualised.

Summary Part I

In this part I tried to lay the foundation for the next one where I’m using my lab environment to simulate an artificially high network latency. I have to admit again that my setup is somewhat over-simplified-using Data Guard with asynchronous log shipping and Maximum Performance mode is not something I’d like to support for anything more important than my toy databases in the lab.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf

DDL triggers – interesting results

This question came in on AskTom, yielding a very interesting result when it comes to DDL triggers. To set the scene, I’ll first create a table called T which is just a copy of SCOTT.EMP


SQL> create table scott.t as select * from scott.emp;

Table created.

SQL> desc scott.t
 Name                                                              Null?    Type
 ----------------------------------------------------------------- -------- -----------------
 EMPNO                                                             NOT NULL NUMBER(4)
 ENAME                                                                      VARCHAR2(10)
 JOB                                                                        VARCHAR2(9)
 MGR                                                                        NUMBER(4)
 HIREDATE                                                                   DATE
 SAL                                                                        NUMBER(7,2)
 COMM                                                                       NUMBER(7,2)
 DEPTNO                                                                     NUMBER(2)

Now let’s say our requirement is to capture anything that relates to columns for tables in the SCOTT schema. So if we create a table, then we want to see all of those columns, and also, if we alter an existing table, then we want to see what the table looks like as a result after the columns have been added or dropped. So let’s build a simple DDL trigger to do that. First, I’ll create a procedure which will list all of the columns for any table name that we pass it.  We’ll work exclusively in SCOTT for this example.


SQL> create or replace
  2  procedure create_audit_trigger(p_tab varchar2) is
  3  begin
  4    for i in ( select column_name from dba_tab_columns
  5               where table_name = p_tab
  6               and owner = 'SCOTT'
  7             )
  8    loop
  9      dbms_output.put_line(i.column_name);
 10    end loop;
 11  end;
 12  /

Procedure created.

And we’ll give it a quick test so make sure it works


SQL> set serverout on
SQL> exec create_audit_trigger('EMP')
EMPNO
ENAME
JOB
MGR
HIREDATE
SAL
COMM
DEPTNO

PL/SQL procedure successfully completed.

We’ll now put that inside our DDL trigger any time we perform an ALTER or CREATE DDL operation on a table.


SQL> create or replace
  2  TRIGGER audit_trigger_update
  3  AFTER ALTER OR CREATE ON scott.SCHEMA
  4  WHEN (ORA_DICT_OBJ_TYPE = 'TABLE')
  5  BEGIN
  6    create_audit_trigger(ORA_DICT_OBJ_NAME);
  7  END;
  8  /

Trigger created.

SQL> conn scott/tiger
Connected.

Now we’ll create a table and see if it works.


SQL> set serverout on
SQL> create table t1 ( x int, y int );
X
Y

Table created.

And we’ll drop a column to test that


SQL> alter table t1 drop column y;
X

So far so good. Now let us add a column to the table.


SQL> alter table t add x int;
EMPNO
ENAME
JOB
MGR
HIREDATE
SAL
COMM
DEPTNO  -- ???? no X

Table altered.

This is an unexpected result. At the time within the trigger we query the data dictionary to obtain a list of columns, it does not yet appear in the definition. Now that the statement has completed, if we perform a describe on the table, we can see that the new column X is indeed there.


SQL> desc t
 Name                                                                    Null?    Type
 ----------------------------------------------------------------------- -------- --------------
 EMPNO                                                                   NOT NULL NUMBER(4)
 ENAME                                                                            VARCHAR2(10)
 JOB                                                                              VARCHAR2(9)
 MGR                                                                              NUMBER(4)
 HIREDATE                                                                         DATE
 SAL                                                                              NUMBER(7,2)
 COMM                                                                             NUMBER(7,2)
 DEPTNO                                                                           NUMBER(2)
 X                                                                                NUMBER(38)

The only workaround I can currently think of is to do work asynchronously, via DBMS_JOB. Rather than DBMS_OUTPUT, I’ll log the column names in a table so we can view them after the fact:


SQL> create table col_debug ( ts timestamp, col_name varchar2(50));

Table created.

SQL> create or replace
  2  procedure create_audit_trigger(p_tab varchar2) is
  3    l_now timestamp := systimestamp;
  4  begin
  5    for i in ( select column_name from dba_tab_columns
  6               where table_name = p_tab
  7               and owner = 'SCOTT'
  8             )
  9    loop
 10      insert into col_debug values (l_now,i.column_name);
 11    end loop;
 12    commit;
 13  end;
 14  /

Procedure created.

SQL> create or replace
  2  TRIGGER audit_trigger_update
  3  AFTER ALTER OR CREATE ON scott.SCHEMA
  4  WHEN (ORA_DICT_OBJ_TYPE = 'TABLE')
  5  declare
  6    j int;
  7  BEGIN
  8    dbms_job.submit(j,'create_audit_trigger('''||ORA_DICT_OBJ_NAME||''');');
  9  END;
 10  /

Trigger created.

Now I repeat the addition of a column, and the check out debugging table


SQL> alter table t add x1 int;

Table altered.

SQL> select * from col_debug;

TS                                                                          COL_NAME
--------------------------------------------------------------------------- ---------------
24-FEB-17 03.02.36.553000 PM                                                EMPNO
24-FEB-17 03.02.36.553000 PM                                                ENAME
24-FEB-17 03.02.36.553000 PM                                                JOB
24-FEB-17 03.02.36.553000 PM                                                MGR
24-FEB-17 03.02.36.553000 PM                                                HIREDATE
24-FEB-17 03.02.36.553000 PM                                                SAL
24-FEB-17 03.02.36.553000 PM                                                COMM
24-FEB-17 03.02.36.553000 PM                                                DEPTNO
24-FEB-17 03.02.36.553000 PM                                                X
24-FEB-17 03.02.36.553000 PM                                                X1

10 rows selected.

So the trigger submits a job rather than does the work itself.

Why does an ADD column operation behave differently to DROP and CREATE? That remains a mystery Smile

Delphix- Cannot Provision from Part of Timeflow

This was received by one of our Delphix AWS Trial customers and he wasn’t sure how to address it.  If any others experience it, this is the why it occurs and how you can correct it.

You’re logged into your Delphix Administration Console and you note there is a fault displayed in the upper right hand console.  Upon expanding, you see the following warning for the databases from the Linux Target to the Sources they pull updates from:

 

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w" sizes="(max-width: 342px) 100vw, 342px" data-recalc-dims="1" />

Its only a warning and the reason it’s only a warning is that it doesn’t stop the user from performing a snapshot and provisioning, but it does impact the timeflow, which could be a consideration if a developer were working and needed to go back in time during this lost time.

How do you address the error?  Its one that’s well documented by Delphix, so simply proceed to the following link, which will describe the issue, how to investigate and resolve.

Straightforward, right?  Not so much in my case.  When the user attempted to ssh into the linux target, he received an IO error:

$ ssh delphix@xx.xxx.xxx.x2

$ ssh: connect to host xx.xxx.xxx.x2 port 22: Operation timed out

I asked the user to then log into Amazon EC2 dashboard and click on Instances.  The following displayed:

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1330w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />

Oh-oh….

By highlighting the instance, the following is then displayed at the lower part of the dashboard, displaying that their is an issue with this instance:

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />

Amazon is quickly on top of this once I refresh the instances and all is running once more.  Once this is corrected and all instances show a green check mark for the status, then I’m able to SSH into the console with out issue:

$ ssh delphix@xx.xxx.xxx.x2

Last login: Tue Feb 21 18:05:28 2017 from landsharkengine.delphix.local

[delphix@linuxtarget ~]$

Does this resolve the issue in the Delphix Admin Console?  Depends… and the documentation linked states that the problem is commonly due to a problem with network or source system, but in this case, it was the target system that suffered the issue in AWS.

As this is a trial and not a non-production system that is currently being used, we will skip recovering the logs to the target system and proceed with taking a new snapshot.  The document also goes into excellent ways to deter from experiencing this type of outage in the future.  Again, this is just a trial, so we won’t put these into practice for a trial environment that we can easily drop and recreate in less than an hour.

Tips For Real Delphix Environments

There are a few things to remember when you’re working with the CLI:

-If you want to inspect any admin information, (such as these logs, as shown in the document) you’ll need to be logged in as the delphix_admin@DOMAIN@.

So if you don’t see the “paths” as you work through the commands, it’s because you may have logged in as sysadmin instead of the delphix_admin.

-If you’ve marked the fault as “Resolved”, there’s no way to resolve the timeflow issue, so you’ll receive the following:

ip-10-0-1-10 timeflow oracle log> list timeflow='TimeFlow' missing=true

No such Timeflow ''TimeFlow''.

-If the databases are down, it’s going to be difficult for Delphix to do anything with the target database.  Consider updating the target to auto restart on an outage.  To do so, click on the database in question, click on Configuration and change it to “On” for the Auto VDB Restart.

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w" sizes="(max-width: 463px) 100vw, 463px" data-recalc-dims="1" />

Here’s a few more tidbits to make you more of an expert with Delphix.  Want to try out the free trail with AWS?  All you need is a free Amazon account and it’s only about 38 cents an hour to play around with a great copy of our Delphix Engine, including a deployed Source and Target.

Just click on this link to get started!



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [Delphix- Cannot Provision from Part of Timeflow], All Right Reserved. 2017.

The post Delphix- Cannot Provision from Part of Timeflow appeared first on DBA Kevlar.

Fourteenth Anniversary For PeteFinnigan.com Limited And New Website

Wow, has it really been fourteen years since I started PeteFinnigan.com Limited? - Time has gone so fast and business is getting better and better. We have great customers, great Oracle Security trainings and consulting projects meeting new people and....[Read More]

Posted by Pete On 23/02/17 At 06:33 PM

Wading Through Database Platforms

As a heterogenous company, I get a lot of questions on what database platforms are important, why and if we should invest in them.  When looking at some of the lists and some of the marketing, it can be a bit overwhelming.

Now I’m not going to promise to take all of the confusion out of the myriad of database platforms out there, but I’m hoping to help with what’s relevant.

Ranking

So let’s take a look at what is in the list of top databases engines in the industry per db-engines.com.  The reason I use this site as my main “go-to” for information on database popularity has a lot to do with it’s method for collecting its information.  There are other ways of calculating popularity, but I’ve found that if you do by dollar amount, number of installations or instances, there is a bias that impacts and I find db-engines is just more reliable and telling of trends.

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1200w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1800w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />

Categories of Database Platforms

The first thing you’ll notice is that [currently], RDBMS, (relational databases) rule the world.  Features, scalability and maturity have given relational a strong base and it continues today.  There are a number of databases that fall into the “big data” arena, including Cassandra, Hive, Solr and others, that bridge different platforms.  I still agree with the way that db-engines has classified them and it should be noted that they’ve looked at the architecture and not the use of the database.

RDBMS

Oracle had a large lead in the field, but as you can see, it’s decreasing on a consistent basis.  Most of us in the Oracle realm are hoping that the cloud shift will start to show the success that they are betting on.  Just as they did the search from (I)nternet to (C)loud, we have faith in this giant.   In contrast, MySQL is consistently strengthening their #2 spot.  They still are viewed as “open source” by many and with the power behind their advanced features, including clusters with version 5.7, folks are finding them to be a strong alternative to Oracle while still staying with the power of the Oracle company.

SQL Server is #3 and with their Azure cloud offering, they are currently the second strongest cloud behind Amazon.  I’ve worked in SQL Server as long as Oracle, having started with MSSQL 6.0-7.0 back in 2000.  I was very passionate about database technology and didn’t feel a strong kinship to any one platform, so was happy to work in Oracle, SQL Server, Sybase or others that came along in my early years of database administration.  Microsoft has invested heavily in business intelligence and data visualization with 2016 and I’m enthralled with some of the newest features.

Document Stores

MongoDB is the only document store that’s in the top 20.  It’s a love/hate relationship for those that use it and for those that came from the RDBMS world, they had a negative view of the platform.  There are a number of web developers that are loyal to the database platform and have propagated it out to the technical world.

A document store is quite the “anti-RDBMS”, considering it violates so many of the rules of an RDBMS, with different columns per row and no schemas.  Amazon attempts to have their own version of any database and for MongoDB, it’s DynamoDB.  Other competitors include:

Wide Column Stores

Cassandra and Hbase, both via Apache, are the top choices.  These have similarities to DynamoDB and other NoSQL databases.  These databases are highly scalable with their ability to be spanned on bare metal hardware and require significant technical knowledge of hardware, software and development to architect an environment that functions seamlessly and efficiently.  Having a strong architect is essential and having a strong, collaborative team is important to this type of technology.

Key-Column Stores

Rounding up the top ten in the list is Redis, which is a key-column store database engine.  Redis is an in-memory data store that supports a wide range of development languages and has a strong open source community.  One of the interesting features in this database engine is its ability to house geospatial indexes and use radius queries with them.  I worked with spatial data for four years when I first became a DBA and was always fascinated with this type of data.  Redis has something called GEORADIUS that makes working with this data much easier.

Search Engines

When I was at Pass Summit, Elasticsearch was  something that was mentioned often.  I also noticed that it was one of the platforms mentioned in Brent Ozar’s salary data, too.  Its pushed for AWS, but I’m seeing it more commonly on Azure and it has numerous followers due to its ability to do deep log analysis then build custom dashboards with Kibana.

I was also surprised, as I think a few will be, that Splunk is listed.  It is a search engine that offers insight into log data.  As logs can accumulate considerable storage in a short time, they’ve branched wisely into a cloud offering.   Just as tracing and logs are the key to database administrators, the value in log data can benefit anyone in Information Technology.

Not Listed

Now there are a few others that fall outside of what’s in the top categories and database platforms that I’d like to mention.

Snowflake, (169) and Redshift, (69) are starting to gain momentum.  Why?  With migration to the cloud, less is more.  The ability to virtualize your data warehouses are a great way to make it an easier shift, but also to lessen your cloud footprint, which means less storage usage.

This is a key to what Delphix is educating customers on as well, as we also virtualize all of non-production databases, many of the platforms seen in the db-engine list, making it easier to migrate to the cloud and lessen the storage footprint.  This is an area that technology should be emphasizing more.  I believe, as with most monthly subscriptions, the smaller bill looks great at first, but when you add it up over the long haul, you realize how much you’re paying for the cloud and it’s essential to recognize the opportunities to save on storage, compute and resources.

Control

The second area I want to focus on is control.  Control over our data, control over what we can do with our data and control over support when something goes wrong with our data.  We choose database platforms to have some control over what features and product support we have.  The dynamic has shifted as many companies now look to open source to control over what they produce by creating the features and product more on their own.  They feel limited by large technical companies and want more control over how much they can do and direction.

No matter what you decide, no matter what your favorite database platform is or what you use in your company, databases make the world go round.  The world continues to spin and databases continue to be the mechanism to deliver the data that we use everyday to answer questions, build trends and analysis from and make the decisions that the world relies on.  We Database Administrators and Developers are the ones that are lucky enough to be those to work in this exciting arena in technology.



Tags:  ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [Wading Through Database Platforms], All Right Reserved. 2017.

The post Wading Through Database Platforms appeared first on DBA Kevlar.

AWS Quirks and Fake News to the Delphix Admin Console

We’ve been working hard to create an incredible new trial version of Delphix that uses AWS, which is built with the open source product Terraform.  Terraform is a tool that anyone can use to build, version and manage a product effectively and seamlessly in a number of clouds.  We are currently using it to implement to AWS, but there is a bright future for these types of open source products and I’m really impressed with how easy its made it to deploy compute instances, the Delphix Engine and supporting architecture on AWS EC2.  If you’d like to read up on Terraform, check out their website.

The Delphix Admin Console and Faults

After building out the Delphix environment with the Engine and the a Linux source/target, the first step for many is to log into the Delphix Admin console.  You can view any faults during the build at the upper right corner under Faults. One error that I’ve noticed comes up in after a successful build is the following:

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 246w" sizes="(max-width: 393px) 100vw, 393px" data-recalc-dims="1" />

AWS Console to the Rescue

By logging into your AWS EC2 console, you can view the instances that are being used.  As you’ll note, the error says that the Delphix Engine is using an unsupported  instance type m4.large.  Yes in our EC2 console, we can see the Delphix Engine, (last in the list and with the name ending in “DE”) that no, actually it isn’t.

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />

It’s actually a m4.xlarge instance type.  What’s even more interesting, is that the Linux Target, (LT) and Linux Source, (LS) are both m4.large instance types, yet no warning was issued for either of these instances as unsupported.

AWS EC2 Supported Instance Types

You can locate what types of instance types are supported for AWS EC2 with the following link.  At this page, we can also see that both the m4.large and the m4.xlarge instance type IS SUPPORTED.

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1413w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />

Knowing that we’ve validated that the instance type is supported means that I can safely ignore it and proceed to work through the trial without worry.

If you’re planning on deploying a production Delphix Engine on AWS, inspect the following document to ensure you build it with the proper configuration.

Nothing to see here and thought I better let everyone know before someone lumps Amazon with CNN… </p />
</p></div></div>

    	  	<div class=

Long Parsing and PGA limits

Recently I’ve seen not so smart optimizer behavior: one query took long time to parse, and ended with an error hitting PGA_AGGREGATE_LIMIT in few minutes; another query was just parsed for ages while using reasonable (under 2G :)) amount of PGA and still could hit PGA_AGGREGATE_LIMIT but after way more time – up to an hour.


Both cases were similar and involved queries which were accessing views; and those views’ code is generated by an application using lots of IN LISTs and other OR conditions. They both are really ugly SQLs with text length ~100K. When Oracle tried to parse them it took a lot of time and parse attempt had either failed with ORA-4036 soon or hanged for a long time and then failed. Strangely incident trace file generated for ORA-4036 doesn’t include PGA heaps breakdown and you have to manually enable PGA heapdump on error to get an idea what is taking up memory. Here’s what I’ve found in there:

//platform.twitter.com/widgets.js
Note kxs-heap-c of 10G. I don’t know how to interpret the numbers here as they don’t add up, and assume qkkele & kkoutlCreatePh as the culprit.
Short stack trace of the process while parsing 2nd query was mostly looking like this:

SQL> oradebug short_stack
ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-kghfnd_in_free_lists()+717<-kghprmalo()+1583<-kghalp()+1246<-qksshMemAllocPerm()+150
 <-kkoabr()+810<-kkobifs()+1411<-kkofbp()+3483<-kkofbp()+2985<-kkofbp()+6206<-kkobmp()+10639<-kkoBitmapAccPath()+212
 <-kkotap()+2019<-kkoiqb()+8672<-kkooqb()+528<-kkoOrExpand()+1105<-kkoOrExpand()+4559<-kkoOrExpand()+4559<-kkoqbc()+7123
 <-apakkoqb()+182<-apaqbdDescendents()+488<-apadrv()+6244<-opitca()+2106<-kksFullTypeCheck()+79<-rpiswu2()+1780<-kksSetBindType()+2076<-kksfbc()+10522<-opiexe()+2536<-...

The closest hit in MOS search was Doc ID 23539020.8 (Fixed in 12.2). It is a perfect match except for one major difference: the document mentions possible ORA-4031, i.e. shared pool allocation error, while I’ve experienced issues in PGA allocations. Nevertheless I’ve tried suggested workaround of disabling OR expansion and it worked: parse time has dropped to acceptable numbers. I haven’t looked close enough why OR expansion needed so much resources. I think CBO tried to cost table access too many times due to large IN LISTS – as 10053 continuously dumped table access path information non stop; this is just an idea though.
CBO definitely needs some way to control parse time in recent releases. I’d appreciate a parameter which would set a target parse time, and if it’s crossed then either stop and use the best plan found so far or profile current parse call, try to eliminate the most time consuming transformation, and start again with this transformation/feature disabled. The latter would require some sophisticated code, and would be preferred; but even a simple target would be better than what we have now.

Another minor strange thing while investigating this issue was: SQL parsed correctly in under 10s when I’ve used SYS connection and set CURRENT_SCHEMA, even though there was no major differences in CBO environment between SYS and application user sessions that could’ve explained why it parsed fast in SYS. Yet another reason to avoid SYS connections and use real user sessions.

Filed under: CBO, Oracle, Performance Tagged: 12.1.0.2, CBQT, OR expansion, PGA

Duplicate constraints are impossible right ?

Here’s a very simple example of a table called PARENT being a (surprise surprise) parent in a referential integrity relationship to a (drum roll for my choice of name) CHILD table


SQL> create table parent ( p int, constraint PAR_PK primary key (p) );

Table created.

SQL> create table child ( c int,
  2        p int
  3      );

Table created.

SQL> alter table child add constraint fk1 foreign key ( p ) references parent ( p );

Table altered.

That is all as we would expect, and similarly, if I inadvertently try to add the same foreign key constraint, I’ll get an error


SQL> alter table child add constraint fk2 foreign key ( p ) references parent ( p );
alter table child add constraint fk2 foreign key ( p ) references parent ( p )
                                     *
ERROR at line 1:
ORA-02275: such a referential constraint already exists in the table

So you might be wondering, how on earth did I get myself into the following calamity on my database:


SQL> select c.constraint_name, cc.column_name, c.r_constraint_name
  2  from   user_constraints c,
  3         user_cons_columns cc
  4  where  c.table_name = 'CHILD'
  5  and    c.constraint_type = 'R'
  6  and    c.constraint_name = cc.constraint_name;

CONSTRAINT_NAME                COLUMN_NAME                    R_CONSTRAINT_NAME
------------------------------ ------------------------------ --------------------
FK1                            P                              PAR_PK
FK2                            P                              PAR_PK

Yes – thats TWO foreign key constraints implementing the identical check.  How did I bypass the duplicate check we saw above ?

It’s just a little idiosyncrasy in all versions of Oracle since the inception of foreign key constraints. If you specify the constraints as part of the table creation DDL, the check for duplicates is skipped.


SQL> create table child ( c int,
  2    p int,
  3    constraint fk1 foreign key ( p ) references parent ( p ) ,
  4    constraint fk2 foreign key ( p ) references parent ( p )
  5  );

Table created.

It is worth looking for and rectifying this condition on your databases, because it can lead to confusing errors, for example, when you do a datapump of such a schema – the import will (correctly) fail on the second constraint, but you’ll be scratching your head when you go look at the imported result and see that the constraint is there !

This is fixed in 12c onwards.


SQL> create table child ( c int,
  2    p int,
  3    constraint fk1 foreign key ( p ) references parent ( p ) ,
  4    constraint fk2 foreign key ( p ) references parent ( p )
  5  );
  constraint fk2 foreign key ( p ) references parent ( p )
                 *
ERROR at line 4:
ORA-02274: duplicate referential constraint specifications