I’ve always been worried about taking a script that is fine to run in my non-production environments (in particular a DROP script) and accidentally running it in a Production environment, shortly followed by the typing up of a fresh resume to look for a new job once the mistake is discovered
The standard means of a “least privilege required” model when it comes to Production access is a good step along this road to protecting the database. But ultimately, at one time or another, an administrator with appropriate privileges, good intentions, but (say) bad cut-paste skills might inadvertently run a drop command against a database they did not want to. (A well publicised example of this happened quite recently https://www.theregister.co.uk/2017/02/01/gitlab_data_loss/)
So it would be nice to have a method of adding a ‘double check’ facility so that we can guard against a DROP being run on your production system inadvertently. That way, even if you did run that drop script against Production, you would be “saved from yourself”. Here’s one such possible means you might want to consider. We’ll use a context variable that has to be set in order for DROP commands to be processed.
-- -- Our context -- SQL> create context DDL_CHECK using ddl_checker; Context created. -- -- A routinte to set it -- SQL> create or replace 2 procedure ddl_checker(p_val varchar2) is 3 begin 4 dbms_session.set_context('DDL_CHECK','ENABLED',p_val); 5 end; 6 / Procedure created. -- -- If we are on the Production database, DROP DDL's only proceed is the context variable is set -- SQL> create or replace 2 trigger ddl_checker_before 3 before drop on scott.schema 4 begin 5 if sys.database_name like '%PROD%' and sys_context('DDL_CHECK','ENABLED') is null 6 then 7 raise_application_error(-20000,'DDL confirmation context is not set'); 8 end if; 9 end; 10 / Trigger created. -- -- And after every DROP, the context is reset for the next use so it cannot be left on inadvertently -- SQL> create or replace 2 trigger ddl_checker_after 3 after drop on scott.schema 4 begin 5 ddl_checker(null); 6 end; 7 / Trigger created. SQL> SQL> SQL> grant execute on ddl_checker to scott; Grant succeeded. SQL>
So now we can see what happens when SCOTT attempts some DDL in Production by mistake.
SQL> conn scott/tiger Connected. SQL> drop table t purge; drop table t purge * ERROR at line 1: ORA-00604: error occurred at recursive SQL level 1 ORA-20000: DDL confirmation context is not set ORA-06512: at line 4 -- -- We must explicitly express our intention to perform a DDL -- SQL> exec admin.ddl_checker('true') PL/SQL procedure successfully completed. -- -- and only then are we allowed to run a DROP command -- SQL> drop table t purge; Table dropped. -- -- and in doing so, we don't have any lingering access -- SQL> drop table t1 purge; drop table t1 purge * ERROR at line 1: ORA-00604: error occurred at recursive SQL level 1 ORA-20000: DDL confirmation context is not set ORA-06512: at line 4 SQL>
I stress – this is not about securing administrator access or replacing your existing processes for auditing or locking down the privileges on your production systems. This is just about that extra check to help your over-worked and/or over-tired administrators from committing a small mistake that becomes a catastrophic error.
Finally. I got the 2 last methods completed. promise.all and promise.race is finished. So now the plsql promises library is feature complete. promise.all method takes a list of promises, built with the help of the promises_ninja package, and once the entire list of promises is fulfilled the top promise is set to fulfilled. Here is an example:
-- Test all method declare test_promise promise := promise(); a_promise1 promise; a_promise2 promise; a_promise3 promise; all_promises promises_ninja.promise_list_type; begin a_promise1 := promise('p_multiplier', 10); promises_ninja.build_promise_list(all_promises, a_promise1); a_promise2 := promise('p_multiplier', 20); promises_ninja.build_promise_list(all_promises, a_promise2); a_promise3 := promise('p_multiplier', 30); promises_ninja.build_promise_list(all_promises, a_promise3); test_promise.all_p(promises_ninja.convert_promise_list(all_promises)); dbms_lock.sleep(5); test_promise.check_and_set_value(); dbms_output.put_line('State is: ' || test_promise.state); end; /
The output is the list of the promises with the fulfilled values. So you can loop through the output and display individual values like this:
declare test_promise promise := promise(); a_promise1 promise; a_promise2 promise; a_promise3 promise; all_promises promises_ninja.promise_list_type; promise_list_o promises_list_obj; promise_val promise; begin a_promise1 := promise('p_multiplier', 10); promises_ninja.build_promise_list(all_promises, a_promise1); a_promise2 := promise('p_multiplier', 20); promises_ninja.build_promise_list(all_promises, a_promise2); a_promise3 := promise('p_multiplier', 30); promises_ninja.build_promise_list(all_promises, a_promise3); test_promise.all_p(promises_ninja.convert_promise_list(all_promises)); dbms_lock.sleep(5); test_promise.check_and_set_value(); dbms_output.put_line('State is: ' || test_promise.state); if test_promise.val.getObject(promise_list_o) = dbms_types.success then -- We converted the object successfully for i in 1..promise_list_o.promise_list.count loop if promise_list_o.promise_list(i).getObject(promise_val) = dbms_types.success then -- We have an individual promise output value dbms_output.put_line('Value of promise is: ' || promise_val.getanyvalue); end if; end loop; end if; end; /
Same code and technique can be used for the race method. The only difference is that the output in the race call is a normal promise and not a list of promises, as only the first promise to complete is returned.
I may have said it before but I consider presenting and teaching a great way to expand one’s knowledge: first of all it requires me to really understand a subject. Secondly, when presenting, you get lots of interesting questions that can turn into blog posts like this one.
Lately I have been asked about the impact of synchronous log shipping to a physical standby database. I was sure there was an effect to be observed, depending most likely on the network latency between systems but I didn’t have any evidence I could pull out of the hat to back up my thoughts. So what better than trying! I also read that some of the events have changed in 12c, and wanted to make them visible. My environment is based on the 2 node RAC primary/2 node RAC standby configuration I wrote about in my previous posts.
Since their initial setup I upgraded the cluster to 188.8.131.52.170117 for Clusterware and RDBMS.
I am planning to share my findings in two posts: the first one (the one you are reading) will present the baseline, the second one to follow shows how performance can be impacted by poor network performance. I guess the second one will be a bit longer …
Data Guard allows the administrator to define the databases in a Data Guard configuration to be in one of three possible modes. The first one is Maximum Performance, the second Maximum Availability and finally Maximum Protection. All three modes are covered in the official documentation set and MOS. Without saying it very scientifically:
Make sure to read the documentation and understand the implications of each protection mode when assessing the most suitable mode for your environment!
Background: New Events
Before going into details I should probably point out that there are some new Data Guard events. Hurray! More events and more meaningful names, great. Just one caveat-have a look at the output from my 12c database:
select name, wait_class from v$event_name where name in ( 'ARCH Remote Write','ASYNC Remote Write','Redo Transport Attach', 'Redo Transport Close','Redo Transport Detach','Redo Transport Open', 'Redo Transport Ping','Redo Transport Slave Shutdown','Redo Transport Slave Startup', 'Redo Writer Remote Sync Complete', 'Redo Writer Remote Sync Notify', 'Remote SYNC Ping','SYNC Remote Write'); NAME WAIT_CLASS -------------------------------------------------- ---------------------------------------- ASYNC Remote Write Other SYNC Remote Write Other ARCH Remote Write Other Redo Transport Attach Other Redo Transport Detach Other Redo Transport Open Other Redo Transport Close Other Redo Transport Ping Other Remote SYNC Ping Other Redo Transport Slave Startup Other Redo Transport Slave Shutdown Other Redo Writer Remote Sync Notify Other Redo Writer Remote Sync Complete Other 13 rows selected.
Compare this to the pre 12c events (taken from an 184.108.40.206 database since they are mostly gone in 12c)
select name, wait_class from v$event_name where name in ( 'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH', 'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH', 'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel'); NAME WAIT_CLASS -------------------------------------------------- -------------------- ARCH wait on ATTACH Network LNS wait on ATTACH Network LNS wait on SENDREQ Network LNS wait on DETACH Network ARCH wait on SENDREQ Network ARCH wait on DETACH Network LNS wait on LGWR Network LGWR wait on LNS Network LGWR-LNS wait on channel Other 9 rows selected.
For the sake of completeness here are those events that remained in 12c:
select name, wait_class from v$event_name where name in ( 'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH', 'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH', 'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel'); NAME WAIT_CLASS ------------------------------ ------------------------------ LNS wait on LGWR Network LGWR wait on LNS Network LGWR-LNS wait on channel Other
Bugger. The new events are all in the “Other” wait class. My initial attempts at filtering information from v$event_histogram based on events with the “Network” class had little success. I guess friends of the OEM performance pages might also find that interesting. If you wonder from where I pulled these events – I used a white paper named “Best Practices for Synchronous Redo Transport” as reference.
Testing: The Environment
As you read in the introduction, this environment is based on a two node RAC primary -> two node RAC standby configuration:
DGMGRL> show configuration verbose Configuration - ractest Protection Mode: MaxPerformance Members: NCDBA - Primary database NCDBB - Physical standby database Properties: FastStartFailoverThreshold = '30' OperationTimeout = '30' TraceLevel = 'USER' FastStartFailoverLagLimit = '30' CommunicationTimeout = '180' ObserverReconnect = '0' FastStartFailoverAutoReinstate = 'TRUE' FastStartFailoverPmyShutdown = 'TRUE' BystandersFollowRoleChange = 'ALL' ObserverOverride = 'FALSE' ExternalDestination1 = '' ExternalDestination2 = '' PrimaryLostWriteAction = 'CONTINUE' Fast-Start Failover: DISABLED Configuration Status: SUCCESS DGMGRL> show database 'NCDBA'; Database - NCDBA Role: PRIMARY Intended State: TRANSPORT-ON Instance(s): NCDBA1 NCDBA2 Database Status: SUCCESS DGMGRL> show database 'NCDBB' Database - NCDBB Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds (computed 0 seconds ago) Apply Lag: 0 seconds (computed 0 seconds ago) Average Apply Rate: 177.00 KByte/s Real Time Query: OFF Instance(s): NCDBB1 NCDBB2 (apply instance) Database Status: SUCCESS DGMGRL> show database 'NCDBA' LogXptMode LogXptMode = 'async' DGMGRL> show database 'NCDBB' LogXptMode LogXptMode = 'async'
The configuration is currently set to “Maximum Performance”, databases ship redo asynchronously.
All systems are patched to the January 2017 Proactive Bundle Patch:
[oracle@rac12pri1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches 24917972;Database PSU 220.127.116.11.170117, Oracle JavaVM Component (JAN2017) 24732082;Database Patch Set Update : 18.104.22.168.170117 (24732082) 24828633;OCW Patch Set Update : 22.214.171.124.170117 (24828633) OPatch succeeded.
Testing part I: The Baseline
I am using Swingbench’s Order Entry benchmark to generate a little bit of load. For the avoidance of doubt: I am not interested in a comparison between asynchronous and synchronous log shipping. I would like to see the effect caused by the difference in latency in the form of hugely increased waits. In my first test I am running with the defaults of my VM (the NIC uses virtio drivers), no change to SDU or send/receive buffers.
$ date Fri Feb 17 13:58:58 UTC 2017 $ ./charbench -cs //rac12pri-scan/swingbench_both \ > -uc 20 -r results_20_maxperf.xml -rt 00:10 Author : Dominic Giles Version : 126.96.36.1991 Results will be written to results_20_maxperf.xml. Hit Return to Terminate Run... Time Users TPM TPS 1:59:55 PM 20 3852 255
On the apply side this translates to the following figures (the average apply rate is misleading and should be ignored; it captures all apply since MRP0 was started)
DGMGRL> show database 'NCDBB' Database - NCDBB Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds (computed 0 seconds ago) Apply Lag: 0 seconds (computed 0 seconds ago) Average Apply Rate: 713.00 KByte/s Real Time Query: OFF Instance(s): NCDBB1 NCDBB2 (apply instance) Database Status: SUCCESS
There is no transport lag, and no apply lag either. When checking for waits on background processes (using Tanel Poder’s ashtop – which requires you to have the Diagnostic Pack license as it access v$active_session_history) I noticed the following:
SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 09:02','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 09:07','dd.mm.yyyy hh24:mi')" Total Seconds AAS %This PROGRAM EVENT WAIT_CLASS SESSION FIRST_SEEN LAST_SEEN DIST_SQLEXEC_SEEN --------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- ----------------- 53 .2 29% | oracle@rac12pri2 (LGWR) log file parallel write System I/O WAITING 2017-02-17 09:03:42 2017-02-17 09:06:53 1 23 .1 13% | oracle@rac12pri2 (LG00) log file parallel write System I/O WAITING 2017-02-17 09:02:03 2017-02-17 09:03:26 1 19 .1 10% | oracle@rac12pri2 (DBW0) ON CPU 2017-02-17 09:02:07 2017-02-17 09:06:39 1 12 .0 7% | oracle@rac12pri2 (LGWR) target log write size Other WAITING 2017-02-17 09:02:03 2017-02-17 09:03:27 1 9 .0 5% | oracle@rac12pri1 (LMS1) ON CPU 2017-02-17 09:04:37 2017-02-17 09:06:58 1 8 .0 4% | oracle@rac12pri2 (TT00) ON CPU 2017-02-17 09:02:15 2017-02-17 09:06:56 1 6 .0 3% | oracle@rac12pri1 (LG00) log file parallel write System I/O WAITING 2017-02-17 09:03:08 2017-02-17 09:06:44 1 6 .0 3% | oracle@rac12pri2 (LGWR) ON CPU 2017-02-17 09:04:05 2017-02-17 09:06:41 1 5 .0 3% | oracle@rac12pri2 (LG00) ON CPU 2017-02-17 09:02:02 2017-02-17 09:03:27 1 5 .0 3% | oracle@rac12pri2 (LMS0) ON CPU 2017-02-17 09:02:07 2017-02-17 09:06:14 1 5 .0 3% | oracle@rac12pri2 (LMS1) ON CPU 2017-02-17 09:02:24 2017-02-17 09:06:28 1 4 .0 2% | oracle@rac12pri1 (LMS0) ON CPU 2017-02-17 09:04:21 2017-02-17 09:06:05 1 3 .0 2% | oracle@rac12pri1 (LGWR) LGWR all worker groups Other WAITING 2017-02-17 09:03:08 2017-02-17 09:06:13 1 2 .0 1% | oracle@rac12pri1 (DBW0) db file parallel write System I/O WAITING 2017-02-17 09:03:08 2017-02-17 09:03:42 1 2 .0 1% | oracle@rac12pri2 (CKPT) ON CPU 2017-02-17 09:05:10 2017-02-17 09:05:13 1 2 .0 1% | oracle@rac12pri2 (TT00) log file sequential read System I/O WAITING 2017-02-17 09:03:17 2017-02-17 09:03:46 1 1 .0 1% | oracle@rac12pri1 (CJQ0) ON CPU 2017-02-17 09:04:44 2017-02-17 09:04:44 1 1 .0 1% | oracle@rac12pri1 (DBW0) ON CPU 2017-02-17 09:03:05 2017-02-17 09:03:05 1 1 .0 1% | oracle@rac12pri1 (LG00) ON CPU 2017-02-17 09:02:02 2017-02-17 09:02:02 1 1 .0 1% | oracle@rac12pri1 (LGWR) target log write size Other WAITING 2017-02-17 09:05:47 2017-02-17 09:05:47 1
The time difference between charbench and the database can be explained by time zones: my load generator is set to use UTC whereas the database is on EST; the result is a 6 hour time difference. This is a minor oversight on my part and has nothing to do with the location of the machines. In fact, they are all virtualised.
Summary Part I
In this part I tried to lay the foundation for the next one where I’m using my lab environment to simulate an artificially high network latency. I have to admit again that my setup is somewhat over-simplified-using Data Guard with asynchronous log shipping and Maximum Performance mode is not something I’d like to support for anything more important than my toy databases in the lab.
This question came in on AskTom, yielding a very interesting result when it comes to DDL triggers. To set the scene, I’ll first create a table called T which is just a copy of SCOTT.EMP
SQL> create table scott.t as select * from scott.emp; Table created. SQL> desc scott.t Name Null? Type ----------------------------------------------------------------- -------- ----------------- EMPNO NOT NULL NUMBER(4) ENAME VARCHAR2(10) JOB VARCHAR2(9) MGR NUMBER(4) HIREDATE DATE SAL NUMBER(7,2) COMM NUMBER(7,2) DEPTNO NUMBER(2)
Now let’s say our requirement is to capture anything that relates to columns for tables in the SCOTT schema. So if we create a table, then we want to see all of those columns, and also, if we alter an existing table, then we want to see what the table looks like as a result after the columns have been added or dropped. So let’s build a simple DDL trigger to do that. First, I’ll create a procedure which will list all of the columns for any table name that we pass it. We’ll work exclusively in SCOTT for this example.
SQL> create or replace 2 procedure create_audit_trigger(p_tab varchar2) is 3 begin 4 for i in ( select column_name from dba_tab_columns 5 where table_name = p_tab 6 and owner = 'SCOTT' 7 ) 8 loop 9 dbms_output.put_line(i.column_name); 10 end loop; 11 end; 12 / Procedure created.
And we’ll give it a quick test so make sure it works
SQL> set serverout on SQL> exec create_audit_trigger('EMP') EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO PL/SQL procedure successfully completed.
We’ll now put that inside our DDL trigger any time we perform an ALTER or CREATE DDL operation on a table.
SQL> create or replace 2 TRIGGER audit_trigger_update 3 AFTER ALTER OR CREATE ON scott.SCHEMA 4 WHEN (ORA_DICT_OBJ_TYPE = 'TABLE') 5 BEGIN 6 create_audit_trigger(ORA_DICT_OBJ_NAME); 7 END; 8 / Trigger created. SQL> conn scott/tiger Connected.
Now we’ll create a table and see if it works.
SQL> set serverout on SQL> create table t1 ( x int, y int ); X Y Table created.
And we’ll drop a column to test that
SQL> alter table t1 drop column y; X
So far so good. Now let us add a column to the table.
SQL> alter table t add x int; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO -- ???? no X Table altered.
This is an unexpected result. At the time within the trigger we query the data dictionary to obtain a list of columns, it does not yet appear in the definition. Now that the statement has completed, if we perform a describe on the table, we can see that the new column X is indeed there.
SQL> desc t Name Null? Type ----------------------------------------------------------------------- -------- -------------- EMPNO NOT NULL NUMBER(4) ENAME VARCHAR2(10) JOB VARCHAR2(9) MGR NUMBER(4) HIREDATE DATE SAL NUMBER(7,2) COMM NUMBER(7,2) DEPTNO NUMBER(2) X NUMBER(38)
The only workaround I can currently think of is to do work asynchronously, via DBMS_JOB. Rather than DBMS_OUTPUT, I’ll log the column names in a table so we can view them after the fact:
SQL> create table col_debug ( ts timestamp, col_name varchar2(50)); Table created. SQL> create or replace 2 procedure create_audit_trigger(p_tab varchar2) is 3 l_now timestamp := systimestamp; 4 begin 5 for i in ( select column_name from dba_tab_columns 6 where table_name = p_tab 7 and owner = 'SCOTT' 8 ) 9 loop 10 insert into col_debug values (l_now,i.column_name); 11 end loop; 12 commit; 13 end; 14 / Procedure created. SQL> create or replace 2 TRIGGER audit_trigger_update 3 AFTER ALTER OR CREATE ON scott.SCHEMA 4 WHEN (ORA_DICT_OBJ_TYPE = 'TABLE') 5 declare 6 j int; 7 BEGIN 8 dbms_job.submit(j,'create_audit_trigger('''||ORA_DICT_OBJ_NAME||''');'); 9 END; 10 / Trigger created.
Now I repeat the addition of a column, and the check out debugging table
SQL> alter table t add x1 int; Table altered. SQL> select * from col_debug; TS COL_NAME --------------------------------------------------------------------------- --------------- 24-FEB-17 03.02.36.553000 PM EMPNO 24-FEB-17 03.02.36.553000 PM ENAME 24-FEB-17 03.02.36.553000 PM JOB 24-FEB-17 03.02.36.553000 PM MGR 24-FEB-17 03.02.36.553000 PM HIREDATE 24-FEB-17 03.02.36.553000 PM SAL 24-FEB-17 03.02.36.553000 PM COMM 24-FEB-17 03.02.36.553000 PM DEPTNO 24-FEB-17 03.02.36.553000 PM X 24-FEB-17 03.02.36.553000 PM X1 10 rows selected.
So the trigger submits a job rather than does the work itself.
Why does an ADD column operation behave differently to DROP and CREATE? That remains a mystery
This was received by one of our Delphix AWS Trial customers and he wasn’t sure how to address it. If any others experience it, this is the why it occurs and how you can correct it.
You’re logged into your Delphix Administration Console and you note there is a fault displayed in the upper right hand console. Upon expanding, you see the following warning for the databases from the Linux Target to the Sources they pull updates from:
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w" sizes="(max-width: 342px) 100vw, 342px" data-recalc-dims="1" />
Its only a warning and the reason it’s only a warning is that it doesn’t stop the user from performing a snapshot and provisioning, but it does impact the timeflow, which could be a consideration if a developer were working and needed to go back in time during this lost time.
How do you address the error? Its one that’s well documented by Delphix, so simply proceed to the following link, which will describe the issue, how to investigate and resolve.
Straightforward, right? Not so much in my case. When the user attempted to ssh into the linux target, he received an IO error:
$ ssh email@example.com $ ssh: connect to host xx.xxx.xxx.x2 port 22: Operation timed out
I asked the user to then log into Amazon EC2 dashboard and click on Instances. The following displayed:
https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1330w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
By highlighting the instance, the following is then displayed at the lower part of the dashboard, displaying that their is an issue with this instance:
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
Amazon is quickly on top of this once I refresh the instances and all is running once more. Once this is corrected and all instances show a green check mark for the status, then I’m able to SSH into the console with out issue:
$ ssh firstname.lastname@example.org Last login: Tue Feb 21 18:05:28 2017 from landsharkengine.delphix.local [delphix@linuxtarget ~]$
Does this resolve the issue in the Delphix Admin Console? Depends… and the documentation linked states that the problem is commonly due to a problem with network or source system, but in this case, it was the target system that suffered the issue in AWS.
As this is a trial and not a non-production system that is currently being used, we will skip recovering the logs to the target system and proceed with taking a new snapshot. The document also goes into excellent ways to deter from experiencing this type of outage in the future. Again, this is just a trial, so we won’t put these into practice for a trial environment that we can easily drop and recreate in less than an hour.
There are a few things to remember when you’re working with the CLI:
-If you want to inspect any admin information, (such as these logs, as shown in the document) you’ll need to be logged in as the delphix_admin@DOMAIN@
So if you don’t see the “paths” as you work through the commands, it’s because you may have logged in as sysadmin instead of the delphix_admin.
-If you’ve marked the fault as “Resolved”, there’s no way to resolve the timeflow issue, so you’ll receive the following:
ip-10-0-1-10 timeflow oracle log> list timeflow='TimeFlow' missing=true No such Timeflow ''TimeFlow''.
-If the databases are down, it’s going to be difficult for Delphix to do anything with the target database. Consider updating the target to auto restart on an outage. To do so, click on the database in question, click on Configuration and change it to “On” for the Auto VDB Restart.
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w" sizes="(max-width: 463px) 100vw, 463px" data-recalc-dims="1" />
Here’s a few more tidbits to make you more of an expert with Delphix. Want to try out the free trail with AWS? All you need is a free Amazon account and it’s only about 38 cents an hour to play around with a great copy of our Delphix Engine, including a deployed Source and Target.
Just click on this link to get started!
Wow, has it really been fourteen years since I started PeteFinnigan.com Limited? - Time has gone so fast and business is getting better and better. We have great customers, great Oracle Security trainings and consulting projects meeting new people and....[Read More]
Posted by Pete On 23/02/17 At 06:33 PM
As a heterogenous company, I get a lot of questions on what database platforms are important, why and if we should invest in them. When looking at some of the lists and some of the marketing, it can be a bit overwhelming.
Now I’m not going to promise to take all of the confusion out of the myriad of database platforms out there, but I’m hoping to help with what’s relevant.
So let’s take a look at what is in the list of top databases engines in the industry per db-engines.com. The reason I use this site as my main “go-to” for information on database popularity has a lot to do with it’s method for collecting its information. There are other ways of calculating popularity, but I’ve found that if you do by dollar amount, number of installations or instances, there is a bias that impacts and I find db-engines is just more reliable and telling of trends.
https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1200w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 1800w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
The first thing you’ll notice is that [currently], RDBMS, (relational databases) rule the world. Features, scalability and maturity have given relational a strong base and it continues today. There are a number of databases that fall into the “big data” arena, including Cassandra, Hive, Solr and others, that bridge different platforms. I still agree with the way that db-engines has classified them and it should be noted that they’ve looked at the architecture and not the use of the database.
Oracle had a large lead in the field, but as you can see, it’s decreasing on a consistent basis. Most of us in the Oracle realm are hoping that the cloud shift will start to show the success that they are betting on. Just as they did the search from (I)nternet to (C)loud, we have faith in this giant. In contrast, MySQL is consistently strengthening their #2 spot. They still are viewed as “open source” by many and with the power behind their advanced features, including clusters with version 5.7, folks are finding them to be a strong alternative to Oracle while still staying with the power of the Oracle company.
SQL Server is #3 and with their Azure cloud offering, they are currently the second strongest cloud behind Amazon. I’ve worked in SQL Server as long as Oracle, having started with MSSQL 6.0-7.0 back in 2000. I was very passionate about database technology and didn’t feel a strong kinship to any one platform, so was happy to work in Oracle, SQL Server, Sybase or others that came along in my early years of database administration. Microsoft has invested heavily in business intelligence and data visualization with 2016 and I’m enthralled with some of the newest features.
MongoDB is the only document store that’s in the top 20. It’s a love/hate relationship for those that use it and for those that came from the RDBMS world, they had a negative view of the platform. There are a number of web developers that are loyal to the database platform and have propagated it out to the technical world.
A document store is quite the “anti-RDBMS”, considering it violates so many of the rules of an RDBMS, with different columns per row and no schemas. Amazon attempts to have their own version of any database and for MongoDB, it’s DynamoDB. Other competitors include:
Cassandra and Hbase, both via Apache, are the top choices. These have similarities to DynamoDB and other NoSQL databases. These databases are highly scalable with their ability to be spanned on bare metal hardware and require significant technical knowledge of hardware, software and development to architect an environment that functions seamlessly and efficiently. Having a strong architect is essential and having a strong, collaborative team is important to this type of technology.
Rounding up the top ten in the list is Redis, which is a key-column store database engine. Redis is an in-memory data store that supports a wide range of development languages and has a strong open source community. One of the interesting features in this database engine is its ability to house geospatial indexes and use radius queries with them. I worked with spatial data for four years when I first became a DBA and was always fascinated with this type of data. Redis has something called GEORADIUS that makes working with this data much easier.
When I was at Pass Summit, Elasticsearch was something that was mentioned often. I also noticed that it was one of the platforms mentioned in Brent Ozar’s salary data, too. Its pushed for AWS, but I’m seeing it more commonly on Azure and it has numerous followers due to its ability to do deep log analysis then build custom dashboards with Kibana.
I was also surprised, as I think a few will be, that Splunk is listed. It is a search engine that offers insight into log data. As logs can accumulate considerable storage in a short time, they’ve branched wisely into a cloud offering. Just as tracing and logs are the key to database administrators, the value in log data can benefit anyone in Information Technology.
Now there are a few others that fall outside of what’s in the top categories and database platforms that I’d like to mention.
Snowflake, (169) and Redshift, (69) are starting to gain momentum. Why? With migration to the cloud, less is more. The ability to virtualize your data warehouses are a great way to make it an easier shift, but also to lessen your cloud footprint, which means less storage usage.
This is a key to what Delphix is educating customers on as well, as we also virtualize all of non-production databases, many of the platforms seen in the db-engine list, making it easier to migrate to the cloud and lessen the storage footprint. This is an area that technology should be emphasizing more. I believe, as with most monthly subscriptions, the smaller bill looks great at first, but when you add it up over the long haul, you realize how much you’re paying for the cloud and it’s essential to recognize the opportunities to save on storage, compute and resources.
The second area I want to focus on is control. Control over our data, control over what we can do with our data and control over support when something goes wrong with our data. We choose database platforms to have some control over what features and product support we have. The dynamic has shifted as many companies now look to open source to control over what they produce by creating the features and product more on their own. They feel limited by large technical companies and want more control over how much they can do and direction.
No matter what you decide, no matter what your favorite database platform is or what you use in your company, databases make the world go round. The world continues to spin and databases continue to be the mechanism to deliver the data that we use everyday to answer questions, build trends and analysis from and make the decisions that the world relies on. We Database Administrators and Developers are the ones that are lucky enough to be those to work in this exciting arena in technology.
We’ve been working hard to create an incredible new trial version of Delphix that uses AWS, which is built with the open source product Terraform. Terraform is a tool that anyone can use to build, version and manage a product effectively and seamlessly in a number of clouds. We are currently using it to implement to AWS, but there is a bright future for these types of open source products and I’m really impressed with how easy its made it to deploy compute instances, the Delphix Engine and supporting architecture on AWS EC2. If you’d like to read up on Terraform, check out their website.
After building out the Delphix environment with the Engine and the a Linux source/target, the first step for many is to log into the Delphix Admin console. You can view any faults during the build at the upper right corner under Faults. One error that I’ve noticed comes up in after a successful build is the following:
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 246w" sizes="(max-width: 393px) 100vw, 393px" data-recalc-dims="1" />
By logging into your AWS EC2 console, you can view the instances that are being used. As you’ll note, the error says that the Delphix Engine is using an unsupported instance type m4.large. Yes in our EC2 console, we can see the Delphix Engine, (last in the list and with the name ending in “DE”) that no, actually it isn’t.
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
It’s actually a m4.xlarge instance type. What’s even more interesting, is that the Linux Target, (LT) and Linux Source, (LS) are both m4.large instance types, yet no warning was issued for either of these instances as unsupported.
You can locate what types of instance types are supported for AWS EC2 with the following link. At this page, we can also see that both the m4.large and the m4.xlarge instance type IS SUPPORTED.
https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1413w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
Knowing that we’ve validated that the instance type is supported means that I can safely ignore it and proceed to work through the trial without worry.
If you’re planning on deploying a production Delphix Engine on AWS, inspect the following document to ensure you build it with the proper configuration.
Nothing to see here and thought I better let everyone know before someone lumps Amazon with CNN…
Recently I’ve seen not so smart optimizer behavior: one query took long time to parse, and ended with an error hitting PGA_AGGREGATE_LIMIT in few minutes; another query was just parsed for ages while using reasonable (under 2G :)) amount of PGA and still could hit PGA_AGGREGATE_LIMIT but after way more time – up to an hour.
Both cases were similar and involved queries which were accessing views; and those views’ code is generated by an application using lots of IN LISTs and other OR conditions. They both are really ugly SQLs with text length ~100K. When Oracle tried to parse them it took a lot of time and parse attempt had either failed with ORA-4036 soon or hanged for a long time and then failed. Strangely incident trace file generated for ORA-4036 doesn’t include PGA heaps breakdown and you have to manually enable PGA heapdump on error to get an idea what is taking up memory. Here’s what I’ve found in there:
Am I reading this right: there's 7816MB (4921+2895) free? what is "10GB free" then? pic.twitter.com/y8UKXHvj0Q
— Timur Akhmadeev (@tmmdv) January 24, 2017
Note kxs-heap-c of 10G. I don’t know how to interpret the numbers here as they don’t add up, and assume qkkele & kkoutlCreatePh as the culprit.
Short stack trace of the process while parsing 2nd query was mostly looking like this:
SQL> oradebug short_stack ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-kghfnd_in_free_lists()+717<-kghprmalo()+1583<-kghalp()+1246<-qksshMemAllocPerm()+150 <-kkoabr()+810<-kkobifs()+1411<-kkofbp()+3483<-kkofbp()+2985<-kkofbp()+6206<-kkobmp()+10639<-kkoBitmapAccPath()+212 <-kkotap()+2019<-kkoiqb()+8672<-kkooqb()+528<-kkoOrExpand()+1105<-kkoOrExpand()+4559<-kkoOrExpand()+4559<-kkoqbc()+7123 <-apakkoqb()+182<-apaqbdDescendents()+488<-apadrv()+6244<-opitca()+2106<-kksFullTypeCheck()+79<-rpiswu2()+1780<-kksSetBindType()+2076<-kksfbc()+10522<-opiexe()+2536<-...
The closest hit in MOS search was Doc ID 23539020.8 (Fixed in 12.2). It is a perfect match except for one major difference: the document mentions possible ORA-4031, i.e. shared pool allocation error, while I’ve experienced issues in PGA allocations. Nevertheless I’ve tried suggested workaround of disabling OR expansion and it worked: parse time has dropped to acceptable numbers. I haven’t looked close enough why OR expansion needed so much resources. I think CBO tried to cost table access too many times due to large IN LISTS – as 10053 continuously dumped table access path information non stop; this is just an idea though.
CBO definitely needs some way to control parse time in recent releases. I’d appreciate a parameter which would set a target parse time, and if it’s crossed then either stop and use the best plan found so far or profile current parse call, try to eliminate the most time consuming transformation, and start again with this transformation/feature disabled. The latter would require some sophisticated code, and would be preferred; but even a simple target would be better than what we have now.
Another minor strange thing while investigating this issue was: SQL parsed correctly in under 10s when I’ve used SYS connection and set CURRENT_SCHEMA, even though there was no major differences in CBO environment between SYS and application user sessions that could’ve explained why it parsed fast in SYS. Yet another reason to avoid SYS connections and use real user sessions.
Here’s a very simple example of a table called PARENT being a (surprise surprise) parent in a referential integrity relationship to a (drum roll for my choice of name) CHILD table
SQL> create table parent ( p int, constraint PAR_PK primary key (p) ); Table created. SQL> create table child ( c int, 2 p int 3 ); Table created. SQL> alter table child add constraint fk1 foreign key ( p ) references parent ( p ); Table altered.
That is all as we would expect, and similarly, if I inadvertently try to add the same foreign key constraint, I’ll get an error
SQL> alter table child add constraint fk2 foreign key ( p ) references parent ( p ); alter table child add constraint fk2 foreign key ( p ) references parent ( p ) * ERROR at line 1: ORA-02275: such a referential constraint already exists in the table
So you might be wondering, how on earth did I get myself into the following calamity on my database:
SQL> select c.constraint_name, cc.column_name, c.r_constraint_name 2 from user_constraints c, 3 user_cons_columns cc 4 where c.table_name = 'CHILD' 5 and c.constraint_type = 'R' 6 and c.constraint_name = cc.constraint_name; CONSTRAINT_NAME COLUMN_NAME R_CONSTRAINT_NAME ------------------------------ ------------------------------ -------------------- FK1 P PAR_PK FK2 P PAR_PK
Yes – thats TWO foreign key constraints implementing the identical check. How did I bypass the duplicate check we saw above ?
It’s just a little idiosyncrasy in all versions of Oracle since the inception of foreign key constraints. If you specify the constraints as part of the table creation DDL, the check for duplicates is skipped.
SQL> create table child ( c int, 2 p int, 3 constraint fk1 foreign key ( p ) references parent ( p ) , 4 constraint fk2 foreign key ( p ) references parent ( p ) 5 ); Table created.
It is worth looking for and rectifying this condition on your databases, because it can lead to confusing errors, for example, when you do a datapump of such a schema – the import will (correctly) fail on the second constraint, but you’ll be scratching your head when you go look at the imported result and see that the constraint is there !
This is fixed in 12c onwards.
SQL> create table child ( c int, 2 p int, 3 constraint fk1 foreign key ( p ) references parent ( p ) , 4 constraint fk2 foreign key ( p ) references parent ( p ) 5 ); constraint fk2 foreign key ( p ) references parent ( p ) * ERROR at line 4: ORA-02274: duplicate referential constraint specifications