We’ve been working hard to create an incredible new trial version of Delphix that uses AWS, which is built with the open source product Terraform. Terraform is a tool that anyone can use to build, version and manage a product effectively and seamlessly in a number of clouds. We are currently using it to implement to AWS, but there is a bright future for these types of open source products and I’m really impressed with how easy its made it to deploy compute instances, the Delphix Engine and supporting architecture on AWS EC2. If you’d like to read up on Terraform, check out their website.
After building out the Delphix environment with the Engine and the a Linux source/target, the first step for many is to log into the Delphix Admin console. You can view any faults during the build at the upper right corner under Faults. One error that I’ve noticed comes up in after a successful build is the following:
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 246w" sizes="(max-width: 393px) 100vw, 393px" data-recalc-dims="1" />
By logging into your AWS EC2 console, you can view the instances that are being used. As you’ll note, the error says that the Delphix Engine is using an unsupported instance type m4.large. Yes in our EC2 console, we can see the Delphix Engine, (last in the list and with the name ending in “DE”) that no, actually it isn’t.
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
It’s actually a m4.xlarge instance type. What’s even more interesting, is that the Linux Target, (LT) and Linux Source, (LS) are both m4.large instance types, yet no warning was issued for either of these instances as unsupported.
You can locate what types of instance types are supported for AWS EC2 with the following link. At this page, we can also see that both the m4.large and the m4.xlarge instance type IS SUPPORTED.
https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1413w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
Knowing that we’ve validated that the instance type is supported means that I can safely ignore it and proceed to work through the trial without worry.
If you’re planning on deploying a production Delphix Engine on AWS, inspect the following document to ensure you build it with the proper configuration.
Nothing to see here and thought I better let everyone know before someone lumps Amazon with CNN…
Recently I’ve seen not so smart optimizer behavior: one query took long time to parse, and ended with an error hitting PGA_AGGREGATE_LIMIT in few minutes; another query was just parsed for ages while using reasonable (under 2G :)) amount of PGA and still could hit PGA_AGGREGATE_LIMIT but after way more time – up to an hour.
Both cases were similar and involved queries which were accessing views; and those views’ code is generated by an application using lots of IN LISTs and other OR conditions. They both are really ugly SQLs with text length ~100K. When Oracle tried to parse them it took a lot of time and parse attempt had either failed with ORA-4036 soon or hanged for a long time and then failed. Strangely incident trace file generated for ORA-4036 doesn’t include PGA heaps breakdown and you have to manually enable PGA heapdump on error to get an idea what is taking up memory. Here’s what I’ve found in there:
Am I reading this right: there's 7816MB (4921+2895) free? what is "10GB free" then? pic.twitter.com/y8UKXHvj0Q
— Timur Akhmadeev (@tmmdv) January 24, 2017
Note kxs-heap-c of 10G. I don’t know how to interpret the numbers here as they don’t add up, and assume qkkele & kkoutlCreatePh as the culprit.
Short stack trace of the process while parsing 2nd query was mostly looking like this:
SQL> oradebug short_stack ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-kghfnd_in_free_lists()+717<-kghprmalo()+1583<-kghalp()+1246<-qksshMemAllocPerm()+150 <-kkoabr()+810<-kkobifs()+1411<-kkofbp()+3483<-kkofbp()+2985<-kkofbp()+6206<-kkobmp()+10639<-kkoBitmapAccPath()+212 <-kkotap()+2019<-kkoiqb()+8672<-kkooqb()+528<-kkoOrExpand()+1105<-kkoOrExpand()+4559<-kkoOrExpand()+4559<-kkoqbc()+7123 <-apakkoqb()+182<-apaqbdDescendents()+488<-apadrv()+6244<-opitca()+2106<-kksFullTypeCheck()+79<-rpiswu2()+1780<-kksSetBindType()+2076<-kksfbc()+10522<-opiexe()+2536<-...
The closest hit in MOS search was Doc ID 23539020.8 (Fixed in 12.2). It is a perfect match except for one major difference: the document mentions possible ORA-4031, i.e. shared pool allocation error, while I’ve experienced issues in PGA allocations. Nevertheless I’ve tried suggested workaround of disabling OR expansion and it worked: parse time has dropped to acceptable numbers. I haven’t looked close enough why OR expansion needed so much resources. I think CBO tried to cost table access too many times due to large IN LISTS – as 10053 continuously dumped table access path information non stop; this is just an idea though.
CBO definitely needs some way to control parse time in recent releases. I’d appreciate a parameter which would set a target parse time, and if it’s crossed then either stop and use the best plan found so far or profile current parse call, try to eliminate the most time consuming transformation, and start again with this transformation/feature disabled. The latter would require some sophisticated code, and would be preferred; but even a simple target would be better than what we have now.
Another minor strange thing while investigating this issue was: SQL parsed correctly in under 10s when I’ve used SYS connection and set CURRENT_SCHEMA, even though there was no major differences in CBO environment between SYS and application user sessions that could’ve explained why it parsed fast in SYS. Yet another reason to avoid SYS connections and use real user sessions.
Here’s a very simple example of a table called PARENT being a (surprise surprise) parent in a referential integrity relationship to a (drum roll for my choice of name) CHILD table
SQL> create table parent ( p int, constraint PAR_PK primary key (p) ); Table created. SQL> create table child ( c int, 2 p int 3 ); Table created. SQL> alter table child add constraint fk1 foreign key ( p ) references parent ( p ); Table altered.
That is all as we would expect, and similarly, if I inadvertently try to add the same foreign key constraint, I’ll get an error
SQL> alter table child add constraint fk2 foreign key ( p ) references parent ( p ); alter table child add constraint fk2 foreign key ( p ) references parent ( p ) * ERROR at line 1: ORA-02275: such a referential constraint already exists in the table
So you might be wondering, how on earth did I get myself into the following calamity on my database:
SQL> select c.constraint_name, cc.column_name, c.r_constraint_name 2 from user_constraints c, 3 user_cons_columns cc 4 where c.table_name = 'CHILD' 5 and c.constraint_type = 'R' 6 and c.constraint_name = cc.constraint_name; CONSTRAINT_NAME COLUMN_NAME R_CONSTRAINT_NAME ------------------------------ ------------------------------ -------------------- FK1 P PAR_PK FK2 P PAR_PK
Yes – thats TWO foreign key constraints implementing the identical check. How did I bypass the duplicate check we saw above ?
It’s just a little idiosyncrasy in all versions of Oracle since the inception of foreign key constraints. If you specify the constraints as part of the table creation DDL, the check for duplicates is skipped.
SQL> create table child ( c int, 2 p int, 3 constraint fk1 foreign key ( p ) references parent ( p ) , 4 constraint fk2 foreign key ( p ) references parent ( p ) 5 ); Table created.
It is worth looking for and rectifying this condition on your databases, because it can lead to confusing errors, for example, when you do a datapump of such a schema – the import will (correctly) fail on the second constraint, but you’ll be scratching your head when you go look at the imported result and see that the constraint is there !
This is fixed in 12c onwards.
SQL> create table child ( c int, 2 p int, 3 constraint fk1 foreign key ( p ) references parent ( p ) , 4 constraint fk2 foreign key ( p ) references parent ( p ) 5 ); constraint fk2 foreign key ( p ) references parent ( p ) * ERROR at line 4: ORA-02274: duplicate referential constraint specifications
A couple of people have requested that I explain how to install the entire random_ninja and testdata_ninja packages manually. I have created a gist here with the full and complete order of the files: Full install list. So download random_ninja zipfile and testdata_ninja zipfile and follow the order of the gist to install all the code.
Once you have installed the packages you have the full random library available, which you can read in full detail about here: Post #1, Post #2 and Post #3. For testdata package you can see a quick demo here and here.
Release 3 of random_ninja is coming up and this is the list of new functionality that will be fully supported (HINT: The code is already in the github source):
Also there are 3 new locale supported for names, addresses and more. These countries are Denmark, China and Dhubai. Performance improvements for Markov text generation are coming as well as new SWIFT and FIX financial random generation.
Colleague Jeff Smith published an interesting post the other day about his “rules and regulations” for blogging, but the overriding theme (Ed: – this is my opinion, I’m not speaking for Jeff) was that the “what” he blogs about was – anything he’s passionate about, and the “when” was – whenever felt inspired to do so.
That got me thinking about blogging in general. I think it is safe to say
(Because not to do so means you’re in the wrong career)
(I got news for you…you’re not the only guy/girl running Oracle, or Microsoft, or SAP, or Java, or …)
(Don’t blame me…blame Netscape , Facebook and cat videos)
So with those things in mind, to not be blogging about the things in your career that you are passionate about … is … well…. sortta just plain rude. The script you wrote, the problem you solved, the frustration you had with the way something worked – all of these are items of potential benefit for someone out there in the community. Moreover, the act of putting it to “paper” might assist you if the topic at hand is one of a seemingly unsolvable problem. And of course, the more people that are blogging, the greater the chance that you will receive a reciprocating benefit, namely, someone else’s blog post will assist you with some element of your working day.
You might be thinking – “Oh..I dont have necessary software to blog” That doesn’t really cut it anymore since all of the blogging platforms have reasonable editors just sitting there right in the browser. I know this because I’m typing this entry into such an editor right now.
Or you might be thinking – “Oh..I dont have the time to write one” Yeah, that doesn’t really cut it either If you haven’t got the time to write it down, just do it video instead. And before you tell me that “Oh..I dont have necessary equipment” let me share with you what I use for a lot of my “away from home” filming:
In terms of a parts list, we have
Rest assured, Steven Spielberg I’m definitely not, but I’m not trying to be. And what’s more, I don’t think the viewing audience is interested tremendously in cinematic wonder – they’re interested in content that might help them have an easier and more productive time at work. I’m just sharing what interests me, on the off chance that it will benefit you.
So whether it is blogging or video or any other medium, don’t be shy. Jump on board and try your hand. You might surprise yourself at how easy it is to reach out to the greater community in your IT world.
Give it a go !
Here’s one of those little improvements in 12c (including 12.1) that will probably end up being described as “little known features” in about 3 years time. Arguably it’s one of those little things that no-one should care about because it’s not the sort of thing you should do on a production system, but that doesn’t mean it won’t be seen in the wild.
Rather than simply state the feature I’m going to demonstrate it, starting with a little code to build a couple of tables with referential integrity:
create table parent ( id number(4), name varchar2(10), constraint par_pk primary key (id) ) ; create table child( id_p number(4) constraint chi_fk_par references parent on delete cascade, id number(4), name varchar2(10), constraint chi_pk primary key (id_p, id) ) ; insert into parent values (1,'Smith'); insert into parent values (2,'Jones'); insert into child values(1,1,'Sally'); insert into child values(1,2,'Simon'); insert into child values(2,1,'Jack'); insert into child values(2,2,'Jill'); commit;
There’s one important detail in this code that isn’t taking the default and isn’t used very frequently – it’s the option on the foreign key to take the action “on delete cascade”. If you delete a row from the parent table then Oracle will automatically delete any referenced rows from the child table first thus avoiding the error ORA-02292: integrity constraint (TEST_USER.CHI_FK_PAR) violated – child record found. (Conveniently I have a suitable index on the child table that will bypass the problem of a mode 4 (or, where child rows already exist, mode 5) TM lock being taken on the child as the parent row is deleted.)
And here’s the demonstration of the new feature – working in 12.1 onwards:
truncate table parent; truncate table parent cascade;
The first command will raise Oracle error ORA-02266: unique/primary keys in table referenced by enabled foreign keys, but the second command will truncate the parent and child tables “simultaneously”: but only if the referential integrity constraint is set to “on delete cascade”. If the referential integrity constraint is left to its default action then the second command will raise error: ORA-14705: unique or primary keys referenced by enabled foreign keys in table “TEST_USER”.”CHILD”
This feature (and several broadly similar features) also works with matching partitions of equi-partitioned (or ref partitioned) tables – and that’s a context where the requirement is much more likely to appear than with non-partitioned tables.
In the previous three parts of this series a lot of preparation work, needed for the configuration of Data Guard, was performed. In this part of the mini-series they all come to fruition. Using the Data Guard broker a switchover operation will be performed. A couple of new features in 12c make this easier. According to the “Changes in This Release for Oracle Data Guard Concepts and Administration” chapter of the 12.1 Data Guard Concepts and Administration guide:
When [you, ed.] perform a switchover from an Oracle RAC primary database to a physical standby database, it is no longer necessary to shut down all but one primary database instance.
I have always wanted to test that in a quiet moment…
I have previously blogged about another useful change that should make my life easier: the static registration of the *_DGMGRL services in the listener.ora file is no longer needed. Have a look at my Data Guard Broker Setup Changes post for more details and reference to the documentation.
NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!
Now let’s get to it.
Step 1: Check the status of the configuration
In the first step I always check the configuration and make sure I can switch over. Data Guard 12c has a nifty automatic check that helps, but I always have a list of tasks I perform prior to a switchover (not shown in this blog post).
The following commands are somewhat sensitive to availability of the network – you should protect your sessions against any type of network failure! I am using screen (1) for that purpose, there are other tools out there doing similar things. Network glitches are too common to ignore, and I have come to appreciate the ability to resume work without too many problems after having seen the dreaded “broken pipe” message in my terminal window…
[oracle@rac12sec1 ~]$ dgmgrl DGMGRL for Linux: Version 220.127.116.11.0 - 64bit Production Copyright (c) 2000, 2013, Oracle. All rights reserved. Welcome to DGMGRL, type "help" for information. DGMGRL> connect sys@ncdbb Password: Connected as SYSDBA. DGMGRL> show configuration Configuration - ractest Protection Mode: MaxPerformance Members: NCDBA - Primary database NCDBB - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS (status updated 55 seconds ago) DGMGRL> validate database 'NCDBB' ...
The command to check for switchover readiness is new to 12c as well and called “validate database”. I don’t have screen output from the situation at this point-just take my word that I was ready :) Don’t switch over if you have any concerns or doubts the operation might not succeed! “Validate database” does not relieve you from your duties to check for switchover readiness – follow your procedures.
Step 2: Switch Over
Finally, the big moment has come! It takes just one line to perform the switchover:
DGMGRL> switchover to 'NCDBB' Performing switchover NOW, please wait... New primary database "NCDBB" is opening... Oracle Clusterware is restarting database "NCDBA" ... Switchover succeeded, new primary is "NCDBB" DGMGRL> DGMGRL> show database 'NCDBA'; Database - NCDBA Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) Average Apply Rate: 9.00 KByte/s Real Time Query: ON Instance(s): NCDBA1 NCDBA2 (apply instance) Database Status: SUCCESS DGMGRL> show database 'NCDBB'; Database - NCDBB Role: PRIMARY Intended State: TRANSPORT-ON Instance(s): NCDBB1 NCDBB2 Database Status: SUCCESS DGMGRL>
Well that was easy! Did you notice Data Guard Broker telling us that ‘Oracle Clusterware is restarting database “NCDBA” …’ ? I like it.
If you get stuck at this point something has gone wrong with the database registration in the OCR. You shouldn’t run into problems though, because you tested every aspect of the RAC system before handing the system over to its intended users, didn’t you?
Validating the new standby database shows no issues. I haven’t noticed it before but “validate database” allows you to get more verbose output:
DGMGRL> validate database verbose 'NCDBA'; Database Role: Physical standby database Primary Database: NCDBB Ready for Switchover: Yes Ready for Failover: Yes (Primary Running) Capacity Information: Database Instances Threads NCDBB 2 2 NCDBA 2 2 Temporary Tablespace File Information: NCDBB TEMP Files: 1 NCDBA TEMP Files: 1 Flashback Database Status: NCDBB: On NCDBA: Off Data file Online Move in Progress: NCDBB: No NCDBA: No Standby Apply-Related Information: Apply State: Running Apply Lag: 0 seconds (computed 1 second ago) Apply Delay: 0 minutes Transport-Related Information: Transport On: Yes Gap Status: No Gap Transport Lag: 0 seconds (computed 1 second ago) Transport Status: Success Log Files Cleared: NCDBB Standby Redo Log Files: Cleared NCDBA Online Redo Log Files: Cleared NCDBA Standby Redo Log Files: Available Current Log File Groups Configuration: Thread # Online Redo Log Groups Standby Redo Log Groups Status (NCDBB) (NCDBA) 1 2 3 Sufficient SRLs 2 2 3 Sufficient SRLs Future Log File Groups Configuration: Thread # Online Redo Log Groups Standby Redo Log Groups Status (NCDBA) (NCDBB) 1 2 3 Sufficient SRLs 2 2 3 Sufficient SRLs Current Configuration Log File Sizes: Thread # Smallest Online Redo Smallest Standby Redo Log File Size Log File Size (NCDBB) (NCDBA) 1 50 MBytes 50 MBytes 2 50 MBytes 50 MBytes Future Configuration Log File Sizes: Thread # Smallest Online Redo Smallest Standby Redo Log File Size Log File Size (NCDBA) (NCDBB) 1 50 MBytes 50 MBytes 2 50 MBytes 50 MBytes Apply-Related Property Settings: Property NCDBB Value NCDBA Value DelayMins 0 0 ApplyParallel AUTO AUTO Transport-Related Property Settings: Property NCDBB Value NCDBA Value LogXptMode ASYNC ASYNC RedoRoutes
Dependency DelayMins 0 0 Binding optional optional MaxFailure 0 0 MaxConnections 1 1 ReopenSecs 300 300 NetTimeout 30 30 RedoCompression DISABLE DISABLE LogShipping ON ON Automatic Diagnostic Repository Errors: Error NCDBB NCDBA No logging operation NO NO Control file corruptions NO NO SRL Group Unavailable NO NO System data file missing NO NO System data file corrupted NO NO System data file offline NO NO User data file missing NO NO User data file corrupted NO NO User data file offline NO NO Block Corruptions found NO NO DGMGRL>
Isn’t that cool? That’s more information at my fingertips than I can shake a stick at! It’s also a lot more than I could think of (eg online datafile move!).
Interestingly the Broker reports that I have “Sufficient SRLs”. I have seen it complain about the number of Standby Redo Logs in the past and blogged about this Interesting observation about standby redo logs in Data Guard
After 4 (!) posts about the matter I have finally been able to perform a switchover operation. Role reversals are a much neglected operation a DBA should be comfortable with. In a crisis situation everyone needs to be clear about what needs to be done to restore service to the users. The database is usually the easier part … Success of Data Guard switchover operations also depends on the quality of change management: it is easy to “forget” applying configuration changes on the DR site.
In today’s busy times only few of us are lucky enough to intimately know each and every database we look after. What’s more common (sadly!) is that a DBA looks after 42 or more databases. This really only works without too many issues if procedures and standards are rock solid, and enforced.
To me, ‘cloud computing’ is renting a compute resource to perform a task. In order to use that compute resource, you need to instruct it to do something, which is typically done via the network. If the task the compute resource needs to fulfil is being an application server or being a client or both in the case of an application server that uses an Oracle database, the network latency between the client of the database and the database server is a critical property.
I think so far everybody is with me. If we zoom in to the network, it becomes more difficult, and *very* easy to make wrong assumptions. Let me explain. A network, but really any connection between processing and a resource, has two DIFFERENT properties that I see getting mixed up consistently. These are:
* Latency: the time it takes for a signal or (network) packet to travel from the client to the server, or the time it takes to travel from the client to the server and back.
* Bandwidth: the amount of data that can be transported from the client to the server in a certain time.
How do you determine the latency of a network? Probably the most people respond with ‘use ping’. This is how that looks like:
[user@oid1 ~]$ ping -c 3 lsh1 PING lsh1 (x.x.x.x) 56(84) bytes of data. 64 bytes from lsh1: icmp_seq=1 ttl=62 time=680 ms 64 bytes from lsh1: icmp_seq=2 ttl=62 time=0.304 ms 64 bytes from lsh1: icmp_seq=3 ttl=62 time=0.286 ms
The question I often ask myself is: what is that we see actually? How does this work?
In order to answer that question, the tcpdump tool can answer that question. Using tcpdump, you can capture the network packets on which the ping utility based the above outcome. The ‘-ttt’ option calculates the time between each arrived packet:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host lsh1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1 > lsh1: ICMP echo request, id 35879, seq 1, length 64 00:00:00.680289 IP lsh1 > oid1: ICMP echo reply, id 35879, seq 1, length 64 00:00:00.319614 IP oid1 > lsh1: ICMP echo request, id 35879, seq 2, length 64 00:00:00.000287 IP lsh1 > oid1: ICMP echo reply, id 35879, seq 2, length 64 00:00:01.000180 IP oid1 > lsh1: ICMP echo request, id 35879, seq 3, length 64 00:00:00.000269 IP lsh1 > oid1: ICMP echo reply, id 35879, seq 3, length 64
So, ping works by sending a packet (ICMP echo request) requesting a reply (ICMP echo reply) from the remote server, and measure the time it takes to get that reply. Great, quite simple, isn’t it? However, the biggest issue I see this is using a protocol that is not used for sending regular data (!). Most application servers I encounter send data using TCP (transmission control protocol), the traffic ping sends are sent using a protocol called ICMP (internet control message protocol). Especially in the cloud, which means (probably) a lot of the infrastructure is shared, ICMP might be given different priority than TCP traffic, which you quite probably are using when the application on your cloud virtual machine is running. For those of you who haven’t looked into the network side of the IT landscape, you can priorise protocols and even specific ports, throttle traffic and you can even terminate it. In fact, a sensible protected (virtual) machine in the cloud will not respond to ICMP echo requests in order to protected it from attacks.
So, what would be a more sensible approach then? A better way would be to use the same protocol and port number that your application is going to use. This can be done using a tool called hping. Using that tool, you can craft your own packet with the protocol and flags you want. In the case of Oracle database traffic that would be the TCP protocol, port 1521 (it can be any port number, 1521 is the default port). This is how you can do that. In order to mimic starting a connection, the S (SYN) flag is set (-S), one packet is send (-c 1) to port 1521 (-p 1521).
[user@oid1 ~]$ sudo hping -S -c 1 -p 1521 db01-vip
What this does is best investigated with tcpdump once again. The server this is executed against can respond in two ways (three actually). When you send this to TCP port 1521 where a listener (or any other daemon that listens on that port) is listening, this is the response:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host db01-vip tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1.kjtsiteserver > db01-vip.ncube-lm: Flags [S], seq 1436552830, win 512, length 0 00:00:00.001229 IP db01-vip.ncube-lm > oid1.kjtsiteserver: Flags [S.], seq 2397022511, ack 1436552831, win 14600, options [mss 1460], length 0 00:00:00.000023 IP oid1.kjtsiteserver > db01-vip.ncube-lm: Flags [R], seq 1436552831, win 0, length 0
This is a variation of the classic TCP three way handshake:
1. A TCP packet is sent with the SYN flag set to indicate starting a (client to server) connection.
2. A TCP packet is sent back with SYN flag set to indicate starting a (server to client) connection, and the first packet is acknowledged.
3. This is where the variation is, normally an acknowledgement would be sent of the second packet to establish a two way connection, but in order to stop the communication a packet is sent with the RST (reset) flag set.
However, this is if a process is listening on the port. This is how that looks like when there is no process listening on port 1521:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host db01 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1.vsamredirector > db01.ncube-lm: Flags [S], seq 1975471906, win 512, length 0 00:00:00.001118 IP db01.ncube-lm > oid1.vsamredirector: Flags [R.], seq 0, ack 1975471907, win 0, length 0
This means that if a connection is initiated to a port on which no process is listening (port status ‘closed’), there is communication between the client and the server. This is why firewalls were invented!
1. A TCP packet is sent with the SYN flag set to indicate starting a connection.
2. A TCP packet is sent back to with the RST (reset) flag set to indicate no connection is possible.
The third option, when port 1521 is firewalled on the server, simply means only the first packet (from client to server with the SYN flag set) is sent and no response is coming back.
Okay, let’s pick up the performance aspect again. This hping command:
[user@oid1 ~]$ sudo hping -S -c 1 -p 1521 db01-vip HPING db01-vip (eth0 x.x.x.x): S set, 40 headers + 0 data bytes len=44 ip=db01-vip ttl=57 DF id=0 sport=1521 flags=SA seq=0 win=14600 rtt=1.2 ms
Says the roundtrip time is 1.2ms. If we look at the network packets and timing:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host db01-vip tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1.mmcal > db01-vip.ncube-lm: Flags [S], seq 1289836562, win 512, length 0 00:00:00.001113 IP db01-vip.ncube-lm > oid1.mmcal: Flags [S.], seq 2504750542, ack 1289836563, win 14600, options [mss 1460], length 0 00:00:00.000016 IP oid1.mmcal > db01-vip.ncube-lm: Flags [R], seq 1289836563, win 0, length 0
It becomes apparent that the 1.2ms time hping reports is the time it takes for the remote server to send back the SYN+ACK package in the TCP three way handshake.
So does that mean that if we take a number of measurements (let’s say 100, or 1000) to have a statistically significant number of measurements we can establish my TCP roundtrip time and then know how fast my connection will be (outside of all the other variables inherent to the internet and potential noisy neighbours to name a few)?
Oracle provides a way to generate and measure SQL-Net traffic in My Oracle Support note: Measuring Network Capacity using oratcptest (Doc ID 2064368.1). This note provides a jar file which contains server and client software, and is aimed at dataguard, but is useful to measure SQL-Net network latency. I have looked at the packets oratcptest generates, and they mimic SQL-Net quite well.
Let’s see if we can redo the test above to measure pure network latency. First on the database server side, setup the server:
[user@db01m ~]$ java -jar oratcptest.jar -server db01 -port=1521
And then on the client side run the client using the same oratcptest jar file:
java -jar oratcptest.jar db01 -mode=sync -length=0 -duration=1s -interval=1s -port=1521
The important bits are -mode=sync (client packet must be acknowledged before sending another packet) and -length=0 (network traffic contains no payload). This is the result:
[Requesting a test] Message payload = 0 bytes Payload content type = RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = (system default) Transport mode = SYNC Disk write = NO Statistics interval = 1 second Test duration = 1 second Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (07:34:42) The server is ready. Throughput Latency (07:34:43) 0.017 Mbytes/s 0.670 ms (07:34:43) Test finished. Socket send buffer = 11700 bytes Avg. throughput = 0.017 Mbytes/s Avg. latency = 0.670 ms
If you look at the hping roundtrip time (1.2ms) and the oratcptest roundtrip time (0.7ms) clearly this is different! If you just look at the numbers (1.2 versus 0.7) it might seem like the oratcptest time is only measuring client to server traffic instead of the whole roundtrip? For this too it’s good to use tcpdump once again and look what oratcptest actually is doing:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host db01 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1.63602 > db01.ncube-lm: Flags [S], seq 2408800085, win 17920, options [mss 8960,sackOK,TS val 3861246405 ecr 0,nop,wscale 7], length 0 00:00:00.001160 IP db01.ncube-lm > oid1.63602: Flags [S.], seq 2178995555, ack 2408800086, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0 00:00:00.000015 IP oid1.63602 > db01.ncube-lm: Flags [.], ack 1, win 140, length 0 00:00:00.023175 IP oid1.63602 > db01.ncube-lm: Flags [P.], seq 1:145, ack 1, win 140, length 144 00:00:00.000520 IP db01.ncube-lm > oid1.63602: Flags [.], ack 145, win 123, length 0 00:00:00.000951 IP db01.ncube-lm > oid1.63602: Flags [P.], seq 1:145, ack 145, win 123, length 144 00:00:00.000008 IP oid1.63602 > db01.ncube-lm: Flags [.], ack 145, win 149, length 0 00:00:00.018839 IP oid1.63602 > db01.ncube-lm: Flags [P.], seq 145:157, ack 145, win 149, length 12 00:00:00.000563 IP db01.ncube-lm > oid1.63602: Flags [P.], seq 145:149, ack 157, win 123, length 4 00:00:00.000358 IP oid1.63602 > db01.ncube-lm: Flags [P.], seq 157:169, ack 149, win 149, length 12 00:00:00.000486 IP db01.ncube-lm > oid1.63602: Flags [P.], seq 149:153, ack 169, win 123, length 4 00:00:00.000100 IP oid1.63602 > db01.ncube-lm: Flags [P.], seq 169:181, ack 153, win 149, length 12 00:00:00.000494 IP db01.ncube-lm > oid1.63602: Flags [P.], seq 153:157, ack 181, win 123, length 4 ... 00:00:00.000192 IP oid1.63586 > db01.ncube-lm: Flags [P.], seq 18181:18193, ack 6157, win 149, length 12 00:00:00.000447 IP db01.ncube-lm > oid1.63586: Flags [P.], seq 6157:6161, ack 18193, win 123, length 4 00:00:00.006696 IP oid1.63586 > db01.ncube-lm: Flags [F.], seq 18193, ack 6161, win 149, length 0 00:00:00.000995 IP db01.ncube-lm > oid1.63586: Flags [F.], seq 6161, ack 18194, win 123, length 0 00:00:00.000012 IP oid1.63586 > db01.ncube-lm: Flags [.], ack 6162, win 149, length 0
If you look at rows 4, 5 and 6 you see the typical TCP three-way handshake. What is nice to see, is that the actual response or roundtrip time for the packet from the server on line 5 actually took 1.1ms, which is what we have measured with hping! At lines 7-10 we see there is a packet send from the client to the server which is ACK’ed and a packet send from the server to the client which is ACK’ed. If you add the ‘-A’ flag to tcpdump you can get the values in the packet printed as characters, which shows the client telling the server how it wants to perform the test and the server responding with the requested settings. This is all a preparation for the test.
Starting from line 11, there is a strict repeating sequence of the client sending a packet of length 12, ACK’ing the previous received packet, and then the server responding with a packet of length 4 ACK’ing its previous received packet. This is the actual performance test! This means that the setting ‘-duration=1s -interval=1s’ does not mean it sends one packet, it actually means it’s continuously sending packets for the duration of 1 second. Also another flag is showing: the P or PSH (push) flag. This flag means the kernel/tcpip-stack understands all data to transmit is provided from ‘userland’, and now must be sent immediately, and instructs the receiving side to process it immediately in order to bring it to the receiving userland application as soon as possible too.
Lines 20-22 show how the connection is closed by sending a packet with a FIN flag, which is done for both the client to the server and the server to the client, and because it’s TCP, these need to be ACK’ed, which is why you see a trailing packet without a flag set, only ACK’ing the FIN packet.
The conclusion so far is that for real usable latency calculations you should not use a different protocol (so whilst ICMP (ping) does give an latency indication it should really only be used as an indicator), and that you should measure doing the actual work, not meta-transactions like the TCP three way handshake. Probably because of the PSH flag, the actual minimal latency for SQL-Net traffic is lower than ping and hping showed.
Wait a minute…did you notice the ‘actual minimal latency’? So far we only have been sending empty packets, which means we measured how fast a packet can travel from client to server and back. In reality, you probably want to send actual data back and forth, don’t you? That is something that we actually have not measured yet!
Let’s do actual Oracle transactions. For the sake of testing network latency, we can use Swingbench to execute SQL. This is how that is done:
[user@oid1 bin]$ cd ~/sw/swingbench/bin [user@oid1 bin]$ ./charbench -c ../configs/stresstest.xml -u soe -p soe -uc 1 -rt 00:01 Author : Dominic Giles Version : 18.104.22.1681 Results will be written to results.xml. Hit Return to Terminate Run... Time Users TPM TPS 8:22:56 AM 1 14450 775
Please mind I am using 1 user (-uc 1) and a testing time of 1 minute (-rt 00:01), which should be longer when you are doing real testing. As a reminder, I am using 1 session because I want to understand the latency, not the bandwidth! In order to understand if the network traffic looks the same as oratcptest.jar, I can use tcpdump once again. Here is a snippet of the traffic:
... 00:00:00.000106 IP oid1.50553 > db01-vip.ncube-lm: Flags [P.], seq 5839:5852, ack 5986, win 272, length 13 00:00:00.000491 IP db01-vip.ncube-lm > oid1.50553: Flags [P.], seq 5986:6001, ack 5852, win 330, length 15 00:00:00.000234 IP oid1.50553 > db01-vip.ncube-lm: Flags [P.], seq 5852:6003, ack 6001, win 272, length 151 00:00:00.000562 IP db01-vip.ncube-lm > oid1.50553: Flags [P.], seq 6001:6077, ack 6003, win 330, length 76 00:00:00.000098 IP oid1.50553 > db01-vip.ncube-lm: Flags [P.], seq 6003:6016, ack 6077, win 272, length 13 00:00:00.000484 IP db01-vip.ncube-lm > oid1.50553: Flags [P.], seq 6077:6092, ack 6016, win 330, length 15 00:00:00.000238 IP oid1.50553 > db01-vip.ncube-lm: Flags [P.], seq 6016:6159, ack 6092, win 272, length 143 00:00:00.000591 IP db01-vip.ncube-lm > oid1.50553: Flags [P.], seq 6092:6425, ack 6159, win 330, length 333 ...
The important bit is this shows the same single packet traffic client to server and back as we saw oratcptest generated, however now with varying packet size (which is logical, different SQL statements are sent to the database), the PSH bit is set, which also is the same as oratcptest generated.
Let’s assume this is a real-life workload. In order to measure and calculate differences in performance between different networks, we need the average packet length. This can be done with a tool called tcpstat (this link provides the EL6 version). In my case I have only one application using a database on this server, so I can just filter on port 1521 to measure my SQL-Net traffic:
[user@oid1 ~]$ sudo tcpstat -i eth0 -o "Packet/s=%p\tmin size: %m\tavg size: %a\tmax size: %M\tstddev: %d\n" -f 'port 1521' Packet/s=2526.40 min size: 53 avg size: 227.76 max size: 1436 stddev: 289.21 Packet/s=2531.40 min size: 53 avg size: 229.79 max size: 1432 stddev: 291.22 Packet/s=2634.20 min size: 53 avg size: 229.59 max size: 1432 stddev: 293.38 Packet/s=2550.00 min size: 53 avg size: 234.11 max size: 1435 stddev: 296.77 Packet/s=2486.80 min size: 53 avg size: 232.24 max size: 1436 stddev: 293.16
In case you wondered why tcpstat reports a minimum length of 53 and tcpdump (a little up in the article) of 13; tcpstat reports full packet length including packet, protocol and frame headers, tcpdump in this case reports the payload length.
Now we can execute oratcptest.jar again, but with a payload size set that matches the average size that we measured, I have taken 250 as payload size:
[user@oid1 ~]$ java -jar oratcptest.jar db01 -mode=sync -length=250 -duration=1s -interval=1s -port=1521 [Requesting a test] Message payload = 250 bytes Payload content type = RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = (system default) Transport mode = SYNC Disk write = NO Statistics interval = 1 second Test duration = 1 second Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (09:39:47) The server is ready. Throughput Latency (09:39:48) 0.365 Mbytes/s 0.685 ms (09:39:48) Test finished. Socket send buffer = 11700 bytes Avg. throughput = 0.365 Mbytes/s Avg. latency = 0.685 ms
As you can see, there is a real modest increase in average latency going from 0.670ms to 0.685ms.
In order to test the impact of network latency let’s move the oratcptest client to the server, to get the lowest possible latency. Actually, this is very easy, because the oratcptest.jar file contains both the client and the server, so all I need to do is logon to the server where I started the oratcptest.jar file in server mode, and run it in client mode:
[user@db01m ~]$ java -jar oratcptest.jar db01 -mode=sync -length=250 -duration=1s -interval=1s -port=1521 [Requesting a test] Message payload = 250 bytes Payload content type = RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = (system default) Transport mode = SYNC Disk write = NO Statistics interval = 1 second Test duration = 1 second Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (14:49:29) The server is ready. Throughput Latency (14:49:30) 12.221 Mbytes/s 0.020 ms (14:49:30) Test finished. Socket send buffer = 26010 bytes Avg. throughput = 11.970 Mbytes/s Avg. latency = 0.021 ms
Wow! The roundtrip latency dropped from 0.685ms to 0.021ms! Another test using oratcptest.jar using a true local network connection (with Linux being virtualised using Xen/OVM) shows a latency of 0.161ms.
These are the different network latency figures measured with oratcptest using a payload size that equals my average network payload size:
– Local only RTT: 0.021
– Local network RTT: 0.161
– Different networks RTT: 0.685
If I take swingbench and execute the ‘stresstest’ run local, on a machine directly connected via the local network and across different networks (think cloud), and now measure TPS (transactions per second), I get the following figures:
– Local only TPS: 2356
– Local network TPS: 1567
– Different networks TPS: 854
Do these figures make sense?
– Local only: Time not in network transit per second: 1000-(0.021*2356)=950.524; approximate average time spend on query: 950.523/2356=0.40ms
– Local network: 1000-(0.161*1567)=747.713/1567=0.48ms
– Different networks: 1000-(0.685*854)=415.010/854=0.49ms
It seems that this swingbench test spends roughly 0.40-0.50ms on processing, the difference in transactions per second seem to be mainly caused by the difference in network latency.
When dealing with a RANGE partitioned table, the defined partitions dictate all of the data that can be placed into the table. For example, if I have a SALES table as per below
SQL> create table SALES 2 ( cal_year date, 3 txn_id int, ... ... 24 ) 25 partition by range ( cal_year ) 26 ( 27 partition p_low values less than ( date '2000-01-01' ), 28 partition p2000 values less than ( date '2001-01-01' ), ... ... 34 partition p2016 values less than ( date '2017-01-01' ) 35 ); Table created.
then the existing partitions define a natural upper bound on the value of CAL_YEAR that I can insert into the table. For example, if I attempt to add a row for the year 2018, I get the familiar ORA-14400 that has called out many a DBA at the stroke of midnight on New Years Eve
SQL> insert into SALES 2 values ( date '2018-01-01', .... ); insert into SALES * ERROR at line 1: ORA-14400: inserted partition key does not map to any partition
As many will know, the resolution to this is either a maintenance task to ensure that there are sufficient partitions defined, or to use the INTERVAL partitioning method, which came available in 11g.
SQL> create table SALES 2 ( cal_year date, 3 txn_id int, ... ... 23 ) 24 partition by range ( cal_year ) 25 INTERVAL( NUMTOYMINTERVAL(1,'YEAR')) 26 ( 27 partition p_low values less than ( date '2000-01-01' ), 28 ); Table created.
And I can observe partitions being created as required as data is added to the table
SQL> select PARTITION_NAME, HIGH_VALUE 2 from user_tab_partitions 3 where table_name = 'SALES'; PARTITION_NAME HIGH_VALUE ------------------------- -------------------------------- P00 TIMESTAMP' 2000-01-01 00:00:00' SQL> insert into SALES 2 values ( to_date('12-DEC-2011'),....); SQL> select PARTITION_NAME, HIGH_VALUE 2 from user_tab_partitions 3 where table_name = 'SALES'; PARTITION_NAME HIGH_VALUE ------------------------- -------------------------------- P00 TIMESTAMP' 2000-01-01 00:00:00' SYS_P362 TIMESTAMP' 2012-01-01 00:00:00'
But this isn’t a post about how interval partitioning is defined, because it’s a topic that is now well understood and well detailed in the documentation and on many blogs.
I wanted to touch on a something more subtle that you might encounter when using interval partitioned tables. Let me do a query on the SALES table, which has been recreated (as INTERVAL partitioned) but is empty. Here is the execution plan when I query the table.
SQL> select * from SALES; --empty ------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | ------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | | 1 | PARTITION RANGE ALL| | 1 | 1 |1048575| | 2 | TABLE ACCESS FULL | SALES | 1 | 1 |1048575| -------------------------------------------------------------
Wow! One million partitions ! That might seem odd, because we know that our table has been defined only with a single partition, and even that might not be instantiated yet depending on our choice of “deferred_segment_creation” parameter on the database. But the explanation is relatively simple. The moment we define a table as interval partitioned, we in effect know “in advance” the definition of every single interval that will ever follow. The starting point for the intervals is known due to the initial partition definition in the DDL, and the size/length of the interval maps out every possible future partition.
The maximum number of partitions is 1048575, which is then reflected in the execution plan.
You’ll see similar information when you create an index on such a table. If the index is local, and hence follows the same partitioning scheme as the underlying table, then it too has potentially 1048575 partitions all not yet in use, but known in advance. So if you look at the PARTITION_COUNT column for such an index, you’ll also see that the database will state that it has a (very) high partition count
SQL> create index sales_ix on sales ( some_col ) local; Index created. SQL> select TABLE_NAME,INDEX_NAME,PARTITION_COUNT from user_part_indexes; TABLE_NAME INDEX_NAME PARTITION_COUNT ------------------------------ ------------------------------ --------------- SALES SALES_IX 1048575 1 row selected.
So if you see anything suggesting one million partitions, double check to see if you really have that many.
When tearing down an AWS Delphix Trial, we run the following command with Terraform:
I’ve mentioned before that every time I execute this command, I suddenly feel like I’m in control of the Death Star in Star Wars:
As this runs outside of the AWS EC2 web interface, you may see some odd information in your dashboard. In our example, we’ve run “terraform destroy” and the tear down was successful:
https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w" sizes="(max-width: 490px) 100vw, 490px" data-recalc-dims="1" />
So you may go to your volumes and after verifying that yes, no volumes exist:
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 768w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1425w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
The instances may still show the three instances that were created as part of the trial, (delphix engine, source and target.)
https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1429w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
These are simply “ghosts of instances past.” The tear down was completely successful and there’s simply a delay before the instance names are removed from the dashboard. Notice that they no longer are listed with a public DNS or IP address. This is a clear indication that these aren’t currently running, exist or more importantly, being charged for.
Just one more [little] thing to be aware of…