A couple of people have requested that I explain how to install the entire random_ninja and testdata_ninja packages manually. I have created a gist here with the full and complete order of the files: Full install list. So download random_ninja zipfile and testdata_ninja zipfile and follow the order of the gist to install all the code.
Once you have installed the packages you have the full random library available, which you can read in full detail about here: Post #1, Post #2 and Post #3. For testdata package you can see a quick demo here and here.
Release 3 of random_ninja is coming up and this is the list of new functionality that will be fully supported (HINT: The code is already in the github source):
Also there are 3 new locale supported for names, addresses and more. These countries are Denmark, China and Dhubai. Performance improvements for Markov text generation are coming as well as new SWIFT and FIX financial random generation.
Colleague Jeff Smith published an interesting post the other day about his “rules and regulations” for blogging, but the overriding theme (Ed: – this is my opinion, I’m not speaking for Jeff) was that the “what” he blogs about was – anything he’s passionate about, and the “when” was – whenever felt inspired to do so.
That got me thinking about blogging in general. I think it is safe to say
(Because not to do so means you’re in the wrong career)
(I got news for you…you’re not the only guy/girl running Oracle, or Microsoft, or SAP, or Java, or …)
(Don’t blame me…blame Netscape , Facebook and cat videos)
So with those things in mind, to not be blogging about the things in your career that you are passionate about … is … well…. sortta just plain rude. The script you wrote, the problem you solved, the frustration you had with the way something worked – all of these are items of potential benefit for someone out there in the community. Moreover, the act of putting it to “paper” might assist you if the topic at hand is one of a seemingly unsolvable problem. And of course, the more people that are blogging, the greater the chance that you will receive a reciprocating benefit, namely, someone else’s blog post will assist you with some element of your working day.
You might be thinking – “Oh..I dont have necessary software to blog” That doesn’t really cut it anymore since all of the blogging platforms have reasonable editors just sitting there right in the browser. I know this because I’m typing this entry into such an editor right now.
Or you might be thinking – “Oh..I dont have the time to write one” Yeah, that doesn’t really cut it either If you haven’t got the time to write it down, just do it video instead. And before you tell me that “Oh..I dont have necessary equipment” let me share with you what I use for a lot of my “away from home” filming:
In terms of a parts list, we have
Rest assured, Steven Spielberg I’m definitely not, but I’m not trying to be. And what’s more, I don’t think the viewing audience is interested tremendously in cinematic wonder – they’re interested in content that might help them have an easier and more productive time at work. I’m just sharing what interests me, on the off chance that it will benefit you.
So whether it is blogging or video or any other medium, don’t be shy. Jump on board and try your hand. You might surprise yourself at how easy it is to reach out to the greater community in your IT world.
Give it a go !
Here’s one of those little improvements in 12c (including 12.1) that will probably end up being described as “little known features” in about 3 years time. Arguably it’s one of those little things that no-one should care about because it’s not the sort of thing you should do on a production system, but that doesn’t mean it won’t be seen in the wild.
Rather than simply state the feature I’m going to demonstrate it, starting with a little code to build a couple of tables with referential integrity:
create table parent ( id number(4), name varchar2(10), constraint par_pk primary key (id) ) ; create table child( id_p number(4) constraint chi_fk_par references parent on delete cascade, id number(4), name varchar2(10), constraint chi_pk primary key (id_p, id) ) ; insert into parent values (1,'Smith'); insert into parent values (2,'Jones'); insert into child values(1,1,'Sally'); insert into child values(1,2,'Simon'); insert into child values(2,1,'Jack'); insert into child values(2,2,'Jill'); commit;
There’s one important detail in this code that isn’t taking the default and isn’t used very frequently – it’s the option on the foreign key to take the action “on delete cascade”. If you delete a row from the parent table then Oracle will automatically delete any referenced rows from the child table first thus avoiding the error ORA-02292: integrity constraint (TEST_USER.CHI_FK_PAR) violated – child record found. (Conveniently I have a suitable index on the child table that will bypass the problem of a mode 4 (or, where child rows already exist, mode 5) TM lock being taken on the child as the parent row is deleted.)
And here’s the demonstration of the new feature – working in 12.1 onwards:
truncate table parent; truncate table parent cascade;
The first command will raise Oracle error ORA-02266: unique/primary keys in table referenced by enabled foreign keys, but the second command will truncate the parent and child tables “simultaneously”: but only if the referential integrity constraint is set to “on delete cascade”. If the referential integrity constraint is left to its default action then the second command will raise error: ORA-14705: unique or primary keys referenced by enabled foreign keys in table “TEST_USER”.”CHILD”
This feature (and several broadly similar features) also works with matching partitions of equi-partitioned (or ref partitioned) tables – and that’s a context where the requirement is much more likely to appear than with non-partitioned tables.
In the previous three parts of this series a lot of preparation work, needed for the configuration of Data Guard, was performed. In this part of the mini-series they all come to fruition. Using the Data Guard broker a switchover operation will be performed. A couple of new features in 12c make this easier. According to the “Changes in This Release for Oracle Data Guard Concepts and Administration” chapter of the 12.1 Data Guard Concepts and Administration guide:
When [you, ed.] perform a switchover from an Oracle RAC primary database to a physical standby database, it is no longer necessary to shut down all but one primary database instance.
I have always wanted to test that in a quiet moment…
I have previously blogged about another useful change that should make my life easier: the static registration of the *_DGMGRL services in the listener.ora file is no longer needed. Have a look at my Data Guard Broker Setup Changes post for more details and reference to the documentation.
NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!
Now let’s get to it.
Step 1: Check the status of the configuration
In the first step I always check the configuration and make sure I can switch over. Data Guard 12c has a nifty automatic check that helps, but I always have a list of tasks I perform prior to a switchover (not shown in this blog post).
The following commands are somewhat sensitive to availability of the network – you should protect your sessions against any type of network failure! I am using screen (1) for that purpose, there are other tools out there doing similar things. Network glitches are too common to ignore, and I have come to appreciate the ability to resume work without too many problems after having seen the dreaded “broken pipe” message in my terminal window…
[oracle@rac12sec1 ~]$ dgmgrl DGMGRL for Linux: Version 184.108.40.206.0 - 64bit Production Copyright (c) 2000, 2013, Oracle. All rights reserved. Welcome to DGMGRL, type "help" for information. DGMGRL> connect sys@ncdbb Password: Connected as SYSDBA. DGMGRL> show configuration Configuration - ractest Protection Mode: MaxPerformance Members: NCDBA - Primary database NCDBB - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS (status updated 55 seconds ago) DGMGRL> validate database 'NCDBB' ...
The command to check for switchover readiness is new to 12c as well and called “validate database”. I don’t have screen output from the situation at this point-just take my word that I was ready :) Don’t switch over if you have any concerns or doubts the operation might not succeed! “Validate database” does not relieve you from your duties to check for switchover readiness – follow your procedures.
Step 2: Switch Over
Finally, the big moment has come! It takes just one line to perform the switchover:
DGMGRL> switchover to 'NCDBB' Performing switchover NOW, please wait... New primary database "NCDBB" is opening... Oracle Clusterware is restarting database "NCDBA" ... Switchover succeeded, new primary is "NCDBB" DGMGRL> DGMGRL> show database 'NCDBA'; Database - NCDBA Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) Average Apply Rate: 9.00 KByte/s Real Time Query: ON Instance(s): NCDBA1 NCDBA2 (apply instance) Database Status: SUCCESS DGMGRL> show database 'NCDBB'; Database - NCDBB Role: PRIMARY Intended State: TRANSPORT-ON Instance(s): NCDBB1 NCDBB2 Database Status: SUCCESS DGMGRL>
Well that was easy! Did you notice Data Guard Broker telling us that ‘Oracle Clusterware is restarting database “NCDBA” …’ ? I like it.
If you get stuck at this point something has gone wrong with the database registration in the OCR. You shouldn’t run into problems though, because you tested every aspect of the RAC system before handing the system over to its intended users, didn’t you?
Validating the new standby database shows no issues. I haven’t noticed it before but “validate database” allows you to get more verbose output:
DGMGRL> validate database verbose 'NCDBA'; Database Role: Physical standby database Primary Database: NCDBB Ready for Switchover: Yes Ready for Failover: Yes (Primary Running) Capacity Information: Database Instances Threads NCDBB 2 2 NCDBA 2 2 Temporary Tablespace File Information: NCDBB TEMP Files: 1 NCDBA TEMP Files: 1 Flashback Database Status: NCDBB: On NCDBA: Off Data file Online Move in Progress: NCDBB: No NCDBA: No Standby Apply-Related Information: Apply State: Running Apply Lag: 0 seconds (computed 1 second ago) Apply Delay: 0 minutes Transport-Related Information: Transport On: Yes Gap Status: No Gap Transport Lag: 0 seconds (computed 1 second ago) Transport Status: Success Log Files Cleared: NCDBB Standby Redo Log Files: Cleared NCDBA Online Redo Log Files: Cleared NCDBA Standby Redo Log Files: Available Current Log File Groups Configuration: Thread # Online Redo Log Groups Standby Redo Log Groups Status (NCDBB) (NCDBA) 1 2 3 Sufficient SRLs 2 2 3 Sufficient SRLs Future Log File Groups Configuration: Thread # Online Redo Log Groups Standby Redo Log Groups Status (NCDBA) (NCDBB) 1 2 3 Sufficient SRLs 2 2 3 Sufficient SRLs Current Configuration Log File Sizes: Thread # Smallest Online Redo Smallest Standby Redo Log File Size Log File Size (NCDBB) (NCDBA) 1 50 MBytes 50 MBytes 2 50 MBytes 50 MBytes Future Configuration Log File Sizes: Thread # Smallest Online Redo Smallest Standby Redo Log File Size Log File Size (NCDBA) (NCDBB) 1 50 MBytes 50 MBytes 2 50 MBytes 50 MBytes Apply-Related Property Settings: Property NCDBB Value NCDBA Value DelayMins 0 0 ApplyParallel AUTO AUTO Transport-Related Property Settings: Property NCDBB Value NCDBA Value LogXptMode ASYNC ASYNC RedoRoutes
Dependency DelayMins 0 0 Binding optional optional MaxFailure 0 0 MaxConnections 1 1 ReopenSecs 300 300 NetTimeout 30 30 RedoCompression DISABLE DISABLE LogShipping ON ON Automatic Diagnostic Repository Errors: Error NCDBB NCDBA No logging operation NO NO Control file corruptions NO NO SRL Group Unavailable NO NO System data file missing NO NO System data file corrupted NO NO System data file offline NO NO User data file missing NO NO User data file corrupted NO NO User data file offline NO NO Block Corruptions found NO NO DGMGRL>
Isn’t that cool? That’s more information at my fingertips than I can shake a stick at! It’s also a lot more than I could think of (eg online datafile move!).
Interestingly the Broker reports that I have “Sufficient SRLs”. I have seen it complain about the number of Standby Redo Logs in the past and blogged about this Interesting observation about standby redo logs in Data Guard
After 4 (!) posts about the matter I have finally been able to perform a switchover operation. Role reversals are a much neglected operation a DBA should be comfortable with. In a crisis situation everyone needs to be clear about what needs to be done to restore service to the users. The database is usually the easier part … Success of Data Guard switchover operations also depends on the quality of change management: it is easy to “forget” applying configuration changes on the DR site.
In today’s busy times only few of us are lucky enough to intimately know each and every database we look after. What’s more common (sadly!) is that a DBA looks after 42 or more databases. This really only works without too many issues if procedures and standards are rock solid, and enforced.
To me, ‘cloud computing’ is renting a compute resource to perform a task. In order to use that compute resource, you need to instruct it to do something, which is typically done via the network. If the task the compute resource needs to fulfil is being an application server or being a client or both in the case of an application server that uses an Oracle database, the network latency between the client of the database and the database server is a critical property.
I think so far everybody is with me. If we zoom in to the network, it becomes more difficult, and *very* easy to make wrong assumptions. Let me explain. A network, but really any connection between processing and a resource, has two DIFFERENT properties that I see getting mixed up consistently. These are:
* Latency: the time it takes for a signal or (network) packet to travel from the client to the server, or the time it takes to travel from the client to the server and back.
* Bandwidth: the amount of data that can be transported from the client to the server in a certain time.
How do you determine the latency of a network? Probably the most people respond with ‘use ping’. This is how that looks like:
[user@oid1 ~]$ ping -c 3 lsh1 PING lsh1 (x.x.x.x) 56(84) bytes of data. 64 bytes from lsh1: icmp_seq=1 ttl=62 time=680 ms 64 bytes from lsh1: icmp_seq=2 ttl=62 time=0.304 ms 64 bytes from lsh1: icmp_seq=3 ttl=62 time=0.286 ms
The question I often ask myself is: what is that we see actually? How does this work?
In order to answer that question, the tcpdump tool can answer that question. Using tcpdump, you can capture the network packets on which the ping utility based the above outcome. The ‘-ttt’ option calculates the time between each arrived packet:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host lsh1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1 > lsh1: ICMP echo request, id 35879, seq 1, length 64 00:00:00.680289 IP lsh1 > oid1: ICMP echo reply, id 35879, seq 1, length 64 00:00:00.319614 IP oid1 > lsh1: ICMP echo request, id 35879, seq 2, length 64 00:00:00.000287 IP lsh1 > oid1: ICMP echo reply, id 35879, seq 2, length 64 00:00:01.000180 IP oid1 > lsh1: ICMP echo request, id 35879, seq 3, length 64 00:00:00.000269 IP lsh1 > oid1: ICMP echo reply, id 35879, seq 3, length 64
So, ping works by sending a packet (ICMP echo request) requesting a reply (ICMP echo reply) from the remote server, and measure the time it takes to get that reply. Great, quite simple, isn’t it? However, the biggest issue I see this is using a protocol that is not used for sending regular data (!). Most application servers I encounter send data using TCP (transmission control protocol), the traffic ping sends are sent using a protocol called ICMP (internet control message protocol). Especially in the cloud, which means (probably) a lot of the infrastructure is shared, ICMP might be given different priority than TCP traffic, which you quite probably are using when the application on your cloud virtual machine is running. For those of you who haven’t looked into the network side of the IT landscape, you can priorise protocols and even specific ports, throttle traffic and you can even terminate it. In fact, a sensible protected (virtual) machine in the cloud will not respond to ICMP echo requests in order to protected it from attacks.
So, what would be a more sensible approach then? A better way would be to use the same protocol and port number that your application is going to use. This can be done using a tool called hping. Using that tool, you can craft your own packet with the protocol and flags you want. In the case of Oracle database traffic that would be the TCP protocol, port 1521 (it can be any port number, 1521 is the default port). This is how you can do that. In order to mimic starting a connection, the S (SYN) flag is set (-S), one packet is send (-c 1) to port 1521 (-p 1521).
[user@oid1 ~]$ sudo hping -S -c 1 -p 1521 db01-vip
What this does is best investigated with tcpdump once again. The server this is executed against can respond in two ways (three actually). When you send this to TCP port 1521 where a listener (or any other daemon that listens on that port) is listening, this is the response:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host db01-vip tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1.kjtsiteserver > db01-vip.ncube-lm: Flags [S], seq 1436552830, win 512, length 0 00:00:00.001229 IP db01-vip.ncube-lm > oid1.kjtsiteserver: Flags [S.], seq 2397022511, ack 1436552831, win 14600, options [mss 1460], length 0 00:00:00.000023 IP oid1.kjtsiteserver > db01-vip.ncube-lm: Flags [R], seq 1436552831, win 0, length 0
This is a variation of the classic TCP three way handshake:
1. A TCP packet is sent with the SYN flag set to indicate starting a (client to server) connection.
2. A TCP packet is sent back with SYN flag set to indicate starting a (server to client) connection, and the first packet is acknowledged.
3. This is where the variation is, normally an acknowledgement would be sent of the second packet to establish a two way connection, but in order to stop the communication a packet is sent with the RST (reset) flag set.
However, this is if a process is listening on the port. This is how that looks like when there is no process listening on port 1521:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host db01 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1.vsamredirector > db01.ncube-lm: Flags [S], seq 1975471906, win 512, length 0 00:00:00.001118 IP db01.ncube-lm > oid1.vsamredirector: Flags [R.], seq 0, ack 1975471907, win 0, length 0
This means that if a connection is initiated to a port on which no process is listening (port status ‘closed’), there is communication between the client and the server. This is why firewalls were invented!
1. A TCP packet is sent with the SYN flag set to indicate starting a connection.
2. A TCP packet is sent back to with the RST (reset) flag set to indicate no connection is possible.
The third option, when port 1521 is firewalled on the server, simply means only the first packet (from client to server with the SYN flag set) is sent and no response is coming back.
Okay, let’s pick up the performance aspect again. This hping command:
[user@oid1 ~]$ sudo hping -S -c 1 -p 1521 db01-vip HPING db01-vip (eth0 x.x.x.x): S set, 40 headers + 0 data bytes len=44 ip=db01-vip ttl=57 DF id=0 sport=1521 flags=SA seq=0 win=14600 rtt=1.2 ms
Says the roundtrip time is 1.2ms. If we look at the network packets and timing:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host db01-vip tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1.mmcal > db01-vip.ncube-lm: Flags [S], seq 1289836562, win 512, length 0 00:00:00.001113 IP db01-vip.ncube-lm > oid1.mmcal: Flags [S.], seq 2504750542, ack 1289836563, win 14600, options [mss 1460], length 0 00:00:00.000016 IP oid1.mmcal > db01-vip.ncube-lm: Flags [R], seq 1289836563, win 0, length 0
It becomes apparent that the 1.2ms time hping reports is the time it takes for the remote server to send back the SYN+ACK package in the TCP three way handshake.
So does that mean that if we take a number of measurements (let’s say 100, or 1000) to have a statistically significant number of measurements we can establish my TCP roundtrip time and then know how fast my connection will be (outside of all the other variables inherent to the internet and potential noisy neighbours to name a few)?
Oracle provides a way to generate and measure SQL-Net traffic in My Oracle Support note: Measuring Network Capacity using oratcptest (Doc ID 2064368.1). This note provides a jar file which contains server and client software, and is aimed at dataguard, but is useful to measure SQL-Net network latency. I have looked at the packets oratcptest generates, and they mimic SQL-Net quite well.
Let’s see if we can redo the test above to measure pure network latency. First on the database server side, setup the server:
[user@db01m ~]$ java -jar oratcptest.jar -server db01 -port=1521
And then on the client side run the client using the same oratcptest jar file:
java -jar oratcptest.jar db01 -mode=sync -length=0 -duration=1s -interval=1s -port=1521
The important bits are -mode=sync (client packet must be acknowledged before sending another packet) and -length=0 (network traffic contains no payload). This is the result:
[Requesting a test] Message payload = 0 bytes Payload content type = RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = (system default) Transport mode = SYNC Disk write = NO Statistics interval = 1 second Test duration = 1 second Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (07:34:42) The server is ready. Throughput Latency (07:34:43) 0.017 Mbytes/s 0.670 ms (07:34:43) Test finished. Socket send buffer = 11700 bytes Avg. throughput = 0.017 Mbytes/s Avg. latency = 0.670 ms
If you look at the hping roundtrip time (1.2ms) and the oratcptest roundtrip time (0.7ms) clearly this is different! If you just look at the numbers (1.2 versus 0.7) it might seem like the oratcptest time is only measuring client to server traffic instead of the whole roundtrip? For this too it’s good to use tcpdump once again and look what oratcptest actually is doing:
[user@oid1 ~]$ sudo tcpdump -ttt -i any host db01 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 00:00:00.000000 IP oid1.63602 > db01.ncube-lm: Flags [S], seq 2408800085, win 17920, options [mss 8960,sackOK,TS val 3861246405 ecr 0,nop,wscale 7], length 0 00:00:00.001160 IP db01.ncube-lm > oid1.63602: Flags [S.], seq 2178995555, ack 2408800086, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0 00:00:00.000015 IP oid1.63602 > db01.ncube-lm: Flags [.], ack 1, win 140, length 0 00:00:00.023175 IP oid1.63602 > db01.ncube-lm: Flags [P.], seq 1:145, ack 1, win 140, length 144 00:00:00.000520 IP db01.ncube-lm > oid1.63602: Flags [.], ack 145, win 123, length 0 00:00:00.000951 IP db01.ncube-lm > oid1.63602: Flags [P.], seq 1:145, ack 145, win 123, length 144 00:00:00.000008 IP oid1.63602 > db01.ncube-lm: Flags [.], ack 145, win 149, length 0 00:00:00.018839 IP oid1.63602 > db01.ncube-lm: Flags [P.], seq 145:157, ack 145, win 149, length 12 00:00:00.000563 IP db01.ncube-lm > oid1.63602: Flags [P.], seq 145:149, ack 157, win 123, length 4 00:00:00.000358 IP oid1.63602 > db01.ncube-lm: Flags [P.], seq 157:169, ack 149, win 149, length 12 00:00:00.000486 IP db01.ncube-lm > oid1.63602: Flags [P.], seq 149:153, ack 169, win 123, length 4 00:00:00.000100 IP oid1.63602 > db01.ncube-lm: Flags [P.], seq 169:181, ack 153, win 149, length 12 00:00:00.000494 IP db01.ncube-lm > oid1.63602: Flags [P.], seq 153:157, ack 181, win 123, length 4 ... 00:00:00.000192 IP oid1.63586 > db01.ncube-lm: Flags [P.], seq 18181:18193, ack 6157, win 149, length 12 00:00:00.000447 IP db01.ncube-lm > oid1.63586: Flags [P.], seq 6157:6161, ack 18193, win 123, length 4 00:00:00.006696 IP oid1.63586 > db01.ncube-lm: Flags [F.], seq 18193, ack 6161, win 149, length 0 00:00:00.000995 IP db01.ncube-lm > oid1.63586: Flags [F.], seq 6161, ack 18194, win 123, length 0 00:00:00.000012 IP oid1.63586 > db01.ncube-lm: Flags [.], ack 6162, win 149, length 0
If you look at rows 4, 5 and 6 you see the typical TCP three-way handshake. What is nice to see, is that the actual response or roundtrip time for the packet from the server on line 5 actually took 1.1ms, which is what we have measured with hping! At lines 7-10 we see there is a packet send from the client to the server which is ACK’ed and a packet send from the server to the client which is ACK’ed. If you add the ‘-A’ flag to tcpdump you can get the values in the packet printed as characters, which shows the client telling the server how it wants to perform the test and the server responding with the requested settings. This is all a preparation for the test.
Starting from line 11, there is a strict repeating sequence of the client sending a packet of length 12, ACK’ing the previous received packet, and then the server responding with a packet of length 4 ACK’ing its previous received packet. This is the actual performance test! This means that the setting ‘-duration=1s -interval=1s’ does not mean it sends one packet, it actually means it’s continuously sending packets for the duration of 1 second. Also another flag is showing: the P or PSH (push) flag. This flag means the kernel/tcpip-stack understands all data to transmit is provided from ‘userland’, and now must be sent immediately, and instructs the receiving side to process it immediately in order to bring it to the receiving userland application as soon as possible too.
Lines 20-22 show how the connection is closed by sending a packet with a FIN flag, which is done for both the client to the server and the server to the client, and because it’s TCP, these need to be ACK’ed, which is why you see a trailing packet without a flag set, only ACK’ing the FIN packet.
The conclusion so far is that for real usable latency calculations you should not use a different protocol (so whilst ICMP (ping) does give an latency indication it should really only be used as an indicator), and that you should measure doing the actual work, not meta-transactions like the TCP three way handshake. Probably because of the PSH flag, the actual minimal latency for SQL-Net traffic is lower than ping and hping showed.
Wait a minute…did you notice the ‘actual minimal latency’? So far we only have been sending empty packets, which means we measured how fast a packet can travel from client to server and back. In reality, you probably want to send actual data back and forth, don’t you? That is something that we actually have not measured yet!
Let’s do actual Oracle transactions. For the sake of testing network latency, we can use Swingbench to execute SQL. This is how that is done:
[user@oid1 bin]$ cd ~/sw/swingbench/bin [user@oid1 bin]$ ./charbench -c ../configs/stresstest.xml -u soe -p soe -uc 1 -rt 00:01 Author : Dominic Giles Version : 220.127.116.111 Results will be written to results.xml. Hit Return to Terminate Run... Time Users TPM TPS 8:22:56 AM 1 14450 775
Please mind I am using 1 user (-uc 1) and a testing time of 1 minute (-rt 00:01), which should be longer when you are doing real testing. As a reminder, I am using 1 session because I want to understand the latency, not the bandwidth! In order to understand if the network traffic looks the same as oratcptest.jar, I can use tcpdump once again. Here is a snippet of the traffic:
... 00:00:00.000106 IP oid1.50553 > db01-vip.ncube-lm: Flags [P.], seq 5839:5852, ack 5986, win 272, length 13 00:00:00.000491 IP db01-vip.ncube-lm > oid1.50553: Flags [P.], seq 5986:6001, ack 5852, win 330, length 15 00:00:00.000234 IP oid1.50553 > db01-vip.ncube-lm: Flags [P.], seq 5852:6003, ack 6001, win 272, length 151 00:00:00.000562 IP db01-vip.ncube-lm > oid1.50553: Flags [P.], seq 6001:6077, ack 6003, win 330, length 76 00:00:00.000098 IP oid1.50553 > db01-vip.ncube-lm: Flags [P.], seq 6003:6016, ack 6077, win 272, length 13 00:00:00.000484 IP db01-vip.ncube-lm > oid1.50553: Flags [P.], seq 6077:6092, ack 6016, win 330, length 15 00:00:00.000238 IP oid1.50553 > db01-vip.ncube-lm: Flags [P.], seq 6016:6159, ack 6092, win 272, length 143 00:00:00.000591 IP db01-vip.ncube-lm > oid1.50553: Flags [P.], seq 6092:6425, ack 6159, win 330, length 333 ...
The important bit is this shows the same single packet traffic client to server and back as we saw oratcptest generated, however now with varying packet size (which is logical, different SQL statements are sent to the database), the PSH bit is set, which also is the same as oratcptest generated.
Let’s assume this is a real-life workload. In order to measure and calculate differences in performance between different networks, we need the average packet length. This can be done with a tool called tcpstat (this link provides the EL6 version). In my case I have only one application using a database on this server, so I can just filter on port 1521 to measure my SQL-Net traffic:
[user@oid1 ~]$ sudo tcpstat -i eth0 -o "Packet/s=%p\tmin size: %m\tavg size: %a\tmax size: %M\tstddev: %d\n" -f 'port 1521' Packet/s=2526.40 min size: 53 avg size: 227.76 max size: 1436 stddev: 289.21 Packet/s=2531.40 min size: 53 avg size: 229.79 max size: 1432 stddev: 291.22 Packet/s=2634.20 min size: 53 avg size: 229.59 max size: 1432 stddev: 293.38 Packet/s=2550.00 min size: 53 avg size: 234.11 max size: 1435 stddev: 296.77 Packet/s=2486.80 min size: 53 avg size: 232.24 max size: 1436 stddev: 293.16
In case you wondered why tcpstat reports a minimum length of 53 and tcpdump (a little up in the article) of 13; tcpstat reports full packet length including packet, protocol and frame headers, tcpdump in this case reports the payload length.
Now we can execute oratcptest.jar again, but with a payload size set that matches the average size that we measured, I have taken 250 as payload size:
[user@oid1 ~]$ java -jar oratcptest.jar db01 -mode=sync -length=250 -duration=1s -interval=1s -port=1521 [Requesting a test] Message payload = 250 bytes Payload content type = RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = (system default) Transport mode = SYNC Disk write = NO Statistics interval = 1 second Test duration = 1 second Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (09:39:47) The server is ready. Throughput Latency (09:39:48) 0.365 Mbytes/s 0.685 ms (09:39:48) Test finished. Socket send buffer = 11700 bytes Avg. throughput = 0.365 Mbytes/s Avg. latency = 0.685 ms
As you can see, there is a real modest increase in average latency going from 0.670ms to 0.685ms.
In order to test the impact of network latency let’s move the oratcptest client to the server, to get the lowest possible latency. Actually, this is very easy, because the oratcptest.jar file contains both the client and the server, so all I need to do is logon to the server where I started the oratcptest.jar file in server mode, and run it in client mode:
[user@db01m ~]$ java -jar oratcptest.jar db01 -mode=sync -length=250 -duration=1s -interval=1s -port=1521 [Requesting a test] Message payload = 250 bytes Payload content type = RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = (system default) Transport mode = SYNC Disk write = NO Statistics interval = 1 second Test duration = 1 second Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (14:49:29) The server is ready. Throughput Latency (14:49:30) 12.221 Mbytes/s 0.020 ms (14:49:30) Test finished. Socket send buffer = 26010 bytes Avg. throughput = 11.970 Mbytes/s Avg. latency = 0.021 ms
Wow! The roundtrip latency dropped from 0.685ms to 0.021ms! Another test using oratcptest.jar using a true local network connection (with Linux being virtualised using Xen/OVM) shows a latency of 0.161ms.
These are the different network latency figures measured with oratcptest using a payload size that equals my average network payload size:
– Local only RTT: 0.021
– Local network RTT: 0.161
– Different networks RTT: 0.685
If I take swingbench and execute the ‘stresstest’ run local, on a machine directly connected via the local network and across different networks (think cloud), and now measure TPS (transactions per second), I get the following figures:
– Local only TPS: 2356
– Local network TPS: 1567
– Different networks TPS: 854
Do these figures make sense?
– Local only: Time not in network transit per second: 1000-(0.021*2356)=950.524; approximate average time spend on query: 950.523/2356=0.40ms
– Local network: 1000-(0.161*1567)=747.713/1567=0.48ms
– Different networks: 1000-(0.685*854)=415.010/854=0.49ms
It seems that this swingbench test spends roughly 0.40-0.50ms on processing, the difference in transactions per second seem to be mainly caused by the difference in network latency.
When dealing with a RANGE partitioned table, the defined partitions dictate all of the data that can be placed into the table. For example, if I have a SALES table as per below
SQL> create table SALES 2 ( cal_year date, 3 txn_id int, ... ... 24 ) 25 partition by range ( cal_year ) 26 ( 27 partition p_low values less than ( date '2000-01-01' ), 28 partition p2000 values less than ( date '2001-01-01' ), ... ... 34 partition p2016 values less than ( date '2017-01-01' ) 35 ); Table created.
then the existing partitions define a natural upper bound on the value of CAL_YEAR that I can insert into the table. For example, if I attempt to add a row for the year 2018, I get the familiar ORA-14400 that has called out many a DBA at the stroke of midnight on New Years Eve
SQL> insert into SALES 2 values ( date '2018-01-01', .... ); insert into SALES * ERROR at line 1: ORA-14400: inserted partition key does not map to any partition
As many will know, the resolution to this is either a maintenance task to ensure that there are sufficient partitions defined, or to use the INTERVAL partitioning method, which came available in 11g.
SQL> create table SALES 2 ( cal_year date, 3 txn_id int, ... ... 23 ) 24 partition by range ( cal_year ) 25 INTERVAL( NUMTOYMINTERVAL(1,'YEAR')) 26 ( 27 partition p_low values less than ( date '2000-01-01' ), 28 ); Table created.
And I can observe partitions being created as required as data is added to the table
SQL> select PARTITION_NAME, HIGH_VALUE 2 from user_tab_partitions 3 where table_name = 'SALES'; PARTITION_NAME HIGH_VALUE ------------------------- -------------------------------- P00 TIMESTAMP' 2000-01-01 00:00:00' SQL> insert into SALES 2 values ( to_date('12-DEC-2011'),....); SQL> select PARTITION_NAME, HIGH_VALUE 2 from user_tab_partitions 3 where table_name = 'SALES'; PARTITION_NAME HIGH_VALUE ------------------------- -------------------------------- P00 TIMESTAMP' 2000-01-01 00:00:00' SYS_P362 TIMESTAMP' 2012-01-01 00:00:00'
But this isn’t a post about how interval partitioning is defined, because it’s a topic that is now well understood and well detailed in the documentation and on many blogs.
I wanted to touch on a something more subtle that you might encounter when using interval partitioned tables. Let me do a query on the SALES table, which has been recreated (as INTERVAL partitioned) but is empty. Here is the execution plan when I query the table.
SQL> select * from SALES; --empty ------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | ------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | | 1 | PARTITION RANGE ALL| | 1 | 1 |1048575| | 2 | TABLE ACCESS FULL | SALES | 1 | 1 |1048575| -------------------------------------------------------------
Wow! One million partitions ! That might seem odd, because we know that our table has been defined only with a single partition, and even that might not be instantiated yet depending on our choice of “deferred_segment_creation” parameter on the database. But the explanation is relatively simple. The moment we define a table as interval partitioned, we in effect know “in advance” the definition of every single interval that will ever follow. The starting point for the intervals is known due to the initial partition definition in the DDL, and the size/length of the interval maps out every possible future partition.
The maximum number of partitions is 1048575, which is then reflected in the execution plan.
You’ll see similar information when you create an index on such a table. If the index is local, and hence follows the same partitioning scheme as the underlying table, then it too has potentially 1048575 partitions all not yet in use, but known in advance. So if you look at the PARTITION_COUNT column for such an index, you’ll also see that the database will state that it has a (very) high partition count
SQL> create index sales_ix on sales ( some_col ) local; Index created. SQL> select TABLE_NAME,INDEX_NAME,PARTITION_COUNT from user_part_indexes; TABLE_NAME INDEX_NAME PARTITION_COUNT ------------------------------ ------------------------------ --------------- SALES SALES_IX 1048575 1 row selected.
So if you see anything suggesting one million partitions, double check to see if you really have that many.
When tearing down an AWS Delphix Trial, we run the following command with Terraform:
I’ve mentioned before that every time I execute this command, I suddenly feel like I’m in control of the Death Star in Star Wars:
As this runs outside of the AWS EC2 web interface, you may see some odd information in your dashboard. In our example, we’ve run “terraform destroy” and the tear down was successful:
https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w" sizes="(max-width: 490px) 100vw, 490px" data-recalc-dims="1" />
So you may go to your volumes and after verifying that yes, no volumes exist:
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 768w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1425w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
The instances may still show the three instances that were created as part of the trial, (delphix engine, source and target.)
https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1429w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2017/01/Screen-Shot-2... 1200w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />
These are simply “ghosts of instances past.” The tear down was completely successful and there’s simply a delay before the instance names are removed from the dashboard. Notice that they no longer are listed with a public DNS or IP address. This is a clear indication that these aren’t currently running, exist or more importantly, being charged for.
Just one more [little] thing to be aware of…
Now, for most of us, we’re living in a mobile world, which means as our laptop travels, our office moves and our IP address changes. This can be a bit troubling for those that are working in the cloud and our configuration to our cloud relies on locating us via our IP Address being the same as it was in our previous location.
What happens if you’re IP Address changes from what you have in your configuration file, (in Terraform’s case, your terraform.tfvars file) for the Delphix AWS Trial? I set my IP Address purposefully incorrect in the file to demonstrate what would happen after I run the terraform apply command:
https://i0.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2017/02/Screen-Shot-2... 768w" sizes="(max-width: 467px) 100vw, 467px" data-recalc-dims="1" />
It’s not the most descriptive error, but that I/O timeout should tell you right away that terraform can’t connect back to your machine.
Now, we’ll tell you to capture your current IP address and update the IP address in the TFVARS file that resides in the Delphix_demo folder, but I know some of you are wondering why we didn’t just build out the product to adjust for an IP address change.
The truth is, you can set a static IP Address for your laptop OR just alias your laptop with the IP Address you wish to have. There are a number of different ways to address this, but looking into the most common, let’s dig into how we would update the IP address vs. updating the file.
You can go to System Preferences or Control Panel, (depending on which OS you’re on) and click on Network and configure your TCP/IP setting to manual and type in an IP Address there. The preference is commonly to choose a non-competitive IP address, (often the one that was dynamically set will do as your manual one to retain) and choose to save the settings. Restart the PC and you can then add that to your configuration files. Is that faster than just updating the TFVARS file- nope.
The second way to do this is to create an Alias IP address to deter from the challenge of each location/WiFi move having it automatically assigned.
Just as above, we often will use the IP address that was given dynamically and just choose to keep this as the one you’ll keep each time. If you’re unsure of what your IP is, there are a couple ways to collect this information:
Open up a browser and type in “What is my IP Address“
or from a command prompt, with “en0” being your WiFi connection, gather your IP Address one of two ways:
$ dig +short myip.opendns.com @resolver1.opendns.com
$ ipconfig getifaddr en0
Then set the IP address and cycle the your WiFi connection:
$ sudo ipconfig set en0 INFORM
$ sudo ifconfig en0 down $ sudo ifconfig en0 up
You can also click on your WiFi icon and reset it as well. Don’t be surprised if this takes a while to reconnect. Renewing and releasing of IP addresses can take a bit across the network and the time required must be recognized.
Depending on which OS you’re on. Using the IP Address from your tfvars file, set it as an alias with the following command:
$ sudo ifconfig en0 alias
If you need to unset it later on:
sudo ifconfig en0 -alias
I found this to be an adequate option- the alias was always there, (like any other alias, it just forwards everything onto the address that you’re recognized at in the file.) but it may add time to the build, (still gathering data to confirm this.) With this addition, I shouldn’t have to update my configuration file, (for the AWS Trial, that means setting it in our terraform.tfvars in the YOUR_IP parameter.)
The browser commands to gather your IP Address work the same way, but if you want to change it via the command line, the commands are different for Windows PC’s:
netsh interface ipv4 show config
You’ll see your IP Address in the configuration. If you want to change it, then you need to run the following:
netsh interface ipv4 set address name="Wi-Fi" static
netsh interface ipv4 show config
You’ll see that the IP Address for your Wi-Fi has updated to the new address. If you want to set it to DHCP, (dynamic) again, run the following:
netsh interface ipv4 set address name="Wi-Fi" source=dhcp
Now you can go wherever you darn well please, set an alias and run whatever terraform commands you wish. All communication will just complete without any error due to a challenging new IP address.
Ain’t that just jiffy? OK, it may be quicker to just gather the IP address and update the tfvars file, but just in case you wanted to know what could be done and why we may not have built it into the AWS Trial, here it is!
One of the optimizer enhancements that appeared in 12.2 for SQL is the “band join”. that makes certain types of merge join much more efficient. Consider the following query (I’ll supply the SQL to create the demonstration at the end of the posting) which joins two tables of 10,000 rows each using a “between” predicate on a column which (just to make it easy to understand the size of the result set) happens to be unique with sequential values though there’s no index or constraint in place:
select t1.v1, t2.v1 from t1, t2 where t2.id between t1.id - 1 and t1.id + 2 ;
This query returns nearly 40,000 rows. Except for the values at the extreme ends of the range each of the 10,000 rows in t2 will join to 4 rows in t1 thanks to the simple sequential nature of the data. In 12.2 the query, with rowsource execution stats enabled, completed in 1.48 seconds. In 18.104.22.168 the query, with rowsource execution stats OFF, took a little over 14 seconds. (With rowsource execution stats enabled it took 22.214.171.124 a little over 1 minute to return the first 5% of the data – I didn’t bother to wait for the rest, though the rate would have improved over time.)
Here are the two execution plans – spot the critical difference:
126.96.36.199 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 25M| 715M| 1058 (96)| 00:00:01 | | 1 | MERGE JOIN | | 25M| 715M| 1058 (96)| 00:00:01 | | 2 | SORT JOIN | | 10000 | 146K| 29 (11)| 00:00:01 | | 3 | TABLE ACCESS FULL | T1 | 10000 | 146K| 27 (4)| 00:00:01 | |* 4 | FILTER | | | | | | |* 5 | SORT JOIN | | 10000 | 146K| 29 (11)| 00:00:01 | | 6 | TABLE ACCESS FULL| T2 | 10000 | 146K| 27 (4)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("T2"."ID"<="T1"."ID"+2) -- > had to add GT here to stop WordPress spoiling the format 5 - access("T2"."ID">="T1"."ID"-1) filter("T2"."ID">="T1"."ID"-1) 188.8.131.52 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 40000 | 1171K| 54 (12)| 00:00:01 | | 1 | MERGE JOIN | | 40000 | 1171K| 54 (12)| 00:00:01 | | 2 | SORT JOIN | | 10000 | 146K| 27 (12)| 00:00:01 | | 3 | TABLE ACCESS FULL| T1 | 10000 | 146K| 25 (4)| 00:00:01 | |* 4 | SORT JOIN | | 10000 | 146K| 27 (12)| 00:00:01 | | 5 | TABLE ACCESS FULL| T2 | 10000 | 146K| 25 (4)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T2"."ID">="T1"."ID"-1) filter("T2"."ID"<="T1"."ID"+2 AND "T2"."ID">="T1"."ID"-1)
Notice how operation 4, the FILTER, that appeared in 12.1 has disappeared in 12.2 and the filter predicate that it used to hold is now part of the filter predicate of the SORT JOIN that has been promoted to operation 4 in the new plan.
As a reminder – the MERGE JOIN operates as follows: for each row returned by the SORT JOIN at operation 2 it calls operation 4. In 12.1 this example will then call operation 5 so the SORT JOIN there happens 10,000 times. It’s important to know, though, that the name of the operation is misleading; what’s really happening is that Oracle is “probing a sorted result set in local memory” 10,000 times – it’s only on the first probe that it finds it has to call operation 6 to read and move the data into local memory in sorted order.
So in 12.1 operation 5 probes (accesses) the in-memory data set starting at the point where t2.id >= t1.id – 1; I believe there’s an optimisation here because Oracle will recall where it started the probe last time and resume searching from that point; having found the first point in the in-memory set where the access predicate it true Oracle will walk through the list passing each row back to the FILTER operation as long as the access predicate is still true, and it will be true right up until the end of the list. As each row arrives at the FILTER operation Oracle checks to see if the filter predicate there is true and passes the row up to the MERGE JOIN operation if it is. We know that on each cycle the FILTER operation will start returning false after receiving 4 rows from SORT JOIN operation – Oracle doesn’t. On average the SORT JOIN operation will send 5,000 rows to the FILTER operation (for a total of 50,000,000 values passed and discarded).
In 12.2, and for the special case here where the join predicate uses constants to define the range, Oracle has re-engineered the code to eliminate the FILTER operation and to test both parts of the between clause in the same subroutine it uses to probe and scan the rowsource. In 12.2 the SORT JOIN operation will pass 4 rows up to the MERGE JOIN operation and stop scanning on the fifth row it reaches. In my examples that’s an enormous (CPU) saving in subroutine calls and redundant tests.
This “band-join” mechanism only applies when the range is defined by constants (whether literal or bind variable). It doesn’t work with predicates like (e.g.):
where t2.id between t1.id - t1.step_back and t1.id + t1.step_forward
The astonishing difference in performance due to enabling rowsource execution statistics is basically due to the number of subroutine calls eliminated – I believe (subject to a hidden parameter that controls a “sampling frequency”) that Oracle will call the O/S clock twice each time it calls the second SORT JOIN operation from the FILTER operation to acquire the next row. In 12.1 we’re doing roughly 50M redundant calls to that SORT JOIN.
The dramatic difference in performance even when rowsource execution statistics isn’t enabled is probably something you won’t see very often in a production system – after all, I engineered a fairly extreme data set and query for the purposes of demonstration. Note, however, the band join does seemt to introduce a change in cost, so it’s possible that on the upgrade you may find a few cases where the optimizer will switch from a nested loop join to a merge join using a band-join.
Little did I know this building that captured my visual attention and imagination so many times walking to work over the last 6 months would play a historic roll in the current political climate.
Here is a picture of the US District Court House from recent articles
And here are some of my iPhone shots over the last few months with some Instagram filtering mixed in