A recent posting on the OTN database forum described a problem with an insert (as select) statement that sometimes ran extremely slowly: nothing interesting yet, there could be plenty of boring reasons for that to happen. The same SQL statement (by SQL_ID) might take 6 hours to insert 300K rows one night while taking just a few minutes to insert 900K another night (still nothing terribly interesting).
After the baseline has been established in the first part of this series it’s time to measure the effect of the network in this part. The second test will introduce an interesting feature: Using Linux’s own Traffic Shaper/Quality of Services module I will add a delay of 100ms to the Data Guard network interface card (NIC) to slow things down a little.
WARNING: this is of course a lab or VM-only situation. I can’t exert control over wire quality in my (own) switches, hence some software magic is needed on my virtual ones. This post is intended to be merely for educational purposes, not for use at work.
I am continuing to use the 2 node RAC 22.214.171.124.170117 primary database on Oracle Linux 7 with UEK 4 and an identical RAC to host my standby database.
I may have said it before but I consider presenting and teaching a great way to expand one’s knowledge: first of all it requires me to really understand a subject. Secondly, when presenting, you get lots of interesting questions that can turn into blog posts like this one.
Lately I have been asked about the impact of synchronous log shipping to a physical standby database. I was sure there was an effect to be observed, depending most likely on the network latency between systems but I didn’t have any evidence I could pull out of the hat to back up my thoughts. So what better than trying! I also read that some of the events have changed in 12c, and wanted to make them visible. My environment is based on the 2 node RAC primary/2 node RAC standby configuration I wrote about in my previous posts.
Since their initial setup I upgraded the cluster to 126.96.36.199.170117 for Clusterware and RDBMS.
Recently I’ve seen not so smart optimizer behavior: one query took long time to parse, and ended with an error hitting PGA_AGGREGATE_LIMIT in few minutes; another query was just parsed for ages while using reasonable (under 2G :)) amount of PGA and still could hit PGA_AGGREGATE_LIMIT but after way more time – up to an hour.
Both cases were similar and involved queries which were accessing views; and those views’ code is generated by an application using lots of IN LISTs and other OR conditions. They both are really ugly SQLs with text length ~100K. When Oracle tried to parse them it took a lot of time and parse attempt had either failed with ORA-4036 soon or hanged for a long time and then failed. Strangely incident trace file generated for ORA-4036 doesn’t include PGA heaps breakdown and you have to manually enable PGA heapdump on error to get an idea what is taking up memory. Here’s what I’ve found in there:
To me, ‘cloud computing’ is renting a compute resource to perform a task. In order to use that compute resource, you need to instruct it to do something, which is typically done via the network. If the task the compute resource needs to fulfil is being an application server or being a client or both in the case of an application server that uses an Oracle database, the network latency between the client of the database and the database server is a critical property.
One of the optimizer enhancements that appeared in 12.2 for SQL is the “band join”. that makes certain types of merge join much more efficient. Consider the following query (I’ll supply the SQL to create the demonstration at the end of the posting) which joins two tables of 10,000 rows each using a “between” predicate on a column which (just to make it easy to understand the size of the result set) happens to be unique with sequential values though there’s no index or constraint in place:
There’s a thread running on OTN at present about deleting huge volumes of duplicated data from a table (to reduce it from 1.1 billion to about 22 million rows). The thread isn’t what I’m going to talk about, though, other than quoting some numbers from it to explain what this post is about.
In the last part of this installment I'll have a brief look at the network performance measured in the Oracle DBaaS environment, in particular the network interface that gets used as private interconnect in case of RAC configuration. The network performance could also be relevant when evaluating how to transfer data to the cloud database.
I've used the freely available "iperf" tool to measure the network bandwidth and got the following results:
[root@test12102rac2 ~]# iperf3 -c 10.196.49.126
Connecting to host 10.196.49.126, port 5201
[ 4] local 10.196.49.130 port 41647 connected to 10.196.49.126 port 5201
Recently I repeated the I/O related tests on a 188.8.131.52 instance for curiosity and was surprised by the fact that I consistently got significantly better results as on 184.108.40.206 and 220.127.116.11.
Now you're probably aware that the version 12.2 so far is "cloud-only", so I can't tell / test whether the version 12.2 is generically providing that increased performance or whether Oracle has optimized the underlying stack, so that previous versions in general could also benefit from better performance if they ran on the same platform. Repeated tests with versions 18.104.22.168 and 22.214.171.124 confirmed the performance figures reported in the previous installment of this series, so as of the time of writing it's only the version 12.2 that provides the improved I/O performance.
Note that as of the time of writing only a single instance configuration was supported with version 12.2, so I wasn't able to run the tests in RAC configuration.
In this second part of this installment I'll focus on the performance figures related to I/O encountered when the corresponding tests were performed on the platform.
When running with minimum sized buffer cache, direct and asynchronous I/O enabled, the following average read-only IOPS figures were measured over a period of several days (this is the test described in part three of the "performance consistency" series) .
First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) with either four or eight sessions: