One of the most irritating features of solving problems for clients is that the models I build to confirm my diagnosis and test my solutions often highlight further anomalies, or make me ask questions that might produce some useful answers to future problems.
Recently I had cause to ask myself if Oracle would push a filter subquery into the second tablescan of a hash join – changing a plan from this:
filter hash join table access full t1 table access full t2 table access by rowid t3 index range scan t3_i1
hash join table access full t1 filter table access full t2 table access by rowid t3 index range scan t3_i1
or, perhaps more likely, to this:
Here’s a simple data set – I’m only interested in three of the columns in the work that follows, but it’s a data set that I use for a number of different models:
I need to check if at least one record present in table before processing rest of the statements in my PL/SQL procedure. Is there an efficient way to achieve that considering that the table is having huge number of records like 10K.
I don’t think many readers of the forum would consider 10K to be a huge number of records; nevertheless it is a question that could reasonably be asked, and should prompt a little discssion.
First question to ask, of course is: how often do you do this and how important is it to be as efficient as possible. We don’t want to waste a couple of days of coding and testing to save five seconds every 24 hours. Some context is needed before charging into high-tech geek solution mode.
Here’s a little surprise that came up on the OTN database forum a few days ago. Rather than describe it, I’m just going to create a data set to demonstrate it, initially using 126.96.36.199 although the same thing happens on 188.8.131.52. The target is a query that joins to a range/hash composite partitioned table and uses a Bloom filter to do partition pruning at the subpartition level. (Note to self: is it possible to see Bloom filters that operate at both the partition and subpartition level from a single join? I suspect not.). Here’s my code to create and populate both the partitioned table and a driving table for the query:
A recent OTN post demonstrated a very important point about looking at execution plans – especially when you don’t use the right data types. The question was:
We’ve this query which throws invalid number
SELECT * FROM table A WHERE A.corporate_id IN (59375,54387) AND TRUNC(created_dt) BETWEEN '19-DEC-14' AND '25-DEC-14';
However it works fine if we use not in instead of in
SELECT * FROM table A WHERE A.corporate_id NOT IN (59375,54387) AND TRUNC(created_dt) BETWEEN '19-DEC-14' AND '25-DEC-14';
From time to time someone publishes a query on the OTN database forum and asks how to make it go faster, and you look at it and think it’s a nice example to explain a couple of principles because it’s short, easy to understand, obvious what sort of things might be wrong, and easy to fix. Then, after you’ve made a couple of suggestions and explained a couple of ideas the provider simply fades into the distance and doesn’t tell you any more about the query, or whether they’ve taken advantage of your advice, or found some other way to address the problem.
Such a query, with its execution plan, appeared a couple of weeks ago:
This is another little reference list I should have created some time ago. It covers a series of posts on interpreting parallel execution plans and understanding where the work happens.
There’s a live example on OTN at the moment of an interesting class of problem that can require some imaginative thinking. It revolves around a design that uses a row in one table to hold the low and high values for a range of values in another table. The problem is then simply to count the number of rows in the second table that fall into the range given by the first table. There’s an obvious query you can write (a join with inequality) but if you have to join each row in the first table to several million rows in the second table, then aggregate to count them, that’s an expensive strategy. Here’s the query (with numbers of rows involved) that showed up on OTN; it’s an insert statement, and the problem is that it takes 7 hours to insert 37,600 rows:
It’s easy to make mistakes, or overlook defects, when constructing parallel queries – especially if you’re a developer who hasn’t been given the right tools to make it easy to test your code. Here’s a little trap I came across recently that’s probably documented somewhere, which could be spotted easily if you had access to the OEM SQL Monitoring screen, but would be very easy to miss if you didn’t check the execution plan very carefully. I’ll start with a little script to generate some data: