While browsing the web recently for articles on the HyperLogLog algorithm that Oracle uses for some of its approximate functions, I came upon a blog post written in Jan 2014 with the title Use Subqueries to Count Distinct 50X Faster. There are various ways that subqueries can be used to rewrite queries for improved performance, but when the title caught my eye I couldn’t think of a way in which they could improve “count distinct”. It turned out that the word “subquery” was being used (quite correctly) in the sense of “inline view” while my mind had immediately turned to subqueries in the select list or where clause.
I thought I had written this note a few years ago, on OTN or Oracle-L if not on my blog, but I can’t find any sign of it so I’ve decided it’s time to write it (again) – starting as a question about the following code:
An important target of trouble-shooting, particularly when addressing performance problems, is to minimise the time and effort you have to spend to get a “good enough” result. A recent question on the OTN database forum struck me as a good demonstration of following this strategy; the problem featured a correlated update that had to access a view 84 times to update a small table; but the view was a complex view (apparently non-mergeable) and the update took several hours to complete even though the view, when instantiated, held only 63 rows.
The OP told us that the query “select * from view” took seven minutes to return those 63 rows, and wanted to know if we could find a nice way to perform the update in (approximately) that seven minutes, rather than using the correlated update approach that seemed to take something in the ballpark of 7 minutes per row updated.
I think the “column group” variant of extended stats is a wonderful addition to the Oracle code base, but there’s a very important detail about using the feature that I hadn’t really noticed until a question came up on the OTN database forum recently about a very bad join cardinality estimate.
The point is this: if you have a multi-column equality join and the optimizer needs some help to get a better estimate of join cardinality then column group statistics may help if you create matching stats at both ends of the join. There is a variation on this directive that helps to explain why I hadn’t noticed it before – multi-column indexes (with exactly the correct columns) have the same effect and, most significantly, the combination of one column group and a matching multi-column index will do the trick.
I received an email recently that started with the sort of opening sentence that I see far more often than I want to:
I have come across an interesting scenario that I would like to run by you, for your opinion.
It’s not that I object to being sent interesting scenarios, it’s just that they are rarely interesting – and this wasn’t one of those rare interesting ones. On the plus side it reminded me that I hadn’t vented one of my popular rants for some time.
Here’s the problem – see if you can work out the error before you get to the rant:
“I’ve got a table and a view on that table; and I’ve got a query that is supposed to use the view. Whether I use the table or the view in query the optimizer uses the primary key on the table to access the table – but when I use the table the query takes about 30 ms, when I use the view the query takes about 903 ms”.
In my last note on index usage I introduced the idea of looking at v$segstat (or v$segment_statistics) and comparing the “logical reads” statistic with the “db block changes” statistic as an indicator of whether or not the index was used in execution plans. This week I’ll explain the idea and show you some results – with a little commentary – from a production system that was reported on the OTN database forum.
The question of how to identify indexes that could be dropped re-appeared (yet again) on the OTN database forum last week. It’s not really surprising that it recurs so regularly – the problem isn’t an easy one to solve but new (and even less new) users keep hoping that there’s a quick and easy solution.
There are, however, strategies and pointers that can help you to optimise the trade-off between effort, risk, and reward. Broadly the idea is to spend a small amount of effort finding a relatively small number of “expensive” indexes that might be safe to drop, so that when you do the detailed analysis you have a good chance that the time spent will be rewarded by a positive result.
Before we get to some results posted on OTN, it’s worth thinking about the global impact and what we’re trying to achieve, and the threats that go with our attempt to achieve it.
I need to check if at least one record present in table before processing rest of the statements in my PL/SQL procedure. Is there an efficient way to achieve that considering that the table is having huge number of records like 10K.
I don’t think many readers of the forum would consider 10K to be a huge number of records; nevertheless it is a question that could reasonably be asked, and should prompt a little discssion.
First question to ask, of course is: how often do you do this and how important is it to be as efficient as possible. We don’t want to waste a couple of days of coding and testing to save five seconds every 24 hours. Some context is needed before charging into high-tech geek solution mode.
Several years go (eight to be precise) I wrote a note suggesting that Oracle will not materialize a factored subquery unless it is used at least twice in the main query. I based this conclusion on a logical argument about the cost of creating and using a factored subquery and, at the time, I left it at that. A couple of years ago I came across an example where even with two uses of a factored subquery Oracle still didn’t materialize even though the cost of doing so would reduce the cost of the query – but I never got around to writing up the example, so here it is:
This whole thing about “not exists” subqueries can run and run. In the previous episode I walked through some ideas of how the following query might perform depending on the data, the indexes, and the transformation that the optimizer might apply: