The question of how to identify indexes that could be dropped re-appeared (yet again) on the OTN database forum last week. It’s not really surprising that it recurs so regularly – the problem isn’t an easy one to solve but new (and even less new) users keep hoping that there’s a quick and easy solution.
There are, however, strategies and pointers that can help you to optimise the trade-off between effort, risk, and reward. Broadly the idea is to spend a small amount of effort finding a relatively small number of “expensive” indexes that might be safe to drop, so that when you do the detailed analysis you have a good chance that the time spent will be rewarded by a positive result.
Before we get to some results posted on OTN, it’s worth thinking about the global impact and what we’re trying to achieve, and the threats that go with our attempt to achieve it.
Here’s an oddity prompted by a question that appeared on Oracle-L last night. The question was basically – “Why can’t I build an index in parallel when it’s single column with most of the rows set to null and only a couple of values for the non-null entries”.
That’s an interesting question, since the description of the index shouldn’t produce any reason for anything to go wrong, so I spent a few minutes on trying to emulate the problem. I created a table with 10M rows and a column that was 3% ‘Y’ and 0.1% ‘N’, then created and dropped an index in parallel in parallel a few times. The report I used to prove that the index build had run parallel build showed an interesting waste of resources. Here’s the code to build the table and index:
Here’s one of those little details that I might have known once, or maybe it wasn’t true in earlier versions of oracle, or maybe I just never noticed it and it’s “always” been true; and it’s a detail I’ll probably have forgotten again a couple of years from now. Consider the following two ways of creating a table with primary key:
A question came up on the OTN database forum recently asking if you could have a partitioned index on a non-partitioned table.
(Aside: I’m not sure whether it would be quicker to read the manuals or try the experiment – either would probably be quicker than posing the question to the forum. As so often happens in these RTFM questions the OP didn’t bother to acknowledge any of the responses)
I’ve just been motivated to resurrect a couple of articles I wrote for DBAZine about 12 years ago on the topic of bitmap indexes. All three links point to Word 97 documents which I posted on my old website in September 2003. Despite their age they’re still surprisingly good.
There’s a live example on OTN at the moment of an interesting class of problem that can require some imaginative thinking. It revolves around a design that uses a row in one table to hold the low and high values for a range of values in another table. The problem is then simply to count the number of rows in the second table that fall into the range given by the first table. There’s an obvious query you can write (a join with inequality) but if you have to join each row in the first table to several million rows in the second table, then aggregate to count them, that’s an expensive strategy. Here’s the query (with numbers of rows involved) that showed up on OTN; it’s an insert statement, and the problem is that it takes 7 hours to insert 37,600 rows:
In an earlier (not very serious) post about count(*) I pointed out how the optimizer sometimes does a redundant “bitmap conversion to rowid” when counting. In the basic count(*) example I showed this wasn’t a realistic issue unless you had set cursor_sharing to “force” (or the now-deprecated “similar”). There are, however, some cases where the optimizer can do this in more realistic circumstances and this posting models a scenario I came across a few years ago. The exact execution path has changed over time (i.e. version) but the anomaly persists, even in 220.127.116.11.
First we create a “fact” table and a dimension table, with a bitmap index on the fact table and a corresponding primary key on the dimension table:
Because you can never have enough of a good thing.
Here’s a thought – The optimizer doesn’t treat all constants equally. No explanations, just read the code – execution plans at the end:
I have a table with several indexes on it, and I have two versions of a query that I might run against that table. Examine them carefully, then come up with some plausible reason why it’s possible (with no intervening DDL, DML, stats collection, parameter fiddling etc., etc., etc.) for the second form of the query to be inherently more efficient than the first.
Here’s a script to create a table, with index, and collect stats on it. Once I’ve collected stats I’ve checked the execution plan to discover that a hint has been ignored (for a well-known reason):