The question of how to identify indexes that could be dropped re-appeared (yet again) on the OTN database forum last week. It’s not really surprising that it recurs so regularly – the problem isn’t an easy one to solve but new (and even less new) users keep hoping that there’s a quick and easy solution.
There are, however, strategies and pointers that can help you to optimise the trade-off between effort, risk, and reward. Broadly the idea is to spend a small amount of effort finding a relatively small number of “expensive” indexes that might be safe to drop, so that when you do the detailed analysis you have a good chance that the time spent will be rewarded by a positive result.
Before we get to some results posted on OTN, it’s worth thinking about the global impact and what we’re trying to achieve, and the threats that go with our attempt to achieve it.
The key detail, of course, is that index maintenance is an expensive process. We could insert 1,000 rows into a table at a cost of writing about 25 table blocks plus a few undo blocks plus something like half a megabyte of redo (assuming, for the purposes of illustration that each row is about 200 bytes on insert). Add one index to the table and we might have to locate and modify 1,000 separate index leaf blocks. The increment on the redo might be about quarter of a megabyte and we may have to access 1,000 different undo blocks for read consistency reasons, but the simple fact that we may need 1,000 buffers to be able to maintain that index is likely to be a significant extra cost on the insert. Make that 10 indexes, or 70 (as one unhappy DBA once told me) and the probability of being able to do high-speed inserts becomes rather low.
Of course we hope that our indexes will allow our queries to operate efficiently with great precision, but inevitably we get to a point where the benefit of precision is outweighed by the cost of maintenance. Our target, then, is to design the set of indexes that makes it possible for the optimizer to find good paths for all the important queries and “good enough” paths for the rest. By the time the system is live, though, it’s too late for “proper design”, and the only option is for damage limitation, a bit of guesswork, and some live testing with fingers crossed (thank goodness for invisible indexes).
The starting point is usually an attempt to identify “the indexes we are not using”, which is typically translated into “the indexes that do not appear in execution plans” – but that’s not actually a good target, for various reasons:
Provided you remember that problems like these exist, and think carefully about the indexes that your strategy suggests, there are various ways you could approach the problem of identifying indexes that don’t get into execution plans.
The ink had barely dried on the manual pages for this view before several people (including me) had written notes explaining why this view wasn’t particularly helpful. (I think I even said something about this in Practical Oracle 8i). I won’t repeat the discussion here but it revolves around the fact that an index is flagged as “used” even if it has only been used once in a single execution of a single statement – so you don’t get any idea of the real importance of the index.
If you review the set of in-memory execution plans (and the AWR or Statspack equivalents) you can identify indexes which definitely have been used – but (a) it’s expensive to scan v$sql_plan frequently and (b) the AWR/Statspack repositories only capture a subset of the more expensive plans, so it’s easy to miss indexes which have been used and are relatively important but aren’t in the repository and don’t happen to be in memory at the moments you look.
If you examine the index definitions you may spot indexes where look very similar. If one index starts with the same columns, in the same order, as another index, there is a good chance that you could reduce two indexes to one – especially if the whole of one of the indexes is the “leading edge” of the other – for example:
Even if the leading edges match and the trailing edges differ we might be able to collapse two indexes into one – depending on how selective the leading columns are and how the indexes are used – for example:
which could perhaps be replaced by one of :
Guessing about the use of a typical date column, though, it’s possible that in this example the current trailing date columns are used with a range-based predicate, so it’s possible that this strategy won’t be effective for this pair of indexes.
Even if the order of later columns in the index doesn’t match you may still find that a pair of indexes could be reduced to a single index – for example the pair:
which could perhaps be replaced by just:
As a safety measure, of course, you would probably create a new index, then make the subject indexes invisible, and wait for at least a week to see whether any performance problems appear (remembering that one automatic performance threat would be the increase in workload as yet another index – temporarily – has to be maintained).
The difficulty of eliminating indexes by examination is that it takes a lot of effort to investigate all the possibilities, so you really need some way of choosing a relatively small subset of indexes that might be worth the effort. This brings me to the principle topic of this posting – using segment statistics to help you pick which indexes might be worth the effort.
Oracle records a number of workload statistics for each object in memory. The view v$segstat is an efficient version of these statistics, and v$segment_statistics is a friendlier version that joins v$segstat to tables user$, obj$ and ts$, with a filter against ind$ to turn meaningless numbers into names.
SQL> desc V$segstat Name Null? Type ----------------------- -------- ---------------- TS# NUMBER OBJ# NUMBER DATAOBJ# NUMBER STATISTIC_NAME VARCHAR2(64) STATISTIC# NUMBER VALUE NUMBER SQL> desc V$segment_statistics Name Null? Type ----------------------- -------- ---------------- OWNER VARCHAR2(30) OBJECT_NAME VARCHAR2(30) SUBOBJECT_NAME VARCHAR2(30) TABLESPACE_NAME VARCHAR2(30) TS# NUMBER OBJ# NUMBER DATAOBJ# NUMBER OBJECT_TYPE VARCHAR2(18) STATISTIC_NAME VARCHAR2(64) STATISTIC# NUMBER VALUE NUMBER
For each segment Oracle records the following statistics (according to v$segstat_name – but there are a couple more hidden statistics reported in the underlying x$ksolsstat object):
NAME SAMPLED -------------------------------- ------- logical reads YES buffer busy waits NO gc buffer busy NO db block changes YES physical reads NO physical writes NO physical read requests NO physical write requests NO physical reads direct NO physical writes direct NO optimized physical reads NO optimized physical writes NO gc cr blocks received NO gc current blocks received NO ITL waits NO row lock waits NO space used NO space allocated NO segment scans NO
Both Statspack (at level 7) and the AWR report have several “Top N” sections for segment statistics. If we examine these stats for all the indexes on a given table we can get some clues about which indexes are likely to be worth further investigation to see if they could be dropped.
One very simple measure is the number of “physical reads” (which, for indexes, will generally be very similar to “physical read requests”). Since a (real) physical read is generally going to take a significant amount of time, segments with very large numbers of physical reads could be contributing a lot of of time to the total database time – so it’s worth knowing why it’s responsible for so many physical reads and worth cross-checking with v$sql_plan (and its historic equivalents) which statements seem to be using or modifying this index.
Even if it turns out that the index is absolutely necessary, you might still be able to spot opportunities to improve efficiency. If it is subject to a significant number of physical reads it may be that the index is just very large – could you make it smaller by rebuilding it with compression on some of the leading columns, is it an index which (for some reason you can identify) tends to degenerate over time and waste a lot of space and should you rebuild it occasionally. It might be possible (depending on the predicates used) to re-arrange the column order in such a way that the activity is focused onto a particular section of the index rather than being spread across the entire index – or you could even find that by careful choice of global partitioning (which is legal on even a non-partitioned table) you might be able to isolate the activity to a small section of the index.
A more interesting measure, though, comes from comparing the “logical reads” with the number of “db block changes”; and that’s the point of this posting – except that I’ve spent so much time on it already that I’m going to have to write part 2 some time next week.