From time to time we see a complaint on OTN about the stats history tables being the largest objects in the SYSAUX tablespace and growing very quickly, with requests about how to work around the (perceived) threat. The quick answer is – if you need to save space then stop holding on to the history for so long, and then clean up the mess left by the history that you have captured; on top of that you could stop gathering so many histograms because you probably don’t need them, they often introduce instability to your execution plans, and they are often the largest single component of the history (unless you are using incremental stats on partitioned objects***)
Patrick Jolliffe alerted the Oracle-L list to a problem that appears when you combine fixed length character columns (i.e. char() or nchar()) with column group statistics. The underlying cause of the problem is the “blank padding” semantics that Oracle uses by default to compare varchar2 with char, so I’ll start with a little demo of that. First some sample data:
I was setting up a few tests on a copy of 220.127.116.11 recently when I made a mistake creating the table – I forgot to put in a couple of CAST() calls in the select list, so I just patched things up with a couple of “modify column” commands. Since I was planning to smash the table in all sorts of ways and it had taken me several minutes to create the data set (10 million rows) I decided to create a clean copy of the data so that I could just drop the original table and copy back the clean version – and after I’d done this I noticed something a little odd.
Here’s the code (cut down to just 10,000 rows), with a little output:
I think the “column group” variant of extended stats is a wonderful addition to the Oracle code base, but there’s a very important detail about using the feature that I hadn’t really noticed until a question came up on the OTN database forum recently about a very bad join cardinality estimate.
The point is this: if you have a multi-column equality join and the optimizer needs some help to get a better estimate of join cardinality then column group statistics may help if you create matching stats at both ends of the join. There is a variation on this directive that helps to explain why I hadn’t noticed it before – multi-column indexes (with exactly the correct columns) have the same effect and, most significantly, the combination of one column group and a matching multi-column index will do the trick.
I think column groups can be amazingly useful in helping the optimizer to generate good execution plans because of the way they supply better details about cardinality; unfortunately we’ve already seen a few cases (don’t forget to check the updates and comments) where the feature is disabled, and another example of this appeared on OTN very recently.
Modifying the example from OTN to make a more convincing demonstration of the issue, here’s some SQL to prepare a demonstration:
I had a recent conversation at Oracle OpenWorld 2015 about a locking anomaly in a 3-node RAC system which was causing unexpected deadlocks. Coincidentally, this conversation came about shortly after I had been listening to Martin Widlake talking about using the procedure dbms_stats.set_table_prefs() to adjust the way that Oracle calculates the clustering_factor for indexes. The juxtaposition of these two topics made me realise that the advice I had given in “Cost Based Oracle – Fundamentals” 10 years ago was (probably) incomplete, and needed some verification. The sticking point was RAC.
A surprising question came up on OTN a couple of days ago:
Why does a query for “column = 999999999999999999” run slower than a query for “column > 999999999999999998” (that’s 18 digit numbers, if you don’t want to count them). With the equality predicate the query is very slow, with the range-based predicate perfomance is good.
Here’s a live one from OTN – here are a couple of extracts from the problem statement:
We’re experiencing an issue where it seems that the query plan changes from day to day for a particular procedure that runs once a night.
It’s resulting in a performance variance of 10 second completion time vs 20 minutes (nothing in between).
It started occurring about 2 months ago and now it’s becoming more prevalent where the bad query plan is coming up more often.
I noticed that the query plans vary for a simple query.
We do run gather statistics every night. (DBMS_STATS.GATHER_SCHEMA_STATS (ownname=>sys_context( ‘userenv’, ‘current_schema’ ), estimate_percent => 1);)
The query and two execution plans look like this:
I’ve just responded to the call for items for the “IOUG Quick Tips” booklet for 2015 – so it’s probably about time to post the quick tip that I put into the 2014 issue. It’s probably nothing new to most readers of the blog, but sometimes an old thing presented in a new way offers fresh insights or better comprehension.
A histogram, created in the right way, at the right time, and supported by the correct client-side code, can be a huge benefit to the optimizer; but if you don’t create and use them wisely they can easily become a source of inconsistent performance, and the automatic statistics gathering can introduce an undesirable overhead during the overnight batch. This note explains how you can create histograms very cheaply on the few columns where they are most likely to have a beneficial effect.
I’ve written about optimizer defects with descending indexes before now but a problem came up on the OTN database forum a few days ago that made me decide to look very closely at an example where the arithmetic was clearly defective. The problem revolves around a table with two indexes, one on a date column (TH_UPDATE_TIMESTAMP) and the other a compound index which starts with the same column in descending order (TH_UPDATE_TIMESTAMP DESC, TH_TXN_CODE). The optimizer was picking the “descending” index in cases where it was clearly the wrong index (even after the statistics had been refreshed and all the usual errors relating to date-based indexes had been discounted). Here’s an execution plan from the system which shows that there’s something wrong with the optimizer: