Both an ORACLE-L discussion here and a Stack Exchange post raise essentially the same question which is, paraphrased: What is the performance overhead of datafile autoextend events. The conventional wisdom, with which I agree, is that the overhead is insignificant. However the poster in the stack exchange dialog asked how one would determine this empirically. [...]
So where were we?
We wanted to implement the following business rule: "Cannot have a manager without a clerk in the same department." At the end of this post, we tried implementing that using a straight-forward row-trigger, which introduced us to the mutating table error. And we've explained thereafter (here and here) that this error is there for a very good reason.
We've also clarified that the two common 'workarounds' 1a and 1b really should not be used.
So using a row-trigger to check for a clerk whenever a manager is inserted will not work. Remember the reason we wanted to use a row-trigger is because in it we have available :new.deptno which represents the department into which a manager just got inserted. This enabled us to efficiently check only that department for the necessary clerk.
So what about a statement trigger then? Obviously it forces us to validate all departments, since within the context of the after insert statement trigger, we do not know anything about the actual insert(s), other than that some insert took place.
Above is an attempt to implement our rule using the after insert statement trigger. As you can see we validate all departments. This is not a very efficient solution to implement our business rule. Not only do we validate all departments, we also validate them on all types of inserts. Obviously when we insert, for instance, a trainer, there is no need at all to validate this rule (assuming the rule was fine, before this insert), since inserting a trainer cannot violate this rule: such an insert doesn't 'touch' the rule.
We will explain in detail how to a) workaround the mutating table issue + b) get an efficient (and correct) implementation for our rule using a statement trigger in multiple future posts. But before we do that, we'd like to spend some time on common use-cases of triggers, and why they gave rise to the general consensus out there of triggers being harmful.
January 4, 2012 (Back to the Previous Post in the Series) I thought that I would begin this blog article with a bit of a test case that demonstrates a very simple deadlock between two sessions. In Session 1 (create the test table with a primary key and insert a single row): CREATE TABLE T1 ( [...]
I received an email recently that had a nice example of what can potentially go wrong with an index. Let’s first create a simple table with a unique index and populate it with 200,000 rows (following demo run on 184.108.40.206): So far, everything is as expected. With have an index with 200,000 rows that currently has [...]
A recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the question, I realised that I had originally supplied a formula for calculating cardinality, rather than selectivity, so I thought I’d supply a proper example.
We’ll start with a script to create some data and stats – and I’m going to start with a script I wrote in Jan 2001 (which is why it happens to use the analyze command rather than dbms_stats.gather_table_stats, even though this example comes from an instance of 220.127.116.11).
create table t1 ( skew, skew2, padding ) as select r1, r2, rpad('x',400) from ( select /*+ no_merge */ rownum r1 from all_objects where rownum ) v1, ( select /*+ no_merge */ rownum r2 from all_objects where rownum ) v2 where r2 order by r2,r1 ; alter table t1 modify skew not null; alter table t1 modify skew2 not null; create index t1_skew on t1(skew); analyze table t1 compute statistics for all indexed columns size 75;
The way I’ve created the data set the column skew has one row with the value 1, two rows with the value 2, and so on up to 80 rows with the value 80. I’ve put in a bid to collect a histogram of 75 buckets – which the default, by the way, for the analyze command - on any indexed columns . (Interestingly the resulting histogram on column skew held on 74 buckets.)
To demonstrate the calculation of selectivity, I then enabled the 10053 trace and ran a query to select one of the “non-popular” values (i.e. a value with a fairly small number of duplicates). The section of the trace file I want to talk about appears as the “Single Table Access Path”.
SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T1[T1] Column (#1): NewDensity:0.008940, OldDensity:0.006757 BktCnt:74, PopBktCnt:31, PopValCnt:15, NDV:80 Column (#1): SKEW( AvgLen: 2 NDV: 80 Nulls: 0 Density: 0.008940 Min: 1 Max: 80 Histogram: HtBal #Bkts: 74 UncompBkts: 74 EndPtVals: 59 Table: T1 Alias: T1 Card: Original: 3240.000000 Rounded: 29 Computed: 28.96 Non Adjusted: 28.96 Access Path: TableScan Cost: 32.00 Resp: 32.00 Degree: 0 Cost_io: 32.00 Cost_cpu: 0 Resp_io: 32.00 Resp_cpu: 0 Access Path: index (AllEqRange) Index: T1_SKEW resc_io: 30.00 resc_cpu: 0 ix_sel: 0.008940 ix_sel_with_filters: 0.008940 Cost: 30.00 Resp: 30.00 Degree: 1 Best:: AccessPath: IndexRange Index: T1_SKEW Cost: 30.00 Degree: 1 Resp: 30.00 Card: 28.96 Bytes: 0
A few points to notice on line 4:
The NDV (number of distinct values) is 80; this means that the Density, in the absence of a histogram, would be 1/80 = 0.0125
We have a NewDensity and an OldDensity. The OldDensity is the value that would have been reported simply as Density in 10g or 9i, and is derived using the mechanism I described in “Cost Based Oracle – Fundamentals” a few years ago. Informally this was:
sum of the square of the frequency of the non-popular values /
(number of non-null rows * number of non-popular non-null rows)
This formula for OldDensity is (I assume) a fairly subtle formula based on expected number of rows for a randomly selected non-popular value in the presence of popular values. The NewDensity, however, seems to take the much simpler approach of “factoring out” the popular values. There are two ways you can approach the arithmetic – one is by thinking of the number of rows you expect for the query “column = ‘non-popular value’”, the other is by thinking of the number of non-popuar values and then adjusting for the relative volume of non-popular value in the table.
Line 9 tells us there are 3240 rows in the table.
Line 4 tells us there are 80 (NDV) distinct values of which 15 (PopValCnt) are popular, and 74 (BktCnt) buckets of which 31 (PopBktCnt) contain popular values.
From this we determine that there are (80 – 15 = ) 65 non-popular values and (3240 * (74-31)/74 = ) 1883 non-popular rows.
Hence we infer that a typical non-popular value will report (1883 / 65 = ) 29 rows – which is the rounded cardinality we see in line 9.
If we consider only non-popular values, then the selectivity is 1/(number of non-popular values) = 1/65.
But this selectivity applies to only 43 buckets of the 74 total bucket count.
To generate a selectivity that can be applied to the original cardinality of the table we have to scale it accordingly.
The selectivity, labelled the NewDensity, is (1/65) * (43/74) = 0.00894 – which is the value we see in line 4.
(Following this, of course, the cardinality for ‘column = constant’ would be 3,240 * 0.00894 = 29
I have been a little cavalier with rounding throughout the example, just to keep the numbers looking a little tidier.
If the column is allowed to be null then our calculation of cardinality would use (3,240 – number of nulls) instead of 3,240. The method for calculating the selectivity would not change, but the resulting figure would be applied to (3,240 – number of nulls).
Happy New Year !! As has been mentioned to me on quite a number of occasions, I’ve haven’t really covered the area of Index Organized Tables (IOTs) on this blog. This despite covering the subject of IOTs in quite some detail in my Index Internals Seminar. Which I readily admit is a bit of a shame as [...]
January 1, 2012 (Back to the Previous Post in the Series) Nearly two years ago I posted the following execution plan in a blog article and asked whether or not there was anything strange about the %CPU column: -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ [...]