Here’s a simple update statement that identifies a few rows in a table then updates a column where a matching value can be derived from another table – it’s an example of an update by correlated subquery:
update t1 set small_vc = ( select max(small_vc) from t2 where t2.id = t1.id ) where mod(id,100) = 0 and exists ( select null from t2 where t2.id = t1.id ) ;
There are a number of different strategies of varying efficiency we could use to modify the data in the same way. We could use the merge command to compare the results of the aggregate subquery on t2 with rows in t1, taking advantage of the 10g enhancement of the merge command to apply results only on matched rows. We could do something with pl/sql bulk selects from the aggregate then apply a forall bulk update taking advantage of array exceptions. In fact, depending on the volume and pattern of data in the two tables, either of the three mechanisms might turn out to be the most efficient in some circumstances.
There is a fourth mechanism which we can’t (or shouldn’t) employ – even though there may be cases where it is more efficient than the three I’ve mentioned so far. It’s the mechanism of the updateable join view, where we join t1 to an aggregate query on t2 on the column that was the correlation column, then update across the join:
update ( select t1.small_vc t1_vc, v1.max_vc v1_vc from t1, ( select t2.id, max(small_vc) max_vc from t2 group by id ) v1 where mod(t1.id,100) = 0 and v1.id = t1.id ) set t1_vc = v1_vc ;
There is a problem with this statement, though – it fails with Oracle error “ORA-01779: cannot modify a column which maps to a non key-preserved table” unless you add the hint /*+ bypass_ujvc */ (bypass update join view check ?) and even that hint fails to work in my copy of 11.2.0.2.
A critical feature of Oracle’s implementation of updatable join views is that only one table will be updated by the statement, and that table must be “key-preserved” – meaning that any row from the table to be updated can (logically) appear at most once in the view. A simple restriction to ensure that this requirement is met is to insist that the join(s) to the other tables(s) should always be with equality on the primary (or unique) key of the table(s).
In this case, though, because we aggregate on t2.id, and then join on t2.id, it is clearly the case that any row that appears in t1 can only appear once in the join between t1 and the aggregate of t2. In this update, t1 is clearly key-preserved – but, absent the hint, the optimizer will not allow the mechanism to come into play. (SQL Server will handle this particular example correctly, by the way.)
So, yet another item on my wishlist for the optimizer – let’s see a few more cases of updateable join views being recognised.
In passing, the path that the optimizer produces for join view update, when hinted, isn’t ideal because it fails to spot an opportunity for transitive closure that would make it more efficient. Here’s the execution plan (with Predicate Section):
---------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | ---------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | | | | 100 | | 1 | UPDATE | T1 | | | | | |* 2 | HASH JOIN | | 100 | 3500 | | 100 | |* 3 | TABLE ACCESS FULL | T1 | 100 | 1500 | | 27 | | 4 | VIEW | | 10000 | 195K| | 72 | | 5 | SORT GROUP BY | | 10000 | 146K| 488K| 72 | | 6 | TABLE ACCESS FULL| T2 | 10000 | 146K| | 27 | ---------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("V1"."ID"="T1"."ID") 3 - filter(MOD("T1"."ID",100)=0)
As you can see, Oracle aggregates the whole of t2, then uses the result to probe a hash table built from a subset of t1. But the basic join is on t2.id = t1.id, which means that the optimizer could, in principle, generate the predicate mod(v1.id,100) = 0 and then push it into the inline view before aggregating a subset of t2. Here’s the plan if you add the predicate manually:
------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 100 | 3500 | 51 | | 1 | UPDATE | T1 | | | | |* 2 | HASH JOIN | | 100 | 3500 | 51 | | 3 | VIEW | | 100 | 2000 | 23 | | 4 | SORT GROUP BY | | 100 | 1500 | 23 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 100 | 1500 | 23 | |* 6 | INDEX FULL SCAN | T2_I1 | 100 | | 21 | |* 7 | TABLE ACCESS FULL | T1 | 100 | 1500 | 27 | ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("V1"."ID"="T1"."ID") 6 - filter(MOD("T2"."ID",100)=0) 7 - filter(MOD("T1"."ID",100)=0)
It’s an interesting little side effect of the change (for my particular data set) that the probe table is now the t1 table (whereas the probe was a sorted aggregate of t2.id in the earlier version). This means that Oracle will be updating t1 as it scans it; in the previous plan Oracle would be jumping around t1 at random to update it in an order dictated by the arrival order of the aggregated rows coming from t2. In principle, this type of change in the order of updates could result in a reduction in the number of random physical I/Os that take place on t1.
If you want to experiment with variations on this problem then you can start with code like the following to create the two tables – bear in mind that there’s no great point in examining the variation in performance that different mechanisms give when playing with such a small (and regular) data set:
create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level ) select rownum id, lpad(rownum,10,'0') small_vc, rpad('x',100) padding from generator v1, generator v2 where rownum ; create table t2 as select * from t1; -- collect stats create unique index t1_i1 on t1(id); create unique index t2_i1 on t2(id); alter table t1 modify id not null; alter table t2 modify id not null;
Recent comments
1 year 46 weeks ago
2 years 6 weeks ago
2 years 10 weeks ago
2 years 11 weeks ago
2 years 15 weeks ago
2 years 37 weeks ago
3 years 5 weeks ago
3 years 34 weeks ago
4 years 19 weeks ago
4 years 19 weeks ago