Search

Top 60 Oracle Blogs

Recent comments

I wish

Here’s a simple update statement that identifies a few rows in a table then updates a column where a matching value can be derived from another table – it’s an example of an update by correlated subquery:

update	t1
set	small_vc = (
		select
			max(small_vc)
		from
			t2
		where	t2.id = t1.id
	)
where
	mod(id,100) = 0
and	exists (
		select null
		from t2
		where 	t2.id = t1.id
	)
;

There are a number of different strategies of varying efficiency we could use to modify the data in the same way. We could use the merge command to compare the results of the aggregate subquery on t2 with rows in t1, taking advantage of the 10g enhancement of the merge command to apply results only on matched rows. We could do something with pl/sql bulk selects from the aggregate then apply a forall bulk update taking advantage of array exceptions. In fact, depending on the volume and pattern of data in the two tables, either of the three mechanisms might turn out to be the most efficient in some circumstances.

There is a fourth mechanism which we can’t (or shouldn’t) employ – even though there may be cases where it is more efficient than the three I’ve mentioned so far. It’s the mechanism of the updateable join view, where we join t1 to an aggregate query on t2 on the column that was the correlation column, then update across the join:

update
	(
	select
		t1.small_vc	t1_vc,
		v1.max_vc	v1_vc
	from
		t1,
		(
		select
			t2.id,
			max(small_vc)	max_vc
		from
			t2
		group by
			id
	)	v1
	where
		mod(t1.id,100) = 0
	and	v1.id = t1.id
	)
set	t1_vc = v1_vc
;

There is a problem with this statement, though – it fails with Oracle error “ORA-01779: cannot modify a column which maps to a non key-preserved table” unless you add the hint /*+ bypass_ujvc */ (bypass update join view check ?) and even that hint fails to work in my copy of 11.2.0.2.

A critical feature of Oracle’s implementation of updatable join views is that only one table will be updated by the statement, and that table must be “key-preserved” – meaning that any row from the table to be updated can (logically) appear at most once in the view. A simple restriction to ensure that this requirement is met is to insist that the join(s) to the other tables(s) should always be with equality on the primary (or unique) key of the table(s).

In this case, though, because we aggregate on t2.id, and then join on t2.id, it is clearly the case that any row that appears in t1 can only appear once in the join between t1 and the aggregate of t2. In this update, t1 is clearly key-preserved – but, absent the hint, the optimizer will not allow the mechanism to come into play. (SQL Server will handle this particular example correctly, by the way.)

So, yet another item on my wishlist for the optimizer – let’s see a few more cases of updateable join views being recognised.

In passing, the path that the optimizer produces for join view update, when hinted, isn’t ideal because it fails to spot an opportunity for transitive closure that would make it more efficient. Here’s the execution plan (with Predicate Section):

----------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost  |
----------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |      |       |       |       |   100 |
|   1 |  UPDATE               | T1   |       |       |       |       |
|*  2 |   HASH JOIN           |      |   100 |  3500 |       |   100 |
|*  3 |    TABLE ACCESS FULL  | T1   |   100 |  1500 |       |    27 |
|   4 |    VIEW               |      | 10000 |   195K|       |    72 |
|   5 |     SORT GROUP BY     |      | 10000 |   146K|   488K|    72 |
|   6 |      TABLE ACCESS FULL| T2   | 10000 |   146K|       |    27 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."ID"="T1"."ID")
   3 - filter(MOD("T1"."ID",100)=0)

As you can see, Oracle aggregates the whole of t2, then uses the result to probe a hash table built from a subset of t1. But the basic join is on t2.id = t1.id, which means that the optimizer could, in principle, generate the predicate mod(v1.id,100) = 0 and then push it into the inline view before aggregating a subset of t2. Here’s the plan if you add the predicate manually:

-------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                |       |   100 |  3500 |    51 |
|   1 |  UPDATE                         | T1    |       |       |       |
|*  2 |   HASH JOIN                     |       |   100 |  3500 |    51 |
|   3 |    VIEW                         |       |   100 |  2000 |    23 |
|   4 |     SORT GROUP BY               |       |   100 |  1500 |    23 |
|   5 |      TABLE ACCESS BY INDEX ROWID| T2    |   100 |  1500 |    23 |
|*  6 |       INDEX FULL SCAN           | T2_I1 |   100 |       |    21 |
|*  7 |    TABLE ACCESS FULL            | T1    |   100 |  1500 |    27 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."ID"="T1"."ID")
   6 - filter(MOD("T2"."ID",100)=0)
   7 - filter(MOD("T1"."ID",100)=0)

It’s an interesting little side effect of the change (for my particular data set) that the probe table is now the t1 table (whereas the probe was a sorted aggregate of t2.id in the earlier version). This means that Oracle will be updating t1 as it scans it; in the previous plan Oracle would be jumping around t1 at random to update it in an order dictated by the arrival order of the aggregated rows coming from t2. In principle, this type of change in the order of updates could result in a reduction in the number of random physical I/Os that take place on t1.

Footnote:

If you want to experiment with variations on this problem then you can start with code like the following to create the two tables – bear in mind that there’s no great point in examining the variation in performance that different mechanisms give when playing with such a small (and regular) data set:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level )
select
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum ;

create table t2 as select * from t1;

-- collect stats

create unique index t1_i1 on t1(id);
create unique index t2_i1 on t2(id);
alter table t1 modify id not null;
alter table t2 modify id not null;