Who's online

There are currently 0 users and 28 guests online.

Recent comments


Visualizing Active Session History (ASH) Data With R

One of the easiest ways to understand something is to see a visualization. Looking at Active Session History (ASH) data is no exception and I’ll dive into how to do so with R and how I used R plots to visually present a problem and confirm a hypothesis. But first some background…


Frequently DBAs use the Automatic Workload Repository (AWR) as an entry point for troubleshooting performance problems and in this case the adventure started the same way. In the AWR report Top 5 Timed Foreground Events, the log file sync event was showing up as the #3 event. This needed deeper investigation as often times the cause for longer log file sync times is related to longer log file parallel write times.

Correlation oddity

This one’s so odd I nearly posted it as a “Quiz Night” – but decided that it would be friendlier simply to demonstrate it. Here’s a simple script to create a couple of identical tables. It’s using my standard environment but, apart from fiddling with optimizer settings, I doubt if there’s any reason why you need to worry too much about getting the environment exactly right.

How To Cancel A Query Running In Another Session

This is not really anything new - in fact Tanel Poder has already blogged about it a while ago. Tanel has specifically covered the handling of "urgent" TCP packets and how this could be used to signal a "cancel" to another process, however this only works on Unix environments and not with Windows SQL*Plus clients. In Tanel's article it is also mentioned that there is an officially documented way of doing this via the Resource Manager if you happen to have an Enterprise Edition license.

In my quick tests however the call to DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS using "CANCEL_SQL" as consumer group only errors out with ORA-29366 saying that the specified consumer group is invalid.

Star Transformation And Cardinality Estimates

If you want to make use of Oracle's cunning Star Transformation feature then you need to be aware of the fact that the star transformation logic - as the name implies - assumes that you are using a proper star schema.

Here is a nice example of what can happen if you attempt to use star transformation but your model obviously doesn't really correspond to what Oracle expects:

drop table d;

purge table d;

drop table t;

purge table t;

create table t
rownum as id
, mod(rownum, 100) + 1 as fk1
, 1000 + mod(rownum, 10) + 1 as fk2
, 2000 + mod(rownum, 100) + 1 as fk3
, rpad('x', 100) as filler
connect by
level <= 1000000

exec dbms_stats.gather_table_stats(null, 't')

create bitmap index t_fk1 on t (fk1);

What the heck is the SQL Execution ID – SQL_EXEC_ID?

Ok, I think it’s time to write another blog entry. I’ve been traveling and dealing with jetlag from 10-hour time difference, then traveling some more, spoken at conferences, drank beer, had fun, then traveled some more, trained customers, hacked some Exadatas and now I’m back home.

Anyway, do you know what is the SQL_EXEC_ID in V$SESSION and ASH views?

Oh yeah, it’s the “SQL Execution ID” just like the documentation says … all clear. Um … is it? I’d like to know more about it – what does it actually stand for?! Is it session level, instance level or a RAC-global counter? And why does it start from 16 million, not 1?


Parallel Downgrade

There are many reasons why a parallel execution might not run with the expected degree of parallelism (DOP), beginning with running out of parallel slaves (PARALLEL_MAX_SERVERS or PROCESSES reached), PARALLEL_ADAPTIVE_MULTI_USER, downgrades at execution time via the Resource Manager, or the more recent features like PARALLEL_DEGREE_LIMIT or the Auto DOP introduced in Oracle 11.2.

Upgrade Argh

Time for another of those little surprises that catch you out after the upgrade.
Take a look at this “Top N” from a standard AWR report, from an instance running

Top 5 Timed Foreground Events
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
direct path read                  3,464,056       6,593      2   33.5 User I/O
DB CPU                                            3,503          17.8
db flash cache single block ph    2,293,604       3,008      1   15.3 User I/O
db file sequential read             200,779       2,294     11   11.6 User I/O
enq: TC - contention                     82       1,571  19158    8.0 Other

Flash Cache

Have you ever heard the suggestion that if you see time lost on event write complete waits you need to get some faster discs.
So what’s the next move when you’ve got 96GB of flash cache plugged into your server (check the parameters below) and see time lost on event write complete waits: flash cache ?

db_flash_cache_file           /flash/oracle/flash.dat
db_flash_cache_size           96636764160

Here’s an extract from a standard AWR report:

Row Lock Waits

Here’s one I keep forgetting – and spending 15 minutes trying to think of the answer before getting to the “deja vu” point again. I’ve finally decided that I’ve got to write the answer down because that will save me about 14 minutes the next time I forget.

Q. In a Statspack or AWR report there is a section titles “Segments by Row Lock Waits”. Why could an index be subject to a Row Lock Wait ?

A. Try inserting into a table from two different sessions (without committing) two rows with the same primary key. The second insert will wait on event enq: TX – row lock contention, and show up in v$lock with a lock request for a TX lock in mode 4. When you issue a commit or rollback on the first session, and the second statement errors or completes (depending on whether you commit or rollback the first session) it will increase the value for row lock waits in v$segstat (and v$segment_statistics) for the index by 1.

There are variations on the theme, of course, but the key feature is uniqueness with one session waiting for another session to commit or rollback on a conflicting value. This includes cases of foreign key constraint checking such as inserting a child for a parent that has been deleted but not committed (and there’s an interesting anomaly with that scenario which – in 10g, at least – reports more row lock waits on the parent PK than you might expect.)


Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-001a0002-0002a0fe       196     197     X            166    1835           S
TM-0000c800-00000000       166    1835    SX            196     197    SX   SSX

It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).

The modes and types give us some clues about what’s going on: TX/4 is typically about indexes involved in referential integrity (though there are a couple of more exotic reasons such as wait for ITLs, Freelists or tablespace status change); conversion of a TM lock from mode 3 to mode 5 is only possible (as far as I know) in the context of missing foreign key indexes when you delete a parent row.

Here’s a simple data set to help demonstrate the type of thing that could have caused this deadlock:

drop table child;
drop table parent;

create table parent (
	id		number(4),
	name		varchar2(10),
	constraint par_pk
		primary key (id)

create table child(
	id_p	number(4),
	id	number(4),
	name	varchar2(10),
	constraint chi_pk
		primary key (id, id_p),
	constraint chi_fk_par
		foreign key(id_p)
		references parent
		on delete cascade

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1, 1, 'Simon');
insert into child values(2, 1, 'Janet');


Note that I have define the primary key on the child the “wrong way round”, so that the foreign key doesn’t have a supporting index. Note also that the foreign key constraint is defined as ‘on delete cascade’ – this isn’t a necessity, but it means I won’t have to delete child rows explicitly in my demo.

Now we take the following steps:

Session 1: delete from parent where id = 1;

This will delete the child row – temporarily taking a mode 4 (S) lock on the child table – then delete the parent row. Both tables will end up locked in mode 3.

Session 2: insert into child values (1,2,'Sally');

This will lock the parent table in mode 2, lock the child table in mode 3, then wait with a TX mode 4 for session 1 to commit or rollback. If session 1 commits it will raise Oracle error: “ORA-02291: integrity constraint (TEST_USER.CHI_FK_PAR) violated – parent key not found”; if session 1 rolls back the insert will succeed.

Session 1: delete from parent where id = 2;

This will attempt to lock the child table in mode 4, find that there it already has the child locked in mode three (which is incompatible with mode 4) and therefore attempt to convert to mode 5 (SSX). This will make it queue, waiting for session 2 to commit.
Three seconds later session 2 (the first to start waiting) will timeout and report a deadlock with the follow deadlock graph:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TM-00015818-00000000        14     371    SX             17     368    SX   SSX
TX-0009000e-000054ae        17     368     X             14     371           S
session 371: DID 0001-000E-00000018	session 368: DID 0001-0011-00000005
session 368: DID 0001-0011-00000005	session 371: DID 0001-000E-00000018
Rows waited on:
Session 368: no row
Session 371: no row
Session 368:
  pid=17 serial=66 audsid=2251285 user: 52/TEST_USER
  O/S info: user: HP-LAPTOPV1\jonathan, term: HP-LAPTOPV1, ospid: 2300:3528, machine: WORKGROUP_JL\HP-LAPTOPV1
            program: sqlplus.exe
  application name: SQL*Plus, hash value=3669949024
  Current SQL Statement:
  delete from parent where id = 2
End of information on OTHER waiting sessions.
Current SQL statement for this session:
insert into child values(1,2,'new')

You’ll notice that there are no rows waited for – session 1 isn’t waiting for a row it’s waiting for a table and session 2 isn’t waiting for a table row it’s waiting for an index entry.

Footnote: There are several variations on the theme of one session inserting child rows when the other session has deleted (or inserted) the parent. The uncommitted parent change is an easy cause of the TX/4; the delete with unindexed foreign key is a necessary cause of the SX -> SSX.