Oakies Blog Aggregator

Oracle Redo Log Waits



In an ongoing effort to document the main Oracle wait events, here is my current effort at documenting Oracle's redo log related wait events. The original article is at

http://sites.google.com/site/embtdbo

and I will be making additional change and adding content as I have time.


Redo

Redo is written to disk when
User commits
Log Buffer 1/3 full (_log_io_size)
Log Buffer fills 1M
Every 3 seconds
DBWR asks LGWR to flush redo
Sessions Commiting wait for LGWR

Redo Log Wait Events

Log file Sync

Wait for redo flush upon:
Commit
Rollback
Arguments
P1 = buffer# in log buffer that needs to be flushed
P2 = not used
P3 = not used
Commit less
Often possible in loops that commit every loop
Commit every 50 or 100 instead
Put redo on dedicated disk
Use Raw Device or Direct IO
More Radical
Consider Ram Disks
Can stripe if redo writes are comparable to stripe size
Striping shouldn’t hurt
Striping can help
Ex: imp – can have large redo writes – can improve by 10-30%
Alternate disks for redo and archiving of redo
Possibly 10gR2
ALTER SYSTEM SET COMMIT_WRITE = BATCH, NOWAIT
Commit could be lost if machine crash
Or IO error

log buffer space

Wait for space in the redo log buffer in SGA
Solution
Increase log_buffer parameter in init.ora
Above 3M log_buffer little affect, if still a problem then backup is at disk level
Improve disk IO for redo
Faster disk
Raw file
Direct IO
Dedicated disk
p1, p2, p3 – no values

log file switch (archiving needed)

No p1,p2,p3
Database “hangs” for transactions
archive log stop;
-- make room in log_archive_dest
archive log start;

log file switch (checkpoint incomplete)

No p1,p2,p3 args
Wait for checkpoint to complete because all log files are full
Solutions
Add more log files
Increase size of log files

log file switch (private strand flush incomplete)

New wait 10g
Like a “log file switch Completion”

log file switch completion

No p1,p2,p3
Wait for lgwr to switch log files when generating redo
Solution:
Increase redo log file size
Reduces frequency of switches
What happens when a log file switch occurs:
Get next log file from control file
Get Redo Copy and Redo Allocation latch
Flush redo
Close File
Update Controlfile
Set new file to Current
Set old file to Active
If in Archivelog mode add file to archive list
Open all members of new logfile group
Write the SCN to the headers
Enable redo log generation
DBWR makes a list of blocks that need to be written out in order to over write the Redo log file a list of blocks that need to be written out in order to over write the Redo log file

switch logfile command

Same as log file switch completion but the command is executed by the dba
Alter system switch logfile;

Redo Log Sizing Concerns and Considerations

What happens to recovery time if I change my redo log file sizes
Larger Redo Log size can increase recovery time but
There are init.ora parameters to limit this
Standby DB: ARCHIVE_LAG_TARGET
Seconds, limits lag between primary and standby
Increases log file switches
FAST_START_MTTR_TARGET
Seconds to Recovery
Easy and accuracy
Is overridden by FAST_START_IO_TARGET
Is overridden by LOG_CHECKPOINT_INTERVAL
alter system set fast_start_mttr_target=17 scope=both;
SQL> select ESTIMATED_MTTR from V$INSTANCE_RECOVERY;
ESTIMATED_MTTR
--------------
21


Notes about Hadoop

My notes from two presentations given to the data mining SIG of the local ACM chapter.

Hadoop is a scalable fault-tolerant grid operating system for data storage and processing.

It is not a database. It is more similar to an operating system: Hadoop has a file system (HDFS) and a job scheduler (Map-Reduce). Both are distributed. You can load any kind of data into Hadoop.

It is quite popular – the last Hadoop Summit had 750 attendees. Not bad for a new open-source technology. It is also quite efficient for some tasks. Hadoop cluster of 1460 nodes can sort a Terabyte of data in 62 seconds – currently the world record for sorting a terabyte.

Hadoop Design Axioms:

  • System will manage and heal itself. (Because using commodity hardware – failure is inevitable).
  • Performance will scale linearly. (With few limitations).
  • Compute should move to data (Processing job should run on the machine holding the data to process)
  • Simple core. Modular and extensible

HDFS:
Distributed file system. Block size is 64M (!). User configures replication factor – each block is replicated on K machines (K chosen by user). More replication can be configured for hot blocks.
A name node keeps track of the blocks and if a node fails the data on it will be replicated to other nodes.

Map-Reduce:
Distributes jobs. It tried to run jobs local to their data to avoid network overhead. It also detects failures and even servers running behind on the processing. If a part of the job is lagging in processing, it will start copies of this part of the job on other servers with the hope that one of the copies will finish faster.

Hadoop Ecosystem:

  • HBase: Google’s big table implementation. Key-value based. Good for quick lookups, but not batch processing. Transactional.
  • Pig, Hive, Scoop: Different languages. Map-Reduce is like assembly – High performance, low-level, contains too much details for most tasks. Hive is SQL language for Hadoop.

Hadoop vs. RDBMS?
RDBMS – expensive, structured, fast, interactive, has standards, transactional.
Hadoop – affordable, unstructured, scalable, resilient. Solves both storage and processing.

Hive and Hadoop at Facebook
Facebook got 200GB of data each day as of March 2008. Thats a lot of data to manage. Facebook philosophy is that more insights can be achieved from running simpler algorithms on more data.

Why Hadoop? Cost of storage. Limitations of data-analysis systems. Many systems have limited scalability. And they were closed and propitiatory.

Why not map-reduce? Not many developers have experience with it. Needed well known schemas and structure.

Hive was built on top of Hadoop to solve these problems. It saves metadata and adds SQL. Also allows integrating with other systems. Hive has tables, which have partitions which hold buckets. Buckets are used for sampling. Hive is very extensible. You can have user defined functions, types, objects, etc.

Hive does optimizations – join order, different processing for skewed data. The optimizer is rule based and uses hints. It also does some kind of dynamic sampling. You can look at the explain plans for the jobs and use that for tuning. Hive uses columnar compression.

Hive support integrations with JDBC, ODBC and Thrift.

It lacks resource management and needs monitoring to catch and kill “bad” jobs.

Concurrency wise, the idea is that you insert data, “publish” it and from the moment it is published everyone else can see it – but it cannot be modified or deleted. This means no read/write contention.

When your projection is not cost-free

First of all I have to admit that the title of this post is a bit misleading - the projection is almost never "cost-free". Indirectly cost will be generated by accessing rows from row sources to obtain any columns selected, and with the introduction of System Statistics, Oracle even assigns a direct CPU cost to accessing a particular column of a row - as it has been pointed out by Joze Senegacnik and is demonstrated e.g. in Christian Antognini's "Troubleshooting Oracle Performance" book on page 117.

However, Oracle obviously treats work that has to be performed as part of the projection differently than work that has to be performed as part of the selection part.

In particular user-defined functions or scalar subqueries will not be accounted for costing when calculating the overall query cost.

This holds even true for user-defined functions that have a cost assigned via the Extensible Optimizer framework. For more information regarding the "Extensible Optimizer" framework, read Adrian Billington's article on oracle-developer.net or refer e.g. to our latest "Expert Oracle Practices" OakTable book where Joze Senegacnik dedicated a whole chapter to this topic.

Of course the best way to deal with this situation is to avoid using user-defined functions or scalar subqueries and replace them with appropriate joins/subqueries. In most cases this yields the best performance, provided that the optimizer comes up with a reasonable execution plan, and also solves the issue of non-contributed work, because regular join constructs will be considered in the cost calculation.

However, what to do of you're in the situation that you can't simply change the query (e.g. third party vendor application)?

If a user-defined function or scalar subquery is used as part of the selection clause the cost-based optimizer will make use of any cost associated with the function, or evaluated for the scalar subquery, whereas the same construct used as part of the projection will not be taken into account for the cost.

In particular in the case of user-defined functions that perform costly operations, for example recursive SQL, and are not declared as deterministic (or are really of undeterministic nature) this can make a significant difference.

The problem with functions that are not declared as deterministic is that the built-in caching feature of Oracle that can help with scalar subqueries or functions declared as deterministic (since Oracle 10.2) can not be used to alleviate the potentially resource-intensive numerous calls to the function.

In case of more complex queries that make use of views, the view merging transformations applied to the query therefore can lead to quite different work performed by a query.

Consider the following example setup:

drop function generate_lio;

drop table t1;

create table t1 (
run_id integer not null, /* identify the process inserting the data */
batch_id integer not null, /* represents clustered data, could also be a (arriving) date */
a_value number null, /* represents sequence based data */
a_random number null, /* represents randomly scattered data */
a_date timestamp default systimestamp not null, /* represents the insert timestamp */
filler char(1) default 'x' not null /* can be used to size the row as required */
);

create index t1_idx1 on
t1 (
a_random
);

insert into t1 (
run_id
, batch_id
, a_value
, a_random
)
select
1 as run_id
, trunc(id - 1 / 400) + 1 as batch_id
, id as a_value
, trunc(dbms_random.value(1, 40000.999999999)) as a_random
from
(
select
level as id
from
dual
connect by
level <= 40000
);

commit;

exec dbms_stats.gather_table_stats(null, 't1')

create or replace function generate_lio(in_lio in number default 1)
return number
--deterministic
as
n_val number;
begin
select /*+ first_rows(1) */
run_id
into
n_val
from
(
select
*
from
(
select
rownum as rn
, run_id
, substrb(rowid, 1, 15) as block
, a_random
from
t1
where
a_random is not null
order by
a_random
)
where rownum <= in_lio
)
where
rn = in_lio;
return trunc(dbms_random.value(0, 1000)) * n_val;
end generate_lio;
/

associate statistics with functions generate_lio default cost (100000, 3, 0) default selectivity 100;

The code creates a table holding 40,000 rows and a simple index with a bad clustering factor. A user-defined function that allows to generate logical I/O based on that index is created and associated a default cost (and selectivity) using ASSOCIATE STATISTICS. The function deliberately uses the DBMS_RANDOM package to simulate a non-deterministic behaviour. Each BATCH_ID in the table covers 400 rows, with 100 batch_ids in total.

Now consider the following query:

/* Hint is required from 11g on to prevent GROUP BY placement */
select /*+ no_place_group_by */
sum(val), t1.filler
from
t1
, (
/* See the difference in the runtime statistics
between merging this view and not */
select /* no_merge */
generate_lio(1) as val
, batch_id
from
t1
) t2
where
t1.batch_id = t2.batch_id
and t1.batch_id = 42
group by
t1.filler;

This query will generate 160,000 rows by combining 400 rows from each row source. Here is the execution plan (all tested on 11.1.0.7) when merging the view T2 by default:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 146 (28)| 00:00:02 |
| 1 | HASH GROUP BY | | 1 | 8 | 146 (28)| 00:00:02 |
|* 2 | HASH JOIN | | 160K| 1250K| 115 (8)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| T1 | 400 | 2000 | 55 (4)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| T1 | 400 | 1200 | 55 (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
4 - filter("BATCH_ID"=42)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=1) "T1"."FILLER"[CHARACTER,1], SUM("GENERATE_LIO"(1))[22]
2 - (#keys=1) "T1"."FILLER"[CHARACTER,1]
3 - "T1"."BATCH_ID"[NUMBER,22], "T1"."FILLER"[CHARACTER,1]
4 - "BATCH_ID"[NUMBER,22]

Notice in particular where the evaluation of the function takes place according to the "Projection" information ("+PROJECTION" option of DBMS_XPLAN.DISPLAY).

And here is the execution plan when explicitly requesting to not merge the view T2:

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 146 (28)| 00:00:02 |
| 1 | HASH GROUP BY | | 1 | 21 | 146 (28)| 00:00:02 |
|* 2 | HASH JOIN | | 160K| 3281K| 115 (8)| 00:00:02 |
|* 3 | TABLE ACCESS FULL | T1 | 400 | 2000 | 55 (4)| 00:00:01 |
| 4 | VIEW | | 400 | 6400 | 55 (4)| 00:00:01 |
|* 5 | TABLE ACCESS FULL| T1 | 400 | 2400 | 55 (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="T2"."BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
5 - filter("BATCH_ID"=42)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=1) "T1"."FILLER"[CHARACTER,1], SUM("VAL")[22]
2 - (#keys=1) "T1"."FILLER"[CHARACTER,1], "VAL"[NUMBER,22]
3 - "T1"."BATCH_ID"[NUMBER,22], "T1"."FILLER"[CHARACTER,1]
4 - "VAL"[NUMBER,22], "T2"."BATCH_ID"[NUMBER,22]
5 - "BATCH_ID"[NUMBER,22]

It is not that obvious from the "Projection" information, but in this case the function (the "VAL" of the "Projection" in operation id 4) is evaluated before the join takes place.

Notice that the (still undocumented) "NO_PLACE_GROUP_BY" hint is required from 11g on to prevent the optimizer from getting too clever with this kind of statement. The GROUP BY is used in this case to simplify the result set processing aggregating it into a single row, but in 11g by default the new GROUP BY placement pushes the GROUP BY into the view, effectively solving the issue of excessive function calls by simply reducing the row source sizes that subsequently get joined. Since this is not supposed to be point of this demonstration, the clever trick of pushing the group by into the view is prevented. However it is interesting to note how new features of the optimizer can help to solve problems by side-effects.

The problem described here can still be seen in 11g without any hints when not using a GROUP BY clause to aggregate the result set.

If you run this with statistics_level set to ALL and check the runtime statistics (DBMS_XPLAN.DISPLAY_CURSOR), you'll notice a significant difference between merging the view T2 or not. By default Oracle will merge the view, and obviously perform the "projection" as part of the HASH GROUP BY operation after joining the data, leading to 160,000 calls to the function, each generating three logical I/Os (when using 1 as parameter to the function).

-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:14.58 | 480K| | | |
| 1 | HASH GROUP BY | | 1 | 1 | 1 |00:00:14.58 | 480K| 805K| 805K| 602K (0)|
|* 2 | HASH JOIN | | 1 | 160K| 160K|00:00:00.01 | 390 | 988K| 988K| 317K (0)|
|* 3 | TABLE ACCESS FULL| T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
|* 4 | TABLE ACCESS FULL| T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
4 - filter("BATCH_ID"=42)

Compare the difference in runtime and the number of logical I/Os performed ("Buffers") to this one, when not merging the view:

------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.34 | 1590 | | | |
| 1 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.34 | 1590 | 805K| 805K| 369K (0)|
|* 2 | HASH JOIN | | 1 | 160K| 160K|00:00:00.02 | 1590 | 988K| 988K| 354K (0)|
|* 3 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.02 | 195 | | | |
| 4 | VIEW | | 1 | 400 | 400 |00:00:00.04 | 1395 | | | |
|* 5 | TABLE ACCESS FULL| T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="T2"."BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
5 - filter("BATCH_ID"=42)

So obviously it is a good idea in this particular case to not merge the view, but although a I/O cost has been explicitly assigned to the function, you can see that both execution plans have exactly the same cost and Oracle happily merges the view.

The costing looks different when using the function as part of the projection clause:

select /*+ no_place_group_by */
sum(val), t1.filler
from
t1
, (
select
1 as val
, batch_id
, a_date
, run_id
from
t1
)t2
where
t1.batch_id = t2.batch_id
and t1.batch_id = 42
and generate_lio(t2.run_id) >= 0
group by
t1.filler;

Note that I had to change the function parameter to something that refers to a column expression, otherwise the optimizer treats "generate_lio(1)" as "independent" and adds a FILTER operation that evaluates "generate_lio" exactly once.

It is now obvious from the execution plan that the cost of executing the function is considered:

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 1356 (4)| 00:00:17 |
| 1 | HASH GROUP BY | | 1 | 11 | 1356 (4)| 00:00:17 |
|* 2 | HASH JOIN | | 160K| 1718K| 1325 (2)| 00:00:16 |
|* 3 | TABLE ACCESS FULL| T1 | 400 | 2000 | 55 (4)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| T1 | 400 | 2400 | 1265 (1)| 00:00:16 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."BATCH_ID"="BATCH_ID")
3 - filter("T1"."BATCH_ID"=42)
4 - filter("BATCH_ID"=42 AND "GENERATE_LIO"("RUN_ID")>=0)

Notice the increased cost of the operation id 4 - calling the function 400 times with an associated I/O cost of 3 and adding on top the associated CPU cost.

A similar behaviour can be seen when using scalar subqueries as part of the selection or projection.

Consider this query:

select /*+ no_place_group_by */
sum(val), t1.filler
from
t1
, (
select /* no_merge */
(
select
min(run_id)
from
t1 a
where
a.a_random = b.a_random
) as val
, batch_id
from
t1 b
) t2
where
t1.batch_id = t2.batch_id
and t1.batch_id = 42
group by
t1.filler;

Checking the execution plan you'll notice again the same cost as above when using the function in the projection:

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 146 (28)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 2 | 16 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T1_IDX1 | 2 | | 1 (0)| 00:00:01 |
| 4 | HASH GROUP BY | | 1 | 13 | 146 (28)| 00:00:02 |
|* 5 | HASH JOIN | | 160K| 2031K| 115 (8)| 00:00:02 |
|* 6 | TABLE ACCESS FULL | T1 | 400 | 2000 | 55 (4)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | T1 | 400 | 3200 | 55 (4)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."A_RANDOM"=:B1)
5 - access("T1"."BATCH_ID"="BATCH_ID")
6 - filter("T1"."BATCH_ID"=42)
7 - filter("BATCH_ID"=42)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) MIN("RUN_ID")[22]
2 - "RUN_ID"[NUMBER,22]
3 - "A".ROWID[ROWID,10]
4 - (#keys=1) "T1"."FILLER"[CHARACTER,1], SUM( (SELECT MIN("RUN_ID") FROM
"T1" "A" WHERE "A"."A_RANDOM"=:B1))[22]
5 - (#keys=1) "T1"."FILLER"[CHARACTER,1], "B"."A_RANDOM"[NUMBER,22]
6 - "T1"."BATCH_ID"[NUMBER,22], "T1"."FILLER"[CHARACTER,1]
7 - "BATCH_ID"[NUMBER,22], "B"."A_RANDOM"[NUMBER,22]

Notice again the subtle difference in the projection when preventing the view merge:

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 146 (28)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 2 | 16 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T1_IDX1 | 2 | | 1 (0)| 00:00:01 |
| 4 | HASH GROUP BY | | 1 | 21 | 146 (28)| 00:00:02 |
|* 5 | HASH JOIN | | 160K| 3281K| 115 (8)| 00:00:02 |
|* 6 | TABLE ACCESS FULL | T1 | 400 | 2000 | 55 (4)| 00:00:01 |
| 7 | VIEW | | 400 | 6400 | 55 (4)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | T1 | 400 | 4400 | 55 (4)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."A_RANDOM"=:B1)
5 - access("T1"."BATCH_ID"="T2"."BATCH_ID")
6 - filter("T1"."BATCH_ID"=42)
8 - filter("BATCH_ID"=42)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) MIN("RUN_ID")[22]
2 - "RUN_ID"[NUMBER,22]
3 - "A".ROWID[ROWID,10]
4 - (#keys=1) "T1"."FILLER"[CHARACTER,1], SUM("VAL")[22]
5 - (#keys=1) "T1"."FILLER"[CHARACTER,1], "VAL"[NUMBER,22]
6 - "T1"."BATCH_ID"[NUMBER,22], "T1"."FILLER"[CHARACTER,1]
7 - "VAL"[NUMBER,22], "T2"."BATCH_ID"[NUMBER,22]
8 - "BATCH_ID"[NUMBER,22], "B"."A_RANDOM"[NUMBER,22]

It is also interesting to note that although the subquery is executed as part of the VIEW projection step in operation id 7, the scalar subquery is still shown at top level of the query starting with operation id 1. It would be more accurate to show it as child of operation id 7 in this particular case, but this is probably not supported by EXPLAIN PLAN at present.

At runtime however, the outcome is different from the function case, mainly due to the filter optimization / subquery caching feature, which also makes the subquery implicitly deterministic - it will only get executed as many times as there are distinct number of input values, which is the A_RANDOM column in this case.

In both cases the subquery will be executed only approx. 400 times, because there are only 400 distinct values in the generated row source.

View merged:

-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.32 | 1418 | | | |
| 1 | SORT AGGREGATE | | 398 | 1 | 398 |00:00:00.01 | 1028 | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 398 | 2 | 781 |00:00:00.01 | 1028 | | | |
|* 3 | INDEX RANGE SCAN | T1_IDX1 | 398 | 2 | 781 |00:00:00.01 | 402 | | | |
| 4 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.32 | 1418 | 805K| 805K| 570K (0)|
|* 5 | HASH JOIN | | 1 | 160K| 160K|00:00:00.02 | 390 | 988K| 988K| 390K (0)|
|* 6 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.02 | 195 | | | |
|* 7 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."A_RANDOM"=:B1)
5 - access("T1"."BATCH_ID"="BATCH_ID")
6 - filter("T1"."BATCH_ID"=42)
7 - filter("BATCH_ID"=42)

View not merged:

-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.24 | 1418 | | | |
| 1 | SORT AGGREGATE | | 398 | 1 | 398 |00:00:00.02 | 1028 | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 398 | 2 | 781 |00:00:00.02 | 1028 | | | |
|* 3 | INDEX RANGE SCAN | T1_IDX1 | 398 | 2 | 781 |00:00:00.02 | 402 | | | |
| 4 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.24 | 1418 | 805K| 805K| 362K (0)|
|* 5 | HASH JOIN | | 1 | 160K| 160K|00:00:00.02 | 1418 | 988K| 988K| 377K (0)|
|* 6 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.02 | 195 | | | |
| 7 | VIEW | | 1 | 400 | 400 |00:00:00.02 | 1223 | | | |
|* 8 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
-----------------------------------------------------------------------------------------------------------------------------

It is obvious from the "Buffers" column that the scalar subquery has been executed at different steps of the execution plan.

However in cases where you're not that lucky and the filter optimization doesn't work that efficiently, there still might be a significant difference between the merged and unmerged view variant of the query.

Again, when moving the subquery to the selection, the cost calculation looks quite different:

select /*+ no_place_group_by */
sum(val), t1.filler
from
t1
, (
select
run_id as val
, batch_id
from
t1
) t2
where
t1.batch_id = t2.batch_id
and t1.batch_id = 42
and t1.run_id >= (
select /*+ no_unnest */
min(run_id)
from
t1 a
where
a.a_random = t1.a_random
)
group by
t1.filler;
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 349K (1)| 01:10:00 |
| 1 | HASH GROUP BY | | 1 | 19 | 349K (1)| 01:10:00 |
|* 2 | FILTER | | | | | |
|* 3 | HASH JOIN | | 160K| 2968K| 116 (9)| 00:00:02 |
|* 4 | TABLE ACCESS FULL | T1 | 400 | 2400 | 55 (4)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | T1 | 400 | 5200 | 56 (6)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 8 | | |
| 7 | TABLE ACCESS BY INDEX ROWID| T1 | 2 | 16 | 3 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | T1_IDX1 | 2 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("T1"."RUN_ID">= (SELECT /*+ NO_UNNEST */ MIN("RUN_ID") FROM "T1"
"A" WHERE "A"."A_RANDOM"=:B1))
3 - access("T1"."BATCH_ID"="BATCH_ID")
4 - filter("BATCH_ID"=42)
5 - filter("T1"."BATCH_ID"=42)
8 - access("A"."A_RANDOM"=:B1)

It is interesting that the FILTER operation is executed "late" (and therefore potentially more often) - since it is only depending on the T1.RUN_ID and T1.A_RANDOM column it could be executed "earlier" while processing the T1 row source, and in fact this can be achieved by adding the PUSH_SUBQ hint to the subquery. I haven't investigated this further, but may be the optimizer doesn't cost the different subquery pushing options when explicitly requesting to not unnest it (the NO_UNNEST hint) - without the NO_UNNEST hint the subquery is transformed into a join in this particular case.

It can be seen that in this case the optimizer used a "worst case" approach estimating that the scalar subquery gets executed many, many times. Very likely the cost increase can be explained by 160,000 times the cost of the scalar subquery (which might be less than 3 and gets rounded up in the EXPLAIN PLAN output) due to the "late" execution.

At runtime the filter optimization again works very efficiently:

-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.38 | 1418 | | | |
| 1 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.38 | 1418 | 805K| 805K| 362K (0)|
|* 2 | FILTER | | 1 | | 160K|00:00:00.16 | 1418 | | | |
|* 3 | HASH JOIN | | 1 | 160K| 160K|00:00:00.01 | 390 | 968K| 968K| 377K (0)|
|* 4 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
|* 5 | TABLE ACCESS FULL | T1 | 1 | 400 | 400 |00:00:00.01 | 195 | | | |
| 6 | SORT AGGREGATE | | 398 | 1 | 398 |00:00:00.04 | 1028 | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| T1 | 398 | 2 | 781 |00:00:00.02 | 1028 | | | |
|* 8 | INDEX RANGE SCAN | T1_IDX1 | 398 | 2 | 781 |00:00:00.02 | 402 | | | |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("T1"."RUN_ID">=)
3 - access("T1"."BATCH_ID"="BATCH_ID")
4 - filter("BATCH_ID"=42)
5 - filter("T1"."BATCH_ID"=42)
8 - access("A"."A_RANDOM"=:B1)

So it is not obvious to me why the optimizer treats these two cases differently - in case of the selection it uses a "worst-case" approach, but why this is not used in case of the projection is not clear to me.

In summary, you need to be careful in particular when using functions as part of the projection clause and complex queries - the order of evaluation might make a significant difference to the overall query performance.

As already mentioned, the best way to deal with such constructs is to avoid them. If you can not, you first should evaluate if the function can be declared as deterministic (will be cached from 10.2 on), or if you can use the workaround of wrapping the function call into a scalar subquery (select f(x) from dual) to take advantage of the subquery caching feature, which makes the function also implicitly deterministic.

How to setup a private DNS for your virtual cluster

One of the challenges I faced recently was building a virtual cluster based on Oracle 11g Release 2 on top of Oracle Enterprise Linux (OEL) running inside VMware server. Although I have an existing virtual Oracle 11g Release 1 cluster, I decided to build a new one in order to be able to teach both […]

Does the Query Optimizer Cost PX Distribution Methods?

The short answer to this question is “yes”, it does. Unfortunately, the distribution costs are not externalized through the execution plans and, as a result, this limitation (yes, it is really a limitation in the current implementation, not a bug) confuses everyone that carefully look at the information provided in an execution plan of a [...]

NEXTGRES Gateway: MySQL Emulator for Oracle

So, a few people have asked me what NEXTGRES Gateway is.  My short answer, the ultimate database compatibility server. Sorry if this blog entry sounds very marketing-oriented, but I’ve been working on this personal project non-stop for the last 8 months and am really excited about it. NEXTGRES Gateway in a nutshell: Designed to assist […]

The most common passwords!

32 Million passwords were analyzed at rockyou.com and these were the top 20

To put it a different perspective, if a hacker used the first 116 password on each account, they would break into 5% of the accounts. With the first 5000 passwords tried on each account, they'd break into 20% of the accounts!
Its so much more meaningful for me to see real life data.
For Oracle specific security, password hacking and protection see
Pete Finnigan's site at

The Core Performance Fundamentals Of Oracle Data Warehousing – Partitioning

[back to Introduction] Partitioning is an essential performance feature for an Oracle data warehouse because partition elimination (or partition pruning) generally results in the elimination of a significant amount of table data to be scanned. This results in a need for less system resources and improved query performance. Someone once told me “the fastest I/O is the one that never happens.” This is precisely the reason that partitioning is a must for Oracle data warehouses – it’s a huge I/O eliminator. I frequently refer to partition elimination as the anti-index. An index is used to find a small amount data that is required; partitioning is used to eliminate vasts amounts of data that is not required. Main Uses For Partitioning I would classify the main reasons to use partitioning in your Oracle data warehouse into these four areas: Data Elimination Partition-Wise Joins Manageability (Partition Exchange Load, Local Indexes, etc.) Information Lifecycle Management (ILM) Partitioning Basics The most common partitioning design pattern found in Oracle data warehouses is to partition the fact tables by range (or interval) on the event date/time column. This allows for partition elimination of all the data not in the desired time window in queries. For example: If I have a [...]

ASH simulation Scripts (SASH and SASHMON)


(see http://sites.google.com/site/embtdbo/wait-event-documentation/sash-1)

I've gotten a few questions on SASH and SASHMON recently, so I decided to repost the following info:

Active Sessoin history, aka ASH, is only available in Oracle 10g and higher and also requires the diagnostic package license. If you are are an older version of Oracle or don't have the diagnostic package license then you have the option of simulating ASH (S-ASH) yourself. Your milage may vary on the scripts below. No guarentees on them working correctly on your systems. Make sure you test and understand them.

For the scripts below it is highly recommended to put the repository on a different database than the database being monitored. I've only tested resource consumption for gathering data (less than 1% of 1 CPU) and not the resource usage of actually storing the data.

--------------------------------------------

(v1 is available at http://ashmasters.com/ash-simulation/)

V2 Installs on Windows or Unix on Standard Edition

The scripts below are newer so there will probably be some hickups in them (some modules such as SQL and Session statistics collection aren't implemented). Please comment on the blog on any issues and/or solutions you find. I wrote these scripts as both an option for installing on Windows (since the first version was a shell script) and for allowing Standard Edition as a repository while still supporting data purging using "poor man's partitioning", ie having separate tables for each day of the week, and using view with union all of the 7 tables in the data mining scripts. I find the following scripts easier to follow, read, understand and change than the single script above.

SASH creates a view called v$active_session_history and scripts that run on the "real" v$active_session_history should also work on SASH.v$active_session_history.

Repository Creation -

repo_0_user.sql - run as SYS or SYSTEM. Creates SASH user
repo_1_tables.sql - run as SASH only !! ( WARNING - Installs the SASH schema on repository database including simulated DBA_ and V$ views. Will cause problems if run as SYS or SYSTEM)

repo_2_pkg.sql - run as SASH, optional, create an automatic purge procedure on repository machine
repo_3_jobs.sql - run as SASH, optional, start a job on repository machine to purge oldest day of data

Monitored Database Setup (do this on each database to be monitored)

targ_1_pkg.sql - install collection package on each database to be monitored
targ_2_jobs.sql - start up collection in a job on each database to be monitored

Data Mining S-ASH

When running scripts written explicity for ASH on SASH data there are a couple of issues. (or running ASHMON)

1) WAIT GROUPS : SASH doesn't collect wait groups because its set up mainly for version 8 and 9 (since ASH is already on 10g+). The wait groups are required to run ASHMON or scripts that use "wait_class" in v$active_session_history.

repo_4_waitgroups.sql - run as SASH, sets up wait groups

2) CURRENT DATABASE : SASH collects data for multiple databases into the same schema, thus scripts and ASHMON have to filter by the correct DBID. I do this by having a table SASH_TARGET that contains the DBID that I'm interested in. Then views like v$active_session_history include a filter on DBID from SASH_TARGET. Thus to change databases, I just change the DBID in SASH_TARGET.

repo_5_curdb.sql - run as SASH, changes the DBID in SASH_TARGET (shows a list of monitored databases and prompts for the one to filter for)

ASHMON - graphical monitor

When running ASHMON, the user to connect to is SASH, ie the owner of the repository data


Performance Metrics and XML Reporting in PeopleSoft

I am working with a PeopleSoft system that makes extensive use of XML Publisher reporting.  Generally these reports are based on SQL queries that are defined in the PeopleSoft PS/Query utility. 

These queries are run by an Application Engine program PSXPQRYRPT that runs the SQL query and produces the report in a single process.  This is a generic application engine program that runs any XML report.  Line 45 (in PeopleTools 8.49) of step MAIN.ExecRPT executes the report with the ProcessReport PeopleCode command.

#eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
&oRptDefn.ProcessReport(&TemplateId, &Languaged, &AsOfDate, &oRptDefn.GetOutDestFormatString(&OutDestFormat));

Batch Timings
Analysis of the Application Engine batch timings indicate that nearly all the time in this Application Engine program is spent in PeopleCode, and that this not SQL execution time.  This is misleading.  The ProcessReport command is PeopleCode, but behind the scenes it also issues the SQL in the report data source.  Not all the time is SQL, but the Application Engine Batch Timings does not count any of this as SQL because it is not in the PeopleCode SQL Class. 

Let’s look at an example Batch Timings report (I have edited it down, removing zero and insignificant timings).

#eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
PeopleSoft Application Engine Timings
(All timings in seconds)

C o m p i l e E x e c u t e F e t c h Total
SQL Statement Count Time Count Time Count Time Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
PeopleCode
Record.SelectByKey PSDBFIELD 8 0.1 8 0.0 0.1
SELECT PSPRCSRQST 2 0.1 2 0.0 0.1
SELECT PSXPTMPLDEFN 3 0.1 3 0.0 0.1
--------
0.3
AE Program: PSXPQRYRPT
MAIN.ExecRpt.H 1 0.0 1 0.0 1 0.0 0.0
--------
0.0
------------------------------------------------------------------------------------------
Call Non-SQL SQL Total
PeopleCode Count Time Time Time
------------------------------ ------- -------- -------- --------
AE Program: PSXPQRYRPT
MAIN.ExecRpt 1 643.2 0.3 643.5
-------- -------- --------
643.2 0.3 643.5
------------------------------------------------------------------------------------------
E x e c u t e
PEOPLECODE Builtin/Method Count Time
------------------------------------------------------- ------- --------
Boolean(Type 5) BuiltIns 90 0.1
DateTime(Type 11) BuiltIns 1 0.1
SQL(Type 524290) Methods 19 0.1
SQL(Type 524290) BuiltIns 9 0.1
Record(Type 524291) Methods 1104 0.1
Session(Type 524303) Methods 207 633.2
JavaObject(Type 524315) BuiltIns 6 2.2
PostReport(Type 524324) Methods 2 0.7
------------------------------------------------------------------------------------------
Total run time : 644.0
Total time in application SQL : 0.3 Percent time in application SQL : 0.0%
Total time in PeopleCode : 643.2 Percent time in PeopleCode : 99.9%
------------------------------------------------------------------------------------------
  • The total execution time of is 644 seconds.
  • 643.2s are reported as being in PeopleCode, and as coming from the MAIN.ExecRpt step.
  • Only 0.3s of that is SQL time, and that comes from PeopleCode functions in the SQL Class  (CreateRecord , CreateSQL etc.)
  • The ProcessReport method does not appear in the PeopleCode SQL analysis section of the report, because it is not classed as SQL.

Identifying the Report ID and Report Source
So, the next question is how can I find out which report this process is running.  Different instances of this report may be running different queries.

You can get the list files generated by a process from the Content Management PeopleTools table PS_CDM_FILE_LIST.  Application Engine processes usually produce various log files (with extensions .aet, .trc and .log), the name of the other file is the same as the name of the report followed by an extension that will vary depending on format.

You can look at the report definition on-line under Reporting Tools -> XML Publisher -> Report Definition, and that will show you the Data Source ID

The Data Source is defined in a component accessible from the previous entry in the same menu.

Analysing Execution Times
I have written this query to aggregate execution time for PSXPQRYRPT by output file name. 

  • PSPRCSRQST is the process request table, from which I get the run time of the process.
  • PS_CDM_FILE_LIST lists the files generated by the process which would be posted to the Report Repository. If I exclude the usual trace files, I am left with the report ID.
  • PSXPRPTDEFN is the report definition record, from which I can get the data source ID
  • PSXPDATSRC specifies the data source.  A data source type (DS_TYPE) of QRY indicates a PeopleSoft PS/Query.  If it is a private query, the OPRID will have a value.
#eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
column report_defn_id heading 'Report ID'      format a12 
column ds_type heading 'Type' format a4
column ds_id heading 'Data Source ID' format a30
column oprid heading 'Owner' format a10
column processes heading 'Prcs' format 990
column reprots heading 'Reps' format 9,990
column secs heading 'Total|Seconds' format 999,990
column median heading 'Median|Seconds' format 999,990
column variance heading 'Variance' format 9990.0
SELECT d.report_defn_id, d.ds_type, d.ds_id, d.oprid
, SUM(y.processes) processes
, SUM(y.reports) reports
, SUM(y.secs) secs
, median(y.secs) median
, variance(y.secs) variance
FROM (
SELECT x.prcsinstance
, x.filename
, COUNT(DISTINCT x.prcsinstance) processes
, COUNT(*) reports
, SUM(x.secs) secs
FROM (
SELECT r.prcsinstance
, f.filename
, 86400*(r.enddttm-r.begindttm)*ratio_to_report(1) over (partition by r.prcsinstance) secs
FROM sysadm.psprcsrqst r
, sysadm.ps_cdm_file_list f
WHERE r.prcsname = 'PSXPQRYRPT'
AND r.prcsinstance = f.prcsinstance
AND NOT f.cdm_file_type IN('AET','TRC','LOG')
AND r.begindttm >= TRUNC(SYSDATE)
) x
GROUP BY x.prcsinstance, x.filename
) y
, sysadm.psxprptdefn d
WHERE d.report_defn_id = SUBSTR(y.filename,1,instr(y.filename,'.')-1)
GROUP BY d.report_defn_id, d.ds_type, d.ds_id, d.oprid
ORDER BY secs DESC
/

I can now see which report process has been executed run how many times, how many copies of the report have been produced, and where the processing time is being spent, and so which one I should look at first.

#eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
Total   Median
Report ID Type Data Source ID Owner Prcs Reps Seconds Seconds Variance
------------ ---- ------------------------------ ---------- ---- ------ -------- -------- --------
XXX_WK_LATE QRY XXX_WKLY_LATENESS_RPT 20 20 2,973 148 3702.9
XXX_HRAM_CON QRY XXX_HRPD_CNT_AMD_RPT_QRY 92 92 2,677 27 108.4
XXX_CKOUT_RP QRY XXX_CHECKOUT_REPORT 47 47 2,043 41 347.7
XXX_BNK_RPT QRY XXX_BNK_DTLS_SCH_QRY 1 75 141 141 0.0

Conclusion

If you have a SQL performance problem with an XML report because the source PS/Query performs poorly, the batch timings will lead you to believe that you have a PeopleCode problem and not a SQL problem. 

Beware, this may not be the case.

Most of the execution time for XML Reports sourced from PS/Queries is almost certain to be SQL execution time.  The above query will tell you which queries are consuming how much time, and so inform your tuning effort.