Search

Top 60 Oracle Blogs

Recent comments

AW-argh

This is another of the blog notes that have been sitting around for several years – in this case since May 2014, based on a script I wrote a year earlier. It makes an important point about “inconsistency” of timing in the way that Oracle records statistics of work done. As a consequence of being first drafted in May 2014 the original examples showed AWR results from 10.2.0.5 and 11.2.0.4 – I’ve just run the same test on 19.3.0.0 to see if anything has changed.

 

[Originally drafted May 2014]: I had to post this as a reminder of how easy it is to forget things – especially when there are small but significant changes between versions. It’s based loosely on a conversation from Oracle-L, but I’m going to work the issue in the opposite order by running some code and showing you the ongoing performance statistics rather than the usual AWR approach of reading the performance stats and trying to guess what happened.

The demonstration needs two sessions to run; it’s based on one session running some CPU-intensive SQL inside an anonymous PL/SQL block with a second another session launching AWR snapshots at carefully timed moments. Here’s the code for the working session:

rem
rem     Script:         awr_timing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2013
rem

alter session set "_old_connect_by_enabled"=true';

create table kill_cpu(n, primary key(n))
organization index
as
select  rownum n
from    all_objects
where   rownum <= 26 -- > comment to avoid wordpress format issue
;

execute dbms_stats.gather_table_stats(user,'kill_cpu')

pause Take an AWR snapshot from another session and when it has completed  press return

declare
        m_ct    number;
begin

        select  count(*) X
        into    m_ct
        from    kill_cpu
        connect by
                n > prior n
        start with
                n = 1
        ;

        dbms_lock.sleep(30);

end;
/

You may recognise an old piece of SQL that I’ve often used as a way of stressing a CPU and seeing how fast Oracle can run. The “alter session” at the top of the code is necessary to use the pre-10g method of running a “connect by” query so that the SQL does a huge number of buffer gets (and “buffer is pinned count” visits). On my current laptop the query takes about 45 seconds (all CPU) to complete. I’ve wrapped this query inside a pl/sql block that then sleeps for 30 seconds.

From the second session you need to launch an AWR snapshot 4 times – once in the pause shown above, then (approximately) every 30 seconds thereafter. The second one should execute while the SQL statement is still running, the third one should execute while the sleep(30) is taking place, and the fourth one should execute after the pl/sql block has ended and the SQL*Plus prompt is visible.

Once you’ve got 4 snapshots you can generate 3 AWR reports. The question to ask then, is “what do the reports say about CPU usage?” Here are a few (paraphrased) numbers – starting with 10.2.0.5 comparing the “Top 5 timed events”, “Time Model”, and “Instance Activity” There are three sets of figures, the first reported while the SQL was still running, the second reported after the SQL statement had completed and the dbms_lock.sleep() call is executing, the last reported after the PL/SQL block has completed. There are some little oddities in the numbers due to backgorund “noise” – but the key points are still clearly visible:

While the SQL was executing

Top 5
-----
CPU Time                       26 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                     26.9        100.0
DB CPU                                       26.2         97.6

Instance Activity
-----------------
CPU used by this session         0.65 seconds
recursive cpu usage              0.67 seconds

SQL ordered by CPU
------------------
31 seconds reported for both the SQL and PLSQL

During the sleep()

Top 5
-----
CPU Time                        19 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                     19.0        100.0
DB CPU                                       18.6         98.1

Instance Activity
-----------------
CPU used by this session         0.66 seconds
recursive cpu usage             44.82 seconds

SQL ordered by CPU
------------------
14 seconds reported for both the SQL and PLSQL

After the PL/SQL block ended

Top 5
-----
CPU Time                         1 second

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                      1.4         99.9
DB CPU                                        1.4         99.7

Instance Activity
-----------------
CPU used by this session        44.68 seconds
recursive cpu usage              0.50 seconds

SQL ordered by CPU
------------------
1 second reported for the PLSQL, but the SQL was not reported

Points to notice:

While the SQL was excecuting (and had been executing for about 26 seconds, the Time Model mechanism was recording the work done by the SQL, and the Top N information echoed the Time model CPU figure. At the same time the “CPU used …” Instance Activity Statistics have not recorded any CPU time for the session – and they won’t until the SQL statement completes. Despite this, the “SQL ordered by …” reports double-count in real-time, show the SQL and the PL/SQL cursors as consuming (with rounding errors, presumably) 31 seconds each.

After the SQL execution was over and the session was sleeping the Time model (hence the Top 5) had recorded a further 19 seconds of work. The instance activity, however, has now accumulated 44 seconds of CPU, but only as “recursive CPU usage” (recursive because our SQL was called from with a PL/SQL block), with no “CPU used by this session”. The “SQL ordered by …” figures have recorded the amount of CPU used by both the SQL and PL/SQL in the interval (i.e. 14 seconds – which is a little off) recorded against both.)

After the PL/SQL block has completed the Time Model and the Top 5 report both say that nothing much happened in the interval, but the Instance Activity suddenly reports 44.68 seconds of CPU used by this session – which (roughly speaking) is truish as the PL/SQL block ended and assigned the accumulated recursive CPU usage to the session CPU figure. Finally, when we get down to the “SQL ordered by CPU” the SQL was not reported  – it did no work in the interval – but the PL/SQL block was still reported but only with a generous 1 second of CPU since all it did in the interval was finish the sleep call and tidy up the stack before ending.

Now the same sets of figures for 11.2.0.4 – there’s a lot of similarity, but one significant difference:

While the SQL was executing


Top 5
-----
CPU Time                        26.6 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                     27.0        100.0
DB CPU                                       26.6         98.5

Instance Activity
-----------------
CPU used by this session         1.09 seconds
recursive cpu usage              1.07 seconds

SQL ordered by CPU
------------------
25.6 seconds reported for both the SQL and PLSQL

During the sleep()

Top 5
-----
CPU Time                        15.1 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                     15.3         99.8
DB CPU                                       15.1         98.2

Instance Activity
-----------------
CPU used by this session        41.09 seconds
recursive cpu usage             41.03 seconds

SQL ordered by CPU
------------------
14.3 seconds reported for the SQL
13.9 seconds reported for the PLSQL

After the PL/SQL block ended

Top 5
-----
CPU Time                         1.4 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                      1.5         99.6
DB CPU                                        1.4         95.4

Instance Activity
-----------------
CPU used by this session         1.02 seconds
recursive cpu usage              0.95 seconds

SQL ordered by CPU
------------------
0.5 seconds reported for the PLSQL, and no sign of the SQL

Spot the one difference in the pattern – during the sleep() the Instance Activity Statistic “CPU used by this session” is recording the full CPU time for the complete query, whereas the time for the query appeared only in the “recursive cpu” in the 10.2.0.5 report.

I frequently point out that for proper understanding of the content of an AWR report you need to cross-check different ways in which Oracle reports “the same” information. This is often to warn you about checking underlying figures before jumping to conclusions about “hit ratios”, sometimes it’s to remind you that while the Top 5 might say some average looks okay the event histogram may say that what you’re looking at is mostly excellent with an occasional disaster thrown in. In this blog note I just want to remind you that if you only ever look at one set of figures about CPU usage there are a few special effects (particularly relating to long running PL/SQL / Java / SQL) where you may have to work out a pattern of behaviour to explain unexpectedly large (or small) figures and contradictory figures, The key to the problem is recognising that different statistics may be updated at different stages in a complex process.

Footnote

I doubt if many people still run 11.2.0.4, so I also re-ran the test on 19.3.0.0 before publishing. The behaviour hasn’t changed since 11.2.0.4 although the query ran a little faster, perhaps due to changes in the mechanisms for this type of “connect by pump”.

11.2.0.4 stats


Name                                            Value
----                                            -----
session logical reads                      33,554,435
consistent gets                            33,554,435
consistent gets from cache                 33,554,435
consistent gets from cache (fastpath)      33,554,431
no work - consistent read gets             33,554,431
index scans kdiixs1                        33,554,433
buffer is not pinned count                 16,777,219


19.3.0.0 stats

Name                                            Value
----                                            -----
session logical reads                      16,843,299
consistent gets                            16,843,299
consistent gets from cache                 16,843,299
consistent gets pin                        16,843,298
consistent gets pin (fastpath)             16,843,298
no work - consistent read gets             16,790,166
index range scans                          33,554,433
buffer is not pinned count                 16,790,169

Some changes are trivial (like the change of name for “index scans kdiixs1”) some are interesting (like some gets not being labelled as “pin” and “pin (fastpath)”), some are baffling (like how you can manage 33M index range scans while doing only 16M buffer gets!)