Search

OakieTags

Who's online

There are currently 0 users and 32 guests online.

Recent comments

Oakies Blog Aggregator

Finding the Next Primary Key Value, a Pre-fixed Solution

January 20, 2012 A request for assistance came in from an ERP mailing list.  The original poster (OP) is running an unspecified version of Oracle Database 9i, and is in need of a solution to generate new part numbers with prefixed characters that describe the type of part, followed by a sequential number that is [...]

SCN – What, why, and how?

In this blog entry, we will explore the wonderful world of SCNs and how Oracle database uses SCN internally. We will also explore few new bugs and clarify few misconceptions about SCN itself.

What is SCN?

SCN (System Change Number) is a primary mechanism to maintain data consistency in Oracle database. SCN is used primarily in the following areas, of course, this is not a complete list:

  1. Every redo record has an SCN version of the redo record in the redo header (and redo records can have non-unique SCN). Given redo records from two threads (as in the case of RAC), Recovery will order them in SCN order, essentially maintaining a strict sequential order. As explained in my paper, every redo record has multiple change vectors too.
  2. Every data block also has block SCN (aka block version). In addition to that, a change vector in a redo record also has expected block SCN. This means that a change vector can be applied to one and only version of the block. Code checks if the target SCN in a change vector is matching with the block SCN before applying the redo record. If there is a mismatch, corruption errors are thrown.
  3. Read consistency also uses SCN. Every query has query environment which includes an SCN at the start of the query. A session can see the transactional changes only if that transaction commit SCN is lower then the query environment SCN.
  4. Commit. Every commit will generate SCN, aka commit SCN, that marks a transaction boundary. Group commits are possible too.

SCN format

SCN is a huge number with two components to it: Base and wrap. Wrap is a 16 bit number and base is a 32 bit number. It is of the format wrap.base. When the base exceeds 4 billion, then the wrap is incremented by 1. Essentially, wrap counts the number of  times base wrapped around 4 billion. Few simple SQL script will enumerate this better:

In the SQL statement below, we use dbms_flashback package call to get the current system change number, we also convert that number to hex format to breakdown the SCN.

col curscn format 99999999999999999999999

select to_char(dbms_flashback.get_system_change_number,'xxxxxxxxxxxxxxxxxxxxxx'),
dbms_flashback.get_system_change_number curscn from dual;

TO_CHAR(DBMS_FLASHBACK. CURSCN
———————– ————————
280000371 10737419121

Here, hex value of the SCN is 0×280000371 and  decimal format is 10737419121. Let’s review the hex value 0×280000371, this value can be split in to two components, better written as 0×2.80000371, where 0×2 is the wrap and 0×80000371 is the hex representation of base. To verify the base and wrap, we can put them back together to get the SCN value. Essentially, multiply wrap by 4 billion and add base to get the SCN in number format. Script shows the output and see that these two numbers are matching.

col n2 format  99999999999999999999999

select to_number(2,'xxxxxxx') * 4 * power(2,30) + to_number(80000371,'xxxxxxxxxxxxxxxxxxxxxx') n2 from dual
N2
 -------------------
 10737419121

If you continue the discussion logically, then maximum value of the wrap defines the maximum value of SCN i.e. maximum value of wrap*4 billion = 65536* 4* power(2,30) = 281,474,976,710,656 = 281 trillion values.

Does each change increment SCN?

Not necessarily. The SCN increment is  not for every change. For example, in the script below, we will change the table 1000 times, but the generated SCN will be very few.

create table  rs.dropme (n1 number , n2 number);
test_case_scn.sql:
--------------cut --------------
col curscn format 99999999999999999999999
select dbms_flashback.get_system_change_number curscn from dual;
begin
 for i in 1 .. 1000
 loop
 insert into rs.dropme values(i, i);
 end loop;
end;
/
select dbms_flashback.get_system_change_number curscn from dual;
------------cut -----------------
alter system switch log file;
SQL> @test_case_scn
  CURSCN
------------------------
10737428262
PL/SQL procedure successfully completed.

CURSCN
------------------------
10737428271
SQL> alter system switch logfile;
System altered.
Even though there were 1000 changes to the table, just 9 SCNs increased. If we dump the redo record using the script dump_last_log.sql (script is inline at the end of this post) then we can see redo records have both SCN and SUBSCN below too. Many REDO records are having same SCN and SUBSCN combo.
REDO RECORD - Thread:1 RBA: 0x000010.0000001c.018c LEN: 0x00fc VLD: 0x01
SCN: 0x0002.8000fb87 SUBSCN:  1 01/19/2012 09:14:27
REDO RECORD - Thread:1 RBA: 0x000010.0000001d.0098 LEN: 0x00fc VLD: 0x01
SCN: 0x0002.8000fb87 SUBSCN:  1 01/19/2012 09:14:27
REDO RECORD - Thread:1 RBA: 0x000010.0000001d.0194 LEN: 0x00fc VLD: 0x01
SCN: 0x0002.8000fb87 SUBSCN:  1 01/19/2012 09:14:27
REDO RECORD - Thread:1 RBA: 0x000010.0000001e.00a0 LEN: 0x00fc VLD: 0x01
SCN: 0x0002.8000fb87 SUBSCN:  1 01/19/2012 09:14:27
...

Database link and SCNS

Database link based transactions can cause SCN increases too. For example, let’s say that, three databases db1, db2, and db3 participate in a distributed transaction and let’s say that their current SCN is 1000, 2000, 5000 respectively in these databases. At commit time, a co-ordinated SCN is needed for the distributed transaction and maximum SCN value from all participating databases is chosen; SCN value of these three databases will be increased to 5000.

Can you run out of SCN?

As you saw earlier, maximum SCN hard limit is 281 trillion. In addition to that, there is also a soft limit imposed by Oracle code as a protection mechanism. If the next SCN is more than the soft limit, ORA-600[2252] is emitted and the operation cancelled. For example, in the case of database link based distributed transaction, if the co-ordinated SCN is greater than the soft limit ORA-600 emitted.

This soft limit is calculated using the formula (number of seconds from 1/1/1988) * 16384. As the number of seconds from 1/1/1988 is continuously increasing, soft limit is increasing at the rate of 16K per second continuously. Unless, your database is running full steam generating over 16K SCNs, you won’t run in to that soft limit that easily. [ But, you could create ORA-600[2252] by resetting your server clock to 1/1/1988].

Problem comes if many interconnected databases each generating at higher rate in kind of round-robin fashion.DB1 generates 20K SCNs per second in the first 5 minutes, DB2 generates 20K SCNs per second in the next 5 minutes, DB3 generates 20K SCNs per second in the next 5 minutes etc. In this case, all three Databases will have a sustained 20K SCNs per second rate. Database is slowly catching up to soft limit (1 second per every 4 second exactly) and again, it will take many years for them to catch up to the soft limit assuming the databases are active, continuously. But, there is that  infamous, hated by my client,  hot backup bug.

(BTW, To reach hard limit,  it will take 544 years to run out of SCN at 16K rate normally (65536*4*1024*1024*1024 / 16384 / 60/60/24/365)).

Here is an example of ORA-600 [2252] error. In this example lines printed below, 2838 is the SCN wrap and 395527372 is the SCN base. If we convert this to decimal SCN it is in the 12 Trillion range. Database link based connection was trying to increase the SCN over 12 Trillion value, but it was rejected by the database as the SCN was exceeding the soft limit.

 ORA-00600: internal error code, arguments: [2252], [2838], [395527372], [], [], [], [], [], [], [], [], []
BTW, in 10g, this 16K per second was hard coded. But, 11gR2, this limit is controlled by an underscore parameter _max_reasonable_scn_rate defaulting to 32K.

Hot backup bug

Most DBAs use RMAN to do backup. But, still, there are few databases that use hot backup mode, primarily because of disk mirror based backups. It is a common behavior to see higher SCN rate if the database is altered to hot backup mode. A SGA variable array keeps track of the backup mode at file level. When you alter the database out of backup mode, SGA variables are reset and the higher SCN rate goes back to normal. Due to a bug (12371955), that SGA variable is not reset leaving the database to think that it is still in hot backup mode. Database generates SCN at higher rate. (if you recycle the database later, of course, the variable is reset to normal rate). There is way to dump the SGA variable to check if the database currently thinks if it is in hot backup mode or not.

Due to this bug, an highly active database can create increased SCN rate over 16K. Over a long period of time (in fact, it probably will take many years) the SCN catches up to the soft limit. Once soft limit is reached, next SCN update will throw ORA-660[2252] errors. Of course, this SCN growth is propagated to other databases over database link. As the soft limit calculation is time based, time zone of the server is also important. For example, if the values are close enough to soft limit, then the databases running in US Eastern time zone will have an higher soft limit by (4*60*60*16384 =235 million ) then the databases running in Pacific Time Zone.

Salient points of the bug are:
  1. There is no corruption danger, sessions might die or the databases might throw ORA-600 errors. In rare cases, databases have to be kept down for few hours or distributed transaction removed from the database so that the head room between the soft limit and the current SCN is widen.
  2. This bug affects only if you use ‘ALTER DATABASE’ command. If you use, ‘ALTER TABLESPACE’ command for backup, you are not affected by this bug.
  3. SCN rate is also directly relevant to activity. If the database has lower activity, SCN rate is also lower, even when the database is altered to backup mode with this bug.
There is a script released by Oracle that can tell you how close your database is to the soft limit,aka SCN headroom. So, first check if your database is having any SCN issue or not, that script is available as bug 13498243 and tells you how many days of SCN headroom you have.

How to check SCN rate?

There are multiple ways to check SCN rate in your database.
Method 1: smon_scn_time keeps track of the mapping between time and SCN at approximately 5 minutes granularity. That can be used to measure SCN rate, see code below. Although, this is easier to check, remember that there is no easy way to identify if the SCN increase is due to intrinsic activity in the database or is it due to an external database increasing the SCN by a distributed transaction activity. We will discuss this differentiation later.
with t1 as(
select time_dp , 24*60*60*(time_dp - lag(time_dp) over (order by time_dp)) timediff,
  scn - lag(scn) over(order by time_dp) scndiff
from smon_scn_time
)
select time_dp , timediff, scndiff,
       trunc(scndiff/timediff) rate_per_sec
from t1
order by 1
/
TIME_DP                TIMEDIFF    SCNDIFF RATE_PER_SEC
-------------------- ---------- ---------- ------------
19-JAN-2012 15:23:21        315       2931            9
19-JAN-2012 15:25:46        145        708            4
19-JAN-2012 15:28:00        134       1268            9
19-JAN-2012 15:30:48        168        597            3
19-JAN-2012 15:35:51        303       4148           13
19-JAN-2012 15:36:47         56        103            1
19-JAN-2012 15:42:14        327        671            2
Method 2:
v$log_history also can be used to check the SCN rate of the database. In this code below, you can see the SCN rate per second queried from v$log_history. Even if you are running in RAC, query against v$log_history is sufficient as it holds the archive logs from all threads. If there is a SCN spike, say from a remote database, then you will see a SCN spike in the output of this query below.
alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
col first_change# format 99999999999999999999
col next_change# format 99999999999999999999
select  thread#,  first_time, next_time, first_change# ,next_change#, sequence#,
   next_change#-first_change# diff, round ((next_change#-first_change#)/(next_time-first_time)/24/60/60) rt
from (
select thread#, first_time, first_change#,next_time,  next_change#, sequence#,dest_id from v$archived_log
where next_time > sysdate-30 and dest_id=1
order by next_time
)
order by  first_time, thread#
/

   THREAD# FIRST_TIME                   FIRST_CHANGE#          NEXT_CHANGE#  SEQUENCE#       DIFF         RT
---------- -------------------- --------------------- --------------------- ---------- ---------- ----------
         2 12-JAN-2012 16:10:30              25995867              26026647        308      30780          0
         1 17-JAN-2012 14:05:00              26026649              26028427        555       1778          1
         1 17-JAN-2012 14:05:00              26026649              26028427        555       1778          1
         2 17-JAN-2012 14:05:00              26026647              26028432        309       1785          1
         2 17-JAN-2012 14:05:00              26026647              26028432        309       1785          1
         1 17-JAN-2012 14:27:21              26028427            1073743815        556 1047715388     814076
         2 17-JAN-2012 14:48:48              26028157              26028230          1         73          3
         2 18-JAN-2012 14:22:23              26076103           10737418303          3 1.0711E+10    7448778
         1 18-JAN-2012 14:22:24              26076106           10737427850          5 1.0711E+10    1458319
         1 18-JAN-2012 16:24:49           10737427850           10737427884          6         34          2
         1 18-JAN-2012 16:25:03           10737427884           10737428252          7        368          1

In the output above,  there was a SCN jump by 10 Billion between 14:27 and 14:05. You can’t differentiate if that increase came from external systems or is it due to intrinsic activity easily. In this specific case, because this is an extreme SCN increase, and I would guess that it came from external systems. ( But usually this level of SCN increase will not happen in your production site and my example is to just explain the concept).

What happens in RAC?

In RAC, instance that receive the update from external system will increase the SCN of the database SCN to the new higher SCN. When other instances query for next SCN, immediately that SCN increase will be propagated to other instances too.

Can two threads get same SCN?

Obvious answer is No. Correct answer is yes. For example, redo records from two threads shows that they have exact same SCN and subSCN. This is not a problem or concern, as the buffer changes are protected by GCS layer code, and the row changes are protected by locking mechanism.
node 1:

REDO RECORD - Thread:1 RBA: 0x000010.0000007f.0114 LEN: 0x0138 VLD: 0x01
SCN: 0x0002.8000fb91 SUBSCN:  1 01/19/2012 09:14:27

node 2:
REDO RECORD - Thread:2 RBA: 0x000007.00000003.0010 LEN: 0x0068 VLD: 0x05
SCN: 0x0002.8000fb91 SUBSCN:  1 01/19/2012 09:14:27

Intrinsic vs Extrinsic SCN growth

There is a statistic that can also guide us to determine if the SCN increase is intrinsic or extrinsic or not. Statistics ‘calls to kcmgas’ gives an approximate number of calls to allocate SCNs. This statistics is an estimate only, not an absolute count of generated SCNs. We will understand this stats with a script and an helper function.


create or replace function get_my_statistics (l_stat_name varchar2)
return number as
l_value number;
begin
select ses.value into l_value
from v$sesstat ses , v$statname stat
where stat.statistic#=ses.statistic# and
ses.sid=(select sid from v$mystat where rownum <=1) and stat.name = l_stat_name;
return l_value;
end;
/
alter system switch logfile;
host sleep 5
create table rs.dropme (n1 number , n2 number);
col curscn format 99999999999999999999999
select dbms_flashback.get_system_change_number curscn , get_my_statistics('calls to kcmgas') kcmgas from dual;
begin
for i in 1 .. 100000
loop
insert into rs.dropme values(i, i);
end loop;
end;
/
select dbms_flashback.get_system_change_number curscn , get_my_statistics('calls to kcmgas') kcmgas from dual;
alter system switch logfile;

Output of the above script is:
                  CURSCN     KCMGAS
------------------------ ----------
             10737522265          0
 PL/SQL procedure successfully completed.
                  CURSCN     KCMGAS
------------------------ ----------
             10737523122        826
From the output, we can see that 857 SCN differences vs 826 kcmgas calls form this session. There could be other background processes generating SCN which would explain this difference. Even at instance level, it doesn’t match exactly, but multiplying ‘kcmgas calls’ statistics by 1.1 gives you better estimate. This method can be used to identify if the SCN growth is intrinsic or extrinsic in a database. It can be also  used to identify the instance generating more SCNs in a RAC cluster or the database generating more SCNs in a complex interconnected environment.

SCN Vulnerability issue

I am not going to discuss details about this vulnerability issue at all. But, this vulnerability require access to production database. DBAs with security in mind, don’t allow production access that easily anyway. So, In my opinion, it is a problem that must be addressed, but you would need a malicious DBA with expert level knowledge to misuse this vulnerability. Follow Oracle support direction on this one as I usually stay away from talking about security vulnerability issues. Check here for details

Summary

I have been holding on publishing this blog entry for many months now. Since this issue is in the public knowledge domain, I can share the knowledge without any repercussions. In a nutshell, understanding SCN generation and intrinsic details about it is important. Armed with scripts, you can review your environment.

Dump_last_log script is not printing properly in html format.

printing in line>

-------------------------------------------------------------------------------------------------
--  Script : dump_last_log.sql
-------------------------------------------------------------------------------------------------
-- This script will dump the last log file.
--   If the log file is big with enormous activity, this might take much resource.
--
--
--  Author : Riyaj Shamsudeen
--  No implied or explicit warranty !
-------------------------------------------------------------------------------------------------
set serveroutput on size 1000000
declare
  v_sqltext varchar2(255);
begin
 select 'alter system dump logfile '||chr(39)||member||chr(39)
 into v_sqltext
 from
v$log  lg, v$logfile lgfile
where lg.group# = lgfile.group# and
lg.sequence# = (select sequence#-1 from v$log where status='CURRENT' and
                  thread#=(select thread# from v$instance )
and lg.thread#=(select thread# from v$instance)
and rownum <2
;
  dbms_output.put_line ('Executing :'||v_sqltext);
  execute immediate v_sqltext;
end;
/

update 1: Fixed formatting and typo.

update 2: Fixed to read “Essentially, multiply base by 4 billion and add wrap to get the SCN in number format”
update 3: updated a typo in a code fragment.

The strangest Oracle problem I ever encountered – can you guess the cause?

Before I joined Blue Gecko, I did independent remote DBA work, and called myself ORA-600 Consulting. Stemming from my hair-raising experiences in the trenches at Amazon in the late ’90s / early 2000s, I decided to specialize in emergency DBA work for companies in the midst of crises (I know, great idea for someone who wanted to get away from the Amazon craziness, right?).

One day in 2009, a company in Florida called my cell phone at 2AM. They described their problem as follows:

We have a 32-bit Intel server running Red Hat Enterprise Linux 4 and Oracle Database Enterprise Edition 9.2.0.1. There are four databases ranging in size from 20G to 100G. The storage is EXT3 filesystems on partitions of an Apple Xserv RAID5 array.

We had a power outage yesterday, and the database server powered down and booted back up. Prior to yesterday, it has not rebooted for about one year. We have been running trouble-free for the previous year. Upon reboot, Oracle started automatically, but all of the databases appeared as they did about one year ago. It is like the database hasn’t been saving the changes we have been making for the past year. None of the inserts, updates or deletes made in the past year are present in the databases. We are absolutely flummoxed. Please help!

I logged into the server and it was just as they described. Even the alert log and messages files ended suddenly about one year prior, and picked up again on the day of the most recent reboot. There was no trace of the intervening 12 months of work. The customer was ready to resort to their backups, but wanted to understand the problem before they proceeded. In addition, restoring backups would mean losing the last 24 hours of transactions, since archivelogs had not gone to tape for that long, and they were missing just like everything else from before the most recent reboot.

They weren’t the only ones who were flummoxed. I just sat there thinking, “where do I start?” After some poking around, though, I solved the problem. Any guesses what went wrong here? I’ll post the solution in about a week. No fair posting the solution if I’ve told you this story before!

Related posts:

  1. Oracle (finally) announces support for Oracle Database 11gR2 on OEL6
  2. Oracle Database Hosting on AWS
  3. Oracle Backups to S3

Fast Recovery Area

I’d be grateful if anyone who is responsible for Oracle database backups could take a quick (3 question) survey on their use or otherwise of the Fast (Flash) Recovery Area feature. http://www.surveymonkey.com/s/J93L7WT All responses are anonymous.

Quiz Night

In my previous post, I made the comment:

In general, if you have a three-column index that starts with the same columns in the same order as the two-column index then the three-column index will be bigger and have a higher clustering_factor.

So what scenarios can you come up with that fall outside the general case ?
Alternatively, what argument could you put forward that justifies the general claim ?

I’ll try to respond to comments on this post a little more quickly than the last one, but I still have quite a lot of other comments to catch up on.

Buffer Busy Waits and Disk file operations I/O

 

Database is getting high waits on buffer busy waits. Here is an example period where 5 sessions are completely blocked on buffer busy waits for 4 minutes:

  select
       to_char(min(sample_time),'HH24:MI') maxst,
       to_char(max(sample_time),'HH24:MI') minst,
       count(*),
       session_id,
       ash.p1,
       ash.p2,
       ash.SQL_ID,
       blocking_session bsid
  from DBA_HIST_ACTIVE_SESS_HISTORY ash
  where event='buffer busy waits'
     and session_state='WAITING'
group by
       session_id
       ,sql_id
       ,blocking_session
       ,ash.p1
       ,ash.p2
Order by  1
/

gives

MAXST MINST   COUNT(*) SESSION_ID         P1         P2 SQL_ID           BSID
----- ----- ---------- ---------- ---------- ---------- ------------- -------
21:54 21:58         26       1526       1812     278732 3gbsbw6w8jdb3      11
21:54 21:58         25        528       1812     278732 3gbsbw6w8jdb3      11
21:54 21:58         25       1514       1812     278732 3gbsbw6w8jdb3      11
21:54 21:58         26        777       1812     278732 3gbsbw6w8jdb3      11
21:54 21:58         25         33       1812     278732 3gbsbw6w8jdb3      11

All are waiting on the same file and block held by one session and all are executing the same statement which is a select for update

The blocker is executing the same SQL statement and spends those 4 minutes waiting for “Disk file operations I/O”:

select
       min(to_char(sample_time,'HH24:MI')) minst,
       max(to_char(sample_time,'HH24:MI')) maxst,
       session_id,
       substr(event,0,30) event,
       count(*),
       ash.SQL_ID,
       blocking_session bsid
from DBA_HIST_ACTIVE_SESS_HISTORY ash
where
       session_id in  ( 11)
group by
     event,sql_id,session_id, blocking_session
order by 1
/

gives

MINST MAXST EVENT                            COUNT(*) SQL_ID           BSID
----- ----- ------------------------------ ---------- ------------- -------
21:54 21:58 Disk file operations I/O               26 3gbsbw6w8jdb3

What are “Disk file operations I/O”?

From the docs , http://docs.oracle.com/cd/E18283_01/server.112/e17110/waitevents003.htm#insertedID40

Disk file operations I/O

This event is used to wait for disk file operations (for example, open, close, seek, and resize). It is also used for miscellaneous I/O operations such as block dumps and password file accesses.

FileOperation Type of file operation
fileno File identification number
filetype Type of file (for example, log file, data file, and so on)
What kind of FileOperations and filetype are occuring:
select p1,p3, count(*) from
dba_hist_active_sess_history
where event ='Disk file operations I/O'
group by p1,p3
/
        P1         P3   COUNT(*)
---------- ---------- ----------
         2          1        193
         2          3         14
         4          4          1
         2          2       4459
         3          4        160
         1         18        103
So mainly FileOperation type 2 and filetype 2.
What are the file types? Not sure, but thanks to Andy Klock on Oracle-L  this looks like a possibility:
select distinct filetype_id, filetype_name from DBA_HIST_IOSTAT_FILETYPE order by 1;

FILETYPE_ID FILETYPE_NAME
----------- ------------------------------
0 Other
1 Control File
2 Data File
3 Log File
4 Archive Log
6 Temp File
9 Data File Backup
10 Data File Incremental Backup
11 Archive Log Backup
12 Data File Copy
17 Flashback Log
18 Data Pump Dump File

What about FileOperation=2? After a call to Oracle support, it looks like on this version, 11.2.0.3 the values are:

    1 file creation
    2 file open
    3 file resize
    4 file deletion
    5 file close
    6 wait for all aio requests to finish
    7 write verification
    8 wait for miscellaneous io (ftp, block dump, passwd file)
    9 read from snapshot files

putting this together gives a script like:

ol file_type for a20
col file_operation for a20
select
    decode(p3,0 ,'Other',
              1 ,'Control File',
              2 ,'Data File',
              3 ,'Log File',
              4 ,'Archive Log',
              6 ,'Temp File',
              9 ,'Data File Backup',
              10,'Data File Incremental Backup',
              11,'Archive Log Backup',
              12,'Data File Copy',
              17,'Flashback Log',
              18,'Data Pump Dump File',
                  'unknown '||p1)  file_type,
    decode(p1,1 ,'file creation',
              2 ,'file open',
              3 ,'file resize',
              4 ,'file deletion',
              5 ,'file close',
              6 ,'wait for all aio requests to finish',
              7 ,'write verification',
              8 ,'wait for miscellaneous io (ftp, block dump, passwd file)',
              9 ,'read from snapshot files',
                 'unknown '||p3) file_operation,
    decode(p3,2,-1,p2) file#,
    count(*)
from dba_hist_active_sess_history
where event ='Disk file operations I/O'
group by p1,p3,
    decode(p3,2,-1,p2)
/

with output like

FILE_TYPE            FILE_OPERATION            FILE#   COUNT(*)
-------------------- -------------------- ---------- ----------
Control File         file open                     0        193
Data File            file open                    -1       4460
Archive Log          file deletion                 0          1
Log File             file open                     0         14
Data Pump Dump File  file creation                 0        103
Archive Log          file resize                   8        160

The “-1″ for datafiles is to group all the datafiles in one line, otherwise in the above case there were over 200 lines of output
For the datafiles what are the I/O latencies looking like compared to the Disk file operations I/O ?

BHOU EVENT_NAME                        AVG_MS           CT
---- ------------------------------ --------- ------------
1054 Disk file operations I/O            2.00       13,547
1130 Disk file operations I/O            1.52       10,658
1200 Disk file operations I/O            1.57        9,846
1230 Disk file operations I/O            2.45        8,704
1300 Disk file operations I/O            3.84        9,526
1330 Disk file operations I/O            2.39       11,989
1400 Disk file operations I/O            1.68       14,698
1430 Disk file operations I/O            2.89       14,863
1500 Disk file operations I/O          860.85       10,577
1530 Disk file operations I/O           12.97       11,783
1600 Disk file operations I/O          623.88       10,902
1630 Disk file operations I/O          357.75       12,428
1700 Disk file operations I/O          294.84       10,543
1730 Disk file operations I/O           12.97       10,623
1800 Disk file operations I/O          461.91       14,443
1830 Disk file operations I/O           12.83       18,504
1900 Disk file operations I/O          443.37        9,563
1930 Disk file operations I/O          237.39       11,737
2000 Disk file operations I/O          542.44       13,027
2033 Disk file operations I/O            6.11        8,389
2100 Disk file operations I/O           16.85       10,561
2130 Disk file operations I/O          306.17        9,873
2200 Disk file operations I/O           20.83       11,335
2230 Disk file operations I/O           12.92       10,158
2300 Disk file operations I/O           13.42       11,025
2330 Disk file operations I/O           15.01       10,883
0000 Disk file operations I/O            5.33        8,533
1054 db file scattered read              1.50       92,394
1130 db file scattered read              1.33       73,243
1200 db file scattered read              1.82      122,988
1230 db file scattered read              2.53      255,474
1300 db file scattered read              4.26      288,144
1330 db file scattered read              2.47      308,045
1400 db file scattered read              2.60       91,684
1430 db file scattered read              3.56      176,324
1500 db file scattered read              4.95      621,658
1530 db file scattered read              5.11      227,565
1600 db file scattered read              5.86      472,804
1630 db file scattered read              9.44      224,984
1700 db file scattered read              9.40      165,238
1730 db file scattered read              7.78      349,003
1800 db file scattered read              6.93      252,761
1830 db file scattered read              7.79      151,760
1900 db file scattered read              5.48      165,369
1930 db file scattered read              3.09      200,868
2000 db file scattered read              3.45      136,647
2033 db file scattered read              5.17      136,330
2100 db file scattered read             11.16      103,799
2130 db file scattered read             10.44      118,025
2200 db file scattered read             20.02      127,638
2230 db file scattered read             13.66      157,210
2300 db file scattered read             10.95       98,493
2330 db file scattered read              8.39      149,606
0000 db file scattered read              4.16      230,075
1054 db file sequential read             3.04    1,152,102
1130 db file sequential read             7.75      165,262
1200 db file sequential read             6.74       23,876
1230 db file sequential read             5.30       10,026
1300 db file sequential read             3.34      496,681
1330 db file sequential read             1.58    1,253,208
1400 db file sequential read             8.86      239,247
1430 db file sequential read            12.91      191,376
1500 db file sequential read            19.97       73,061
1530 db file sequential read            17.80       43,662
1600 db file sequential read            12.41      144,741
1630 db file sequential read             8.99      411,254
1700 db file sequential read             8.03      540,138
1730 db file sequential read             9.26      422,317
1800 db file sequential read            19.16      155,787
1830 db file sequential read             6.01      641,517
1900 db file sequential read             4.79      573,674
1930 db file sequential read             2.72      824,991
2000 db file sequential read             1.59      504,650
2033 db file sequential read             1.88      324,741
2100 db file sequential read            24.32       74,026
2130 db file sequential read            16.05       67,545
2200 db file sequential read            15.52      219,928
2230 db file sequential read             9.87      259,956
2300 db file sequential read            15.18      122,362
2330 db file sequential read             9.97       94,124
0000 db file sequential read            14.19       50,264

Tough to see what is going on
Let’s lay it out in a bubble chart. Latency in ms is on the Y-Axis, time on the X-Axis (AWR snapshots every half hour) and size of the ball is the amount of requests:

The Disk file operations I/O doesn’t seem to correlate with disk activity nor latency. When disk activity is high, on the left, with much data coming off spindle (ie 8ms average) Disk file operations I/O is fast, ie 2ms. But at 20:00,  there is medium I/O activity but much of it is coming from cache, ie 2ms average and the Disk file operations are slow.

The database is not using ASM. The number of datafiles is 900. Open file descriptors is set at 8192.

Might be time to run some DTrace on the file operations to get a better idea of what is going on.

.
UPDATE:

Full disclosure: this database was running over NFS on Delphix OS which supports DTrace. I ran DTrace on all the NFS operations and none took over 2 seconds, which is far from the 4 minutes reported by the database.

Ended writing a script to run pstack on sessions that were waiting for Disk file operation I/O for more than 10 seconds.  When the pstack was taken the first time, there is no guarentee the process was still on the same wait, but if a second pstack was taken for the same process, then it guarenteed that the first pstack was during the wait.  All such waiters that had 2 pstacks  were waiting  in the first pstack for

_close_sys()

So, although Oracle looks to be reporting slow file open operations, the pstacks are showing problems on close file.

Googling for on HP ( the database was running on HP), I found a couple of links, but nothing definitive

http://www.unix.com/hp-ux/30396-socket-close-hangs-cpu-go-100-a.html
http://h30499.www3.hp.com/t5/Networking/CPU-usage-is-100-when-close-sys-call-is-made/td-p/4609196

Oracle reports a few bugs with the wait event “Disk file operation I/O” but none see applicable to this install

Bug 10361651 – File open may hang/spin forever if dNFS is unresponsive [ID 10361651.8]

This bug is suppose to be fixed on 11.2.0.3 and applies to dNFS, though the issues sounds like it would fit – if an I/O operation takes “long” then Oracle’s wait times get corrupted and the process waits much much too long on the event.

Using R and Oracle tracefiles

If you are not familiar with R, this is the description from the R site: R is a language and environment for statistical computing and graphics. I encountered R at the Erasmus university, where I am working on a project with DNA data which is put in an Oracle database (see: http://huvariome.erasmusmc.nl/ (It’s down at the moment)).

R is about using statistics, and helps greatly because of the ability to make graphics of statistical data, which you can use to correlate figures. An example is shown here by Greg Rahn (Structured data) and here an example of Stefan Moeding (blog, the name is a guess, I’ve composed it from the author’s name of the articles on the blog, and I’ve guessed the name of the site (www.moeding.net) is the surname).

I probably will blog about using database contents and R in the future (most probably active session history (ash) data). This blog post is about using R and Oracle database sqltrace with waits (also known as 10046 level 8 traces or 10046/8 traces).

Getting hands-on

If you want to experiment with this, the first you need is R itself. My advice would be to get and use Rstudio, which is a free and open source R integrated development environment (IDE).

Next you need a real Oracle tracefile. If you just want to experiment and don’t have a database at hand with which you can generate one, I have a sample trace file: v11203_ora_19295.trc. I tried to read the tracefile into R (R is very good at using CSV data), and process it to make it usable, but that was very hard. In order to make the tracefile in a CSV-like format, I use a small sed (stream editor; available on Mac OS-X, Linux, well…on most modern platforms except for Windows) script called waits.sed. This script outputs all WAIT lines in the tracefile in a way it can be easily used as CSV file.

This is the content of the waits.sed script:

# format WAIT rows from an oracle 10046/sql trace to
# 1:WAIT 2:cursor number 3:wait name 4:elapsed time 5:p1 6:p2 7:p3 8:object number 9:time
s/^\(WAIT\)\ #\([0-9]*\):\ nam='\(.*\)'\ ela=\ *\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ obj#=\([0-9\-]*\)\ tim=\([0-9]*\)$/\1|\2|\3|\4|\5|\6|\7|\8|\9/

In order to make a file with only waits, use the following command:

sed -f waits.sed v11203_ora_19295.trc | grep ^WAIT > v11203_ora_19295.trc.waits

This creates a new file called v11203_ora_19295.trc.waits with only the waits in a CSV format.

Now we are at the point where we get to use R! Open RStudio, and open this R scipt: tracewaits.R. Now choose ‘Edit’,'Run code’,'Run all’. If the v11203_ora_19295.trc.waits file is in the correct directory, the ‘plots’ tab on the right lower side will display the following picture:
Alright. Not much to see yet! This tells us we got two cursors with wait lines in this tracefile. To choose the waits of the one cursor of interest in this tracefile, uncomment (remove the hash) in the following line in the R script:

#trc <- subset(trc, cursor == "47388340751480")

Now run the script again. Now the graph show something far more interesting:

What do we see here? The Y axis shows the value in the ‘p3′ column of the waits. The most waits are ‘direct path read’ (light blue), which means ‘p3′ is the number of blocks read per IO call. The X axis numbers are the seconds of the timestamp in the ‘tim’ column.

We see the number of blocks read per IO call is low for the first few IO calls, then builds up to 126, and later becomes 128 (the maximum in this case). What we see here is automatically allocated extents (of which the first few allocated extents are very small, then bigger, and so on)

Now let’s add the time waited for the waits to the graph, and see if there’s a correlation between blocks read per wait and time waited per wait:

Uncomment the following line:

#p <- p + geom_line(aes(tim, ela,colour="elapsed time")) + stat_smooth(aes(y=ela))

And run the script again. This adds the time waited (ela column) to the graph as a line, and a trendline (stat_smooth):

The “ela” or “elapsed time” line shows why the bar’s in the graph do not have a constant distance between each other: the IO time heavily fluctuates (please mind this is because of my overloaded laptop :) ). Also, when looking at the beginning, I can see the IO times (on my laptop at least) for the small IO’s is lower than the bigger IO’s:
(add the following line to the script to zoom in to the beginning:)

trc <- subset(trc, p2 <= 642)

Which gives this graph:

This clearly shows the bigger (126 blocks) IO’s have a higher response time.

I hope this encourages people to use R and to extend the waits.sed script (it’s a very simple script now, and ONLY handles waits, not parse, execute, fetch, etc.), and extend the R script. Using ‘facet_grid’ you can put multiple graphs of tracefiles/waits which are exactly aligned which each other, so you can see any potential correlation between the two. Also you can overlay multiple graphs, like shown in this example: a graph showing the number of blocks per IO and a graph show the elapsed time per IO.

Have fun using and experimenting with R!

Tagged: oracle R trace tracefile statistics correlation performance

Oracle documentation on a Kindle

I recently bought myself a Kindle – the keyboard 3G version. Keyboard as I know I will want to add notes to things and the 3G version for no better reason than some vague idea of being able to download things when I am away from my WiFi.

So, how about getting Oracle documentation onto it? You can get the oracle manuals as PDF versions (as opposed to HTML) so I knew it was possible and that others have done so before. A quick web search will show a few people have done this already – one of the best posts is by Robin Moffat.

Anyway, this is my take on it.

1) Don’t download the PDF versions of the manuals and then just copy them onto your kindle. It will work, but is not ideal. PDF files are shown as a full page image in portrait mode and parts are unreadable. Swap to landscape mode and most text becomes legible and you can zoom in. In both modes there is no table of contents and none of the links work between sections. All you can do is step back and forth page by page and skip directly to pages, ie goto page 127. This is not so bad actually as quite often the manual states the page to go to for a particular figure or topic.

2) Do download the MOBI format of the manuals you want, if available. Oracle started producing it’s manuals in Mobi and Epub format last year. I understand that Apple’s .AZW format is based on .MOBI (Mobipocket) format. As such text re-flows to fit the screen of the Kindle. I’ve checked a few of the DBA_type manuals for V10 and V11 and Mobi files seem generally available, but not a couple I checked for 10.1. If there is no Mobi, you can still revert to downloading the PDF version.

3) You cannot download a set of manuals in this format and you won’t see an option to download an actual manual in MOBI format until you go into the HTML version of the document.

I can understand that it would be a task for someone in Oracle to go and create a new downloadable ZIP of all books in these formats or, better still, sets to cover a business function (like all DBA-type books and all developer-type books), but it would be convenient.
Anyhow, go to OTN’s documentation section, pick the version you want and navigate to the online version of the manual.

Here I go to the 11.2 version – note, I’m clicking on the online set of manuals, not the download option.


Select the HTML version of the document you want, in this case I am grabbing a copy of the performance tuning guide. As you can see, this is also where you can choose the PDF version of the manual

Once the first page comes up, you will see the options for PDF, Mobi and Epub versions at the top right of the screen (see below). I wonder how many people have not realised the manuals are now available in new ebook formats, with the option only there once you are in the manual itself?

I’ve already clicked the Mobi link and you can see at the bottom left of the screen-shot, it has already downloaded {I’m using Chrome, BTW}. Over my 4Mb slightly dodgy broadband connection it took a few seconds only.

4) I don’t like the fact that the files are called things like E25789-01.mobi. I rename them as I move them from my download directory to a dedicated directory. You then attach up your kindle to your computer and drag the files over to the kindle’s “documents” folder and, next time you go to the main menu on the kindle, they will appear with the correct title (irrespective of you renaming them or not)

5) If you download the PDFs I would strongly suggest you rename these files before you move them to the kindle as they will come up with that name. I have a booked called e26088 on my kindle now – which manual is that? {don’t tell me, I know}. I have not tried renaming the file on the kindle itself yet.

6) You don’t have to use a PC as an intermediate staging area, you can directly download the manuals to your kindle, if you have a WiFi connection. Go check out chapter 6 of the kindle user guide 4th edition for details, but you can surf the web on your kindle. Press HOME, then MENU and go down to EXPERIMENTAL. click on “Launch Browser” (if you don’t have wireless turned on, you should get prompted). I’d recommend you flick the kindle into landscape mode for this next bit and don’t expect lightning fast response. If it does not take you to the BOOKMARKS page, use the menu button to get there and I’d suggest you do a google search for OTN to get to the site. Once there navigate as described before. When you click on the .Mobi file it should be downloaded to your kindle in a few seconds. Don’t leave the page until it has downloaded as otherwise the download will fail.

There you go, you can build up whatever set of oracle manuals you like on your ebook or kindle and never be parted from them. Even on holiday…

I’ve obviously only just got going with my Kindle. I have to say, reading manuals on it is not my ideal way of reading such material. {story books I am fine with}. I find panning around tables and diagrams is a bit clunky and the Kindle is not recognising the existence of chapters in the Oracle Mobi manuals, or pages for that matter. However, the table of contents works, as do links, so it is reasonably easy to move around the manual. Up until now I’ve carried around a set of Oracle manuals as an unzipped copy of the html download save to a micro-USB stick but some sites do not allow foreign USB drives to be used. I think I prefer reading manuals on my netbook to the kindle, but the kindle is very light and convenient. If I ever get one of those modern smart-phone doo-dahs, I can see me dropping the netbook in favour of the smartphone and this kindle.

Of course, nothing beats a big desk and a load of manuals and reference books scattered across it, open at relevant places, plus maybe some more stuff on an LCD screen.

Advert: ODTUG Webcast – “Thinking Clearly about XML”

I will make an first time attempt tomorrow to do a webcast for ODTUG in the US about how to start with XML in the Oracle database kind of topics. The subject was initially inspired by Cary Millsap’s great whitepaper, “Thinking Clearly About Performance”, that is, my attempt of doing an OakTalk during UKOUG 2011.

Read More…