Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

The Helsinki Platform

Nice picture from "OraDude" showing the Fat Database, or Helsinki's first observation.

Dynamic sampling and partitioned tables

------------------------------------------------------------------------------

Update January 2010: A thread on OTN mentioned this blog post and another, actually contradicting blog post by Asif Momen.

So why are these two blog posts coming to different conclusions regarding Dynamic Sampling and partitions with missing statistics?

This is the good thing with documented test cases - I reproduced what Asif has done and found out that the significant difference between these two test cases is the existence of global level statistics.

In my test case below, I have explicitly gathered statistics only on partition level, and there are no statistics on global level/table (which can be seen from the output of the query against user_tab_statistics below).

Asif has actually gathered statistics on global/table level which can be seen from his blog post.

So the conclusion seems to be: If you prune to a single partition, but this partition has no statistics, then Dynamic sampling will be used if no global/table level statistics are available. If global/table level statistics are available, the optimizer won't perform dynamic sampling and revert to these global/table level statistics instead.

Oddly this obviously doesn't apply to the subpartition/partition level case: Repeating a similar setup with subpartitions having no statistics, but statistics on partition level are available, Dynamic Sampling still was used (tested on 11.1.0.7 Win32).

------------------------------------------------------------------------------

Dynamic sampling for tables with missing statistics is enabled by default from Oracle 10g on (OPTIMIZER_DYNAMIC_SAMPLING = 2). You can get the same behaviour in Oracle 9i by increasing the default dynamic sampling level of 1 to at least 2, by the way, at system, session or statement level (OPTIMIZER_DYNAMIC_SAMPLING parameter or the DYNAMIC_SAMPLING hint at statement level). For more information, see the documentation.

It's an interesting question what happens if you have a partitioned table but only for some of the partitions or subpartitions statistics are missing, and some others have statistics gathered.

Does dynamic sampling selectively kick in depending on which partition accessed or is it simply checking if the table itself has statistics or not?

The following testcase which works only on 11.1 and later since it's using list/range composite partitioning for the subpartition specific tests shows the results of 11.1.0.7 on Win32:

SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

SQL>
SQL> -- Range partitioning testcase
SQL> CREATE TABLE wr_test
2 ( test_id
3 , trade_date
4 , CONSTRAINT test_pk PRIMARY KEY (trade_date, test_id) using index local)
5 PARTITION BY RANGE (trade_date)
6 ( PARTITION p_jan VALUES LESS THAN (DATE '2009-02-01')
7 , PARTITION p_feb VALUES LESS THAN (DATE '2009-03-01')
8 , PARTITION p_mar VALUES LESS THAN (DATE '2009-04-01') )
9 AS
10 SELECT ROWNUM AS test_id
11 , DATE '2009-02-01' + trunc(dbms_random.value(0, 59)) as trade_date
12 FROM dual
13 connect by level <= 1000;

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'WR_TEST', partname=>'p_feb', granularity=>'partition')

PL/SQL procedure successfully completed.

SQL>
SQL> select
2 partition_name
3 , num_rows
4 from
5 user_tab_statistics
6 where
7 table_name = 'WR_TEST';

PARTITION_NAME NUM_ROWS
------------------------------ ----------

P_JAN
P_FEB 491
P_MAR

4 rows selected.

SQL>
SQL> -- Dynamic sampling is selectively used on partitions without statistics
SQL> explain plan for
2 select * from wr_test where trade_date = date '2009-03-01';

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1136113187

--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16 | 352 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 16 | 352 | 2 (0)| 00:00:01 | 3 | 3 |
|* 2 | TABLE ACCESS FULL | WR_TEST | 16 | 352 | 2 (0)| 00:00:01 | 3 | 3 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("TRADE_DATE"=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
- dynamic sampling used for this statement

18 rows selected.

SQL>
SQL> -- No dynamic sampling with statistics in place
SQL> explain plan for
2 select * from wr_test where trade_date = date '2009-02-01';

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3091737428

--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 198 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 18 | 198 | 2 (0)| 00:00:01 | 2 | 2 |
|* 2 | INDEX RANGE SCAN | TEST_PK | 18 | 198 | 2 (0)| 00:00:01 | 2 | 2 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TRADE_DATE"=TO_DATE(' 2009-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

14 rows selected.

SQL>
SQL> drop table wr_test purge;

Table dropped.

SQL>
SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

SQL>
SQL> -- composite partitioning testcase
SQL> CREATE TABLE wr_test
2 ( test_id
3 , trade_date
4 , CONSTRAINT test_pk PRIMARY KEY (trade_date, test_id) using index local)
5 partition by list (test_id)
6 SUBPARTITION BY RANGE (trade_date)
7 (
8 partition p_default values (default)
9 ( SUBPARTITION p_jan VALUES LESS THAN (DATE '2009-02-01')
10 , SUBPARTITION p_feb VALUES LESS THAN (DATE '2009-03-01')
11 , SUBPARTITION p_mar VALUES LESS THAN (DATE '2009-04-01') )
12 )
13 AS
14 SELECT ROWNUM AS test_id
15 , DATE '2009-02-01' + trunc(dbms_random.value(0, 59)) as trade_date
16 FROM dual
17 connect by level <= 1000;

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'WR_TEST', partname=>'p_feb', granularity=>'subpartition')

PL/SQL procedure successfully completed.

SQL>
SQL> select
2 partition_name
3 , subpartition_name
4 , num_rows
5 from
6 user_tab_statistics
7 where
8 table_name = 'WR_TEST';

PARTITION_NAME SUBPARTITION_NAME NUM_ROWS
------------------------------ ------------------------------ ----------

P_DEFAULT
P_DEFAULT P_JAN
P_DEFAULT P_FEB 491
P_DEFAULT P_MAR

5 rows selected.

SQL>
SQL> -- Dynamic sampling also is selectively used on SUBpartitions without statistics
SQL> explain plan for
2 select * from wr_test where trade_date = date '2009-03-01';

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1060835009

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16 | 352 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE | | 16 | 352 | 2 (0)| 00:00:01 | 1 | 1 |
| 2 | PARTITION RANGE SINGLE| | 16 | 352 | 2 (0)| 00:00:01 | 3 | 3 |
|* 3 | TABLE ACCESS FULL | WR_TEST | 16 | 352 | 2 (0)| 00:00:01 | 3 | 3 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("TRADE_DATE"=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
- dynamic sampling used for this statement

19 rows selected.

SQL>
SQL> -- No dynamic sampling with statistics in place
SQL> explain plan for
2 select * from wr_test where trade_date = date '2009-02-01';

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1060835009

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 198 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE | | 18 | 198 | 2 (0)| 00:00:01 | 1 | 1 |
| 2 | PARTITION RANGE SINGLE| | 18 | 198 | 2 (0)| 00:00:01 | 2 | 2 |
|* 3 | TABLE ACCESS FULL | WR_TEST | 18 | 198 | 2 (0)| 00:00:01 | 2 | 2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("TRADE_DATE"=TO_DATE(' 2009-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

15 rows selected.

SQL>
SQL> -- Different treatment of subpartitions in pre-10.2.0.4
SQL> alter session set optimizer_features_enable = '10.2.0.3';

Session altered.

SQL>
SQL> -- Now uses partition-level statistics
SQL> -- These are missing
SQL> -- Therefore dynamic sampling
SQL> -- Although the subpartition accessed has statistics
SQL> -- Bet these are not used by pre-10.2.0.4 optimizer code
SQL> explain plan for
2 select * from wr_test where trade_date = date '2009-02-01';

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1060835009

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23 | 506 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE | | 23 | 506 | 2 (0)| 00:00:01 | 1 | 1 |
| 2 | PARTITION RANGE SINGLE| | 23 | 506 | 2 (0)| 00:00:01 | 2 | 2 |
|* 3 | TABLE ACCESS FULL | WR_TEST | 23 | 506 | 2 (0)| 00:00:01 | 2 | 2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("TRADE_DATE"=TO_DATE(' 2009-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
- dynamic sampling used for this statement

19 rows selected.

SQL>
SQL> -- Gathering statistics on partition level
SQL> exec dbms_stats.gather_table_stats(null, 'WR_TEST', partname=>'p_default', granularity=>'partition', method_opt=>'for all columns size 1')

PL/SQL procedure successfully completed.

SQL>
SQL> -- No longer using dynamic sampling
SQL> explain plan for
2 select * from wr_test where trade_date = date '2009-02-01';

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3944471208

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17 | 187 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE | | 17 | 187 | 2 (0)| 00:00:01 | 1 | 1 |
| 2 | PARTITION RANGE SINGLE| | 17 | 187 | 2 (0)| 00:00:01 | 2 | 2 |
|* 3 | INDEX RANGE SCAN | TEST_PK | 17 | 187 | 2 (0)| 00:00:01 | 2 | 2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TRADE_DATE"=TO_DATE(' 2009-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

15 rows selected.

SQL>

So as can be seen in 11.1 dynamic sampling is selectively used, depending on what kind of partition pruning is recognized by the optimizer at parse time and if statistics have been gathered for that partition, and this also applies to the subpartition level. The same can be seen in 10.2.0.4, apart from the severe bug regarding single subpartition pruning in the 10.2.0.4 patch set release as shown here.

As already mentioned a couple of times here on my blog, the optimizer code of pre-10.2.0.4 versions doesn't use subpartition level statistics even when pruned to a single subpartition and always reverts to the partition level.

Diagnosing and Resolving “gc block lost”

Last week, one of our clients had a sudden slow down on all of their applications which is running on two node RAC environment

Below is the summary of the setup:
– Server and Storage: SunFire X4200 with LUNs on EMC CX300
– OS: RHEL 4.3 ES
– Oracle 10.2.0.3 (database and clusterware)
– Database Files, Flash Recovery Area, OCR, and Voting disk are located on OCFS2 filesystems
– Application: Forms and Reports (6i and also lower)

As per the DBA, the workload on the database was normal and there were no changes on the RAC nodes and on the applications. Hmm, I can’t really tell because I haven’t really looked into their workload so I don’t have past data to compare.

I first setup the OS Watcher on both nodes (with 60sec snapshot interval) to have an end-to-end view of the server performance while I’m doing the diagnosis on the database. It’s a cool tool and automatically gathers a lot of info when you start it (cpu, io, memory, network, process info), also this tool has a graphing facility which makes it faster to detect spikes and any trends.

Then, I looked into the database parameters and compared it with the RDA of the database I had before. This is just to check if there are any changes on the parameters that might cause the sudden slow down. Hmm…there was nothing new

Then, I queried on the V$SESSION

SQL> select count(*) from v$session;      -- server1

  COUNT(*)
----------
       385

SQL> select count(*) from v$session;     -- server2

  COUNT(*)
----------
    49

.
I was surprised! The session count on both servers show that there are more users connected on server1 (almost 89% of the total users). This could be because of
1) the clients having lower versions (< Sql*Plus 8.1 or OCI8, see Note 97926.1) that may not support TAF (FAILOVER_MODE) and Load Balancing (LOAD_BALANCE) or
2) they are using TNS entries explicitly connecting to server1
Since the RAC database has different applications connected to it, knowing which applications are on server1 and how many they are will be enough to determine which users on particular applications could be transferred to server2 to at least balance the number of users on both nodes, but doing this may still not solve the issue as I don’t have enough info that this is the root cause of the problem..

The query below shows that all the users don’t have FAILOVER capabilities which could be attributed by the two points I mentioned above

select distinct s.inst_id instance_id, s.failover_type failover_type, s.failover_method
failover_method, s.failed_over failed_over
from gv$session s;
INSTANCE_ID FAILOVER_TYPE        FAILOVER_METHOD      FAILED_OVER
----------- -------------------- -------------------- --------------------
          1 NONE                 NONE                 NO
          2 NONE                 NONE                 NO

.
Below query shows the distinct applications/modules and their count which are all on server1. Also you’ll notice that most of them are inactive (some of these users should connect to server2 to have balance on both nodes)

SQL> select distinct inst_id, module, program, status, count(*)
from gv$session
group by inst_id, module, program, status
order by 1,2,3,4;

   INST_ID MODULE                         PROGRAM                        STATUS       COUNT(*)
---------- -------------------------------------------------- -------------------------------------------------- ---------- ----------
     1 F45RUN32.EXE                       F45RUN32.EXE                   INACTIVE       23       

... output snipped ....

     1 xxxPgm                         xxxPgm                         INACTIVE       21      

... output snipped ....

     1 c:\xxxwin95\BIN\F45RUN32.EXE L:\xxx\VS\FORMS\MAI   c:\xxxwin95\BIN\F45RUN32.EXE L:\xxx\VS\FORMS\MAI   INACTIVE       44
     1 c:\xxxwin95\BIN\F45RUN32.EXE L:\MAIN.FMX       c:\xxxwin95\BIN\F45RUN32.EXE L:\MAIN.FMX       INACTIVE       31
     1 c:\xxxwin95\BIN\F45RUN32.EXE w:\xxxMENU.fmx       c:\xxxwin95\BIN\F45RUN32.EXE w:\xxxMENU.fmx     INACTIVE       14
     1 c:\xxxwin95\BIN\F45RUN32.EXE w:\xxxlogin.fmx       c:\xxxwin95\BIN\F45RUN32.EXE w:\xxxlogin.fmx   INACTIVE       96  

... output snipped ....

     1 c:\xxxwin95\BIN\R25SRV32.exe               c:\xxxwin95\BIN\R25SRV32.exe           INACTIVE       54      

... output snipped ....

     1 ifweb90.exe                        ifweb90.exe                    INACTIVE       11

.
Below info shows some data samples of DBA_HIST_ACTIVE_SESS_HISTORY which is graphed using Tanel Poder’s Perfsheet (could be found here and here) to clearly distinguish the distribution of the modules and number of users. Most of the users on server1 uses the “c:\xxxwin95\BIN\F45RUN32.EXE w:\xxxlogin.fmx”

GcBlocksLost_1_SessionCount

Having the graph above may lead us to conclusion that the significant number of users on server1 attributes to the slow down of the transactions. But as per the DBA, there were no changes made and they were running the same transactions as before on server1 which has acceptable response time. Also the OS Watcher shows that the CPU utilization (peak is 60% and most of the time at 40%) and run queue was low and the disk IO utilization and service time were average (always 10ms below), and there were enough memory on the server (3GB free out of 12GB.. the SGA is 2GB) and no swapping.

Hmm…Drilling down on the wait interface and on per session level that is doing the important business operations will give us a definite conclusion on what is really the bottleneck on the database.

The graph below is another sample from DBA_HIST_ACTIVE_SESS_HISTORY that shows server1 (in blue box) is suffering from “gc cr block lost” and “gc cr multi block request” from 7am to 4pm. The “Metalink Doc ID: 563566.1 gc lost blocks diagnostics” indicates that it is a cluster problem which could be a problem on the network interconnect

GcBlocksLost_2_server1waits

To confirm the bottleneck shown from the sample of DBA_HIST_ACTIVE_SESS_HISTORY (above), the output of ADDM and AWR report from a peak period was analyzed (from 2-3 & 3-4 pm). Below are the output of ADDM and AWR reports:

ADDM output:

          DETAILED ADDM REPORT FOR TASK 'TASK_137854' WITH ID 137854
          ----------------------------------------------------------

              Analysis Period: 28-MAY-2009 from 13:58:59 to 14:59:12
         Database ID/Instance: 3967623528/2
      Database/Instance Names: xxx
                    Host Name: xxx
             Database Version: 10.2.0.3.0
               Snapshot Range: from 15642 to 15643
                Database Time: 3710 seconds
        Average Database Load: 1 active sessions

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FINDING 1: 52% impact (1917 seconds)
------------------------------------
Cluster multi-block requests were consuming significant database time.

   RECOMMENDATION 1: SQL Tuning, 37% benefit (1389 seconds)
      ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
         "0h0fn2d19adtk". Look for an alternative plan that does not use
         object scans.
         RELEVANT OBJECT: SQL statement with SQL_ID 0h0fn2d19adtk

         ... output snipped ... 

   RECOMMENDATION 2: SQL Tuning, 8.4% benefit (310 seconds)
      ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
         "8jd43xr4rp00u". Look for an alternative plan that does not use
         object scans.
         RELEVANT OBJECT: SQL statement with SQL_ID 8jd43xr4rp00u

         ... output snipped ... 

   RECOMMENDATION 3: SQL Tuning, 4.6% benefit (172 seconds)
      ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
         "6wpgf3s0vzuks". Look for an alternative plan that does not use
         object scans.
         RELEVANT OBJECT: SQL statement with SQL_ID 6wpgf3s0vzuks

         ... output snipped ... 

   SYMPTOMS THAT LED TO THE FINDING:
      SYMPTOM: Inter-instance messaging was consuming significant database
               time on this instance. (94% impact [3488 seconds])
         SYMPTOM: Wait class "Cluster" was consuming significant database
                  time. (94% impact [3488 seconds])

FINDING 2: 51% impact (1874 seconds)
------------------------------------
Global Cache Service Processes (LMSn) in other instances were not processing
requests fast enough.

   RECOMMENDATION 1: DB Configuration, 51% benefit (1874 seconds)
      ACTION: Increase throughput of the Global Cache Service (LMSn)
         processes. Increase the number of Global Cache Service processes by
         increasing the value of the parameter "gcs_server_processes".
         Alternatively, if the host is CPU bound consider increasing the OS
         priority of the Global Cache Service processes.
      RATIONALE: The value of parameter "gcs_server_processes" was "2" during
         the analysis period.

   SYMPTOMS THAT LED TO THE FINDING:
      SYMPTOM: Inter-instance messaging was consuming significant database
               time on this instance. (94% impact [3488 seconds])
         SYMPTOM: Wait class "Cluster" was consuming significant database
                  time. (94% impact [3488 seconds])

FINDING 3: 36% impact (1330 seconds)
------------------------------------
Cluster communications that were retried due to lost blocks consumed
significant database time.

   RECOMMENDATION 1: Host Configuration, 36% benefit (1330 seconds)
      ACTION: Check the configuration of the cluster interconnect. Check OS
         setup like adapter setting, firmware and driver release. Check that
         the OS's socket receive buffers are large enough to store an entire
         multiblock read. The value of parameter
         "db_file_multiblock_read_count" may be decreased as a workaround.
      RATIONALE: The instance was consuming 477 kilo bits per second of
         interconnect bandwidth.

   SYMPTOMS THAT LED TO THE FINDING:
      SYMPTOM: Inter-instance messaging was consuming significant database
               time on this instance. (94% impact [3488 seconds])
         SYMPTOM: Wait class "Cluster" was consuming significant database
                  time. (94% impact [3488 seconds])

#######################################################################

          DETAILED ADDM REPORT FOR TASK 'TASK_137874' WITH ID 137874
          ----------------------------------------------------------

              Analysis Period: 28-MAY-2009 from 14:59:12 to 15:58:44
         Database ID/Instance: 3967623528/2
      Database/Instance Names: xxx
                    Host Name: xxx
             Database Version: 10.2.0.3.0
               Snapshot Range: from 15643 to 15644
                Database Time: 1563 seconds
        Average Database Load: .4 active sessions

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FINDING 1: 42% impact (654 seconds)
-----------------------------------
Cluster communications that were retried due to lost blocks consumed
significant database time.

   RECOMMENDATION 1: Host Configuration, 42% benefit (654 seconds)
      ACTION: Check the configuration of the cluster interconnect. Check OS
         setup like adapter setting, firmware and driver release. Check that
         the OS's socket receive buffers are large enough to store an entire
         multiblock read. The value of parameter
         "db_file_multiblock_read_count" may be decreased as a workaround.
      RATIONALE: The instance was consuming 134 kilo bits per second of
         interconnect bandwidth.

   SYMPTOMS THAT LED TO THE FINDING:
      SYMPTOM: Inter-instance messaging was consuming significant database
               time on this instance. (88% impact [1371 seconds])
         SYMPTOM: Wait class "Cluster" was consuming significant database
                  time. (88% impact [1372 seconds])

.
AWR output:

WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Release     RAC Host
------------ ----------- ------------ -------- ----------- --- ------------
xxxxx         3967623528 xxxxxx              2 10.2.0.3.0  YES xxx

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     15642 28-May-09 13:58:58        57      27.0
  End Snap:     15643 28-May-09 14:59:12        64      28.7
   Elapsed:               60.23 (mins)
   DB Time:               61.83 (mins)

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
gc cr multi block request            19,644       1,902     97   51.3    Cluster
gc cr block lost                      1,011       1,117   1104   30.1    Cluster
enq: WF - contention                    626         268    429    7.2      Other
gc current block lost                   186         203   1089    5.5    Cluster
cr request retry                        678         162    240    4.4      Other

Per Wait Class
~~~~~~~~~~~~~~~~~~
                                                                  Avg
                                       %Time       Total Wait    wait     Waits
Wait Class                      Waits  -outs         Time (s)    (ms)      /txn
-------------------- ---------------- ------ ---------------- ------- ---------
Cluster                        34,450    7.0            3,385      98       6.8
Other                          91,292   64.7              467       5      18.1
User I/O                       26,510     .0              102       4       5.3
Concurrency                    29,668     .0               54       2       5.9
System I/O                     13,360     .0               11       1       2.6
Commit                          1,313     .0                4       3       0.3
Application                     2,374     .0                2       1       0.5
Network                       129,774     .0                0       0      25.7
Configuration                      20     .0                0       0       0.0

#######################################################################

WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Release     RAC Host
------------ ----------- ------------ -------- ----------- --- ------------
xxxxx         3967623528 xxxxxx              2 10.2.0.3.0  YES xxx

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     15643 28-May-09 14:59:12        64      28.7
  End Snap:     15644 28-May-09 15:58:43        64      29.6
   Elapsed:               59.53 (mins)
   DB Time:               26.05 (mins)

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
gc cr block lost                        389         429   1103   27.5    Cluster
gc cr multi block request             2,360         422    179   27.0    Cluster
enq: WF - contention                    510         212    416   13.6      Other
gc current block lost                   188         204   1084   13.0    Cluster
cr request retry                        326         181    554   11.6      Other

Per Wait Class
~~~~~~~~~~~~~~~~~~
                                                                  Avg
                                       %Time       Total Wait    wait     Waits
Wait Class                      Waits  -outs         Time (s)    (ms)      /txn
-------------------- ---------------- ------ ---------------- ------- ---------
Cluster                        10,239    8.9            1,247     122       2.1
Other                         153,082   71.9              468       3      31.0
Application                     1,784   18.0              161      90       0.4
Concurrency                    33,014     .0               53       2       6.7
User I/O                        4,246     .0               27       6       0.9
System I/O                     13,648     .0               14       1       2.8
Commit                          1,546     .0                4       3       0.3
Network                       138,333     .0                1       0      28.0
Configuration                      16     .0                0       4       0.0

.
The only thing that is left to do is to drill down on the complaining users and check on the network performance and interconnect… unfortunately the users were going home and I can’t see any significant database activity. So I called it a night, and just analyzed my activity log and read on Metalink Doc ID 563566.1. Also I advised the DBA to divide the users across the nodes.

Next day afternoon, and it’s time to do the drill down. You can see on the graph below (a sample from GV$SESSION) that the users were already distributed, but still there is a performance problem. I told you this will not solve the issue ! </p />
</p></div>

    	  	<div class=

Two weeks to go for ODTUG

ODTUG Kaleidoscoop 2009 is approaching soon. I'll be hosting two presentations.Fat Databases: a Layered ApproachThis will basically be the Helsinki Declaration talk, only crammed into just one hour. I'll probably skip the four observations and go straight to the WoD application and its code classification (DL, BL and UI-code). And close with a short demo by building a page (with Apex of course)

Rebound for database design?

My fellow Oaktable member Robyn Sands posted something very "inline" with the Helsinki declaration. Good comments too.

on the importance of a good data model ...

An article written by Bert Scalzo was published in Information Management this week. The topic is 'Is Data Modeling Still Relavant?': it's short, to the point and well worth reading.The article doesn't recommend a specific tool, it simply recommends the practice of capturing a model of the data at rest (old school approach) in addition to the newer techniques that focus on capturing the data in

Manually Booting Tuxedo Application Server Processes in Parallel

Normally when an Application Server is booted, initialisation of each process completes before the next one is started. The ability to boot Application Server processes in parallel was added to the psadmin utility in PeopleTools 8.48. However, psadmin is merely a wrapper for the BEA Tuxedo tmadmin command line utility, and it has always been possible to do this manually in previous versions of PeopleTools via the tmadmin utility as follows.

1. Boot the Tuxedo Bulletin Board Liaison process.

#boot the Tuxedo administrative processes
boot -A

2. Boot the PeopleSoft Application Server processes and but specify the -w parameter so that they don't wait as they start

boot -g APPSRV -w

If you are running PUBSUB or other servers in other groups then you would also boot them here.

3. Boot the JREPSRV process (which maps Java Classes to Tuxedo Services).

boot -g JREPGRP

4. List the servers with print server so you know that the PeopleSoft servers are booted.

psr

5. When all the other processes have booted, boot the WSL and JSL processes.

boot -g BASE
boot -g JSLGRP

Manually Booting Tuxedo Application Server Processes in Parallel

Normally when an Application Server is booted, initialisation of each process completes before the next one is started. The ability to boot Application Server processes in parallel was added to the psadmin utility in PeopleTools 8.48. However, psadmin is merely a wrapper for the BEA Tuxedo tmadmin command line utility, and it has always been possible to do this manually in previous versions of PeopleTools via the tmadmin utility as follows.

1. Boot the Tuxedo Bulletin Board Liaison process.

#boot the Tuxedo administrative processes
boot -A

2. Boot the PeopleSoft Application Server processes and but specify the -w parameter so that they don't wait as they start

boot -g APPSRV -w

If you are running PUBSUB or other servers in other groups then you would also boot them here.

3. Boot the JREPSRV process (which maps Java Classes to Tuxedo Services).

boot -g JREPGRP

4. List the servers with print server so you know that the PeopleSoft servers are booted.

psr

5. When all the other processes have booted, boot the WSL and JSL processes.

boot -g BASE
boot -g JSLGRP

Resuming transmission...

I've been busy lately: preparing papers and presentations for the upcoming ODTUG, presenting at Hotsos-revisited, and presenting at the Dutch DBA-symposium. All spare time went into these activities, and the Helsinki blog just had to wait in line. But now I intend to resume transmission over here again.A couple of weeks ago I received following comment on this blog:toon,Suppose we have a

Collaborate09 thoughts...

I've been meaning to write up my thoughts from this years IOUG Collaborate event.  Ian Abramson, the president of the IOUG, has a nice series of write ups on the event itself - starting with day 1 through day 3 and then a wrap-up post.  He describes it as a three day event - but to me it is more of a four day thing.  For the last couple of years, I've done a university day at Collaborate - so once again I spent all day Sunday May 3rd in a conference center talking about Oracle for eight hours.  This year I talked about Encryption, all of the Flashback technologies, Read & Write Consistency (a really nice way to introduce the flashback stuff actually, sort of a pre-requisite) and Database Reorganizations (when to, when not to and when you have to - how to).

I also had a one hour "regular" session on Monday where I presented on "What's new in Oracle Application Development" where I touched on general database enhancements, ODP.Net, PHP/Ruby/Python, Java/JDBC, SQL Developer and APEX advances.  This is the session where I got most of the questions/comments this year.

I think I might have scared away some of the DBA crowd with my session title, they see "Application Development" and stay away - but I know for sure at least one DBA was in the audience.  He came up to me afterwards and commented on how glad he was to have attended - for a pretty simple reason.  In each of the tools I discussed (mostly .NET, SQL Developer and APEX) - I pointed out how they easily tied the application into AWR (automatic workload repository) and ASH (active session history) facilities and how they could be used as an interface to the tuning/diagnostic packs.  He is sure the developers he supports are unaware of this tie in and he himself certainly was - but his first course of action upon getting back to the office would be to expose them to it.  The problem he has had in the past is that many people consider tuning "a DBA task" - when in fact - it is really a developer's thing to do.  Since most developers don't have access to, or don't want access to Enterprise Manager - they assumed the advanced tuning features were not available to them.  Not any more - all of the development tools now have tie in's to AWR/ASH and various features of the tuning/diagnostic packs.

One of the areas of large interest is the new (in beta) SQL Developer data modeling features.  A lot of people (self included) still find the good old ERD (entity relationship diagram) a good way to envision/model your schema.  SQL Developer now has an engine to do just that - with the ability to spit out various different physical models from the same logical model - either in support of different databases (yes, it does not just do Oracle), or in support of different environments (simple storage characteristic changes).

 

One thing I find interesting/good about an event like Collaborate is the ability to learn something new.  Even after using Oracle for going on 22 years, I still learn something new often (almost every day).  Regardless of your skill level, there is the opportunity to expand your knowledge of the stuff you use.  For example - Jonathan Lewis (a pretty smart guy when it comes to Oracle knowledge) posted about some things he discovered for the first time.  I strongly encourage everyone to attend at least one event if possible every year - to network, to learn and..... to present.  IOUG Collaborate is a good venue for doing that - UKOUG is another - and the call for papers for the UKOUG event in November is currently happening....  Take a chance - go for it.  It'll probably go much better than my first presentation :)