Search

OakieTags

Who's online

There are currently 0 users and 42 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Filtering PeopleTools SQL from Performance Monitor Traces

I have been doing some on-line performance tuning on a PeopleSoft Financials system using PeopleSoft Performance Monitor (PPM).  End-users have collect verbose PPM traces. Usually, when I use PPM in a production system, all the components are fully cached by the normal activity of the user (except when the application server caches have recently been cleared).  However, when working in a user test environment it is common to find that the components are not fully cached. This presents two problems.

  • The application servers spend quite a lot of time executing queries on the PeopleTools tables to load the components, pages and PeopleCode into their caches. We can see in the screenshot of the component trace that there is a warning message that component objects are not fully cached, and that these  cache misses skew timings.
  • In verbose mode, the PPM traces collect a lot of additional transactions capturing executions and fetches against PeopleTools tables. The PPM analytic components cannot always manage the resultant volume of transactions.
    Figure 1. Component trace as collected by PPM
    Figure 1. Component trace as collected by PPM

    If I go further down the same page and look in the SQL Summary, I can see SQL operations against PeopleTools tables (they are easily identifiable in that they generally do not have an underscore in the third character). Not only are 5 of the top 8 SQL operations related to PeopleTools tables, we can also see that they also account for over 13000 executions, which means there are at least 13000 rows of additional data to be read from PSPMTRANSHIST.

    Figure 2. SQL Summary of PPM trace with PeopleTools SQL
    Figure 2. SQL Summary of PPM trace with PeopleTools SQL

    When I open the longest running server round trip (this is also referred to as a Performance Monitoring Unit or PMU), I can only load 1001 rows before I get a message warning that the maximum row limit has been reached. The duration summary and the number of executions and fetches cannot be calculated and hence 0 is displayed.

    Figure 3: Details of longest PMU with PeopleTools SQL
    Figure 3: Details of longest PMU with PeopleTools SQL

    Another consequence of the PeopleTools data is that it can take a long time to open the PMU tree. There is no screenshot of the PMU tree here because in this case I had so much data that I couldn't open it before the transaction timed out!

    Solution 

    My solution to this problem is to delete the transactions that relate to PeopleTools SQL and correct the durations, and the number of executions and fetches held in summary transactions. The rationale is that these transactions would not normally occur in significant quantities in a real production system, and there is not much I can do about them when they do.
    The first step is to clone the trace. I could work on the trace directly, but I want to preserve the original data.
    PPM transactions are held in the table PSPMTRANSHIST. They have a unique identifier PM_INSTANCE_ID. A single server round trip, also called a Performance Monitoring Unit (PMU), will consist of many transactions. They can be shown as a tree and each transaction has another field PM_PARENT_INST_ID which holds the instance of the parent. This links the data together and we can use hierarchical queries in Oracle SQL to walk the tree. Another field PM_TOP_INST_ID identifies the root transaction in the tree.
    Cloning a PPM trace is simply a matter of inserting data into PSPMTRANSHIST. However, when I clone a PPM trace I have to make sure that the instance numbers are distinct but still link correctly. In my system I can take a very simple approach. All the instance numbers actually collected by PPM are greater than 1016. So, I will simply use the modulus function to consistently alter the instances to be different. This approach may break down in future, but it will do for now.
    On an Oracle database, PL/SQL is a simple and effective way to write simple procedural processes.  I have written two anonymous blocks of code.
    Note that the cloned trace will be purged from PPM like any other data by the delivered PPM archive process.

    #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    REM xPT.sql
    BEGIN --duplicate PPM traces
    FOR i IN (
    SELECT h.*
    FROM pspmtranshist h
    WHERE pm_perf_trace != ' ' /*rows must have a trace name*/
    -- AND pm_perf_trace = '9b. XXXXXXXXXX' /*I could specify a specific trace by name*/
    AND pm_instance_id > 1E16 /*only look at instance > 1e16 so I do not clone cloned traces*/
    ) LOOP
    INSERT INTO pspmtranshist
    (PM_INSTANCE_ID, PM_TRANS_DEFN_SET, PM_TRANS_DEFN_ID, PM_AGENTID, PM_TRANS_STATUS,
    OPRID, PM_PERF_TRACE, PM_CONTEXT_VALUE1, PM_CONTEXT_VALUE2, PM_CONTEXT_VALUE3,
    PM_CONTEXTID_1, PM_CONTEXTID_2, PM_CONTEXTID_3, PM_PROCESS_ID, PM_AGENT_STRT_DTTM,
    PM_MON_STRT_DTTM, PM_TRANS_DURATION, PM_PARENT_INST_ID, PM_TOP_INST_ID, PM_METRIC_VALUE1,
    PM_METRIC_VALUE2, PM_METRIC_VALUE3, PM_METRIC_VALUE4, PM_METRIC_VALUE5, PM_METRIC_VALUE6,
    PM_METRIC_VALUE7, PM_ADDTNL_DESCR)
    VALUES
    (MOD(i.PM_INSTANCE_ID,1E16) /*apply modulus to instance number*/
    ,i.PM_TRANS_DEFN_SET, i.PM_TRANS_DEFN_ID, i.PM_AGENTID, i.PM_TRANS_STATUS,
    i.OPRID,
    SUBSTR('xPT'||i.PM_PERF_TRACE,1,30) /*adjust trace name*/,
    i.PM_CONTEXT_VALUE1, i.PM_CONTEXT_VALUE2, i.PM_CONTEXT_VALUE3,
    i.PM_CONTEXTID_1, i.PM_CONTEXTID_2, i.PM_CONTEXTID_3, i.PM_PROCESS_ID, i.PM_AGENT_STRT_DTTM,
    i.PM_MON_STRT_DTTM, i.PM_TRANS_DURATION,
    MOD(i.PM_PARENT_INST_ID,1E16), MOD(i.PM_TOP_INST_ID,1E16), /*apply modulus to parent and top instance number*/
    i.PM_METRIC_VALUE1, i.PM_METRIC_VALUE2, i.PM_METRIC_VALUE3, i.PM_METRIC_VALUE4, i.PM_METRIC_VALUE5,
    i.PM_METRIC_VALUE6, i.PM_METRIC_VALUE7, i.PM_ADDTNL_DESCR);
    END LOOP;
    COMMIT;
    END;
    /

    Now I will work on the cloned trace. I want to remove certain transaction.

    • PeopleTools SQL. Metric value 7 reports the SQL operation and SQL table name. So if the first word is SELECT and the second word is a PeopleTools table name then it is a PeopleTools SQL operation. A list of PeopleTools tables can be obtained from the object security table PSOBJGROUP.
    • Implicit Commit transactions. This is easy - it is just transaction type 425. 

    Having deleted the PeopleTools transactions, I must also

    • Correct transaction duration for any parents of transaction. I work up the hierarchy of transactions and deduct the duration of the transaction that I am deleting from all of the parent.
    • Transaction types 400, 427 and 428 all record PeopleTools SQL time (metric 66). When I come to that transaction I also deduct the duration of the deleted transaction from the PeopleTools SQL time metric in an parent transaction.
    • Delete any children of the transactions that I delete. 
    • I must also count each PeopleTools SQL Execution transaction (type 408) and each PeopleTools SQL Fetch transaction (type 414) that I delete. These counts are also deducted from the summaries on the parent transaction 400. 

    The summaries in transaction 400 are used on the 'Round Trip Details' components, and if they are not adjusted you can get misleading results. Without the adjustments, I have encountered PMUs where more than 100% of the total duration is spent in SQL - which is obviously impossible.
    Although this technique of first cloning the whole trace and then deleting the PeopleTools operations can be quite slow, it is not something that you are going to do very often. 

    #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    REM xPT.sql
    REM (c)Go-Faster Consultancy Ltd. 2014
    set serveroutput on echo on
    DECLARE
    l_pm_instance_id_m4 INTEGER;
    l_fetch_count INTEGER;
    l_exec_count INTEGER;
    BEGIN /*now remove PeopleTools SQL transaction and any children and adjust trans durations*/
    FOR i IN (
    WITH x AS ( /*returns PeopleTools tables as defined in Object security*/
    SELECT o.entname recname
    FROM psobjgroup o
    WHERE o.objgroupid = 'PEOPLETOOLS'
    AND o.enttype = 'R'
    )
    SELECT h.pm_instance_id, h.pm_parent_inst_id, h.pm_trans_duration, h.pm_trans_defn_id
    FROM pspmtranshist h
    LEFT OUTER JOIN x
    ON h.pm_metric_value7 LIKE 'SELECT '||x.recname||'%'
    AND x.recname = upper(regexp_substr(pm_metric_value7,'[^ ,]+',8,1)) /*first word after select*/
    WHERE pm_perf_trace like 'xPT%' /*restrict to cloned traces*/
    -- AND pm_perf_trace = 'xPT9b. XXXXXXXXXX' /*work on a specific trace*/
    AND pm_instance_id < 1E16 /*restrict to cloned traces*/
    AND ( x.recname IS NOT NULL
    OR h.pm_trans_defn_id IN(425 /*Implicit Commit*/))
    ORDER BY pm_instance_id DESC
    ) LOOP
    l_pm_instance_id_m4 := TO_NUMBER(NULL);
     
        IF i.pm_parent_inst_id>0 AND i.pm_trans_duration>0 THEN
    FOR j IN(
    SELECT h.pm_instance_id, h.pm_parent_inst_id, h.pm_top_inst_id, h.pm_trans_defn_id
    , d.pm_metricid_3, d.pm_metricid_4
    FROM pspmtranshist h
    INNER JOIN pspmtransdefn d
    ON d.pm_trans_defn_set = h.pm_trans_defn_set
    AND d.pm_trans_defn_id = h.pm_trans_Defn_id
    START WITH h.pm_instance_id = i.pm_parent_inst_id
    CONNECT BY prior h.pm_parent_inst_id = h.pm_instance_id
    ) LOOP
    /*decrement parent transaction times*/
    IF j.pm_metricid_4 = 66 /*PeopleTools SQL Time (ms)*/ THEN --decrement metric 4 on transaction 400
    --dbms_output.put_line('ID:'||i.pm_instance_id||' Type:'||i.pm_trans_defn_id||' decrement metric_value4 by '||i.pm_trans_duration);
    UPDATE pspmtranshist
    SET pm_metric_value4 = pm_metric_value4 - i.pm_trans_duration
    WHERE pm_instance_id = j.pm_instance_id
    AND pm_trans_Defn_id = j.pm_trans_defn_id
    AND pm_metric_value4 >= i.pm_trans_duration
    RETURNING pm_instance_id INTO l_pm_instance_id_m4;
    ELSIF j.pm_metricid_3 = 66 /*PeopleTools SQL Time (ms)*/ THEN --SQL time on serialisation
    --dbms_output.put_line('ID:'||i.pm_instance_id||' Type:'||i.pm_trans_defn_id||' decrement metric_value3 by '||i.pm_trans_duration);
    UPDATE pspmtranshist
    SET pm_metric_value3 = pm_metric_value3 - i.pm_trans_duration
    WHERE pm_instance_id = j.pm_instance_id
    AND pm_trans_Defn_id = j.pm_trans_defn_id
    AND pm_metric_value3 >= i.pm_trans_duration;
    END IF;

    UPDATE pspmtranshist
    SET pm_trans_duration = pm_trans_duration - i.pm_trans_duration
    WHERE pm_instance_id = j.pm_instance_id
    AND pm_trans_duration >= i.pm_trans_duration;
    END LOOP;
    END IF;

    l_fetch_count := 0;
    l_exec_count := 0;
    FOR j IN( /*identify transaction to be deleted and any children*/
    SELECT pm_instance_id, pm_parent_inst_id, pm_top_inst_id, pm_trans_defn_id, pm_metric_value3
    FROM pspmtranshist
    START WITH pm_instance_id = i.pm_instance_id
    CONNECT BY PRIOR pm_instance_id = pm_parent_inst_id
    ) LOOP
    IF j.pm_trans_defn_id = 408 THEN /*if PeopleTools SQL*/
    l_exec_count := l_exec_count + 1;
    ELSIF j.pm_trans_defn_id = 414 THEN /*if PeopleTools SQL Fetch*/
    l_fetch_count := l_fetch_count + j.pm_metric_value3;
    END IF;
    DELETE FROM pspmtranshist h /*delete tools transaction*/
    WHERE h.pm_instance_id = j.pm_instance_id;
    END LOOP;

    IF l_pm_instance_id_m4 > 0 THEN
    --dbms_output.put_line('ID:'||l_pm_instance_id_m4||' Decrement '||l_exec_Count||' executions, '||l_fetch_count||' fetches');
    UPDATE pspmtranshist
    SET pm_metric_value5 = pm_metric_value5 - l_exec_count
    , pm_metric_value6 = pm_metric_value6 - l_fetch_count
    WHERE pm_instance_id = l_pm_instance_id_m4;
    l_fetch_count := 0;
    l_exec_count := 0;
    END IF;

    END LOOP;
    END;
    /

    Now, I have a second PPM trace that I can open in the analytic component.

    Figure 4: Original and Cloned PPM traces
    Figure 4: Original and Cloned PPM traces

    When I open the cloned trace, both timings in the duration summary have reduced as have the number of executions and fetches.  The durations of the individual server round trips have also reduced.

    Figure 5: Component Trace without PeopleTools transactions
    Figure 5: Component Trace without PeopleTools transactions

    All of the PeopleTools SQL operations have disappeared from the SQL summary.

    Figure 6: SQL Summary of PPM trace after removing PeopleTools SQL transactions
    Figure 6: SQL Summary of PPM trace after removing PeopleTools SQL transactions

    The SQL summary now only has 125 rows of data.

    Figure 7: SQL Summary of PMU without PeopleTools SQL

    Now, the PPM tree component opens quickly and without error.

    Figure 8: PMU Tree after removing PeopleTools SQL
    Figure 8: PMU Tree after removing PeopleTools SQL

    There may still be more transactions in a PMU than I can show in a screenshot, but I can now find the statement that took the most time quite quickly.

    Figure 9: Long SQL transaction further down same PMU tree
    Figure 9: Long SQL transaction further down same PMU tree

    Conclusions 

    I think that it is reasonable and useful to remove PeopleTools SQL operations from a PPM trace.
    In normal production operation, components will mostly be cached, and this approach renders traces collected in non-production environments both usable in the PPM analytic components and more realistic for performance tuning. However, it is essential that when deleting some transactions from a PMU, that summary data held in other transactions in the same PMU are also corrected so that the metrics remain consistent.

    Oracle Enterprise Manager 12c Command Line Interface is Available!

    The new Oracle Enterprise Manager 12c Command Line Interface book is available via a number of locations, including Amazon and directly from Apress.  If you are an EM12c fanatic or just learning and want to learn more, consider the new book that will show you why the command line returns the DBA to the golden age, empowering Enterprise Manager to script and enable tasks at a global level!

    9781484202395HiRes



    Tags:  


    Del.icio.us



    Facebook

    TweetThis

    Digg

    StumbleUpon




    Copyright © DBA Kevlar [Oracle Enterprise Manager 12c Command Line Interface is Available!], All Right Reserved. 2014.

    Filtering PeopleTools SQL from Performance Monitor Traces

    I have been doing some on-line performance tuning on a PeopleSoft Financials system using PeopleSoft Performance Monitor (PPM).  End-users have collect verbose PPM traces. Usually, when I use PPM in a production system, all the components are fully cached by the normal activity of the user (except when the application server caches have recently been cleared).  However, when working in a user test environment it is common to find that the components are not fully cached. This presents two problems.

    • The application servers spend quite a lot of time executing queries on the PeopleTools tables to load the components, pages and PeopleCode into their caches. We can see in the screenshot of the component trace that there is a warning message that component objects are not fully cached, and that these  cache misses skew timings.
    • In verbose mode, the PPM traces collect a lot of additional transactions capturing executions and fetches against PeopleTools tables. The PPM analytic components cannot always manage the resultant volume of transactions.
      Figure 1. Component trace as collected by PPM
      Figure 1. Component trace as collected by PPM

      If I go further down the same page and look in the SQL Summary, I can see SQL operations against PeopleTools tables (they are easily identifiable in that they generally do not have an underscore in the third character). Not only are 5 of the top 8 SQL operations related to PeopleTools tables, we can also see that they also account for over 13000 executions, which means there are at least 13000 rows of additional data to be read from PSPMTRANSHIST.

      Figure 2. SQL Summary of PPM trace with PeopleTools SQL
      Figure 2. SQL Summary of PPM trace with PeopleTools SQL

      When I open the longest running server round trip (this is also referred to as a Performance Monitoring Unit or PMU), I can only load 1001 rows before I get a message warning that the maximum row limit has been reached. The duration summary and the number of executions and fetches cannot be calculated and hence 0 is displayed.

      Figure 3: Details of longest PMU with PeopleTools SQL
      Figure 3: Details of longest PMU with PeopleTools SQL

      Another consequence of the PeopleTools data is that it can take a long time to open the PMU tree. There is no screenshot of the PMU tree here because in this case I had so much data that I couldn't open it before the transaction timed out!

      Solution 

      My solution to this problem is to delete the transactions that relate to PeopleTools SQL and correct the durations, and the number of executions and fetches held in summary transactions. The rationale is that these transactions would not normally occur in significant quantities in a real production system, and there is not much I can do about them when they do.
      The first step is to clone the trace. I could work on the trace directly, but I want to preserve the original data.
      PPM transactions are held in the table PSPMTRANSHIST. They have a unique identifier PM_INSTANCE_ID. A single server round trip, also called a Performance Monitoring Unit (PMU), will consist of many transactions. They can be shown as a tree and each transaction has another field PM_PARENT_INST_ID which holds the instance of the parent. This links the data together and we can use hierarchical queries in Oracle SQL to walk the tree. Another field PM_TOP_INST_ID identifies the root transaction in the tree.
      Cloning a PPM trace is simply a matter of inserting data into PSPMTRANSHIST. However, when I clone a PPM trace I have to make sure that the instance numbers are distinct but still link correctly. In my system I can take a very simple approach. All the instance numbers actually collected by PPM are greater than 1016. So, I will simply use the modulus function to consistently alter the instances to be different. This approach may break down in future, but it will do for now.
      On an Oracle database, PL/SQL is a simple and effective way to write simple procedural processes.  I have written two anonymous blocks of code.
      Note that the cloned trace will be purged from PPM like any other data by the delivered PPM archive process.

      #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
      REM xPT.sql
      BEGIN --duplicate PPM traces
      FOR i IN (
      SELECT h.*
      FROM pspmtranshist h
      WHERE pm_perf_trace != ' ' /*rows must have a trace name*/
      -- AND pm_perf_trace = '9b. XXXXXXXXXX' /*I could specify a specific trace by name*/
      AND pm_instance_id > 1E16 /*only look at instance > 1e16 so I do not clone cloned traces*/
      ) LOOP
      INSERT INTO pspmtranshist
      (PM_INSTANCE_ID, PM_TRANS_DEFN_SET, PM_TRANS_DEFN_ID, PM_AGENTID, PM_TRANS_STATUS,
      OPRID, PM_PERF_TRACE, PM_CONTEXT_VALUE1, PM_CONTEXT_VALUE2, PM_CONTEXT_VALUE3,
      PM_CONTEXTID_1, PM_CONTEXTID_2, PM_CONTEXTID_3, PM_PROCESS_ID, PM_AGENT_STRT_DTTM,
      PM_MON_STRT_DTTM, PM_TRANS_DURATION, PM_PARENT_INST_ID, PM_TOP_INST_ID, PM_METRIC_VALUE1,
      PM_METRIC_VALUE2, PM_METRIC_VALUE3, PM_METRIC_VALUE4, PM_METRIC_VALUE5, PM_METRIC_VALUE6,
      PM_METRIC_VALUE7, PM_ADDTNL_DESCR)
      VALUES
      (MOD(i.PM_INSTANCE_ID,1E16) /*apply modulus to instance number*/
      ,i.PM_TRANS_DEFN_SET, i.PM_TRANS_DEFN_ID, i.PM_AGENTID, i.PM_TRANS_STATUS,
      i.OPRID,
      SUBSTR('xPT'||i.PM_PERF_TRACE,1,30) /*adjust trace name*/,
      i.PM_CONTEXT_VALUE1, i.PM_CONTEXT_VALUE2, i.PM_CONTEXT_VALUE3,
      i.PM_CONTEXTID_1, i.PM_CONTEXTID_2, i.PM_CONTEXTID_3, i.PM_PROCESS_ID, i.PM_AGENT_STRT_DTTM,
      i.PM_MON_STRT_DTTM, i.PM_TRANS_DURATION,
      MOD(i.PM_PARENT_INST_ID,1E16), MOD(i.PM_TOP_INST_ID,1E16), /*apply modulus to parent and top instance number*/
      i.PM_METRIC_VALUE1, i.PM_METRIC_VALUE2, i.PM_METRIC_VALUE3, i.PM_METRIC_VALUE4, i.PM_METRIC_VALUE5,
      i.PM_METRIC_VALUE6, i.PM_METRIC_VALUE7, i.PM_ADDTNL_DESCR);
      END LOOP;
      COMMIT;
      END;
      /

      Now I will work on the cloned trace. I want to remove certain transaction.

      • PeopleTools SQL. Metric value 7 reports the SQL operation and SQL table name. So if the first word is SELECT and the second word is a PeopleTools table name then it is a PeopleTools SQL operation. A list of PeopleTools tables can be obtained from the object security table PSOBJGROUP.
      • Implicit Commit transactions. This is easy - it is just transaction type 425. 

      Having deleted the PeopleTools transactions, I must also

      • Correct transaction duration for any parents of transaction. I work up the hierarchy of transactions and deduct the duration of the transaction that I am deleting from all of the parent.
      • Transaction types 400, 427 and 428 all record PeopleTools SQL time (metric 66). When I come to that transaction I also deduct the duration of the deleted transaction from the PeopleTools SQL time metric in an parent transaction.
      • Delete any children of the transactions that I delete. 
      • I must also count each PeopleTools SQL Execution transaction (type 408) and each PeopleTools SQL Fetch transaction (type 414) that I delete. These counts are also deducted from the summaries on the parent transaction 400. 

      The summaries in transaction 400 are used on the 'Round Trip Details' components, and if they are not adjusted you can get misleading results. Without the adjustments, I have encountered PMUs where more than 100% of the total duration is spent in SQL - which is obviously impossible.
      Although this technique of first cloning the whole trace and then deleting the PeopleTools operations can be quite slow, it is not something that you are going to do very often. 

      #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
      REM xPT.sql
      REM (c)Go-Faster Consultancy Ltd. 2014
      set serveroutput on echo on
      DECLARE
      l_pm_instance_id_m4 INTEGER;
      l_fetch_count INTEGER;
      l_exec_count INTEGER;
      BEGIN /*now remove PeopleTools SQL transaction and any children and adjust trans durations*/
      FOR i IN (
      WITH x AS ( /*returns PeopleTools tables as defined in Object security*/
      SELECT o.entname recname
      FROM psobjgroup o
      WHERE o.objgroupid = 'PEOPLETOOLS'
      AND o.enttype = 'R'
      )
      SELECT h.pm_instance_id, h.pm_parent_inst_id, h.pm_trans_duration, h.pm_trans_defn_id
      FROM pspmtranshist h
      LEFT OUTER JOIN x
      ON h.pm_metric_value7 LIKE 'SELECT '||x.recname||'%'
      AND x.recname = upper(regexp_substr(pm_metric_value7,'[^ ,]+',8,1)) /*first word after select*/
      WHERE pm_perf_trace like 'xPT%' /*restrict to cloned traces*/
      -- AND pm_perf_trace = 'xPT9b. XXXXXXXXXX' /*work on a specific trace*/
      AND pm_instance_id < 1E16 /*restrict to cloned traces*/
      AND ( x.recname IS NOT NULL
      OR h.pm_trans_defn_id IN(425 /*Implicit Commit*/))
      ORDER BY pm_instance_id DESC
      ) LOOP
      l_pm_instance_id_m4 := TO_NUMBER(NULL);
       
          IF i.pm_parent_inst_id>0 AND i.pm_trans_duration>0 THEN
      FOR j IN(
      SELECT h.pm_instance_id, h.pm_parent_inst_id, h.pm_top_inst_id, h.pm_trans_defn_id
      , d.pm_metricid_3, d.pm_metricid_4
      FROM pspmtranshist h
      INNER JOIN pspmtransdefn d
      ON d.pm_trans_defn_set = h.pm_trans_defn_set
      AND d.pm_trans_defn_id = h.pm_trans_Defn_id
      START WITH h.pm_instance_id = i.pm_parent_inst_id
      CONNECT BY prior h.pm_parent_inst_id = h.pm_instance_id
      ) LOOP
      /*decrement parent transaction times*/
      IF j.pm_metricid_4 = 66 /*PeopleTools SQL Time (ms)*/ THEN --decrement metric 4 on transaction 400
      --dbms_output.put_line('ID:'||i.pm_instance_id||' Type:'||i.pm_trans_defn_id||' decrement metric_value4 by '||i.pm_trans_duration);
      UPDATE pspmtranshist
      SET pm_metric_value4 = pm_metric_value4 - i.pm_trans_duration
      WHERE pm_instance_id = j.pm_instance_id
      AND pm_trans_Defn_id = j.pm_trans_defn_id
      AND pm_metric_value4 >= i.pm_trans_duration
      RETURNING pm_instance_id INTO l_pm_instance_id_m4;
      ELSIF j.pm_metricid_3 = 66 /*PeopleTools SQL Time (ms)*/ THEN --SQL time on serialisation
      --dbms_output.put_line('ID:'||i.pm_instance_id||' Type:'||i.pm_trans_defn_id||' decrement metric_value3 by '||i.pm_trans_duration);
      UPDATE pspmtranshist
      SET pm_metric_value3 = pm_metric_value3 - i.pm_trans_duration
      WHERE pm_instance_id = j.pm_instance_id
      AND pm_trans_Defn_id = j.pm_trans_defn_id
      AND pm_metric_value3 >= i.pm_trans_duration;
      END IF;

      UPDATE pspmtranshist
      SET pm_trans_duration = pm_trans_duration - i.pm_trans_duration
      WHERE pm_instance_id = j.pm_instance_id
      AND pm_trans_duration >= i.pm_trans_duration;
      END LOOP;
      END IF;

      l_fetch_count := 0;
      l_exec_count := 0;
      FOR j IN( /*identify transaction to be deleted and any children*/
      SELECT pm_instance_id, pm_parent_inst_id, pm_top_inst_id, pm_trans_defn_id, pm_metric_value3
      FROM pspmtranshist
      START WITH pm_instance_id = i.pm_instance_id
      CONNECT BY PRIOR pm_instance_id = pm_parent_inst_id
      ) LOOP
      IF j.pm_trans_defn_id = 408 THEN /*if PeopleTools SQL*/
      l_exec_count := l_exec_count + 1;
      ELSIF j.pm_trans_defn_id = 414 THEN /*if PeopleTools SQL Fetch*/
      l_fetch_count := l_fetch_count + j.pm_metric_value3;
      END IF;
      DELETE FROM pspmtranshist h /*delete tools transaction*/
      WHERE h.pm_instance_id = j.pm_instance_id;
      END LOOP;

      IF l_pm_instance_id_m4 > 0 THEN
      --dbms_output.put_line('ID:'||l_pm_instance_id_m4||' Decrement '||l_exec_Count||' executions, '||l_fetch_count||' fetches');
      UPDATE pspmtranshist
      SET pm_metric_value5 = pm_metric_value5 - l_exec_count
      , pm_metric_value6 = pm_metric_value6 - l_fetch_count
      WHERE pm_instance_id = l_pm_instance_id_m4;
      l_fetch_count := 0;
      l_exec_count := 0;
      END IF;

      END LOOP;
      END;
      /

      Now, I have a second PPM trace that I can open in the analytic component.

      Figure 4: Original and Cloned PPM traces
      Figure 4: Original and Cloned PPM traces

      When I open the cloned trace, both timings in the duration summary have reduced as have the number of executions and fetches.  The durations of the individual server round trips have also reduced.

      Figure 5: Component Trace without PeopleTools transactions
      Figure 5: Component Trace without PeopleTools transactions

      All of the PeopleTools SQL operations have disappeared from the SQL summary.

      Figure 6: SQL Summary of PPM trace after removing PeopleTools SQL transactions
      Figure 6: SQL Summary of PPM trace after removing PeopleTools SQL transactions

      The SQL summary now only has 125 rows of data.

      Figure 7: SQL Summary of PMU without PeopleTools SQL

      Now, the PPM tree component opens quickly and without error.

      Figure 8: PMU Tree after removing PeopleTools SQL
      Figure 8: PMU Tree after removing PeopleTools SQL

      There may still be more transactions in a PMU than I can show in a screenshot, but I can now find the statement that took the most time quite quickly.

      Figure 9: Long SQL transaction further down same PMU tree
      Figure 9: Long SQL transaction further down same PMU tree

      Conclusions 

      I think that it is reasonable and useful to remove PeopleTools SQL operations from a PPM trace.
      In normal production operation, components will mostly be cached, and this approach renders traces collected in non-production environments both usable in the PPM analytic components and more realistic for performance tuning. However, it is essential that when deleting some transactions from a PMU, that summary data held in other transactions in the same PMU are also corrected so that the metrics remain consistent.

      Interesting Oracle Syntax Error

      As shared by a well known Oracle and Big Data performance geek!

      SQL> ALTER SYSTEM SET inmemory_size = 5T SCOPE=spfile;
      ALTER SYSTEM SET inmemory_size = 5T SCOPE=spfile
      *
      ERROR at line 1:
      ORA-32005: error while parsing size specification [5T]
      
      
      SQL> ALTER SYSTEM SET inmemory_size = 5120G SCOPE=spfile;
      
      System altered.
      

      :)

      Upgrades

      One of the worst problems with upgrades is that things sometimes stop working. A particular nuisance is the execution plan that suddenly stops appearing, to be replaced by an alternative plan that is much less efficient.

      Apart from the nuisance of the time spent trying to force the old plan to re-appear, plus the time spent working out a way of rewriting the query when you finally decide the old plan simply isn’t going to re-appear, there’s also the worry about WHY the old plan won’t appear. Is it some sort of bug, is it that some new optimizer feature has disabled some older optimizer feature, or is it that someone in the optimizer group realised that the old plan was capable of producing the wrong results in some circumstances … it’s that last possibility that I find most worrying.

      Here’s an example that appeared recently on OTN that’s still got me wondering about the possibility of wrong results (in the general case). We start with a couple of tables, a view, and a pipelined function. This example is a simple model of the problem that showed up on OTN; it’s based on generated data so that anyone who wants to can play around with it to see if they can bypass the problem without making any significant changes to the shape of the code:

      
      create table t1
      as
      with generator as (
      	select	--+ materialize
      		rownum id
      	from dual
      	connect by
      		level <= 1e4
      )
      select
      	rownum			id,
      	rownum			n1,
      	mod(rownum,100)		n_100,
      	rpad('x',100)		padding
      from
      	generator	v1
      ;
      
      create table t2
      as
      with generator as (
      	select	--+ materialize
      		rownum id
      	from dual
      	connect by
      		level <= 1e4
      )
      select
      	rownum			id,
      	rownum			n1,
      	mod(rownum,100)		n_100,
      	rpad('x',100)		padding
      from
      	generator	v1
      ;
      
      alter table t2 add constraint t2_pk primary key(id);
      
      begin
      	dbms_stats.gather_table_stats(
      		ownname		 => user,
      		tabname		 =>'T1',
      		method_opt	 => 'for all columns size 1'
      	);
      
      	dbms_stats.gather_table_stats(
      		ownname		 => user,
      		tabname		 =>'T2',
      		method_opt	 => 'for all columns size 1'
      	);
      
      end;
      /
      
      create or replace type myScalarType as object (
              x int,
              y varchar2(15),
              d date
      )
      /
      
      create or replace type myArrayType as table of myScalarType
      /
      
      create or replace function t_fun1(i_in number)
      return myArrayType
      pipelined
      as
      begin
      	pipe row (myscalartype(i_in,     lpad(i_in,15),     trunc(sysdate) + i_in    ));
      	pipe row (myscalartype(i_in + 1, lpad(i_in + 1,15), trunc(sysdate) + i_in + 1));
      	return;
      end;
      /
      
      create or replace view v1
      as
      select
      	--+ leading(t2 x) index(t2)
      	x.x, x.y, x.d,
      	t2.id, t2.n1
      from
      	t2,
      	table(t_fun1(t2.n_100)) x
      where
      	mod(t2.n1,3) = 1
      union all
      select
      	--+ leading(t2 x) index(t2)
      	x.x, x.y, x.d,
      	t2.id, t2.n1
      from
      	t2,
      	table(t_fun1(t2.n_100)) x
      where
      	mod(t2.n1,3) = 2
      ;
      
      

      A key part of the problem is the UNION ALL view, where each subquery holds a join to a pipeline function. We’re about to write a query that joins to this view, and wants to push a join predicate into the view. Here’s the SQL:

      
      select
      	/*+ leading(t1 v1) use_nl(v1) */
      	v1.x, v1.y, v1.d,
      	v1.n1,
      	t1.n1
      from
      	t1,
      	v1
      where
      	t1.n_100 = 0
      and	v1.id = t1.n1
      ;
      
      

      You’ll notice that the join v1.id = t1.n1 could (in principle) be pushed inside the view to become t2.id = t1.n1 in the two branches of the UNION ALL; this would make it possible for the nested loop that I’ve hinted between t1 and v1 to operate efficiently – and in 11.1.0.7 this is exactly what happens:

      
      ------------------------------------------------------------------------------------------------
      | Id  | Operation                             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
      ------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                      |        | 16336 |   733K|   123   (1)| 00:00:01 |
      |   1 |  NESTED LOOPS                         |        | 16336 |   733K|   123   (1)| 00:00:01 |
      |*  2 |   TABLE ACCESS FULL                   | T1     |   100 |   700 |    23   (5)| 00:00:01 |
      |   3 |   VIEW                                | V1     |   163 |  6357 |     1   (0)| 00:00:01 |
      |   4 |    UNION-ALL PARTITION                |        |       |       |            |          |
      |   5 |     NESTED LOOPS                      |        |  8168 |   103K|    16   (0)| 00:00:01 |
      |*  6 |      TABLE ACCESS BY INDEX ROWID      | T2     |     1 |    11 |     2   (0)| 00:00:01 |
      |*  7 |       INDEX UNIQUE SCAN               | T2_PK  |     1 |       |     1   (0)| 00:00:01 |
      |   8 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |       |       |            |          |
      |   9 |     NESTED LOOPS                      |        |  8168 |   103K|    16   (0)| 00:00:01 |
      |* 10 |      TABLE ACCESS BY INDEX ROWID      | T2     |     1 |    11 |     2   (0)| 00:00:01 |
      |* 11 |       INDEX UNIQUE SCAN               | T2_PK  |     1 |       |     1   (0)| 00:00:01 |
      |  12 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |       |       |            |          |
      ------------------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
         2 - filter("T1"."N_100"=0)
         6 - filter(MOD("T2"."N1",3)=1)
         7 - access("T2"."ID"="T1"."N1")
        10 - filter(MOD("T2"."N1",3)=2)
        11 - access("T2"."ID"="T1"."N1")
      
      

      For each row returned by the tablescan at line 2 we call the view operator at line 3 to generate a rowsource, but we can see in the predicate sections for lines 7 and 11 that the join value has been pushed inside the view, allowing us to access t2 through its primary key index. Depending on the data definitions, constraints, view definition, and version of Oracle, you might see the UNION ALL operator displaying the PARTITION option or the PUSHED PREDICATE option in cases of this type.

      So now we upgrade to 11.2.0.4 (probably any 11.2.x.x version) and get the following plan:

      
      ------------------------------------------------------------------------------------------------
      | Id  | Operation                             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
      ------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                      |        |  1633K|    99M|   296K  (4)| 00:24:43 |
      |   1 |  NESTED LOOPS                         |        |  1633K|    99M|   296K  (4)| 00:24:43 |
      |*  2 |   TABLE ACCESS FULL                   | T1     |   100 |   700 |    23   (5)| 00:00:01 |
      |*  3 |   VIEW                                | V1     | 16336 |   909K|  2966   (4)| 00:00:15 |
      |   4 |    UNION-ALL                          |        |       |       |            |          |
      |   5 |     NESTED LOOPS                      |        |   816K|    10M|  1483   (4)| 00:00:08 |
      |*  6 |      TABLE ACCESS BY INDEX ROWID      | T2     |   100 |  1100 |   187   (2)| 00:00:01 |
      |   7 |       INDEX FULL SCAN                 | T2_PK  | 10000 |       |    21   (0)| 00:00:01 |
      |   8 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |  8168 | 16336 |    13   (0)| 00:00:01 |
      |   9 |     NESTED LOOPS                      |        |   816K|    10M|  1483   (4)| 00:00:08 |
      |* 10 |      TABLE ACCESS BY INDEX ROWID      | T2     |   100 |  1100 |   187   (2)| 00:00:01 |
      |  11 |       INDEX FULL SCAN                 | T2_PK  | 10000 |       |    21   (0)| 00:00:01 |
      |  12 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |  8168 | 16336 |    13   (0)| 00:00:01 |
      ------------------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
         2 - filter("T1"."N_100"=0)
         3 - filter("V1"."ID"="T1"."N1")
         6 - filter(MOD("T2"."N1",3)=1)
        10 - filter(MOD("T2"."N1",3)=2)
      
      

      In this plan the critical join predicate appears at line 3; the predicate hasn’t been pushed. On the other hand the index() hints in the view have, inevitably, been obeyed (resulting in index full scans), as has the use_nl() hint in the main query – leading to a rather more expensive and time-consuming execution plan.

      The first, quick, debugging step is simply to set the optimizer_features_enable back to 11.1.0.7 – with no effect; the second is to try adding the push_pred() hint to the query – with no effect; the third is to generate the outline section of the execution plans and copy the entire set of hints from the good plan into the bad plan, noting as we do so that the good plan actually uses the hint OLD_PUSH_PRED(@”SEL$1″ “V1″@”SEL$1″ (“T2″.”ID”)) – still no effect.

      Since I happen to know a few things about what is likely to appear in the 10053 (optimizer) trace file, my next step would be to flush the shared pool, enable the trace, and then check the trace file (using grep or find depending on whether I was running UNIX or Windows) for the phrase “JPPD bypassed”; this is what I got:

      
      test_ora_9897.trc:OJPPD:     OJPPD bypassed: View contains TABLE expression.
      test_ora_9897.trc:JPPD:     JPPD bypassed: View not on right-side of outer-join.
      test_ora_9897.trc:JPPD:     JPPD bypassed: View not on right-side of outer-join.
      
      

      So 11.1.0.7 had a plan that used the old_push_pred() hint, but 11.2.0.4 explicitly bypassed the option (the rubric near the top of the trace file translates OJPPD to “old-style (non-cost-based) JPPD”, where JPPD translates to “join predicate push-down”). It looks like the plan we got from 11.1.0.7 has been deliberately blocked in 11.2.0.4. So now it’s time to worry whether or not that means I could have been getting wrong results from 11.1.0.7.

      In my test case, of course, I can bypass the problem by explicitly rewriting the query – but I’ll have to move the join with t1 inside the view for both subqueries; alternatively, given the trivial nature of the pipeline function, I could replace the table() operator with a join to another union all view. In real life such changes are not always so easy to implement.

      Footnote: the restriction is still in place on 12.1.0.2.

      Footnote 2: somewhere I’ve probably published a short note explaining that one of my standard pre-emptive strikes on an upgrade is to run the following command to extract useful information from the executable: “strings -a oracle | grep -v bypass”: it can be very helpful to have a list of situations in which some query transformation is bypassed.

       

      Digital Learning: LVC customer quotes

      LVC

      Here’s a collection of customer quotes as a follow-up to my last post about the importance of attitude towards Live Virtual Classes (LVCs). They are from courses that I have taught personally this year with an average delivery score of about 96%:

      Oracle Database 12c: Managing Multitenant Architecture

      Oracle Grid Infrastructure 11g: Manage Clusterware and ASM

      Oracle Database 12c: New Features for Administrators

      Exadata Database Machine Administration Workshop

      Oracle 11g: RAC and Grid Infrastructure Administration Accelerated

      “My first experience of an LVC. Pleased to say it was very positive. Introduction and start on Monday morning was smooth. I would recommend to my colleagues.”

      Always important to make a good first impression!

      “The whole LVC package just worked. From the comfort of my own environment with a great instructor makes for happy learning :)”

      And that is exactly what we strive to deliver.

      “Both, host and producer were very professional and guided the students through the course.”

      An LVC producer takes care for all technical aspects apart from the course itself, like access to the learning platform. The instructor appears as “host” on the learning platform.

      “Instructor professionally answered students’ questions and kept up a positive mood in the community!”

      LVCs can be funny too :-)

      “I appreciate the way how the course was presented. Very well controlled time, organization of presentation, exercises. Interaction with us was great. Always ready to answer a question, give an examples to difficult topic, illustrating topics.”

      So much about allegedly missing interaction in LVCs.

      “I work few years on RAC databases, my knowledge was not so clear regarding some topic on RAC and Grid after completing this training I’m sure that I will handle our RAC and Grid environment differently and for sure will have positive impact in our production environment. Great thank!”

      You cannot top that with a classroom course either :-)

      “LVC is offering great and flexible way to gain knowledge without travel or hotels etc.”

      “LVCs reduce travel costs and help the students to manage their time on their own, i.e. to join the classes from home and focus on the presented content.”

      Trust me, I didn’t make up the last two although they may sound like your manager talking – or mine, for that matter ;-)

      Tagged: Digital Learning, LVC

      Mobile Front and Center at Oracle Open World 2014

      Wow! Sorry I have not posted anything since Oracle Open World. Lot’s of great new stuff is coming from Oracle that you should look into including some pretty cool stuff for mobile development.

      1. Oracle’s Mobile Application Framework (MAF) is more than just a rework of the ADF Mobile package (all existing ADF Mobile apps will be migrated to MAF); MAF includes new and updated components and a complete ecosystem to support it see OTN for more: http://www.oracle.com/technetwork/developer-tools/maf/overview/index.html
      2. Oracle is sponsoring the Oracle Mobile Framework Developer Challenge to allow developers to show their MAF capabilities: https://blogs.oracle.com/imc/entry/maf12c_developer_challenge
      3. Oracle’s Mobile Suite and Oracle Mobile Security suite provide a comprehensive way to secure web applications:  http://www.oracle.com/technetwork/developer-tools/maf/overview/index.html
      4. Oracle’s User Experience (UX) gets a new UI facelift named “Alta” that goes beyond mere skinning to include a new design philosophy and uses proven patterns in a “mobile first” responsive and whitespace-heavy interface: http://www.oracle.com/webfolder/ux/middleware/alta/index.html
      5. Oracle Enterprise Mobile Platform allows quick-and-easy development mobile applications using simple drag-and-drop wizards – no programming required! Resourceful users build their own applications and developers can use it to prototype and create templates for more-comprehensive MAF applications. http://www.oracle.com/us/technologies/mobile/overview/index.html
      6. Oracle Database 12c provides native REST (http://www.oracle.com/technetwork/developer-tools/rest-data-services/overview/index.html) and JSON (http://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246) to better-support mobile applications.

      I hope this article interests you in learning more about Oracle’s mobility efforts. If you’re in Michigan or Reston please some see my MAF demos!

      EM12c Release 4, Health Overview

      As part of the projects I work on at Oracle, it’s often ensuring that those customers that wish to deploy Enterprise Manager, (EM12c) to large environments, have the correct settings and are tuned to offer the best performance from the Oracle Management repository database, weblogic and up to the console URL’s accessed by users.  This means that these large environments often receive recommendations from our experts  that differ from what EM12c comes “out of the box” with.

      For those that aren’t receiving internal Oracle training on what to look for and how to tune EM12c tiers, there are some new features in release 4 that should be checked out by anyone using EM12c.

      EM12c Health Overview

      The Health Overview is accessible via the Setup menu, (right side of Console), Manage Cloud Control and Health Overview.

      ho1

      We’ll go over each of these new monitoring tools, but the Health Overview includes valuable information about the health of both the Oracle Management Repository, (OMR) and the Oracle Management Service, (OMS).

      The overview page breaks down into easy to understand sections.  The first is basic information and workload on the OMS:

      ho2

      From here you can see all pertinent, high level information about the OMS/OMR, including OMS information, the number of agents, including status counts on availability and if a load balancer is used in the EM12c configuration.  Important target status availability is posted, along with how many administrators have been given access to the EM console.

      Below this we can view the backlog graph on the right hand side of the page:

      ho2

      That right hand graph is important since along with the upload rate, you can see if there is a consistent backlog of XML files to be uploaded and that can signal performance trouble.  A backlog can cause problems, as this beings there is a backlog for the loader can delay receipt of critical alerts and information about a target.  If the backlog becomes too extensive, an agent can reach a threshold on how many files it can handle backlogged and stop collecting, which is a significant issue.  It’s important that if serious backlog issues are noted, tuning options to deter from them, (like add a load balancer to assist in managing the workload or a second OMS.)

      Repository Details

      The next section includes connection information, which also has the service name, the database name and database type, the port and job system status.  On the right is a graph showing if any backoff requests have been sent.  These occur when the OMS is busy processing an XML upload and requests the agent to hold off on sending anymore files until it has finished.

      ho3

      Notification Information

      Scanning down from the backoff graph in the Health Overview displays the Notification backlog graph.  Knowing how backlogged your time-sensitive notifications are performing is crucial when someone asks if there is anyway to know why one of the notifications weren’t received in a timely manner and you can quickly assess if it is an issue with EM12c or if the problem resides elsewhere.

      ho4

      Alerts

      The last section in the health overview covers incident management.  You can easily see if there are any Metric Collection Errors, (corresponding this to the backlog data above if necessary), access related Metric Errors and Related Alerts.

      ho5

      You also can choose to launch the Incident Manager from the Health Overview console if you wish to get more details about all incidents currently in the queue. This view is really to give you a very high level account of what incidents are currently open and related alerts and metric errors.  Use that button to launch the Incident Manager if you wish to see what the alerts are all about.

      We’ll dig into the deeper diagnostic data offered in EM12c, release 4 for the OMR, OMS and Agents in subsequent posts, so until next time!



      Tags:  


      Del.icio.us



      Facebook

      TweetThis

      Digg

      StumbleUpon




      Copyright © DBA Kevlar [EM12c Release 4, Health Overview], All Right Reserved. 2014.

      The First PASS Summit Bloggers’ Meetup

      We are stoked to announce the first ever PASS Summit Bloggers’ Meetup!

      What: PASS Summit Bloggers’ Meetup 2014
      When: Thursday, November 6th, 5pm – 7pm
      Where: Sports Restaurant & Bar, 140 4th Ave North, Suite 130, Seattle, WA 98109
      How: Please comment with “COUNT ME IN” if coming — we need to know attendance numbers.

      We’re excited to meet old friends, and make new ones in Seattle this year. Pythian will sponsor munchies and drinks. There will be a networking contest with some cool prizes, plus everyone will get their very own PASS Summit Bloggers Meetup shirt! Meetups are a great way to make new friends in the community, so come on out — all bloggers are welcome!

      If you are planning to attend, please comment here with the phrase “COUNT ME IN”. This will help us ensure we have the attendance numbers right. Please provide your blog URL with your comment — it’s a Bloggers Meetup after all! Make sure you comment here if you are attending so that we have enough room, food, and drinks.

      The location is perfect to get ready for the appreciation event — minutes walk from EMP Museum! Snacks and drinks before the big event and mingle with fellow bloggers. What can be better?

      Of course, do not forget to blog and tweet about this year’s bloggers meetup using #Summit14 #sqlpass. See you there!

      first_rows(10)

      No, not the 10th posting about first_rows() this week – whatever it may seem like – just an example that happens to use the “calculate costs for fetching the first 10 rows” optimizer strategy and does it badly. I think it’s a bug, but it’s certainly a defect that is a poster case for the inherent risk of using anything other than all_rows optimisation.  Here’s some code to build a couple of sample tables:

      
      begin
      	dbms_stats.set_system_stats('MBRC',16);
      	dbms_stats.set_system_stats('MREADTIM',12);
      	dbms_stats.set_system_stats('SREADTIM',5);
      	dbms_stats.set_system_stats('CPUSPEED',1000);
      end;
      /
      
      create table t1
      as
      with generator as (
      	select	--+ materialize
      		rownum id 
      	from dual 
      	connect by 
      		level <= 1e4
      )
      select
      	rownum					id,
      	trunc(dbms_random.value(1,1000))	n1,
      	lpad(rownum,10,'0')	small_vc,
      	rpad('x',100)		padding
      from
      	generator	v1,
      	generator	v2
      where
      	rownum <= 1e6
      ;
      
      create index t1_n1 on t1(id, n1);
      
      create table t2
      as
      with generator as (
      	select	--+ materialize
      		rownum id 
      	from dual 
      	connect by 
      		level <= 1e4
      )
      select
      	rownum					id,
      	trunc(dbms_random.value(10001,20001))	x1,
      	lpad(rownum,10,'0')	small_vc,
      	rpad('x',100)		padding
      from
      	generator	v1,
      	generator	v2
      where
      	rownum <= 1e6
      ;
      
      create index t2_i1 on t2(x1);
      
      begin
      	dbms_stats.gather_table_stats(
      		ownname		 => user,
      		tabname		 =>'T1',
      		estimate_percent => 100,
      		method_opt	 => 'for all columns size 1'
      	);
      
      	dbms_stats.gather_table_stats(
      		ownname		 => user,
      		tabname		 =>'T2',
      		estimate_percent => 100,
      		method_opt	 => 'for all columns size 1'
      	);
      
      end;
      /
      
      
      create or replace view  v1
      as
      select 
      	id, n1, small_vc, padding
      from	t1 
      where	n1 between 101 and 300
      union all
      select 
      	id, n1, small_vc, padding
      from	t1 
      where	n1 between 501 and 700
      ;
      
      

      The key feature of this demonstration is the UNION ALL view and what the optimizer does with it when we have first_rows_N optimisation – this is a simplified model of a production problem I was shown a couple of years ago, so nothing special, nothing invented. Here’s a query that behaves badly:

      
      select
      	/*+ gather_plan_statistics */
      	v1.small_vc,
      	v1.n1
      from
      	v1,
      	t2
      where
      	t2.id = v1.id
      and	t2.x1 = 15000
      ;
      
      

      I’m going to execute this query in three different ways – as is, using all_rows optimisation; as is, using first_rows_10 optimisation, then using all_rows optimisation but with the necessary hints to make it follow the first_rows_10 execution path. Here are the resulting plans from an instance of 12.1.0.2 (the same thing happens in 11.2.0.4):

      
      first_rows_10 plan
      ----------------------------------------------------------------------------------------------
      | Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
      ----------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                     |       |     1 |    35 |   107   (0)| 00:00:01 |
      |*  1 |  HASH JOIN                           |       |     1 |    35 |   107   (0)| 00:00:01 |
      |   2 |   VIEW                               | V1    |    24 |   600 |     4   (0)| 00:00:01 |
      |   3 |    UNION-ALL                         |       |       |       |            |          |
      |*  4 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
      |*  5 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
      |   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |   100 |  1000 |   103   (0)| 00:00:01 |
      |*  7 |    INDEX RANGE SCAN                  | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
      ----------------------------------------------------------------------------------------------
      
      all_rows plan
      ------------------------------------------------------------------------------------------------
      | Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
      ------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                       |       |    40 |  1400 |   904   (1)| 00:00:01 |
      |   1 |  NESTED LOOPS                          |       |    40 |  1400 |   904   (1)| 00:00:01 |
      |   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |   100 |  1000 |   103   (0)| 00:00:01 |
      |*  3 |    INDEX RANGE SCAN                    | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
      |   4 |   VIEW                                 | V1    |     1 |    25 |     8   (0)| 00:00:01 |
      |   5 |    UNION ALL PUSHED PREDICATE          |       |       |       |            |          |
      |   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
      |*  7 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
      |   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
      |*  9 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
      ------------------------------------------------------------------------------------------------
      
      first_rows_10 plan hinted under all_rows optimisation
      ---------------------------------------------------------------------------------------------- 
      | Id  | Operation                    | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | 
      ---------------------------------------------------------------------------------------------- 
      |   0 | SELECT STATEMENT             |       |   200 |  8600 |       |  6124   (3)| 00:00:01 | 
      |*  1 |  HASH JOIN                   |       |   200 |  8600 |    17M|  6124   (3)| 00:00:01 |
      |   2 |   VIEW                       | V1    |   402K|    12M|       |  5464   (3)| 00:00:01 | 
      |   3 |    UNION-ALL                 |       |       |       |       |            |          | 
      |*  4 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2731   (3)| 00:00:01 | 
      |*  5 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2733   (3)| 00:00:01 | 
      |   6 |   TABLE ACCESS BY INDEX ROWID| T2    |   100 |  1000 |       |   103   (0)| 00:00:01 |
      |*  7 |    INDEX RANGE SCAN          | T2_I1 |   100 |       |       |     3   (0)| 00:00:01 |
      ----------------------------------------------------------------------------------------------
      
      

      I’m not sure why the first_rows_10 plan uses “table access by rowid batched”, but I’d guess it’s because the optimiser calculates that sorting the index rowids before visiting the table may have a small benefit on the speed of getting the first 10 rows – eventually I’ll get around to checking the 10053 trace file. The important thing, though, is the big mistake in the strategy, not the little difference in table access.

      In the first_rows_10 plan the optimizer has decided building an in-memory hash table from the UNION ALL of the rows fetched from the two copies of the t1 table will be fast and efficient; but it’s made that decision based on the assumption that it will only get 10 rows from each copy of the table – and at run-time it HAS to get all the relevant t1 rows to build the hash table before it can get any t2 rows. We can get some idea of the scale of this error when we look at the hinted plan under all_rows optimisation – it’s a lot of redundant data and a very expensive hash table build.

      In contrast the all_rows plan does an efficient indexed access into the t2 table then, for each row, does a join predicate pushdown into the union all view using an indexed access path. If we only wanted to fetch 10 rows we could stop after doing a minimum amount of work. To demonstrate the error more clearly I’ve re-run the experiment for the first two plans from SQL*PLus, setting the arraysize to 11, the pagesize to 5, and stopping after the first 10 rows. Here are the plans showing the rowsource execution stats:

      
      first_rows_10 plan
      ------------------------------------------------------------------------------------------------------------------------------------------------
      | Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
      ------------------------------------------------------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                     |       |      1 |        |   107 (100)|     12 |00:00:00.43 |   35150 |       |       |          |
      |*  1 |  HASH JOIN                           |       |      1 |      1 |   107   (0)|     12 |00:00:00.43 |   35150 |    24M|  3582K|   23M (0)|
      |   2 |   VIEW                               | V1    |      1 |     24 |     4   (0)|    400K|00:00:06.48 |   35118 |       |       |          |
      |   3 |    UNION-ALL                         |       |      1 |        |            |    400K|00:00:04.20 |   35118 |       |       |          |
      |*  4 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.71 |   17559 |       |       |          |
      |*  5 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.63 |   17559 |       |       |          |
      |   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |       |       |          |
      |*  7 |    INDEX RANGE SCAN                  | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |       |       |          |
      ------------------------------------------------------------------------------------------------------------------------------------------------
      
      all_rows plan
      -----------------------------------------------------------------------------------------------------------------------
      | Id  | Operation                              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
      -----------------------------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                       |       |      1 |        |   904 (100)|     12 |00:00:00.01 |     213 |
      |   1 |  NESTED LOOPS                          |       |      1 |     43 |   904   (1)|     12 |00:00:00.01 |     213 |
      |   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |
      |*  3 |    INDEX RANGE SCAN                    | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |
      |   4 |   VIEW                                 | V1    |     28 |      1 |     8   (0)|     12 |00:00:00.01 |     181 |
      |   5 |    UNION ALL PUSHED PREDICATE          |       |     28 |        |            |     12 |00:00:00.01 |     181 |
      |   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    212K|     4   (0)|      8 |00:00:00.01 |      93 |
      |*  7 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      8 |00:00:00.01 |      85 |
      |   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    213K|     4   (0)|      4 |00:00:00.01 |      88 |
      |*  9 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      4 |00:00:00.01 |      84 |
      -----------------------------------------------------------------------------------------------------------------------
      
      

      If I had set the optimizer_mode to first_rows_10 because I really only wanted to fetch (about) 10 rows then I’ve managed to pay a huge overhead in buffer visits, memory and CPU for the privilege – the all_rows plan was much more efficient.

      Remember – we often see cases where the first_rows(n) plan will do more work to get the whole data set in order to be able to get the first few rows more quickly (the simplest example is when the optimizer uses a particular index to get the first few rows of a result set in order without sorting rather than doing a (faster) full tablescan with sort. This case, though, is different: the optimizer is choosing to build a hash table as if it only has to put 10 rows into that hash table when it actually HAS to build the whole has table before it can take any further steps – we don’t get 10 rows quicker and the rest more slowly; we just get 10 very slow rows.

      Footnote:

      It’s possible that this is an example of bug 9633142: (FIRST_ROWS OPTIMIZER DOES NOT PUSH PREDICATES INTO UNION INLINE VIEW) but that’s reported as fixed in 12c, with a couple of patches for 11.2.0.2/3. However, setting “_fix_control”=’4887636:off’, does bypass the problem. (The fix control, introduced in 11.1.0.6 has description: “remove restriction from first K row optimization”)