Oakies Blog Aggregator

Hadoop for Database Professionals class at NoCOUG Fall Conference on 9th Nov

If you happen to be in Bay Area on Thursday 9th November, then come check out the NoCOUG Fall Conference in California State University in downtown Oakland, CA.

Gluent is delivering a Hadoop for Database Professionals class as a separate track there (with myself and Michael Rainey as speakers) where we’ll explain the basics & concepts of modern distributed data processing platforms and then show a bunch of Hadoop demos too (mostly SQL-on-Hadoop stuff that database folks care about).

This should be a useful class to attend if you are wondering why all the buzz about Hadoop and various distributed “Big Data” computing technologies – and where do these technologies work well (and not work) in a traditional enterprise. All explained using database professionals’ terminology. And there’s a networking event in the end!

You can check out the event agenda here and can RSVP at http://nocoug.org/rsvp.html. If you aren’t already a NoCOUG member, you can still attend for free as a Gluent Guest using the secret code…. “GLUENT” :-)

See you soon!

NB! I am running one more Advanced Oracle Troubleshooting training in 2018! You can attend the live online training and can download personal video recordings too. The Part 1 starts on 29th January 2018 - sign up here!

Max PDBs in Standard Edition

Here is a small warning. In Standard Edition, you may expect that the features that are not available for your edition are blocked, but in 12.2 you can create more than one PDB in Standard Edition, and you should set MAX_PDBS to prevent that.

12cR1

In 12.1 Standard Edition, when you try to create more than one PDB (i.e with CON>ID > 3) you get an error:

ORA-65010: maximum number of pluggable databases created

12cR2

But it seems that this has been lost in 12.2:


oracle@SE122 ~$ sqlplus / as sysdba
 
SQL*Plus: Release 12.2.0.1.0 Production on Thu Oct 10 11:41:56 2017
 
Copyright (c) 1982, 2016, Oracle. All rights reserved.
 
 
Connected to:
Oracle Database 12c Standard Edition Release 12.2.0.1.0 - 64bit Production
 
SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB1 READ WRITE NO
 
SQL> create pluggable database PDB2 admin user admin identified by me;
 
Pluggable database created.
 
SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB1 READ WRITE NO
4 PDB2 MOUNTED
SQL> alter pluggable database PDB2 open;
 
Pluggable database altered.

So, if you are in Standard Edition, don’t forget to set MAX_PDBS to 1:

SQL> alter pluggable database PDB2 close immediate;
 
Pluggable database altered.
 
SQL> drop pluggable database PDB2 including datafiles;
 
Pluggable database dropped.
 
SQL> alter system set max_pdbs=1;
 
System altered.
 
SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB1 READ WRITE NO
 
12:18:28 SQL> create pluggable database PDB2 admin user pdbadmin identified by oracle;
create pluggable database PDB2 admin user pdbadmin identified by oracle
*
ERROR at line 1:
ORA-65010: maximum number of pluggable databases created

 

Cet article Max PDBs in Standard Edition est apparu en premier sur Blog dbi services.

nVision Performance Tuning: 3. Indexing of Ledger, Budget, and Summary Ledger Tables on Non-Engineered Oracle Systems

This blog post is part of a series that discusses how to get optimal performance from PeopleSoft nVision reporting as used in General Ledger.

If you are on a conventional non-engineered Oracle system, then it is not going to be viable to full scan the ledger tables for every query.  You are going to have to use indexes to find your data.  Every customer's nVision reports are unique to that customer and the nature of their business.  Different customers will analyse their data by different combinations of attributes.  Indexes will be needed to match those analysis criteria.
A survey of the nVision reports and the SQL they produce will reveal the various sets of columns by which ledger data is queried. The heaviest SQL statements can be identified from Active Session History (ASH) or the Automatic Workload Repository (AWR) (or Statspack if the Diagnostics pack is not licenced).   As an example, let's take the following two statements as examples, and consider the indexes that would be needed to support them.

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 13.6px; overflow: auto; padding-left: 4px; padding-right: 4px; width: 620.35px;">SELECT L4.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT) 
FROM PS_LEDGER A, PSTREESELECT10 L4, PSTREESELECT10 L2, PSTREESELECT10 L
WHERE A.LEDGER='ACTUALS'
AND A.FISCAL_YEAR=2017
AND A.ACCOUNTING_PERIOD=1
AND L4.SELECTOR_NUM=78948
AND A.CHARTFIELD3>= L4.RANGE_FROM_10 AND A.CHARTFIELD3 <= L4.RANGE_TO_10
AND L4.TREE_NODE_NUM BETWEEN 1000000000 AND 2000000000
AND L2.SELECTOR_NUM=92481 AND A.ACCOUNT=L2.RANGE_FROM_10
AND (A.DEPTID BETWEEN '10000' AND '18999'

OR A.DEPTID='29150'
OR A.DEPTID=' ')
AND L.SELECTOR_NUM=92469 AND A.CHARTFIELD1=L.RANGE_FROM_10
AND A.CURRENCY_CD='USD'
GROUP BY L4.TREE_NODE_NUM
  • nVision queries will always contain single value predicates on LEDGER and FISCAL_YEAR.
  • You may see some queries on a single ACCOUNTING_PERIOD, as in the query above, usually the current period.  Otherwise, they might be on a range of ACCOUNTING_PERIODs, usually the year to date as in the example below.
  • Then there will predicates on the other analysis columns, in this case, DEPTID, ACCOUNT, CHARTFIELD1, CHARTFIELD3.
  • You may see criteria on CURRENCY_CD, however, this may not be very selective.  Generally, most of the queries will be on the base currency, and most of the data will be in the base currency.
  • Sometimes you may see a criterion on STATISTICS_CODE=' '.  Normally, only a very few ledger rows have statistics code, so there is no benefit in adding STATISTICS_ CODE to any index.
  • Generally, single value predicate columns should be placed at the front of the index, followed by the other analysis criteria.
  • However, I would not attempt to overload the index with additional non-selective columns because unless you index all the referenced columns, including the amount columns, you will not be able to satisfy the query from the index only, and you will still have to visit the table.
  • So for this query, you might build an index on the following columns: LEDGER, FISCAL_YEAR, ACCOUNTING_PERIOD, DEPTID, ACCOUNT, CHARTFIELD1, CHARTFIELD3, CURRENCY_CD

This query, from the same system, is on a different combination of columns

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 13.6px; overflow: auto; padding-left: 4px; padding-right: 4px; width: 620.35px;">SELECT L2.TREE_NODE_NUM,L3.TREE_NODE_NUM,L4.TREE_NODE_NUM,SUM(A.POSTED_BASE_AMT) 
FROM PS_LEDGER A, PSTREESELECT05 L1, PSTREESELECT10, L2, PSTREESELECT10 L3, PSTREESELECT10 L4, PSTREESELECT10 L
WHERE A.LEDGER='ACTUALS' AND A.FISCAL_YEAR=2015
AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 11
AND L1.SELECTOR_NUM=30369 AND A.BUSINESS_UNIT=L1.RANGE_FROM_05
AND L2.SELECTOR_NUM=30374 AND A.CHARTFIELD1=L2.RANGE_FROM_10
AND L3.SELECTOR_NUM=30375 AND A.ACCOUNT=L3.RANGE_FROM_10
AND L4.SELECTOR_NUM=30376 AND A.CHARTFIELD2=L4.RANGE_FROM_10
AND L.SELECTOR_NUM=30372 AND A.CHARTFIELD1=L.RANGE_FROM_10
AND A.CURRENCY_CD='USD'
GROUP BY L2.TREE_NODE_NUM,L3.TREE_NODE_NUM,L4.TREE_NODE_NUM
  • In this case, the above query is part of a fiscal year-to-date report, returning the same period in a previous fiscal year.  The system is currently in period 11, so this report is looking at accounting periods 1 to 11.  ACCOUNTING_PERIOD is a much less selective criterion in this case.  If partitioning option were not available, I might be tempted to demote ACCOUNTING_PERIOD to further down the list of indexed columns if the criterion on the next column is more selective.
  • This query uses 5 trees.  A cartesian join of all 5 is likely to produce a product so large that it will not be effective to do index lookups for each member of the product.  The columns with the most selective criteria should be earlier in the index.  It might not even be worth including columns with non-selective criteria in the index.
  • So you might build an index on the following columns: LEDGER, FISCAL_YEAR,  BUSINESS_UNIT, ACCOUNT, CHARTFIELD1, CHARTFIELD2, CURRENCY_CD, ACCOUNTING_PERIOD.

Recommendations

It is easy to see how you might build up a large number of indexes on the ledger and summary ledger tables.  You might have pairs of indexes, one for single accounting periods and perhaps another similar one for year-to-date for each set of analysis criteria.
However, as you add indexes the overhead of index maintenance experienced by any processes that post to the ledger, or by the summary ledger build process will increase.  There is a balance to be struck. You will have to build as many indexes with as many columns as are necessary, but as few as possible.
If you partition the tables on FISCAL_YEAR and ACCOUNTING_PERIOD and locally partition the indexes, you will only need one of the indexes.
Index leaf block dictionary compression can significantly reduce the size of the index.  Any increase in the CPU overhead of using and maintaining the index will be more than offset by the reduction in I/O overhead.  The optimal prefix length can be calculated using the command ANALYZE INDEX … VALIDATE STRUCTURE.  This feature does not require the Advanced Compression licence.

“Oh…another language is too hard”

We had a request on AskTOM a few days ago asking for an implementation of the XIRR function in PL/SQL.

I didn’t really know much about what that function was, or what it did, but a quick web search yielded plenty of examples in Excel, where it is a pre-delivered function, as described here:

“Returns the internal rate of return for a schedule of cash flows that is not necessarily periodic.”

That explains what the function does, but doesn’t explain how the function result is derived.  So I pushed back to the author of the question saying that if they could give us an algorithm for the function, then we could take a look at implementing the algorithm in PL/SQL.  After some research they came back with the following diagram taken from: http://maestro.ipcc.ca/files/rate_of_return/XIRR_ROR.pdf

image

The formula with some cool mathematical symbols on the page looks like an algorithm but it is not – the real nuts and bolts is in bullet point “3”.  There is no “formula” as such for XIRR – we need an iterative approach, namely, try an initial set of conditions that yields an answer “close” to the desired result, but if not, iterate continuously using previous results to slowly “narrow in” on the result until we meet some nominated tolerance value.  This was starting to sound complicated Smile

Luckily, our author also posted a link to a solution that has been written in C##.  So I posed myself the question – “How hard is it to port from C##…” (a language I have never used professionally as a developer) “… run in the database as PL/SQL?”.  After all, often we hear that database-centric code cannot be tackled in an organization because the task of understanding PL/SQL as a non-PL/SQL developer is simply “too onerous”.

Here is the original C## code.


using System;
using System.Collections.Generic;
using System.Linq;


namespace Xirr
{
    public class Program
    {
        private const Double DaysPerYear = 365.0;
        private const int MaxIterations = 100;
        private const double DefaultTolerance = 1E-6;
        private const double DefaultGuess = 0.1;

    private static readonly Func, Double> NewthonsMethod =
        cf => NewtonsMethodImplementation(cf, Xnpv, XnpvPrime);

    private static readonly Func, Double> BisectionMethod =
        cf => BisectionMethodImplementation(cf, Xnpv);

    public static void Main(string[] args)
    {
        RunScenario(new[]
            {
                // this scenario fails with Newton's but succeeds with slower Bisection
                new CashItem(new DateTime(2012, 6, 1), 0.01),
                new CashItem(new DateTime(2012, 7, 23), 3042626.18),
                new CashItem(new DateTime(2012, 11, 7), -491356.62),
                new CashItem(new DateTime(2012, 11, 30), 631579.92),
                new CashItem(new DateTime(2012, 12, 1), 19769.5),
                new CashItem(new DateTime(2013, 1, 16), 1551771.47),
                new CashItem(new DateTime(2013, 2, 8), -304595),
                new CashItem(new DateTime(2013, 3, 26), 3880609.64),
                new CashItem(new DateTime(2013, 3, 31), -4331949.61)
            });
        RunScenario(new[]
            {
                new CashItem(new DateTime(2001, 5, 1), 10000),
                new CashItem(new DateTime(2002, 3, 1), 2000),
                new CashItem(new DateTime(2002, 5, 1), -5500),
                new CashItem(new DateTime(2002, 9, 1), 3000),
                new CashItem(new DateTime(2003, 2, 1), 3500),
                new CashItem(new DateTime(2003, 5, 1), -15000)
            });
    }

    private static void RunScenario(IEnumerable cashFlow)
    {
        try
        {
            try
            {
                var result = CalcXirr(cashFlow, NewthonsMethod);
                Console.WriteLine("XIRR [Newton's] value is {0}", result);
            }
            catch (InvalidOperationException)
            {
                // Failed: try another algorithm
                var result = CalcXirr(cashFlow, BisectionMethod);
                Console.WriteLine("XIRR [Bisection] (Newton's failed) value is {0}", result);
            }
        }
        catch (ArgumentException e)
        {
            Console.WriteLine(e.Message);
        }
        catch (InvalidOperationException exception)
        {
            Console.WriteLine(exception.Message);
        }
    }

    private static double CalcXirr(IEnumerable cashFlow, Func, double> method)
    {
        if (cashFlow.Count(cf => cf.Amount > 0) == 0)
            throw new ArgumentException("Add at least one positive item");

        if (cashFlow.Count(c => c.Amount < 0) == 0)
            throw new ArgumentException("Add at least one negative item");

        var result = method(cashFlow);

        if (Double.IsInfinity(result))
            throw new InvalidOperationException("Could not calculate: Infinity");

        if (Double.IsNaN(result))
            throw new InvalidOperationException("Could not calculate: Not a number");

        return result;
    }

    private static Double NewtonsMethodImplementation(IEnumerable cashFlow,
                                                      Func, Double, Double> f,
                                                      Func, Double, Double> df,
                                                      Double guess = DefaultGuess,
                                                      Double tolerance = DefaultTolerance,
                                                      int maxIterations = MaxIterations)
    {
        var x0 = guess;
        var i = 0;
        Double error;
        do
        {
            var dfx0 = df(cashFlow, x0);
            if (Math.Abs(dfx0 - 0) < Double.Epsilon)
                throw new InvalidOperationException("Could not calculate: No solution found. df(x) = 0");

            var fx0 = f(cashFlow, x0);
            var x1 = x0 - fx0/dfx0;
            error = Math.Abs(x1 - x0);

            x0 = x1;
        } while (error > tolerance && ++i < maxIterations);
        if (i == maxIterations)
            throw new InvalidOperationException("Could not calculate: No solution found. Max iterations reached.");

        return x0;
    }

    internal static Double BisectionMethodImplementation(IEnumerable cashFlow,
                                                         Func, Double, Double> f,
                                                         Double tolerance = DefaultTolerance,
                                                         int maxIterations = MaxIterations)
    {
        // From "Applied Numerical Analysis" by Gerald
        var brackets = Brackets.Find(Xnpv, cashFlow);
        if (Math.Abs(brackets.First - brackets.Second) < Double.Epsilon)
            throw new ArgumentException("Could not calculate: bracket failed");

        Double f3;
        Double result;
        var x1 = brackets.First;
        var x2 = brackets.Second;

        var i = 0;
        do
        {
            var f1 = f(cashFlow, x1);
            var f2 = f(cashFlow, x2);

            if (Math.Abs(f1) < Double.Epsilon && Math.Abs(f2) < Double.Epsilon)
                throw new InvalidOperationException("Could not calculate: No solution found");

            if (f1*f2 > 0)
                throw new ArgumentException("Could not calculate: bracket failed for x1, x2");

            result = (x1 + x2)/2;
            f3 = f(cashFlow, result);

            if (f3*f1 < 0)
                x2 = result;
            else
                x1 = result;
        } while (Math.Abs(x1 - x2)/2 > tolerance && Math.Abs(f3) > Double.Epsilon && ++i < maxIterations);

        if (i == maxIterations)
            throw new InvalidOperationException("Could not calculate: No solution found");

        return result;
    }

    private static Double Xnpv(IEnumerable cashFlow, Double rate)
    {
        if (rate <= -1)
            rate = -1 + 1E-10; // Very funky ... Better check what an IRR <= -100% means

        var startDate = cashFlow.OrderBy(i => i.Date).First().Date;
        return
            (from item in cashFlow
             let days = -(item.Date - startDate).Days
             select item.Amount*Math.Pow(1 + rate, days/DaysPerYear)).Sum();
    }

    private static Double XnpvPrime(IEnumerable cashFlow, Double rate)
    {
        var startDate = cashFlow.OrderBy(i => i.Date).First().Date;
        return (from item in cashFlow
                let daysRatio = -(item.Date - startDate).Days/DaysPerYear
                select item.Amount*daysRatio*Math.Pow(1.0 + rate, daysRatio - 1)).Sum();
    }

    public struct Brackets
    {
        public readonly Double First;
        public readonly Double Second;

        public Brackets(Double first, Double second)
        {
            First = first;
            Second = second;
        }

        internal static Brackets Find(Func, Double, Double> f,
                                      IEnumerable cashFlow,
                                      Double guess = DefaultGuess,
                                      int maxIterations = MaxIterations)
        {
            const Double bracketStep = 0.5;
            var leftBracket = guess - bracketStep;
            var rightBracket = guess + bracketStep;
            var i = 0;
            while (f(cashFlow, leftBracket)*f(cashFlow, rightBracket) > 0 && i++ < maxIterations)
            {
                leftBracket -= bracketStep;
                rightBracket += bracketStep;
            }

            return i >= maxIterations
                       ? new Brackets(0, 0)
                       : new Brackets(leftBracket, rightBracket);
        }
    }

    public struct CashItem
    {
        public DateTime Date;
        public Double Amount;

        public CashItem(DateTime date, Double amount)
        {
            Date = date;
            Amount = amount;
        }
    }
}

And after …maybe…30 minutes, I got a PL/SQL version up and running.


SQL> set serverout on
SQL> declare
  2
  3    cDaysPerYear      number := 365.0;
  4    cMaxIterations    number := 100;
  5    cDefaultTolerance number := 1E-6;
  6    cDefaultGuess     number := 0.1;
  7    cDouble_Epsilon   number := 0.00000000001;
  8
  9    type cash_obj is record ( dte date, amount number );
 10    type cash_list is table of cash_obj index by pls_integer;
 11    cash_item cash_list;
 12
 13    type bracket_rec is record ( frst number, second number );
 14
 15    function Xnpv(cashFlow cash_list, prate number) return number is
 16      startdate date := cashFlow(1).dte;
 17      dys number;
 18      tot number := 0;
 19      rate number := prate;
 20    begin
 21        if (rate <= -1) then rate := -1 + 1E-10; end if;
 22
 23        for item in 1 .. cashFlow.count loop
 24             dys := -(cashFlow(item).Dte - startDate);
 25             tot := tot + cashFlow(item).Amount*Power(1 + rate, dys/cDaysPerYear);
 26        end loop;
 27        return tot;
 28    end;
 29
 30    function XnpvPrime(cashFlow cash_list, prate number) return number is
 31      startdate date := cashFlow(1).dte;
 32      daysRatio number;
 33      tot number := 0;
 34      rate number := prate;
 35    begin
 36        for item in 1 .. cashFlow.count loop
 37             daysRatio := -(cashFlow(item).Dte - startDate)/cDaysPerYear;
 38             tot := tot + cashFlow(item).Amount*daysRatio*Power(1 + rate, daysRatio - 1);
 39        end loop;
 40        return tot;
 41    end;
 42
 43    function BracketsFind(cashFlow cash_list,guess number := cDefaultGuess) return bracket_rec is
 44      x bracket_rec;
 45      bracketStep number := 0.5;
 46      leftBracket number := guess - bracketStep;
 47      rightBracket number := guess + bracketStep;
 48      i int := 0;
 49    begin
 50      while Xnpv(cashFlow, leftBracket)*Xnpv(cashFlow, rightBracket) > 0
 51      loop
 52          leftBracket := leftBracket - bracketStep;
 53          rightBracket:= rightBracket + bracketStep;
 54          i := i + 1;
 55      end loop;
 56
 57      if i >= cmaxIterations then
 58         x.frst := 0;
 59         x.second := 0;
 60         return x;
 61      else
 62         x.frst := leftBracket;
 63         x.second := rightBracket;
 64         return x;
 65      end if;
 66    end;
 67
 68    function BisectionMethodImplementation(cashFlow cash_list,tolerance number := cDefaultTolerance, maxIterations number := cMaxIterations) return number is
 69
 70       brackets bracket_rec;
 71       f3 number;
 72       result number;
 73       x1 number;
 74       x2 number;
 75       i int := 0;
 76       f1 number;
 77       f2 number;
 78    begin
 79       brackets := BracketsFind(cashFlow);
 80       if Abs(brackets.Frst - brackets.Second) < cDouble_Epsilon then
 81          raise_application_error(-20000,'Could not calculate: bracket failed');
 82       end if;
 83       x1 := brackets.Frst;
 84       x2 := brackets.Second;
 85
 86       loop
 87
 88          f1 := Xnpv(cashFlow, x1);
 89          f2 := Xnpv(cashFlow, x2);
 90
 91          if Abs(f1) < cDouble_Epsilon and Abs(f2) < cDouble_Epsilon then
 92              raise_application_error(-20000,'Could not calculate: No solution found');
 93          end if;
 94
 95          if (f1*f2 > 0) then
 96              raise_application_error(-20000,'Could not calculate: bracket failed for x1, x2');
 97          end if;
 98
 99          result := (x1 + x2)/2;
100          f3 := Xnpv(cashFlow, result);
101
102          if (f3*f1 < 0) then
103              x2 := result;
104          else
105              x1 := result;
106          end if;
107          i := i + 1;
108          exit when not
109            (Abs(x1 - x2)/2 > tolerance and Abs(f3) > cDouble_Epsilon and i < cmaxIterations);
110       end loop;
111
112       if (i = cmaxIterations) then
113          raise_application_error(-20000,'Could not calculate: No solution found');
114       end if;
115
116       return result;
117    end;
118
119    function NewtonsMethodImplementation(cashFlow cash_list,tolerance number := cDefaultTolerance, maxIterations number := cMaxIterations,guess number := cDefaultGuess) return number is
120        x0 number := guess;
121        i int := 0;
122        error number;
123        fx0 number;
124        x1 number;
125        dfx0 number;
126    begin
127        loop
128            dfx0 := XnpvPrime(cashFlow, x0);
129            if (Abs(dfx0 - 0) < cDouble_Epsilon) then
130                raise_application_error(-20000,'Could not calculate: No solution found. df(x) = 0');
131            end if;
132
133            fx0 := Xnpv(cashFlow, x0);
134            x1 := x0 - fx0/dfx0;
135            error := Abs(x1 - x0);
136
137            x0 := x1;
138          i := i + 1;
139          exit when not (error > tolerance and  i < cmaxIterations );
140        end loop;
141
142        if (i = maxIterations) then
143            raise_application_error(-20000,'Could not calculate: No solution found. Max iterations reached.');
144        end if;
145        return x0;
146      end;
147
148      function CalcXirr(cashFlow cash_list) return number is
149        ok boolean := false;
150      begin
151        for i in 1 .. cashFlow.count loop
152           ok := ok or cashFlow(i).amount > 0;
153        end loop;
154        if not ok then raise_application_error(-20000,'Add at least one positive item');   end if;
155
156        ok := false;
157        for i in 1 .. cashFlow.count loop
158           ok := ok or cashFlow(i).amount < 0;
159        end loop;
160        if not ok then raise_application_error(-20000,'Add at least one negative item');   end if;
161
162  --      return BisectionMethodImplementation(cashFlow);
163        return NewtonsMethodImplementation(cashFlow);
164
165      end;
166
167  begin
168
169    cash_item(1).dte := date '2008-01-01';
170    cash_item(2).dte := date '2008-03-01';
171    cash_item(3).dte := date '2008-10-30';
172    cash_item(4).dte := date '2009-02-15';
173    cash_item(5).dte := date '2009-04-01';
174
175    cash_item(1).Amount := -10000;
176    cash_item(2).Amount := 2750;
177    cash_item(3).Amount := 4250;
178    cash_item(4).Amount := 3250;
179    cash_item(5).Amount := 2750;
180
181    dbms_output.put_line(CalcXirr(cash_item));
182  end;
183  /
.3733625335188315103083924482521883278399

PL/SQL procedure successfully completed.

I validated the results against various data examples I found on the web.  The example above maps to the example in the Excel link previous mentioned.

 

image

Let me assure you, I am in no way claiming that this was some sort of incredible feat on my part.  It was simply taking each function in turn, making some assumptions about the variables and the coding constructs and iteratively converting the code over, and then testing it for validity within an anonymous block.  And similarly, the PL/SQL code is not a shining example of quality coding standards – quite the opposite.  It deliberately is trying to stay close to the naming standards and conventions that were in the C## source.  Besides a couple of web searches (for example, to examine what Double.Epilson represented) the process was relatively straightforward.

You may be rolling your eyes and thinking to yourself: “Here we go….he’s going to tell us that C# is junk and that we should refactor our C# applications into the database”, but that is definitely not the case.  My point is – if you have some data-intensive code in the application tier that will get benefits from locating that code closer to the database, you might surprise yourself by discovering how easy it is to port the logic from one coding language to another, in this case, PL/SQL.  And for your Oracle databases, there is no simply better data-centric language than PL/SQL.

International Woman Traveling with a 16in Carry-on

I travel and a seriously good amount.  I will present at about 35-40 events, attend company functions/meetings and this year with a considerable amount of those outside of the Denver area, this means I’ll be on a plane around 5-6 trips per month.

Women commonly pack more, but I receive a few comments per my average three day trip on how I’m able to travel with a 16 inch carry-on.  I’m picking up new tricks every day, but the one thing I’ve come to learn, once you find a great travel trick, you have a tendency to look no further so I thought I’d share and hopefully it will provide value to others who travel.  These tips and tricks aren’t just for the women travelers, either, so guys, please do read on.  I’ll make sure to include links to anything I’m able to as I build out this post and may revisit it to add updates as I come across new products or tips.

Suitcases

So it took me a while, but as varied as my trips can be, I have two suitcases and two backpacks.  They’re all carry-on size, as I would sooner give up my right hand than my baggage to the airlines.  I shop primarily at TJ Maxx and when in Europe, I love looking around as they have the most awesome bags that are FUNCTIONAL and small.  I hate waste and bling will annoys the heck out of me… </p />
</span></p></div></div>

    	  	<div class=

Database is more than persistence. It is shared. My thoughts about Microservice Database.

The development teams often see the database as a ‘persistence layer’ only. And sometimes it is mentioned that the database is always the cause of the problems, especially when implementing continuous integration and trying to be agile. Then cames the idea to have this ‘persistence layer’ in an isolated environment, such as a docker container, with the database for each developer. However, this overlooks the real cause of the problems, which is not the persistence function of the database, but the fact that it is shared. And when you share something, in a multi-user environment, you reach another level of complexity. And if you are not prepared for that, you perceive it as a problem.

This philosophical blog post contains some cool words which, in my opinion, must be carefully considered when dealing database: agile, persistence, stateless, docker, microservices, stateless, NoSQL, containers, lake, cloud,…


When I hear about ‘persistence layer’ it always reminds me a meeting a few years ago where the architects of a new application draw on the whiteboard something that they could have called microservices a few years later. Dedicated web services reading data, processing it, and writing the result to the ‘persistence layer’. And when the data was only a transient interface between two of those services, they’ve written ‘temporary persistence’ on the whiteboard diagram. When you come with two antonyms to define something, there is probably something wrong in the design.

Commitment

In order to understand what’s beyond persistence, you should think about what happens when you:
SQL> commit work;
Commit complete.

First, the commit statement closes a transaction that has been started by the first DML. That’s the first purpose of a database management system: process transactions. This is already something that is not very clear for some development architects. They came with stateless services, just because thinking about transactions is not easy when there is no one-to-one relationship between the database session and the end-user (as in client/server on dedicated terminal). So they designed stateless services. Then they encountered issues such as lost updates as soon as they had more than one user. And then implemented optimistic locking in their application server, not always in the right layer. I recommend this talk from Vlad Mihalcea about transactions.

Second, the commit statement ensures that changes are persistent and durable. The changes are written to disk, to backups, and to standby databases. This could be as simple as a write call, but is a bit more complex for performance reasons: random writes done on cache, written to disk asynchronously, redo transaction log written sequentially, synced on commit. Here, the developer do not need to care about the underlying mechanisms, beyond just trying to change only what is needed and commit only when needed.

Third, the commit marks the changes as publicly visible. Because the data is shared among multiple users. This is why developing and testing on your own personal database is limited to unit tests. As soon as you have to work on implementing a real use-case, you must work on a database shared by different services. I have seen enterprises going early on agile Database as a Service for agile development where each developer was working on his own copy (thin clone) of the database. They are now asking for common environments where multiple developers can work and test their different part of the application. Back to this pre-agile idea, and back to the same problem: the database is shared.

Finally, I think that some developers like to see the database as only a ‘persistence layer’ just because it is easier. You can be lazy and let the database system and the database administrators manage the durability of the data. The non-optimal performance will be compensated by software and hardware. And ignoring the two other properties of a database system is just walking away from the complexity. NoSQL to ignore transactions and consistencies, and containers to ignore the sharing concern.

Impedance Mismatch

This is not new. ‘Object Oriented’ was the buzzword before ‘buzzword’ itself was a buzzword. And OO development was ok until it had to manipulate data from the database. They called ‘impedance mismatch’ the problems encountered when trying to match the object-oriented model with the relational model. And they built frameworks to do this matching as if it were simple, such as matching the Object Identity concept with the Primary Key concept. And my experience is that this was nice to build good Proof or Concepts, but failed in production on consistency, performance, and scalability. Object Oriented development is good for non-shared transient objects. A GUI is based on objects, such as a window or a text field, where the object identity is the address in memory. If you restart the application, it is another object, with different address memory. And the GUI on your colleague computer is again another object. It is not persistent but transient. And it is not shared but all different. Applying this model to data doesn’t work. You can simulate persistency with an Object-Relational mapping (ORM) but sharing will be a problem. Those ORM usually work on proxy cached objects in the application server, trying to re-invent the management of concurrent changes, without using the database system which is built for that.

RDBMS

The current shift from ‘database technology’ to ‘developer community’ is probably a good idea, but only if we do not do the same errors such as using the database as a black box to persist objects. We must keep in mind the reasons why Relational Database Management Systems were built for.

Before RDBMS, data was modeled hierarchically. This was good for performance (because data is clustered) but was good for only one use-case. Other use-cases had to replicate the data into another physical design. The relational modeling stores data in a way it can be used by all use-cases. For example, You can look at the orders from one customer, or for the orders on one product. The business entities are in different tables and are joined dynamically when queried. This is a data point of view. The developer builds something different because the developer works on one use-case. This was the trend for XML a few years ago, and JSON now, bringing back the hierarchical model that failed decades ago. Very good to work on your use-case, but very bad when data is shared with other use cases. You have good Proof of Concept and good unit test. But integration will be impossible. I think that we will see the same with microservices: each one designed for its own usage without thinking about other (including future) use-cases.

Before RDBMS, data definition was included in the code for each use-case. A COBOL program had a data division describing the structure of the data that will be accessed by the program. This was not good for shared databases because one structural change had to change all programs. And this is why the relational model was introduced with physical and logical independence. The logical model is designed for the application, and it is the common interface between the developers and the database administrators. Here again, I’ve seen some application architects going backward, using ORM mapping used in the same way the COBOL data division was used in the past.

Microservice Database

Today, developers are running their code in containers. This has the same properties as the OO development I described before: isolated and transient. Docker starts a few processes in its own image of the disk. This is perfect. But, like they did with OO, they try to bring this idea to the database. And that will fail again because, except in early stage of development, you need a database that is shared and persistent.

In my opinion, running the Oracle Database in a Docker container gives a wrong message to the developer community. A database involves a lot of resources and is highly coupled with the OS. My lab databases on my laptop are VirtualBox VMs. Easy to build, easy to start, and easy to share. I’ve tested database on docker 3 years ago to see how we can build data virtualization, and thin clones for development, with a standby database on Docker. I’ve written an article about that for DOAG. The technology has evolved but I think that it is not a good idea except for the fun of trying something new.

Today, we have containers on the database itself, with multitenant pluggable databases and application containers. You can provision a pluggable database for each docker container running the application. And you have easy ways to copy, clone, refresh or have common data. This is ok for development.

Application containers have been introduced for SaaS: each application tenant has its own pluggable database. The data in the PDB is still shared by all application use-cases, and metadata (and some reference tables) is shared in application root by several applications.

However, we also see in 12cR2, and probably more in 18c, some cross-PDB DML which allows queries and modification among several pluggable databases. It is not yet stable, using database links for modifications (see here). But I can imagine that it will be fixed in 18c or later being able to update different PDBs within the same transaction.

And then, I foresee how this will be perceived by some developer architects (those ignoring that the database must be shared). They will build microservices, with small application servers running in a Docker container. And they will map one PDB for each service, doing something like a Microservice Database. Some will store data in XML, others in JSON, and some in relational tables. All those will be consolidated into a multi-tenant database to be managed easily. And my fear is that we will see a large bowl of spaghetti to exchange data between those Microservice PDBs using database links, common views, cross-PDB DML, and maybe some logical replication.

In my opinion, microservices can be an interesting design for future applications, but trying to map the same idea to the database will be a failure if developers don’t realize that the database is not there only to persist data but also to share it. But that’s just my own current opinion and I’ve no idea about the future. Maybe all this data modified by microservices will be shared in real time in a big Data Lake universally available on the cloud, and all IT problems about concurrency and latency will be solved.

 

Cet article Database is more than persistence. It is shared. My thoughts about Microservice Database. est apparu en premier sur Blog dbi services.

Archivelog deletion policy on Data Guard configuration

The deletion policy on a dataguard configuration should be:

CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;
for the site where you don’t backup. It can be the standby or the primary.

and:

CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK;
for the site where you do the backups. It can be the primary or the standby.

I’ve always configured it in this way, but I recently discovered that the order of the subclause matters. Do not CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY; because no archivelogs will be reclaimable, and your recovery area will be full. This is probably a bug. I’ll update this post when I have more information about this.

Test case

I’m on the standby database where I do the backups:

 
Recovery Manager: Release 12.2.0.1.0 - Production on Sun Oct 22 17:37:18 2017
 
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
 
RMAN-06568: connected to target database: CDB2 (DBID=693620717, not open)
 
RMAN>
RMAN-03029: echo set on
 

I start with all default configuration:

RMAN> show all;
RMAN-06607: RMAN configuration parameters for database with db_unique_name CDB2B are:
RMAN-01005: CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
RMAN-01005: CONFIGURE BACKUP OPTIMIZATION OFF; # default
RMAN-01005: CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
RMAN-01005: CONFIGURE CONTROLFILE AUTOBACKUP ON; # default
RMAN-01005: CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
RMAN-01005: CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
RMAN-01005: CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
RMAN-01005: CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
RMAN-01005: CONFIGURE MAXSETSIZE TO UNLIMITED; # default
RMAN-01005: CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
RMAN-01005: CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
RMAN-01005: CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
RMAN-01005: CONFIGURE RMAN OUTPUT TO KEEP FOR 7 DAYS; # default
RMAN-01005: CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
RMAN-01005: CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/12.2.0/dbhome_1/dbs/snapcf_CDB2B.f'; # default

I have one full backup and then some archivelog backups:

RMAN> list backup summary;
RMAN-06345: List of Backups
RMAN-06346: ===============
RMAN-06347: Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
RMAN-06348: ------- -- -- - ----------- --------------- ------- ------- ---------- ---
RMAN-06349: 9 B F A DISK 22-OCT 17:05:16 1 1 NO TAG20171022T170250
RMAN-06349: 10 B F A DISK 22-OCT 17:06:23 1 1 NO TAG20171022T170250
RMAN-06349: 11 B F A DISK 22-OCT 17:07:18 1 1 NO TAG20171022T170250
RMAN-06349: 12 B F A DISK 22-OCT 17:07:35 1 1 NO TAG20171022T170732
RMAN-06349: 13 B A A DISK 22-OCT 17:15:09 1 1 NO TAG20171022T171502
RMAN-06349: 14 B F A DISK 22-OCT 17:15:19 1 1 NO TAG20171022T171518
RMAN-06349: 15 B A A DISK 22-OCT 17:27:28 1 1 NO TAG20171022T172721
RMAN-06349: 16 B F A DISK 22-OCT 17:27:39 1 1 NO TAG20171022T172737

I have no obsolete backups:

RMAN> report obsolete;
RMAN-06524: RMAN retention policy will be applied to the command
RMAN-06511: RMAN retention policy is set to redundancy 1
RMAN-06147: no obsolete backups found

APPLIED ON ALL STANDBY

I have 1.8% of the recovery area that has been applied:

RMAN> configure archivelog deletion policy to applied on all standby;
RMAN-06601: new RMAN configuration parameters:
RMAN-01005: CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;
RMAN-06604: new RMAN configuration parameters are successfully stored
 
 
RMAN> select percent_space_used,percent_space_reclaimable,number_of_files from v$recovery_area_usage where file_type='ARCHIVED LOG';
RMAN-01005: PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
RMAN-01005: ------------------ ------------------------- ---------------
RMAN-01005: 1.3 1.08 3

BACKED UP 1 TIMES TO DISK

I have 1.3% of the recovery area that has been backed up:

RMAN> configure archivelog deletion policy to backed up 1 times to disk;
RMAN-06600: old RMAN configuration parameters:
RMAN-01005: CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;
RMAN-06601: new RMAN configuration parameters:
RMAN-01005: CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO DISK;
RMAN-06604: new RMAN configuration parameters are successfully stored
 
 
RMAN> select percent_space_used,percent_space_reclaimable,number_of_files from v$recovery_area_usage where file_type='ARCHIVED LOG';
RMAN-01005: PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
RMAN-01005: ------------------ ------------------------- ---------------
RMAN-01005: 1.3 1.3 3

This looks good. I have some archived logs that have been applied but not backed up yet.

Both in the ‘bad’ order

But now I want to combine both:

RMAN> configure archivelog deletion policy to backed up 1 times to disk applied on all standby;
RMAN-06600: old RMAN configuration parameters:
RMAN-01005: CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO DISK;
RMAN-06601: new RMAN configuration parameters:
RMAN-01005: CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY;
RMAN-06604: new RMAN configuration parameters are successfully stored
 
 
RMAN> select percent_space_used,percent_space_reclaimable,number_of_files from v$recovery_area_usage where file_type='ARCHIVED LOG';
RMAN-01005: PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
RMAN-01005: ------------------ ------------------------- ---------------
RMAN-01005: 1.3 0 3

Nothing is recoverable here, wich is in my opinion a bug.

Both in the ‘right’ order

Trying the same but with different order:

RMAN> configure archivelog deletion policy to applied on all standby backed up 1 times to disk;
RMAN-06600: old RMAN configuration parameters:
RMAN-01005: CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY;
RMAN-06601: new RMAN configuration parameters:
RMAN-01005: CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK;
RMAN-06604: new RMAN configuration parameters are successfully stored
 
 
RMAN> select percent_space_used,percent_space_reclaimable,number_of_files from v$recovery_area_usage where file_type='ARCHIVED LOG';
RMAN-01005: PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
RMAN-01005: ------------------ ------------------------- ---------------
RMAN-01005: 1.3 1.08 3

This look good. 1.08% have been applied and backed up.

I’ve verified the same behavior when backups are done on primary, or on standby, and same in 11.2.0.4, 12.1 and 12.2 with latest RU.
When the deletion policy is starting with the backup clause before the standby clause, the files are not marked as reclaimable. However, they are deleted with a ‘delete archivelog’ statement without the ‘force’ option.
The behavior is the same with ‘shipped to’ instead of ‘applied on’.

So what?

Be careful with the syntax: the ‘applied’ or ‘shipped’ clause must be written before the ‘backup’ one.
Check that archived logs are reclaimable. The query I use for that is in: https://blog.dbi-services.com/archivelog-deletion-policy-for-standby-database-in-oracle-data-guard/. When more archived log detail is required, I’ve also this query: https://blog.dbi-services.com/drilling-down-vrecoveryareausage/
Always have a look at the recovery area usage after you change the deletion policy, the site where you backup, or after a switchover.

 

Cet article Archivelog deletion policy on Data Guard configuration est apparu en premier sur Blog dbi services.

Oracle 12.1 big table caching IO code path

Recently I was triggered about the ‘automatic big table caching’ feature introduced in Oracle version 12.1.0.2 with Roger Macnicol’s blogpost about Oracle database IO and caching or not caching (https://blogs.oracle.com/smartscan-deep-dive/when-bloggers-get-it-wrong-part-1 https://blogs.oracle.com/smartscan-deep-dive/when-bloggers-get-it-wrong-part-2). If you want to read something about the feature in general, search for the feature name, you’ll find several blogposts about it.

If you are not familiar with automatic big table caching, it’s a feature which allows you to dedicate a percentage of the buffer cache for scanning big tables. Big tables are tables that are not small :-), which technically means tables bigger (a higher number of blocks) than the “_small_table_threshold” parameter, which defaults to 2% of the buffer cache or 20 blocks, whichever is bigger. The choice to store blocks in the big table cache area is dependent on the segment “temperature”, which essentially is a kind of touch count. There is one other caveat of using this feature, which I currently find weird: once a database is operating in clustered mode (RAC), serial scans can not use the automatic big table caching anymore, the use of it then exclusively is for caching parallel query workers blocks scanned.

The purpose of this blogpost is to investigate the IO implementation of using the big table caching feature. The available general read code paths that I am aware of in Oracle are:
– Single block reads, visible via the wait event ‘db file sequential read’, which always go to the buffer cache.
– Multiple single block reads, visible via the wait ‘db file parallel read’, which always go to the buffer cache.
– Multi block reads, visible via the wait event ‘db file scattered read’, which always go to the buffer cache.
– Multi block reads, visible via the wait event ‘direct path read’, which go to the process’ PGA.
– Multi block reads, visible via the wait event ‘direct path read’, which go to the in-memory area.

For full scans of a segment, essentially the choices are to go buffered, or go direct. The question that I asked myself is: what code path is used when the automatic big table caching feature is used?

The reason for asking myself this is because the difference between a cached multi block read and a direct path read is significant. The significance of this decision is whether multiple IO requests can be submitted for multiple extents leveraging asynchronous IO, which is possible in the direct path read code path, or only the IO requests can be submitted for a single extent, which is what is true for the db file scattered read code path. To summarise the essence: can I only execute a single IO request (scattered read) or multiple IO requests (direct path read).

The answer is very simple to find if you enable the big table caching feature, and then read a table that is considered big with an empty big table cache. A table that is read into the big table area, shows the ‘db file scattered read’ wait event. What is interesting when a table is scanned that does not fit in the big table area: when a big table is not read for reading into the big table area, the wait event switches to ‘direct path read’. Or actually I should say: the general rules are applied for choosing between buffered or direct path reads, potentially being a direct path read.

I want to emphasise again on the importance of the IO code path decision. If you have an Oracle database server that uses modern high performance hardware for its disk subsystem, not choosing the correct multi block read codepath can severely influence throughput.

For example, if your IO subsystem has an average IO time of 1ms, the maximal theoretical throughput for buffered IO is 1/0.001*1MB=1GBPS (this will be way lower in reality). Direct path reads start off with 2 IO requests, which obviously means that that’s double the amount of buffered IO, but this can, depending on an Oracle internal resource calculation, grow up to 32 IO requests at the same time.

Tagged: AIO, IO, IO subsystem, oracle, performance, wait events

VirtualBox 5.2 exports the VM to the Oracle Cloud

The new release of Oracle VM VirtualBox (aka VirtualBox) is there with a new functionality to export a VM to the Oracle Cloud Compute (aka Oracle Cloud Infrastructure). That can be interesting to prepare a VM on my laptop and move it to the Cloud to get it accessible from everywhere. Here’s my first try. In my opinion, it’s idea but probably need further evolution.

VirtualBox

Here is what is new: in addition to .ova you can export to an Oracle Public Cloud image:
CaptureVboxCloud000

This takes some time, as it compresses and writes all the disk images

CaptureVboxCloud002

The result is a .tar.gz for each disk attached to my VM. It is actually the image of the disk (.img) that is tar-ed and then gzipped. My VM (called VM101) had two disks (VM101-disk1.vdi and VM101-disk2.vdi). The export generated: VM101.tar.gz (containing VM101-disk002.img which looks like my first disk) and VM101-disk003.tar.gz (VM101-disk003.img which looks like my second disk)

Here is the content:


$ tar -ztvf VM101.tar.gz
-rw-r----- vboxopc10/vbox_v5.2.VBOX_VERSION_PATCHr11 4294967296 2017-10-19 21:23 VM101-disk002.img
 
$ tar -ztvf VM101-disk003.tar.gz
-rw-r----- vboxopc10/vbox_v5.2.VBOX_VERSION_PATCHr11 27917287424 2017-10-19 21:25 VM101-disk003.img

The .img is the image of the disk, with the partition and boot sector.

Compute Cloud

In the Oracle Public Cloud I can import this image: Compute Classic -> Images -> Upload Image

CaptureVboxCloud003

I upload only the image of the first disk, which contains the root filesystem:

CaptureVboxCloud004

CaptureVboxCloud005

And then I create the compute instance with the ‘Associate Image’ button:

CaptureVboxCloud006

Now, I’m ready to create an instance for it: Instance -> Customize -> Private Images

CaptureVboxCloud010

Then, I can define the shape (OCPU and memory), upload my SSH public key, and add storage (I could add my second disk here) and create the instance.

Here I’ve started it:

CaptureVboxCloud008

Unfortunately, my VM still has the network interface defined for my VirtualBox environment and then I have no way to connect to it. I hope that this feature will evolve to also export virtual network interfaces.

I have not seen any way to open a terminal on console. The only thing I can do is take snapshots of it:

CaptureVboxCloud009

Ok, so there’s a problem way before the network interfaces. My VM from Oracle VM VirtualBox (aka VirtualBox) now starts on Oracle VM (aka OVM) and besides the similar marketing name, they are different hypervisors (OVM running XEN). Probably a driver is missing to access block devices and maybe this Bug 21244825.

That’s probably all my tests on this until the next version. It is currently not easy to have a VM that can be started on different hypervisors and network environment.

So what?

Nothing very special here. Moving a VM from one hypervisor to the other is not an easy thing, but it is a good idea. And I hope that the integration into Oracle Cloud will be easier in the future with virtual disk and network interfaces. For the Oracle Cloud, it will be nice to have access to the console, but at least a screenshot may help to troubleshoot.

 

Cet article VirtualBox 5.2 exports the VM to the Oracle Cloud est apparu en premier sur Blog dbi services.

nVision Performance Tuning: 2. Effects of Performance Options

This blog post is part of a series that discusses how to get optimal performance from PeopleSoft nVision reporting as used in General Ledger.

In this post, I examine the effect of different settings for the tree performance options on how the SQL is generated.  It is common, to see different performance options in use on different trees in the same SQL query.  It is important to be able to look at a piece of SQL generated by nVision and to be able to work out which performance options have been set on which trees.

  • Access Method: join to tree selector –v- use literal values
  • Tree Selectors: statics –v- dynamic
  • Selector Options: single values –v- inequalities/between

Access Method

This option allows you control how the tree selector table is combined with the ledger table.  The choices of interest are whether to join the tree selector, or whether to use literal values where feasible.

Join

Here, two tree selectors are joined to the ledger table

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT L4.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT)
FROM PS_LEDGER A, PSTREESELECT10 L4, … PSTREESELECT10 L
WHERE A.LEDGER='ACTUALS'
AND A.FISCAL_YEAR=2017

AND L.SELECTOR_NUM=73130
AND A.CHARTFIELD1=L.RANGE_FROM_10
AND A.CURRENCY_CD='GBP'
GROUP BY L4.TREE_NODE_NUM

It is not feasible to use literal values for L4 because the data is grouped by L4.TREE_NODE, however, it is feasible to use literal values for L because no column from this table is included in either the SELECT or GROUP BY clauses.

Literal Values

When 'use literal values' is selected nVision replaces the join to the tree selector table with a set of literal values generated from the contents of the tree selector table.
In this example, the join to the CHARTFIELD1 tree selector has been replaced with a series of literal value predicates.  There is one for every selected tree leaf.  A simple equality condition for single value leaves, and a BETWEEN condition for ranged leaves.

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT L2.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT)
FROM PS_LEDGER A, PSTREESELECT10 L2
WHERE A.LEDGER='ACTUALS'
AND A.FISCAL_YEAR=2017
AND A.ACCOUNTING_PERIOD=1

AND L2.SELECTOR_NUM=101142
AND A.ACCOUNT=L2.RANGE_FROM_10
AND (A.CHARTFIELD1='0070700'
OR A.CHARTFIELD1='0078999'

OR A.CHARTFIELD1 BETWEEN '0091100' AND '0091101'
OR A.CHARTFIELD1='0150204')
AND A.CURRENCY_CD='GBP'
GROUP BY L2.TREE_NODE_NUM

Recommendations

In general, the literal values option is very beneficial.

  • It simplifies the queries generated by nVision by removing a table from the from clause and leaves the optimizer with fewer choices when determining the execution plan.
  • Data is filtered on the scan of the ledger table rather than when it is joined to the tree selector.  On an engineered system these predicates can be pushed down to the storage cells.

However, there are some disadvantages

  • As the number of leaves on a tree increase, so the number of literal predicates in the query increases, and the time taken to parse the SQL increases.  As a rough guideline, I would say that the parse time starts to become significant as the tree exceeds 2000 leaves, and you might be better using the Join Option.
  • Also, as the number of literal predicates increase the time taken to evaluate them increases.  This applies to both conventional and engineered systems.
  • Some PeopleSoft customers have found it generally beneficial to the system to raise the value of OPTIMIZER_DYNAMIC_SAMPLING from the default of 2 to 4 so that Oracle collects dynamic statistics if there are expressions in where clause predicates or multiple predicates on the same table (see Database SQL Tuning Guide -> Managing Optimizer Statistics).  However, this can significantly increase the parse overhead of nVision SQL using many predicates.  If necessary, this parameter can be reset at session level for nVision with a trigger.
  • The nVision client also has to do more work to generate the SQL.
  • Literal values increase the variability of SQL generated, reports using different tree nodes will generate SQL with different numbers of predicates and there is effectively no chance that any two statements would be similar enough to use any of Oracle's plan stability technologies (SQL baselines, SQL profiles, SQL plan directives).
  • In theory, cursor sharing could reduce the parse, but in my experience is that the optimizer does not correctly cost the predicates resulting in different execution plans with poorer performance.

Tree Selectors

This option allows you to choose whether a tree selector is managed in a static or dynamic manner.  As you will see, the Tree Selectors performance option is tied in with the Selector Options performance option.

Static

Where a tree uses a static selector, the entire tree is extracted into the tree selector.  There will be no tree node number criteria in the extract statement.

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT INTO PSTREESELECT10
(SELECTOR_NUM,TREE_NODE_NUM,RANGE_FROM_10,RANGE_TO_10)
SELECT 78722,L.TREE_NODE_NUM, SUBSTR(L.RANGE_FROM,1,10),SUBSTR(L.RANGE_TO,1,10)
FROM PSTREELEAF L
WHERE L.SETID='GLOBE'
AND L.SETCNTRLVALUE=' '
AND L.TREE_NAME='FUNCTION'
AND L.EFFDT=TO_DATE('1901-01-01','YYYY-MM-DD')

Note that the tree leaf was not joined to the underlying detail table.  Ranged leaves become ranged selectors.  This is because there is no mechanism to determine when the detail table has changed and thus when the static selector would be out of date and need to be refreshed.  This is why single value joins cannot be performed with static selectors.
Tree node number criteria will appear in the nVision queries to identify the required selectors.

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT L4.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT) 
FROM PS_LEDGER A, PSTREESELECT10 L4, PSTREESELECT10 L2, PSTREESELECT10 L
WHERE A.LEDGER='ACTUALS'
AND A.FISCAL_YEAR=2017
AND A.ACCOUNTING_PERIOD=1
AND L4.SELECTOR_NUM=89052
AND A.CHARTFIELD3>= L4.RANGE_FROM_10
AND A.CHARTFIELD3 <= L4.RANGE_TO_10
AND L4.TREE_NODE_NUM BETWEEN 1000000000 AND 2000000000

A control row is maintained on the table PSTREESELCTL.  This row indicates that a particular selector is static, and the version number is compared to the version number of PSTREEDEFN to determine whether the selector is valid, or whether the tree has changed.  If the tree has changed it will have a higher version number, and the selector will be extracted again.  The selector number from the query can be looked up on this table to identify the tree.

Dynamic

Dynamic selectors are built on-the-fly by nVision as the report runs.  They select just the rows from the tree that are required by the report.

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT INTO PSTREESELECT10
(SELECTOR_NUM,TREE_NODE_NUM,RANGE_FROM_10,RANGE_TO_10)
SELECT DISTINCT 108090,L.TREE_NODE_NUM,D.ACCOUNT,D.ACCOUNT FROM
PS_GL_ACCOUNT_TBL D, PSTREELEAF L
WHERE L.SETID='GLOBE' AND L.SETCNTRLVALUE=' '
AND L.TREE_NAME='ACCOUNT'
AND L.EFFDT=TO_DATE('1901-01-01','YYYY-MM-DD')
AND L.RANGE_FROM<>L.RANGE_TO
AND D.ACCOUNT BETWEEN L.RANGE_FROM AND L.RANGE_TO
AND D.SETID='GLOBE' AND
(L.TREE_NODE_NUM BETWEEN 789473386 AND 793372019
OR L.TREE_NODE_NUM BETWEEN 810915873 AND 812865189

OR L.TREE_NODE_NUM BETWEEN 1089668204 AND 1095516154
OR L.TREE_NODE_NUM BETWEEN 1113060008 AND 1115009324)

All the tree node number predicates are found in the query that populates the selector table and not in the query that generates the report.

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT L4.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT) 
FROM PS_LEDGER A, PSTREESELECT10 L4, PSTREESELECT10 L2, PSTREESELECT10 L
WHERE A.LEDGER='S_USMGT'
AND A.FISCAL_YEAR=2017
AND A.ACCOUNTING_PERIOD=1

AND L.SELECTOR_NUM=96774
AND A.CHARTFIELD1=L.RANGE_FROM_10 AND
A.CURRENCY_CD='USD' GROUP BY L4.TREE_NODE_NUM

nVision will delete dynamic selectors after they have been used.  However, if the report crashes or times out, rows will be left in the selector table, and this debris can build up over time.  So there is a requirement to manage these tables.
Note that in this example when the selector was extracted, the tree leaf table was joined to the underlying table on which the tree was based (in this case GL_ACCOUNT_TBL).  This occurs when the single value join option is selected.  The tree selector contains a row for each value rather than for each tree leaf.  This will result in larger tree selectors where ranged leaves are used.
PeopleSoft do not provide any way to identify the tree used to create a dynamic selector.  From the SQL query, all we can see is the field to which it was joined.  The above example is a tree related to CHARTFIELD1.

Recommendations

Although static selectors are easier to manage, the decision to use dynamic selectors is driven by the need to use single value joins.

  • It is important to maintain up-to-date statistics on the selector tables.  As new selectors are extracted, a series of ever-increasing selector numbers are allocated from a sequence maintained on the table PSTREESELNUM.   If not maintained, the selector numbers will exceed the high value recorded in the column statistics for column SELECTOR_NUM, and the database will not correctly calculate the number of rows expected, resulting in inappropriate execution plans.
  • It is easier to manage optimizer statistics on static selectors because the PSTREESELCTL table records when they have been updated.  In fact, it is possible to put a trigger on that table to maintain statistics.  However, static selectors imply range predicates that bring other problems that I discuss in the next section.
  • Using compound triggers on the selector tables it is possible to maintain a log of selectors, maintain statistics on them and purge selectors left behind by nVision reports that have not cleaned up after themselves.  This is also used to identify the trees used in dynamic selectors.  I will describe this solution later.

Recommendation: Prior to Oracle 11g, I have recommended using static selectors and ranged joins, and by careful management of statistics on the tree selector and indexing of the ledger tables good results can be obtained.  The tree selectors are Cartesian joined together before using index lookups on the ledger tables.
However, this approach is generally not effective for queries that reference more than 3 tree selector, nor for very large trees.  The size of the Cartesian product is the product of the number of rows extracted from each tree selectors table.  For example, three selectors with 100 rows each will result in 1 million lookups on the ledger table.  As the size of the Cartesian product grows the number of index look-ups also grows.  There comes a point when it is better to remove the least selective tree from the Cartesian and hash joining it after the looking up ledger.  Striking the correct balance can be difficult, and leads to using different performance options in different reports.
From Oracle 11g, and especially on engineered systems, I now prefer dynamic selectors because they permit the use of single value joins that can use Bloom filters and Extended Statistics.  Although, where smaller trees are in use that do not significantly cut down the query result set, range joins may still be preferable, in which cases, so might static selectors.

Selector Options

This performance option controls how the tree selector is joined to the ledger table.  The choice is between single value equality joins or inequality joins.

Ranged Joins

There are two forms of ranged joins, either using the BETWEEN keyword or using a pair of inequalities.  There is no difference to Oracle because the optimizer rewrites BETWEEN as a pair of inequalities.
NB: If you specify single value joins on a static selector you still get an inequality ranged join because only the tree leaves are extracted to the selector.  .  nVision still does this even if all of the leaves in the tree are single values and not ranges.

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT L4.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT) 
FROM PS_LEDGER A, PSTREESELECT10 L4, PSTREESELECT10 L2, PSTREESELECT10 L
WHERE A.LEDGER='ACTUALS'
AND A.FISCAL_YEAR=2017
AND A.ACCOUNTING_PERIOD=1
AND L4.SELECTOR_NUM=89052
AND A.CHARTFIELD3>= L4.RANGE_FROM_10
AND A.CHARTFIELD3 <= L4.RANGE_TO_10

AND A.CURRENCY_CD='GBP'
GROUP BY L4.TREE_NODE_NUM

Single Value Joins

If single value joins are selected, nVision generates an equality join between the tree selector and the ledger table.  The tree selector must be dynamic, and nVision extracts all the intermediate values between the tree node ranges on the underlying detail table.

#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT L4.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT) 
FROM PS_LEDGER A, PSTREESELECT10 L4, PSTREESELECT10 L2, PSTREESELECT10 L
WHERE A.LEDGER='ACTUALS'
AND A.FISCAL_YEAR=2017
AND A.ACCOUNTING_PERIOD=1

AND L.SELECTOR_NUM=96774
AND A.CHARTFIELD1=L.RANGE_FROM_10
AND A.CURRENCY_CD='GBP'
GROUP BY L4.TREE_NODE_NUM

Recommendations

In general, I would recommend using single value joins and therefore also dynamic selectors.  The main advantage is that they enable two particular optimisations in Oracle.

  • Bloom filters only work with equality predicates, and therefore only with single value joins.  They enable a filter on one column of one table to be roughly mapped to another column of another table upon which there is an equality join.  This filters data earlier in a query, cutting down the size of the eventual join operation.  It is a rough match in that it may return some false positives to the subsequent hash operation, but it will not omit any results.  The hash join then does an exact match of the rows to produce the correct result set.  On an Engineered Oracle system this filter can also be pushed down to the storage cells. 
  • This is an example of the kind of execution plan that we should see when a Bloom filter is used.  A filter has been created on PSTREESELECT10 at line 3, and it is applied to PS_LEDGER at line 6.
#eeeeee; border: 0px solid rgb(0, 0, 0); font-family: "courier new"; font-size: 50%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 12876 (100)| | | |
| 1 | HASH GROUP BY | | 501 | 35571 | 12876 (42)| 00:00:01 | | |
| 2 | HASH JOIN | | 975K| 66M| 12822 (41)| 00:00:01 | | |
| 3 | JOIN FILTER CREATE | :BF0000 | 2577 | 59271 | 7 (0)| 00:00:01 | | |
| 4 | PARTITION RANGE SINGLE | | 2577 | 59271 | 7 (0)| 00:00:01 | 87968 | 87968 |
| 5 | INDEX FAST FULL SCAN | PS_PSTREESELECT10 | 2577 | 59271 | 7 (0)| 00:00:01 | 87968 | 87968 |
| 6 | JOIN FILTER USE | :BF0000 | 1715K| 78M| 12804 (41)| 00:00:01 | | |
| 7 | PARTITION RANGE ITERATOR | | 1715K| 78M| 12804 (41)| 00:00:01 | 27 | 34 |
| 8 | MAT_VIEW REWRITE ACCESS STORAGE FULL| PS_LEDGER | 1715K| 78M| 12804 (41)| 00:00:01 | 27 | 34 |
-------------------------------------------------------------------------------------------------------------------------------
  • Extended statistics, (i.e. on groups of columns) also only work with equality predicates.  Single value join queries can, therefore, benefit from them.

There are drawbacks to be overcome.

  • Single value joins require the use dynamic selectors.  That requires management of statistics on the dynamic selectors, and debris left in the tree selectors table from failed reports that have not cleaned up after themselves.
  • Single value joins can result in larger tree selector tables with many more rows than tree leaves.  For large trees or trees with many values in leaf ranges, resulting in more than about 2000 rows on the tree selector tables, it can be advantageous to change the Access Method performance option from using literal values to join because the parse overhead becomes significant.