Search

OakieTags

Who's online

There are currently 0 users and 31 guests online.

Recent comments

Oakies Blog Aggregator

FBI trouble

In our application we extensively use a function-based index on an important table. Couple of days ago I’ve seen an interesting issue associated with this FBI, view and a GROUP BY query. I have to say I don’t have an explanation what exactly it is and how I should call it properly, hence just “trouble” in the subject line.

This is a synthetic setup to demonstrate the case:

drop table t cascade constraints purge
;
create table t (
    id    number not null,
    x     varchar2(100),
    y     number,
    pad   varchar2(50)
);
insert into t
with g as (select /*+ materialize */ null from all_source where rownum <= 1000)
select mod(rownum, 1000) id,
       lpad('x', rownum/1e5, 'x') x,
       1 y,
       lpad('x', 50, 'x') pad
  from g g1, g g2
 where rownum <= 1e6
;

create index t_indx on t(id, coalesce(upper(x), to_char(y)))
;
exec dbms_stats.gather_table_stats(user, 't', method_opt=>'for all columns size 1', cascade=>true)
exec dbms_stats.gather_table_stats(user, 't', method_opt=>'for all hidden columns size 1', no_invalidate=>false)

create or replace view v1
as
select t.*, coalesce(upper(x), to_char(y)) func from t
;
create or replace view v2
as
select v1.id, nvl(coalesce(upper(x), to_char(y)), 1) y, func from v1
;

So there’s a table and a function-based index defined. There’re also two views selecting data from table: one that adds FBI column, the other tries to restrict selected data to that stored in the index. So far so good:

SQL> explain plan for select * from v2 where id = :1;

Explained.

SQL> select * from table(dbms_xplan.display(null,null,'+outline +projection'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 4058602070

---------------------------------------------------------------------------
| Id  | Operation        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |  1000 | 10000 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| T_INDX |  1000 | 10000 |     5   (0)| 00:00:01 |
---------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$5C160134" "T"@"SEL$3" "T_INDX")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      MERGE(@"SEL$3")
      OUTLINE(@"SEL$335DD26A")
      OUTLINE(@"SEL$1")
      MERGE(@"SEL$335DD26A")
      OUTLINE_LEAF(@"SEL$5C160134")
      ALL_ROWS
      DB_VERSION('11.2.0.3')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T"."ID"=TO_NUMBER(:1))

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "T"."ID"[NUMBER,22], COALESCE(UPPER("X"),TO_CHAR("Y"))[VARCHAR2,1
       00]

39 rows selected.

A query gets pure index access and does not touch the table at all, as wanted. Now kind of dumb query is fired against this view:

SQL> explain plan for
  2  select count(t2.y)
  3    from (select 1 id from dual) t1
  4        ,(select id, y from v2 where id = :1) t2
  5  where t1.id = t2.id(+)
  6  group by t1.id
  7  ;

Explained.

SQL> select * from table(dbms_xplan.display(null,null,'+outline +projection'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 2568130530

----------------------------------------------------------------------------------------
| Id  | Operation                     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        |     1 |    27 |     7   (0)| 00:00:01 |
|   1 |  SORT GROUP BY NOSORT         |        |     1 |    27 |     7   (0)| 00:00:01 |
|   2 |   NESTED LOOPS OUTER          |        |     1 |    27 |     7   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL          | DUAL   |     1 |     2 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS BY INDEX ROWID| T      |     1 |    25 |     5   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN          | T_INDX |     1 |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      USE_NL(@"SEL$70D5F4CE" "T"@"SEL$5")
      LEADING(@"SEL$70D5F4CE" "DUAL"@"SEL$2" "T"@"SEL$5")
      INDEX_RS_ASC(@"SEL$70D5F4CE" "T"@"SEL$5" "T_INDX")
      FULL(@"SEL$70D5F4CE" "DUAL"@"SEL$2")
      OUTLINE(@"SEL$5")
      OUTLINE(@"SEL$4")
      MERGE(@"SEL$5")
      OUTLINE(@"SEL$7286615E")
      OUTLINE(@"SEL$3")
      MERGE(@"SEL$7286615E")
      OUTLINE(@"SEL$C8360722")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      MERGE(@"SEL$C8360722")
      MERGE(@"SEL$2")
      OUTLINE_LEAF(@"SEL$70D5F4CE")
      ALL_ROWS
      DB_VERSION('11.2.0.3')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("T"."ID"(+)=TO_NUMBER(:1))
       filter("T"."ID"(+)=CASE  WHEN (ROWID IS NOT NULL) THEN 1 ELSE 1 END )

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) 1[2], COUNT(CASE  WHEN ROWID IS NOT NULL THEN
       NVL(COALESCE(UPPER("T"."X"),TO_CHAR("T"."Y")),'1') ELSE NULL END )[22]
   2 - (#keys=0) ROWID[ROWID,10], "T"."X"[VARCHAR2,100], "T"."Y"[NUMBER,22]
   3 - ROWID[ROWID,10]
   4 - ROWID[ROWID,10], "T"."X"[VARCHAR2,100], "T"."Y"[NUMBER,22]
   5 - ROWID[ROWID,10]

56 rows selected.

See how Oracle reacted. It’s now accessing the table, and it’s not wanted at all. Optimizer trace shows that the query undergoes several view merging and is transformed to

select count(case
                 when "T".ROWID is not null then
                  nvl(coalesce(upper("T"."X"), to_char("T"."Y")), 1)
                 else
                  null
             end) "COUNT(T2.Y)"
  from "SYS"."DUAL" "DUAL", TIM."T" "T"
 where case
           when "DUAL".ROWID is not null then
            1
           else
            1
       end = "T"."ID"(+)
   and "T"."ID"(+) = :b1
 group by 1

I assume this transformation resulted in additional CASE expression in the select list and this is what made Oracle to access table. Well, now I know why it happens and how to get rid of it, but I really curious to know how to call and control it. There’s really no information in 10053 trace about this feature, except that the predicate first appears under “Outer Join Elimination” transformation – but turning OJE off does not change anything. If anyone knows more on this topic – welcome to comments.

Update Dec 19th I’ve uploaded 10053 trace generated for a modified query (see comments): link to download. Please note that the file is plain *.txt with .doc extension.

Filed under: CBO, Oracle Tagged: CBQT, FBI, indexes

Approximate NDV

This is just a temporary note (comments disabled) to make sure that any subscribers get a message that Amit Poddar has given me permission to publish his presentation on the 11g Approximate NDV. There are links to a white paper and the presentation slides on my posting about Hotsos 2009 where I saw him give the presentation.

Note – do not link back to this posting, I intend to delete it in a few days.

The Thing…

The first film I ever saw on VHS was The Thing (originally released 1982). I’m not sure exactly how long after the cinema release it was, but I remember I was still in school, so I guess it was about 1983-84 and I was about 14-15 years old. A kid at school had just got a video recorder and I went over to his house to see it. It was one of those giant top-loader things. I was totally in awe of it. I thought the film was pretty neat too. :)

So 29 years after the first film was made they decided to make a prequel called The Thing (2011). If you’ve seen the original, then you’ll remember the scene at the start where the Norwegian helicopter is chasing the dog and a guy is taking pot-shots at it. The film is based around the events in the Norwegian camp prior to the first scene in the original film.

So how was it? Pretty darn good in my opinion. It’s hard to watch the original without feeling how dated it is. The new film is a fairly similar plot, in fact it might as well be a remake, but the visuals are brought right up to date. It’s a bit of a gore-fest at times and it makes you jump big-time on several occasions. If you liked the original, this is a worthy remake/prequel.

Cheers

Tim…




"Workarounds" for ORA-04091

In the previous post, we demonstrated running into the mutating table error. The example involved the well-known EMP table and a business rule that we are trying to implement: all departments with a Manager should also employ a Clerk. We started our implementation for that rule by writing a straight-forward 'after-row-insert' trigger on the EMP table.

  • The row trigger detects whenever a Manager is inserted (remember, we can only detect this with a row trigger, not with a statement trigger);
  • It then calls a stored procedure to which it supplies the department number (deptno-column value) of the inserted Manager;
  • The procedure then queries the EMP table to see if a Clerk exists in this department;
    • If it finds no such Clerk, the procedure raises an application error, which causes the after-insert-row trigger to fail. This in turn prevents the insert of the Manager, since it would violate our business rule (the Manager insert will undergo a statement-level rollback due to the row-trigger failing);
    • If it finds such a Clerk, then the procedure finishes successfully, which causes the after-insert-row trigger to finish successfully, which causes the Manager insert to execute successfully.
A straight-forward 'use-case' of using a (row) trigger, right? Of course this setup didn't work, since the cursor in the stored procedure will always hit the mutating table error, thus preventing a Manager insert even if the necessary Clerk would indeed be present.
So, what's the workaround for this? Well there are a two I would like to discuss in this post: one of them is very 'popular', the other one is -- I guess -- not so well-known.
I cannot stress enough upfront right now, that both 'workarounds' are red herrings.
Workaround 1a: use the autonomous_transaction pragma inside the stored procedure.
If we add just one line at the top of the stored procedure, we can fix this baby (see red-arrow line below).
By ensuring that all code inside the P_CHECK_FOR_CLERK procedure will run as an autonomous transaction, we can prevent the mutating table error. Why? Because it causes Oracle to 'step out' of the current transaction context, in which EMP is currently 'mutating', and open a new transaction in which EMP is not mutating, and therefor can be queried without hitting the mutating table error. So all we need to do is add the PRAGMA line highlighted above. We do not need to add an explicit COMMIT (or ROLLBACK) at the end of the procedure, which is usually required to end an autonomous transaction, since the procedure doesn't really start a transaction at all: is just issues a query.
This is the popular workaround. Now for the lesser known workaround.
Workaround 1b: use a loopback db-link to query the mutating table.
By querying the EMP table to look for a Clerk, via a loopback database link, we can fix this baby too.
So we create a loopback database link (which we call loopback above) which connects back to our application owner schema. We then add the text '@loopback' to the table name (EMP) inside the FROM-clause of our query that was hitting the mutating table (second red arrow above). Done. We'll demonstrate in a minute that this 'fixes' our issue as well.
But before we do that you really need to understand a fundamental difference between these two workarounds.
In the preliminaries that we discussed in the previous post, one of the observations that was made about an autonomous transaction was that all code in such transaction cannot see any (uncommitted) changes made by the other (parent) transaction. So what this means for our example is that the cursor in the stored procedure (the one that tries to find a Clerk) which is executed by an 'after-row' trigger, will see the EMP table without the Manager that was inserted by our triggering insert statement.
Yet.
The cursor that uses the loopback database link, which is executed from within a new session created by the database link, *will* see the mutating table as it exists at that point-in-time in the transaction of the 'original' session (the one that issued the Manager insert). Why? Because Oracle detects that both sessions are now part of a distributed transaction, and the database link session will see the same database snapshot as the other session. Now this is pretty radical, I think. When I discovered this behavior I was stunned for a few moments: here we have a database session (OK, it's one that was created by a database link, but still) a session that is able to see uncommitted changes of a different session! I will show you a demo that proves this behavior in a minute.
Now wouldn't that be an in-depth (trick) question for some fancy certification exam:
"When can a database session see uncommitted changes of another session?"
a) Never.
b) Always, this is default behavior.
c) If it's using an autonomous transaction.
d) If the session was opened by a database link.
;-)
Now let's test both workarounds. Remember we are trying to insert a manager into department 42, in which no other employee works yet.

Et voilá: no more mutating table error. Our row-trigger + procedure work, they give us the business rule error.

So what's wrong with both approaches? Can you guess it? I'll give you a hint: think about multi-row inserts. Will they work given workaround 1a (the autonomous transaction) or workaround 1b (the db-link trick)?

I'll discuss this in the sequel, which should follow promptly.

Up the Villa…

I had a bit of good news the other day. Nephew #2 (6 years old) is now playing for Aston Villa Academy. He’s quite small for his age and he trialed with a group of boys a year older than him, so my brother said he looked tiny and the kit they gave him was gigantic on him. He’s used to playing with Nephew #1, who’s just turned 10, so playing with bigger kids doesn’t phase him and he’s a bit more cocky than Nephew #1, which they seem to like at these places. Anyway, at the end of the session they said they want him. They are not allowed to sign exclusively until they are 8 years old, so he can keep playing for Wolverhampton Wanderers Academy and Telford.

So now I have to support:

  • West Brom and Telford because Nephew #1 plays for them.
  • Villa, Wolves and Telford because Nephew #2 plays for them.
  • Man United because Nephew #1 and #2 (and their parents) support them.

I don’t even like football really, but if they can keep improving for the next 10 years and be in the freakishly small percentage of people that actually make it to become professional footballers, I’ll be able to live in the manner I would like to become accustomed to. Muhahaha (little finger in the corner of the mouth)… :)

Cheers

Tim…




I wish

Here are a few thoughts on dbms_stats – in particular the procedure gather_index_stats.

The procedure counts the number of used leaf blocks and the number of distinct keys using a count distinct operation, which means you get an expensive aggregation operation when you gather stats on a large index. It would be nice efficiency feature if Oracle changed the code to use the new Approximate NDV mechanism for these counts.

Then there are a couple of additions I’d like to see to the actual stats gathered: the average space used by a key (including the rowid), calculated as sum(space used) / count(index entries), so that we could do a quick comparison of the index size compared to the space allocation without resorting to complex estimates. The code would, of course, have to handle the prefix and the tail for compressed indexes, so possibly it would be better to record the two figures for space used, and include the figure for number of prefix entries. Note, the number of prefix entries is not the same as the number of distinct values for prefixes, as the same combination of prefix values may appear in several consecutive leaf blocks – and this brings me to another feature enhancement, to match SQL Server.

For a multi-column index, how about a new structure to hold the number of distinct values for each of the possible prefixes for that index. This could piggy-back on the existing extended statistics feature, and wouldn’t be too expensive to operate if it used the approximate NDV mechanism; it could, of course,  be a bit of a threat since this approach would be using some of the 1,000 column limit that you’re allowed for a single table; there are also a few messy details to handle relating to multiple indexes starting with the same combination of columns in different orders – and the side effects of dropping one index from a set with such an overlap.

Footnote 1: When gathering stats for a Unique index, the code copies num_rows to distinct_keys, so that’s one aggregation optimisation already in place.

Footnote 2: There may be some empty blocks which don’t get counted – see my book Cost Based  Oracle – Fundamentals for the impact this can have on costing index fast full scans … I need to check if the optimizer still uses the number of used leaf blocks to cost the fast full scan, or whether it has switched to using the high water mark of the segment.

Getting Started with Dtrace

Structure of a dtrace script

#!/usr/sbin/dtrace -s

something_to_trace
/ filters /
{ actions }

Something_else_to_trace
/ filters_optional /
{ take some actions }

The script has sections that fire if the specified probe fires in the OS. For example, if  do a  send over TCP then my “something_to_trace” could be a probe (an event) called “tcp:::send” . I could further filter by receiving machine’s IP address. Then when a packet is sent over TCP and the receiver is the IP in the filter I can take some actions like state the size of the packet.

What can I trace?  What are the possible “something_to_trace”, ie the probes?
To get a list run of OS probes, run

  dtrace -l
    -l = list instead of enable probes
The list will be long. It’s a good to have some idea what one is searching for like “io”, “tcp”, “zfs” etc and grep for areas of interest.
The output format is 5 columns , first the probe id  and followed by the name in 4 parts
 id  provider  module  function  name
One can get the list of probes just for a subset using “-ln” flag and include part of the probe  either provider, module, function or name.  The provider, module,function and name are concatenated together with colons (:).  I commonly only use provider and name, for example
  sudo dtrace -ln tcp:::
   ID   PROVIDER            MODULE                          FUNCTION NAME
 7301        tcp                ip                    tcp_input_data receive
 7302        tcp                ip                tcp_input_listener receive
 7303        tcp                ip          tcp_xmit_listeners_reset receive
 7304        tcp                ip                   tcp_fuse_output receive
    -n = Specify probe name to trace or  list
Now if I have a specific probe I want to trace, I can get more information about the data available for that probe using the verbose flag as in:
    dtrace -lvn tcp:ip:tcp_input_data:receive
       ID   PROVIDER            MODULE                          FUNCTION NAME
     7301        tcp                ip                    tcp_input_data receive

        Argument Types
                args[0]: pktinfo_t *
                args[1]: csinfo_t *
                args[2]: ipinfo_t *
                args[3]: tcpsinfo_t *
                args[4]: tcpinfo_t *
Now I see the arguments for this probe and I have access to these arguments when this probe fires, but what do these arguments contain? Here a crucial trick. You can look up these arguments on:
        http://src.illumos.org

(also, can look  in /usr/lib/dtrace  or previously http://cvs.opensolaris.org/source/)

For example, I see that  args[3] is “tcpsinfo_t. ” What is “tcpsinfo_t?”   I can type “tcpsinfo_t” into the Symbol field as in

http://src.illumos.org

I get a list of appearances of this structure. I click on the first one as it looks like it’s probably the structure definition and get:

So now I can see the contents of the structure and variable  types of each field in the structure. From the above I see the receivers address is “tcps_raddr” in arg[3] so I can now filter for the receivers address.

A good example of trying to understand dtrace probes and arguments is found here: http://blogs.oracle.com/observatory/entry/d_script_archeology

And now I’m ready to put together a script accessing the contents of these structures

#!/usr/sbin/dtrace -s
#pragma D option quiet
tcp:::send
/ args[3]->tcps_raddr == "192.168.1.1" /
{ printf("send     %6d \n",args[2]->ip_plength);
}
tcp:::receive
/ args[3]->tcps_raddr == "192.168.1.1" /
{ printf("receive %6d  \n", args[2]->ip_plength );
}

(added the line “#pragma D option quiet” to suppress the default output of the names of probes that fired and limit the output to only the print statements)

My main point of this blog was to show how to get a list of OS probes to trace and for those probes how to find out the contents of  the data structures available for that probe. The access to these structures depends on “translators.” See

http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Features

The translators are defined in /usr/lib/dtrace.  So instead of looking up structures in the Illumos source one can look for them in the translator files in /usr/lib/dtrace. The translators provide a stable way to access the OS structures. The OS structures can be accessed directly without the translators but it’s more complicated and not stable across versions of the OS not to mention types of OS.

For Oracle,  there are no translators though you could write them.

For Oracle, to get an idea what you can trace, get the PID of a running Oracle process, say in my case 11602

    dtrace -ln pid11602:oracle:kcbr*:

The list will be quite large. It more or less requires knowing what you are looking for. Above I give “kcbr*”  for Kernel Cache Buffer Recent. Checkout http://www.orafaq.com/wiki/X$_Table  for an idea of the layers that make up the start of some of the function names. ( it would be a lot more fun trace Oracle with DTrace if one had access to the source!)

The best references for  dtrace are first the dtrace book  and dtrace community then Brendan Gregg’s blog at:

If you are specifically interested in TCP then Alan Maguire’s blog is good

And for some specific examples with Oracle check out:

A few other notes:

Useful variables that can be filtered on or printed out

  • pid – process id
  • execname – executable name
  • timestamp – timestamp in nano-seconds
  • tid – thread id
  • cwd – current working directory
  • probeprov, probemod, probefunc, probename – probe’s provider, module, func and name

Dtrace variables (from Brendan’s blog see: http://dtrace.org/blogs/brendan/2011/11/25/dtrace-variable-types/ )

#444444; font-family: Arial,Verdana,sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 20px; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; background-color: #ffffff;">

DTrace variable types, in order of preference:

type prefix scope overhead multi-CPU safe example assignment
aggregation @ global low yes @x = count();
clause local this-> clause instance[1] very low yes this->x = 1;
thread local self-> thread medium yes self->x = 1;
scalar none global low-medium no[2] x = 1;
associative array none global medium-high no[2] x[y] = 1;

 

The above list is a nice cheat sheet.
I started off using scalar global variables but the right way is to use aggregations, clause local and thread local variables
Definitely read about aggregates when getting started with Dtrace as aggregates are key to getting useful data out of dtrace

http://wikis.sun.com/display/DTrace/Aggregations

For formatting output data, wrap scripts with something like Perl and post process the data. Here is an example of wrapping (piping) dtrace data to perl and having perl format the output (Perl code is set up to aggregate similar types of data coming from I/O, ZFS an NFS, but  below is just the code for monitoring NFS, )

/usr/sbin/dtrace -n '
#pragma D option quiet
nfsv3:::op-read-start, nfsv3:::op-write-start ,nfsv4:::op-read-start {
        tm[args[1]->noi_xid] = timestamp;
        sz[args[1]->noi_xid] = args[2]->count    ;
}
nfsv4:::op-write-start {
        tm[args[1]->noi_xid] = timestamp;
        sz[args[1]->noi_xid] = args[2]->data_len ;
}
nfsv3:::op-read-done, nfsv3:::op-write-done, nfsv4:::op-read-done, nfsv4:::op-write-done
/tm[args[1]->noi_xid]/
{       this->delta= (timestamp - tm[args[1]->noi_xid]);
        this->type =  probename == "op-write-done" ? "W" : "R";
        @nfs_tm[this->type]=sum(this->delta);
        @nfs_mx[this->type]=max( (this->type == "R" ? this->delta : 0));
        @nfs_mx[this->type]=max( (this->type == "W" ? this->delta : 0));
        @nfs_ct[this->type]=count();
        @nfs_sz[this->type]=sum(sz[args[1]->noi_xid]);
        tm[args[1]->noi_xid] = 0;
        sz[args[1]->noi_xid] = 0;
}
profile:::tick-1sec
{      printa("nfs_tm ,%s,%@d\n",@nfs_tm);
       printa("nfs_mx ,%s,%@d\n",@nfs_mx);
       printa("nfs_ct ,%s,%@d\n",@nfs_ct);
       printa("nfs_sz ,%s,%@d\n",@nfs_sz);
       clear(@nfs_mx);
       clear(@nfs_tm);
       clear(@nfs_ct);
       clear(@nfs_sz);
       printf("!\n");
}
' | perl -e '
while (my $line = ) {
       $line=~ s/\s+//g;
       if ( $line eq "!"  ) {
          printf("--|------| %10s %10s %10s %10s\n", "avg_ms","MB/s","mx_ms","count");
          foreach $type ('R','W') {
            foreach $class ('nfs') {
             $ct=${$class . "_ct"}{$type}||0;
             $sz=${$class . "_sz"}{$type}||0;
             $mx=${$class . "_mx"}{$type}||0;
             $tm=${$class . "_tm"}{$type}||0;
               if ( $ct > 0 ) {
                   $ms=(($tm/1000000)/$ct);
                   printf("$type | $class  : %10.2f",$ms);
               } else {
                   printf("$type | $class  : %10.2f",0) ;
               }
               printf(" %10.2f",$sz/(1024*1024));
               $mx_ms=$mx/1000000;
               printf(" %10.2f",$mx_ms);
               printf(" %10d",$ct);
               print "\n";
            }
          }
       } else {
          ($area, $r_w, $value)=split(",",$line);
        ${$area}{$r_w}=$value;
       }
}'

 

PS I tried using AWK for formating but the data seems to get buffered so there isn’t immediate output. I tried using “fflush” and print /dev/stdout but couldn’t get it to consistently work

dtrace [code] | awk  '
{
       # hack to get AWK to flush out immediately
       printf("%s\n",$0) > "/dev/stdout"
       close("/dev/stdout")
       #fflush(stdout)
       #system("")
}'

 

PS one thing to keep in mind is overhead. The overhead of Dtrace is small but it's not zero. See

http://dtrace.org/blogs/brendan/2011/02/18/dtrace-pid-provider-overhead/

For more info. From the above link from Brendan Gregg, here is the key part:

I'd suggest considering pid provider overhead as follows:

Don't worry too much about pid provider probe cost at < 1000 events/sec.
At > 10,000 events/sec, pid provider probe cost will be noticeable.
At > 100,000 events/sec, pid provider probe cost may be painful.

Let's discuss these using the 6 us per probe result from the previous tests. This could be faster (~2 us) or slower (> 15 us) depending on architecture and the probe action.

  • at 1000 events/sec, that could cost 6 milliseconds of CPU time. If the application was 100% on-CPU, it may slow by 0.6% - ie, don't worry.
  • At 10,000 events/sec, that could cost 60 milliseconds of CPU time. For a 100% on-CPU application, then the slow down may be 6%. If it was idle some of the time (systems often are), then the overhead may be less, as there are spare cycles to run DTrace (really depends on the location of the probe in the code-path, though).
  • At 100,000 events/sec, the CPU cost could be 600 milliseconds. This can significantly slow down an application. Whether that's a problem depends on the app and its function - how sensitive it is to latency.

 

DTrace is quite terse and limited in actions and functions. As I said above, for formatting data, wrap DTrace in perl and do calculations and formating in perl. Why? A few examples

  • DTrace doesn't have floating point operations
  • DTrace aggregate arrays don't have a method allowing you to query the keys and step through them, thus there is no way to do calculations on values in one aggregate array with values in another.

Just save yourself time and head ache and do calculations and formatting in perl.

DTrace is limited in functions. Many of the functions are hard to track down and/or have little documentation. Here is a list of functions:

http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Internals

http://developers.sun.com/solaris/articles/dtrace_quickref/dtrace_functions.html

 

 

 

Provision Oracle RDBMS software via RPM

I have always asked myself why Oracle doesn’t package their software as an RPM-surely such a large organisation has the resources to do so!

Well the short answer is they don’t give you an RPM, except for the XE version of the database which prompted me to do it myself. The big problem anyone faces with RPM is that the format doesn’t seem to support files larger than 2GB. Everybody knows that the Oracle database installation is > 2G which requires a little trick on our side. And the trick is not even obscure in any way as I remembered: some time ago I read an interesting article written by Frits Hoogland about cloning Oracle homes. It’s still very relevant and can be found here:

http://fritshoogland.wordpress.com/2010/07/03/cloning-your-oracle-database-software-installation/

Now that gave me the idea:

  1. You install the oracle binaries on a reference host
  2. Apply any patches and PSUs you need
  3. Wrap the oracle home up in a tar-ball just the way Frits describes by descending into $ORACLE_HOME and creating a tar archive of all files, excluding those ending in “*.log”, network config files in $ORACLE_HOME/network/admin and anything in $ORACLE_HOME/dbs. We don’t want to proliferate our database initialisation files …
  4. You make that tarball available on a central repository and export that with CIFS/NFS or whatever other mechanism you like
  5. Mount this exported file system in /media, so that /media/db11.2.0.3/ has the database.tar.gz file available
  6. Install the RPM

Simple! Piet de Visser would be proud.As it turned out creating (or building) the RPM is the hard part! I have spent a fair amount of time to understand it, and realised that it’s amazing! However IMO RPM is primarly aimed at packaging/building software from the source, something the oracle installer doesn’t really do in the same way.

To get you started, here’s a sample spec file I used, named Oracle11203.spec.

%define name            Oracle11203EE
# package release
%define release         1.0
# vendor release
%define version         11.2.0.3

Buildroot:              %{_topdir}/BUILDROOT/%{name}-%{release}-rootdir
Summary:                Oracle Database 11.2.0.3 EE
License:                Commercial
Name:                   %{name}
Version:                %{version}
Release:                %{release}
Group:                  Applications/Databases
Vendor:                 Martin Bach Consulting
URL:                    http://martincarstenbach.wordpress.com
Requires:               oracle-validated

%description
Oracle 11.2.0.3 Enterprise Edition installation. Requires the binaries
in tar-compressed format in /media/db11.2.0.3/database.tar.gz ready for
cloning. A good source for how to package your system up for cloning can
be found here:

http://fritshoogland.wordpress.com/2010/07/03/cloning-your-oracle-databa...

The oracle-validated is used on Oracle Linux 5 to create a sensible installation
environment. Ensure you review the environment before you build oracle!

%post
if [ -d "/u01/app/oracle/product/11.2.0.3/" ];
then echo "ORACLE_HOME directory exists-aborting";
exit 127
fi

echo "cloning the oracle home now to /u01/app/oracle/product/11.2.0.3/"
mkdir -p /u01/app/oracle/product/11.2.0.3
tar --gzip -xvf /media/db11.2.0.3/database.tar.gz -C /u01/app/oracle/product/11.2.0.3
cd /u01/app/oracle/product/11.2.0.3/clone/bin
./clone.pl ORACLE_HOME="/u01/app/oracle/product/11.2.0.3" ORACLE_BASE="/u01/app/oracle" -defaultHomeName

%files
%defattr(-,root,root,-)

How RPM normally works

When building RPMs, a certain directory structure is assumed, such as a top directory (/home/martin/rpm) and certain subdirectories:

  • BUILD
  • BUILDROOT
  • RPMS
  • SOURCES
  • SPECS
  • SRPMS
  • tmp

The RPM build process is controlled via a “SPEC” file. Documentation of these isn’t too great, especially since the format and process has changed over time (see references). The SPEC file follows the process you would normally follow when compiling a piece of software from source. The SPEC file puts more structure around it.

Your source code goes into SOURCES (still tar-compressed). RPM can be instructed to prepare (=uncompress and patch) the tarball, which by default goes into BUILD. When running the configure command, you usually set the prefix to BUILDROOT. After the build completed (using make), you install the package in BUILDROOT. RPM then catalogs the files inside BUILDROOT and uses the defattr() clause in the SPEC file to assign permissions. This is a big relief-in earlier versions of RPM each file that went into the RPM had to be listed under the %files section.

How this process works

The above SPEC skips almost all of these steps. In our example it’s not necessary to download any sources, all the logic is contained in the SPEC file %POST section.In other words, all you need to do to build the RPM is to copy and paste the above code into the Oracle11203EE.spec file. You build the actual RPM using “rpmbuild -ba Oracle11203EE.spec”. At the end, you find a RPM file in the RPMS/x86-64 directory which can from then on be used to install Oracle on any server. And yes, the RPM is empty except for some header information and a few lines of code.

Note that the SPEC file isn’t really complete: add mount commands and error checking as you see fit. You might also want to replace literals with variables etc-the idea was to give the principle away.

Reference

Mike Carey: The Naming of the Beasts…

“The Naming of the Beasts” is book 5 in the Felix Castor series by Mike Carey. Juliet, the succubus, has gone all wild, beaten up her human wife and is on the verge of feasting on aroused men’s souls again. Felix’s friend Rafi, still possessed by the demon Asmodeus, has gone AWOL and started a killing spree. Life’s never easy when you’re a freelance exorcist… :)

This is the last in the series so far and it maintains the pace of the other books, while tying up a lot of loose ends. According to Wikipedia the next book is out late 2001 (imminently). It will be interesting to see what Felix does next, since the main thrust of the story of the first 5 books is now concluded.

Cheers

Tim…




And so the evenings start drawing out (honest!)

I know I’ve blogged about this before, but it was early on when very few people read my ramblings, so I am mentioning it again…

For those of us in the Northern Hemisphere, today is the day when the evenings start drawing out, which personally I find a relief as going home in the dark depresses me. Sunset tomorrow will be later than today – by all of a few seconds but, heck, later is later. {If I am out by a day, please don’t tell me – don’t shatter my good mood!}

However, as many of you are probably thinking, shortest day in the Northern hemisphere is not until the 22nd December (it’s the 21st or 22nd, depending on how long ago the last leap year was). Mornings continue to get later until around the 3rd January. It is because the earth is not “standing totally upright” in it’s orbit. If you think of the plane in which the earth circles around the sun as a flat surface, the north pole is at the top of the planet and that there is a pole sticking though the earth that it spins around every day, that pole is leaning back away from the sun today and slightly to one side, like a staggering drunk.

For the timing of sunrise and sunset for the city nearest you, check out this nice website here. This link will show London but you can change that.

The original post is here. It does not say any more but there are a couple of pretty sunset pictures on it.

Of course, if you are in the Southern Hemisphere {say Perth, Australia} then your sunrises have just started getting later by today. But time for the Barby in the evening is still drawing out for a week or two. We can all be happy :-)