Search

OakieTags

Who's online

There are currently 0 users and 33 guests online.

Recent comments

Oakies Blog Aggregator

Why ask how, when why is so much more fun?

OK. So the original quote from Spawn is exactly the opposite, but let’s go with it… :)

A few times in the past I’ve been asked questions and started to give a direct answer, then someone smarter has jumped in and asked the killer question. Why? Quite often it’s easy to answer the initial question, so rather than understand the reason for the question, you just respond and pat yourself on the back. That’s great, but without knowing the context of the question, the “right answer” could actually be the “wrong answer”. As Tom always says, “The answer to every question is *it depends*!”

I had another situation like that recently. The questions was, “How can I install VNC on a Linux box?” Pretty simple answer and I know a guy who wrote an article on that, so I pointed them to the article. Job done!

Then I got a pang of guilt and the conversation went like this…

  • Q: Why do you want to install VNC?
  • A: Because my boss told me too.
  • Q: By why does your boss want you to install VNC?
  • A: Because the network connection breaks sometimes, making a “ssh -X user@host” a dodgy solution.

Now I have nothing against VNC itself, but installing it on a server is one more attack vector to worry about, especially if it’s not necessary. Knowing the context allowed me to talk about silent installs, command line DBCA, running things in the background, even the screen command.

If the person goes away and installs VNC, that’s no skin off my nose, but just answering how, without knowing the context could well have opened them, or me, up to criticism down the line.

So next time you answer a question and are about to enable smug mode, ask yourself if you have actually helped, or just taken the easy route.

Cheers

Tim…


Why ask how, when why is so much more fun? was first posted on January 26, 2016 at 7:43 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Trace file size

Here’s a convenient enhancement for tracing that came up on Twitter a few days ago – first in a tweet that I retweeted, then in a question from Christian Antognini based on this bit of the 12c Oracle documentation (opens in separate tab). The question was – does it work for you ?

The new description for max_dump_file_size says that for large enough values Oracle will split the file into multiple chunks of a few megabytes, using a suffix to identify the sequence of the chunks, keeping only the first chunk and the most recent chunks. Unfortunately this doesn’t seem to be true. However, prompted by Chris’ question I ran a quick query against the full parameter list looking for parameters with the word “trace” in their name:


select
        /*+
                leading(nam val val2)
                full(name)
                full(val)  use_hash(val)  no_swap_join_inputs(val)
                full(val2) use_hash(val2) no_swap_join_inputs(val2)
        */
        nam.ksppinm                             name,
        val.ksppstvl                            ses_val,
        val2.ksppstvl                           sys_val,
        nam.ksppdesc                            description,
        nam.indx+1                              numb,
        nam.ksppity                             type,
        val.ksppstdf                            is_def,
        decode(bitand(nam.ksppiflg/256,1),
                1,'True',
                  'False'
        )                                       ses_mod,
        decode(bitand(nam.ksppiflg/65536,3),
                1,'Immediate',
                2,'Deferred' ,
                3,'Immediate',
                  'False'
        )                                       sys_mod,
        decode(bitand(val.ksppstvf,7),
                1,'Modified',
                4,'System Modified',
                  'False'
        )                                       is_mod,
        decode(bitand(val.ksppstvf,2),
                2,'True',
                  'False'
        )                                       is_adj,
        val.ksppstcmnt                          notes
from
        x$ksppi         nam,
        x$ksppcv        val,
        x$ksppsv        val2
where
        nam.indx = val.indx
and     val2.indx = val.indx
and     ksppinm like '%&m_search.%'
order by
        nam.ksppinm
;

Glancing through the result I spotted a couple of interesting parameters with the letters “uts” in their names, so re-ran my query looking for all the “uts” parameters, getting the following (edited) list:


NAME                           SYS_VAL         DESCRIPTION    
------------------------------ --------------- ---------------------------------------------
_diag_uts_control              0               UTS control parameter
_uts_first_segment_retain      TRUE            Should we retain the first trace segment
_uts_first_segment_size        0               Maximum size (in bytes) of first segments 
_uts_trace_disk_threshold      0               Trace disk threshold parameter
_uts_trace_segment_size        0               Maximum size (in bytes) of a trace segment
_uts_trace_segments            5               Maximum number of trace segments 

Note particularly the “first segment size” and “trace segment size” – defaulting to zero (which often means a hidden internal setting, though that doesn’t seem to be the case here, but maybe that’s what the “diag control” is for). I haven’t investigated all the effects, but after a little experimentation I found that all I needed to do to get the behaviour attributed to max_dump_file_size was to set the following two parameters – which I could do at the session level.


alter session set "_uts_first_segment_size" = 5242880;
alter session set "_uts_trace_segment_size" = 5242880;

The minimum value for these parameters is the one I’ve shown above (5120 KB) and with the default value for _uts_trace_segments you will get a maximum of 5 trace files with sequential names like the following:

ls -ltr *4901*.trc

-rw-r----- 1 oracle oinstall 5243099 Jan 26 08:15 orcl_ora_4901_1.trc
-rw-r----- 1 oracle oinstall 5243064 Jan 26 08:15 orcl_ora_4901_12.trc
-rw-r----- 1 oracle oinstall 5243058 Jan 26 08:15 orcl_ora_4901_13.trc
-rw-r----- 1 oracle oinstall 5242993 Jan 26 08:15 orcl_ora_4901_14.trc
-rw-r----- 1 oracle oinstall 1363680 Jan 26 08:15 orcl_ora_4901.trc

As you can see I’m currently generating my 15th trace, and Oracle has kept the first one and the previous three. It’s always working on a file with no suffix to its name but as soon as that file hits its limiting size (plus or minus a few bytes) it gets its appropriate suffix, the oldest file is deleted, and a new trace file without a suffix is started.

Apart from the usual header information the trace files start and end with lines like:

*** TRACE CONTINUED FROM FILE /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_11.trc ***
  
*** TRACE SEGMENT RENAMED TO /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_12.trc ***

There is one little trap to watch out for: if you set either of these parameters to be larger than max_dump_file_size tracing stops as soon as one of the segments hits the max_dump_file_size and that trace file ends with the usual “overflow” message – e.g, when I changed the max_dump_file_size to 4M in mid-session:

*** DUMP FILE SIZE IS LIMITED TO 4194304 BYTES ***

In my case I had started with max_dump_file_size set to 20M, so I got lucky with my choice of 5M as the segment size.

Further investigation is left as an exercise to the interested reader.

 

Oracle Midlands : Event #13

Tomorrow is Oracle Midlands Event #13.

om13

Franck is a super-smart guy, so please show your support and start the year as you mean to go on!

Cheers

Tim…


Oracle Midlands : Event #13 was first posted on January 25, 2016 at 11:41 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

PL/SQL context switch, part 2

This is the second blogpost on using PL/SQL inside SQL. If you landed on this page and have not read the first part, click this link and read that first. I gotten some reactions on the first article, of which one was: how does this look like with ‘pragma udf’ in the function?

Pragma udf is a way to speed up using PL/SQL functions in (user defined function), starting from version 12. If you want to know more about the use of pragma udf, and when it does help, and when it doesn’t, please google for it.

create or replace function add_one( value number ) return number is
        pragma udf;
        l_value number(10):= value;
begin
        return l_value+1;
end;
/

select sum(add_one(id)) from t2;

As you can see, really the only thing you have to do is add ‘pragma udf’ in the declaration section of PL/SQL.

Here is how the flamegraph looks like:

What is visible, is that the functions between the plsql interpreter (pfrrun) and the function that makes the operand evaluation switch to PL/SQL (evapls) now is only one function, peidxrex. However, inside the evapls function there are two additional functions called (kkxmsagof, kkxmsagif, not readable) which take noticeable time. Conclusion at this point is pragma udf is doing it in yet another way than a native PL/SQL function and the subquery factoring.

Profiling this using the systemtap script:

global evapls_time, pfrrun_time, evapls_tot=0, pfrrun_tot=0

probe begin {
	printf("Begin.\n")
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("evapls") {
	if ( pid() == target() )
		evapls_time=local_clock_us()
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("evapls").return {
	if ( pid() == target() )
		evapls_tot+=local_clock_us()-evapls_time
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("pfrrun") {
	if ( pid() == target() )
		pfrrun_time=local_clock_us()
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("pfrrun").return {
	if ( pid() == target() )
		pfrrun_tot+=local_clock_us()-pfrrun_time
}

probe end {
	printf("\nevapls time: %12d\npfrrun time: %12d\n", evapls_tot, pfrrun_tot)
}

Shows:

# stap -x 92509 plsql.stap
Begin.
^C
evapls time:      2211412
pfrrun time:       804178

So, that’s very close to using this function using subquery factoring, a bit longer (2192685). This is very strictly depending on what is actually done, so milage may vary for your own use.

While we are at it, let’s have a look how this looks like when no PL/SQL is used, so:

select sum(id+1) from t2;

Here it is:

The function used for adding is now evaaddrset. From the size of the kdst_fetch function can be seen that it takes way lesser time. Let’s measure it with a changed version of the systemtap script:

global evaaddrset_time, evaaddrset_tot=0

probe begin {
	printf("Begin.\n")
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("evaaddrset") {
	if ( pid() == target() )
		evaaddrset_time=local_clock_us()
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("evaaddrset").return {
	if ( pid() == target() )
		evaaddrset_tot+=local_clock_us()-evaaddrset_time
}

probe end {
	printf("\nevaaddrset time: %12d\n", evaaddrset_tot)
}

This is how the output looks like:

# stap -x 92509 plsql.stap
Begin.
^C
evaaddrset time:        43389

A simple calculation shows that doing the addition native in SQL only takes 43389/2211412*100=2% of the runtime of PL/SQL with pragma udf.

Tagged: internals, oracle, performance, profiling, systemtap

ORA_ROWSCN – When Was My Record Commited

I was going to do a follow-up post to my post on USERENV(‘COMMITSCN’) just to describe the slightly better known but still overlooked (and possibly more useful) ORA_ROWSCN – but I don’t need to!

As Neil Chandler has done this excellent post on it

Go and look at that post, it has all the information and detail you were actually looking for.

However, for any of you search-engine-seekers who can’t face the pain of following a link {it’s such a hard life for the modern knowledge-by-mouse-click generation}, here are the bare bones:

ORA_ROWSCN is a pseudo column that, by default, shows you the SCN (System Change Number) when the block was last changed. You can create a table with the ROWDEPENDENCIES extension to track it for individual rows but you can’t alter a table to add it. So usually you see a (conservative) last-change-scn for the block. See Neil’s blog post for more detail

To convert the SCN to a date/time there are various ways but Oracle will do it for you if it was within the last few days for you – with SCN_TO_TIMESTAMP function. If the row/block was last changed more than a few days ago, that function will error with ORA-08181 (I wish it just returned null rather than 08181, but then you can always wrap the call in your own function that handles that…)

Below is a short demo using test tables I don’t describe:

select house_number, addr_line_1,post_code
      ,ora_rowscn
      ,scn_to_timestamp(ora_rowscn) time_of_scn
from address 
where house_number = 100
and addr_line_1 like 'COTHAM SOUTH%'
/

  HOUSE
 NUMBER ADDR_LINE_1          POST_COD ORA_ROWSCN TIME_OF_SCN
------- -------------------- -------- ---------- ------------------------
    100 COTHAM SOUTH TERRACE SH5 8FA    11425626 24-JAN-16 20.44.56.00000
    100 COTHAM SOUTH DRIVE   LS20 1QY   11427281 24-JAN-16 20.51.29.00000
    100 COTHAM SOUTH         BD17 7JW   11437843 24-JAN-16 20.53.39.00000
    100 COTHAM SOUTH TERRACE LS7 9SK    11448376 24-JAN-16 20.54.56.00000
    100 COTHAM SOUTH TERRACE LS16 4SW   11460162 24-JAN-16 21.20.29.00000
    100 COTHAM SOUTH TERRACE LS7 1GL    11461400 24-JAN-16 21.25.48.00000
    100 COTHAM SOUTH         LS20 1TO   11471921 24-JAN-16 21.26.53.00000
    100 COTHAM SOUTH DRIVE   LS1 5EJ    11471921 24-JAN-16 21.26.53.00000
    100 COTHAM SOUTH DRIVE   SG 3LO     11482461 24-JAN-16 21.28.05.00000
...

--However, if the change is more than about 120 hours ago...
select surname,first_forename,dob,ora_rowscn
     ,scn_to_timestamp(ora_rowscn) time_of_scn
from person where surname='KINOCK'
and DOB between sysdate -10000 and sysdate -9500
/

     ,scn_to_timestamp(ora_rowscn) time_of_scn
      *
ERROR at line 2:
ORA-08181: specified number is not a valid system change number
ORA-06512: at "SYS.SCN_TO_TIMESTAMP", line 1

-- which is a bit misleading as it was a valid SCN, just not for the SCN_TO_TIMESTAMP function
-- remove the column based on scn_to_timestamp...

select surname,first_forename,dob,ora_rowscn
     --,scn_to_timestamp(ora_rowscn) time_of_scn
from person where surname='KINOCK'
and DOB between sysdate -10000 and sysdate -9500

SURNAME         FIRST_FORENAME  DOB               ORA_ROWSCN
--------------- --------------- ----------------- ----------
KINOCK          ABIGAIL         22-APR-1989 00:00    2518996
KINOCK          FRANCESCA       23-APR-1989 00:00    2539749
KINOCK          GIANO           10-NOV-1989 00:00    2567890
KINOCK          GILLIAN         11-JAN-1990 00:00    2716278
...

Look, you really should go and look at Neil’s post: click here

Semijoin_driver

Here’s one of those odd little tricks that (a) may help in a couple of very special cases and (b) may show up at some future date – or maybe it already does – in the optimizer if it is recognised as a solution to a more popular problem. It’s about an apparent restriction on how the optimizer uses the BITMAP MERGE operation, and to demonstrate a very simple case I’ll start with a data set with just one bitmap index:


create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1000)        n1,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index t1_b1 on t1(n1);

select index_name, leaf_blocks, num_rows from user_indexes;

/*
INDEX_NAME           LEAF_BLOCKS   NUM_ROWS
-------------------- ----------- ----------
T1_B1                        500       1000
*/

Realistically we don’t expect to use a single bitmap index to access data from a large table, usually we expect to have queries that give the optimizer the option to choose and combine several bitmap indexes (possibly driving through dimension tables first) to reduce the target row set in the table to a cost-effective level.

In this example, though, I’ve created a column data set that many people might view as “inappropriate” as the target for a bitmap index – in one million rows I have one thousand distinct values, it’s not a “low cardinality” column – but, as Richard Foote (among others) has often had to point out, it’s wrong to think that bitmap indexes are only suitable for columns with a very small number of distinct values. Moreover, it’s the only index on the table, so no chance of combining bitmaps.

Another thing to notice about my data set is that the n1 column has been generated by the mod() function; because of this the column cycles through the 1,000 values I’ve allowed for it, and this means that the rows for any given value are scattered widely across the table, but it also means that if I find a row with the value X in it then there could well be a row with the value X+4 (say) in the same block.

I’ve reported the statistics from user_indexes at the end of the sample code. This shows you that the index holds 1,000 “rows” – i.e. each key value requires only one bitmap entry to cover the whole table, with two rows per leaf block.  (By comparison, a B-tree index oon the column was 2,077 leaf block uncompressed, or 1,538 leaf blocks when compressed).

So here’s the query I want to play with, followed by the run-time execution plan with stats (in this case from a 12.1.0.2 instance):


alter session set statistics_level = all;

select
        /*+
                qb_name(main)
        */
        max(small_vc)
from
        t1
where
        n1 in (1,5)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |      1 |00:00:00.03 |    2006 |      4 |
|   1 |  SORT AGGREGATE                       |       |      1 |      1 |      1 |00:00:00.03 |    2006 |      4 |
|   2 |   INLIST ITERATOR                     |       |      1 |        |   2000 |00:00:00.03 |    2006 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      2 |   2000 |   2000 |00:00:00.02 |    2006 |      4 |
|   4 |     BITMAP CONVERSION TO ROWIDS       |       |      2 |        |   2000 |00:00:00.01 |       6 |      4 |
|*  5 |      BITMAP INDEX SINGLE VALUE        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |      4 |
------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("N1"=1 OR "N1"=5))

The query is selecting 2,000 rows from the table, for n1 = 1 and n1 = 5, and the plan shows us that the optimizer probes the bitmap index twice (operation 5), once for each value, fetching all the rows for n1 = 1, then fetching all the rows for n1 = 5. This entails 2,000 buffer gets. However, we know that for every row where n1 = 1 there is another row nearby (probably in the same block) where n1 = 5 – it would be nice if we could pick up the 1 and the 5 at the same time and do less work.

Technically the optimizer has the necessary facility to do this – it’s known as the BITMAP MERGE – Oracle can read two or more entries from a bitmap index, superimpose the bits (effectively a BITMAP OR), then convert to rowids and visit the table. Unfortunately there are cases (and it seems to be only the simple cases) where this doesn’t appear to be allowed even when we – the users – can see that it might be a very effective strategy. So can we make it happen – and since I’ve asked the question you know that the answer is almost sure to be yes.

Here’s an alternate (messier) SQL statement that achieves the same result:


select
        /*+
                qb_name(main)
                semijoin_driver(@subq)
        */
        max(small_vc)
from
        t1
where
        n1 in (
                select /*+ qb_name(subq) */
                        *
                from    (
                        select 1 from dual
                        union all
                        select 5 from dual
                        )
        )
;

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |      1 |00:00:00.02 |    1074 |       |       |          |
|   1 |  SORT AGGREGATE                      |       |      1 |      1 |      1 |00:00:00.02 |    1074 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |   2000 |   2000 |00:00:00.02 |    1074 |       |       |          |
|   3 |    BITMAP CONVERSION TO ROWIDS       |       |      1 |        |   2000 |00:00:00.01 |       6 |       |       |          |
|   4 |     BITMAP MERGE                     |       |      1 |        |      1 |00:00:00.01 |       6 |  1024K|   512K| 8192  (0)|
|   5 |      BITMAP KEY ITERATION            |       |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
|   6 |       VIEW                           |       |      1 |      2 |      2 |00:00:00.01 |       0 |       |       |          |
|   7 |        UNION-ALL                     |       |      1 |        |      2 |00:00:00.01 |       0 |       |       |          |
|   8 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 10 |       BITMAP INDEX RANGE SCAN        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  10 - access("N1"="from$_subquery$_002"."1")

Key points from this plan – and I’ll comment on the SQL in a moment: The number of buffer visits is roughly halved (In many cases we picked up two rows as we visited each buffer); operation 4 shows us that we did a BITMAP MERGE, and we can see in operations 5 to 10 that we did a BITMAP KEY ITERATION (which is a bit like a nested loop join – “for each row returned by child 1 (operation 6) we executed child 2 (operation 10)”) to probe the index twice and get two strings of bits that operation 4 could merge before operation 3 converted to rowids.

For a clearer picture of how we visit the table, here are the first few rows and last few rows from a version of the two queries where we simply select the ID column rather than aggregating on the small_vc column:

select  id from ...

Original query structure
         1
      1001
      2001
      3001
...
    997005
    998005
    999005

2000 rows selected.

Modified query structure:

         1
         5
      1001
      1005
      2001
      2005
...
    998001
    998005
    999001
    999005
    
2000 rows selected.

As you can see, one query returns all the n1 = 1 rows then all the n1 = 5 rows while the other query alternates as it walks through the merged bitmap. You may recall the Exadata indexing problem (now addressed, of course) from a few years back where the order in which rows were visited after a (B-tree) index range scan made a big difference to performance. This is the same type of issue – when the optimizer’s default plan gets the right data in the wrong order we may be able to find ways of modifying the SQL to visit the data in a more efficient order. In this case we save only fractions of a second because all the data is buffered, but it’s possible that in a production environment with much larger tables many, or all, of the re-visits could turn into physical reads.

Coming back to the SQL, the key to the re-write is to turn my IN-list into a subquery, and then tell the optimizer to use that subquery as a “semijoin driver”. This is essentially the mechanism used by the Star Tranformation, where the optimizer rewrites a simple join so that each dimension table (typically) appears twice, first as an IN subquery driving the bitmap selection then as a “joinback”. But (according to the manuals) a star transformation requires at least two dimension tables to be involved in a join to the central fact table – and that may be why the semi-join approach is not considered in this (and slightly more complex) cases.

 

 

My reference: bitmap_merge.sql, star_hack3.sql

Raspberry Pi Version 2 Wifi Inconsistent Connectivity

I’d had some difficulty with WiFi consistently staying connected on each of my Raspberry Pi units.  I’d had a little time to look into it, but no research had offered an answer and the only fixes I’d come up with was a shell script I wrote to force a restart of the NIC, which helped some, but no improvement when I added an antenna, (I considered all the wires, sensors, etc. might be interfering with the signal.)

Finally this weekend, started to dig around and noted that although in the user interface, it showed both connections to my home WiFi and my Smartphone as enabled, the /etc/wpa-supplicant/wpa_supplicant.conf showed the following for each entry:

disabled=1

I couldn’t find any support documentation that backed up the theory that there was a mismatch in what the GUI stated vs. the configuration file that is the backbone of the settings, but with all the other data involved, it made sense that this should be set to either:

enabled=1

Or

disabled=0

So, I took a deep breath and updated one WiFi connection entry in the file with “disabled=0”, saved and rebooted.

Sure enough, my connection to my home WiFi stayed connected and consistent to the internet.  No more disconnects and no more challenges with connectivity.

Another change I made afterwards, which is backed up with support from multiple resources is editing the /etc/network/interfaces configuration file.  Scan the file and look for an entry that says:

auto wlan0

If you don’t see this entry, add a line to the top of the file, (you should see at least one other line starting with “auto” so add a line under that one if there is), save and reboot.  This will update your RPI to start the WiFi NIC automatically, which will then in turn, connect to your WiFi network.

I’ve made this change to four of my five RPI, (the PiZero Jessie load doesn’t experience this issue and was what triggered that it was a software issue with Wheezy and not a hardware problem….)

rpi3http://dbakevlar.com/wp-content/uploads/2016/01/rpi3.jpg 600w" sizes="(max-width: 472px) 100vw, 472px" />

Kellyn-1, Frustration-0



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [Raspberry Pi Version 2 Wifi Inconsistent Connectivity], All Right Reserved. 2016.

SQL – is there a better way?

Next Tuesday at 11:20 am at Oracle headquarters there will be a panel discussion on SQL. You can participate whether you are there or not as we are collecting questions ahead of time and will video the panel to be posted later. The panel’s participants will be

You can add your questions to the list of questions here on the Google spreadsheet or in the comments on this post.

Is it worth discussing webscale? Webscale seems like a joke word mainly from that infamous NoSQL is webscale video though ironically I’ve seen it in new industry announcements even this week.

General areas of interest are what is the future of SQL? Will there always be SQL or will there be something to replace SQL? How can we improve SQL?  Where is the industry headed with SQL, No SQL and other ways of storing and retrieving data.

Currently the questions are:

  1. Why is SQL getting a bad rap?
    • Isn’t SQL slow? Why is it so slow?
    • Aren’t alternatives to SQL (XQuery, JSON find()) easier to use?
    • If SQL is so great, how do you explain the rise of NoSQL databases?
    • How much of the problem with Oracle SQL is it’s divergence of from  Codd’s relational algebra? (allowing duplicate rows, NULL, etc)
    • Chris Date, Pascal Fabien and others have expressed dismay at the differences in industry SQL vs relational algebra as set down by Codd. How much of a problem are these differences really?
  2. Is SQL hard to learn?
    • Is SQL a *real* programming language? Is it a functional language?
    • I’ve heard that I need to know relational algebra in order to be effective with SQL, is that true?
    • I’m an object-oriented developer, what’s an easy way for me to learn and use SQL? ***
    • How do I write loops in SQL?
    • Joins look painful — isn’t there an easier way?
    • I don’t want to filter entire tables; why doesn’t SQL let me look at the first few rows?
    • I’ve heard of JDBC, but it sounds really low level, like C; is there a easier way for me to interact with SQL?
    • How do I represent sparse data in SQL?
    • Is there a good IDE I can use for SQL? Anything other than SQL Developer?
  3. Why does Oracle have a NoSQL product?
    • Oracle seems to have flip flopped on NoSQL? Why? Is it because Oracle EE and SE have too much overhead for OLTP and ACID compliance?
      1. in May 2011 Oracle recommended not using NoSQL in a white paper titled “Debunking the NoSQL Hype,” the final advice being  “Go for the tried and true path. Don’t be risking your data on NoSQL databases.” *
      2. In Sept 2011 Oracle releases Oracle NoSQL database **
    • Is Oracle No SQL webscale What is webscale ?
    • NoSQL can do 5.6 million queries per second. Can Oracle do the same?
    • Why is Hadoop so popular even though Oracle SQL can run circles around HBase, Hive, Pig, Impala, etc
  4. What’s the difference between PL/SQL and SQL?
    • What does PL/SQL bring to the performance table? (question for Bryn and Steven)
  5. 12c Release 2 performance and reliability features e.g. Sharding
  6. Oracle Developer Advocate program (question for Steven)
  7. There was a time when Oracle only had “declarative referential integrity” which still live on as DISABLE RELY NOVALIDATE. When will Oracle implement SQL-92 ASSERT for arbitrary constraints. There is a rumor that Oracle is thinking about it.

 

* https://www.google.com/#q=%2B%22Debunking+the+NoSQL+Hype%22

**http://www.oracle.com/technetwork/products/nosqldb/learnmore/nosql-wp-1436762.pdf

Oracle suggested that the NoSQL approach was well-suited for certain use-cases:
“The Oracle NoSQL Database, with its ‘No Single Point of Failure’ architecture,
is the right solution when data access is “simple” in nature and application demands exceed the volume or latency capability of traditional data management solutions.  For example, click-stream data from high volume web sites, high-throughput event processing and social networking communications all represent application domains that produce extraordinary volumes of simple keyed data. Monitoring online retail behavior, accessing customer profiles, pulling up appropriate customer ads and storing and forwarding real-time communication are examples of domains requiring the ultimate in low-latency access. Highly distributed applications such as real-time sensor aggregation and scalable authentication also represent domains well-suited to Oracle NoSQL Database.”

*** “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.” (incorrectly attributed to Esther Dyson, the editor of Release 1.0).

Thoughts

The big questions for me are

  • Will there ever be an alternative to SQL?
  • Are there ways to improve SQL significantly?
  • Have we learned anything from No SQL? Has No SQL shown up any major flaws in SQL and relational?

The questions wanting to know if there are alternatives to doing joins struck me as strange when I first saw it. Now with some reflection it brings up the old argument of whether to normalize or de-normalize. In the de-normalized model we can often access all the data we want without a join but we loose the flexibility of the normalized model.

 

 

PL/SQL context switch

Whenever you use PL/SQL in SQL statements, the Oracle engine needs to switch from doing SQL to doing PL/SQL, and switch back after it is done. Generally, this is called a “context switch”. This is an example of that:

-- A function that uses PL/SQL 
create or replace function add_one( value number ) return number is
        l_value number(10):= value;
begin
        return l_value+1;
end;
/
-- A SQL statement that uses the PL/SQL function
select sum(add_one(id)) from t2;

Of course the functionality of the function is superfluous, it can easily be done in ‘pure’ SQL with ‘select sum(id+1) from t2’. But that is not the point.
Also, I added a sum() function, for the sake of preventing output to screen per row.

The first thing to check, if there is a difference in performance between executing with sum(id+1) and sum(add_one(id)). If there isn’t we can stop here :-)

TS@frits > set timing on
TS@frits > select sum(id+1) from t2;

 SUM(ID+1)
----------
5.0000E+11

Elapsed: 00:00:00.19
TS@frits > select sum(add_one(id)) from t2;

SUM(ADD_ONE(ID))
----------------
      5.0000E+11

Elapsed: 00:00:01.13

This statement executes a full table scan, I’ve closely guarded the IO times were alike.
But it looks there is a whopping difference between including PL/SQL and not: 113/19*100=595%, or differently worded: almost six times slower.

How does this work? Let’s have a look using stapflames. The idea behind flame graphs in general is to understand in which (c code) functions (user mode and kernel mode) the time is spend. Because of the full backtrace (all the called functions on top of each other), it gives an insight on how a program is working.

The first thing we need to establish, is how PL/SQL looks like from the perspective of C-functions. For that reason, I created a bogus PL/SQL program to profile:

declare
t number:=0;
begin
while t < 1000000 loop
t:=t+1;
end loop;
end;
/

Yes, that is right, the only thing this anonymous PL/SQL block does, is declare a number to a variable named ‘t’, and then loop adding one to the variable until the variable reaches the number 1000000. Again, what this program does is not interesting nor functional, the only thing it needs to do is run, so when we profile the program we are sure it is doing PL/SQL.

I ran this anonymous PL/SQL block using my stapflame utility to generate a flamegraph, and this is how that looks like:
I have taken the flamegraph of all time considered on cpu by the Oracle database.

First of all, one important property of flamegraphs is shown: the sequence is random. If you look at the kpoal8 function, you see there are two different paths taken from this function: opiexe (oracle program interface execute) and opiosq0 (oracle program interface prepare to parse). Of course the PL/SQL block is first parsed and then executed, so the order is different than shown.

What is also very visible, is that almost all time executing, is done using a function ‘pfrrun’, which seems to be the main function executing PL/SQL. On top of that we can see some functions which roughly resemble the functionality used in the PL/SQL block: pfrinstr_ADDN (addition, t:=t+1) and pfrinstr_RELBRNCH (the loop). This also gives a fair indication PL/SQL is interpreted rather than compiled. Anyway, what is important is that from looking at the little test above, we can distinguish running PL/SQL from SQL by the pfrrun function.

Now let’s look at at a flamegraph of running a PL/SQL function in the SQL statement:
The flamegraph shows all the time considered running on CPU for executing the statement ‘select sum(add_one(id)) from t2’. There is a lot to see!

When we look on top of the function kpoal8, we see the function opifch2. This means the vast majority of the time is spend in the fetch phase of the SQL statement. On top of the opifch2 function we see two functions which start with qer. ‘qer’ probably means Query Execute Rowsource. ‘qertbFetch’ is the fetch procedure for table scans. Then we see the kdsttgr function (kernel data scan table get row), and then the ultra fast table scan function (kdstf; kernel data scan table full) followed by a lot of zero’s and/or one’s and ending with ‘km’ or ‘kmP’. There are a lot of kdstf functions in the Oracle executable, I assume the zero’s and one’s after ‘kdstf’ are a bitmap of potentially needed functions during the full scan, so the full table scan function can omit a lot of choices on runtime, leading to better CPU efficiency. See an article by Tanel on what this means.

In the full table scan function, there are two main functions which consume time, kdst_fetch and qesaFastAggNonDistSS. ‘kdst_fetch’ and deeper functions are functions related to doing IO for reading the data from the data file. ‘qesaFastAggNonDistSS’ is the main function for actually processing the data. The function qesaFastAggNonDistSS has two main functions consuming its time, evaopn2 and a function visible as ‘sl..’, which is actually a function called slnxsum, in other words, the sum() function. The function evaopn2 is a function to evaluate operands. This evaluation is the path towards executing the PL/SQL function.

Actually, when carefully looking at the evaopn2 function, we see the slnxsum function, and ‘evapls’, which is the function to switch to PL/SQL. The two main functions in ‘evapls’ are kgmexec and opiomc. Again here the order is switched; what happens here is first a cursor must be mapped for executing PL/SQL (opiomc function), after which it can be executed (kgmexec function).

In order to understand what the time taken by “switching” to PL/SQL is, we can take the total time the query engine is processing everything PL/SQL related, which is the total time taken by the ‘evapls’ function, and measure the time actually running PL/SQL, which is the time taken by the ‘pfrrun’ function. This can be accomplished by simple systemtap script:
(please mind you need a recent systemtap version, preferably gotten from https://sourceware.org/git/systemtap.git, and kernel version 3.10 to be able to use return probes)

global evapls_time, pfrrun_time, evapls_tot=0, pfrrun_tot=0

probe begin {
	printf("Begin.\n")
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("evapls") {
	if ( pid() == target() )
		evapls_time=local_clock_us()
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("evapls").return {
	if ( pid() == target() )
		evapls_tot+=local_clock_us()-evapls_time
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("pfrrun") {
	if ( pid() == target() )
		pfrrun_time=local_clock_us()
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("pfrrun").return {
	if ( pid() == target() )
		pfrrun_tot+=local_clock_us()-pfrrun_time
}

probe end {
	printf("\nevapls time: %12d\npfrrun time: %12d\n", evapls_tot, pfrrun_tot)
}

This is how it looks like on my machine:

# stap -x 29680 plsql.stap
Begin.
^C
evapls time:      3203637
pfrrun time:       951830

So, the overhead or context switching time, which must be Oracle server code executing between the the evapls function, where it determines it needs to execute PL/SQL and the pfrrun function, which is the PL/SQL interpreter, is on my machine:
(1-951830/3203637)*100=70%
Wow!

One way of reducing this problem, is using subquery factoring, alias the ‘with clause’. To use the function that way, this is how the SQL should be written:

with
function add_one( value number ) return number is
	l_value number(10):= value;
begin
	return l_value+1;
end;
select sum(add_one(id)) from t2;
/

Let’s have a look at the flamegraph of this construction:
It becomes apparent that with subquery factoring, there are way lesser functions between the evapls and pfrrun functions.
Normal PLSQL: kgmexec, kkxmpexe, kkxdexe, peidxexe, peidxr_run, plsql_run
Subquery factoring: kkxmss_speedy_stub, peidxrex
Also mind there is no codepath for mapping a cursor.

Let’s have a look at the timing:

# stap -x 29680 plsql.stap
Begin.
^C
evapls time:      2192685
pfrrun time:       880230

The time spend in PL/SQL, by looking at total time spend in the evapls function reduced by 32% ((1-2192685/3203637)*100).
However, if you calculate the overhead, it is still pretty significant: (1-880230/2192685)*100=60%

Conclusion
The most simple conclusion I can make is: do not use PL/SQL if you can solve it in SQL, like in the example above. This does not mean you should never use PL/SQL, contrary: in a lot of cases processing should be done where the data is, and sometimes you need a procedural language for that.

I made a lot of assumptions in this little investigation. The function naming (the translation from the Oracle C function name to what functionality it is supposed to deliver) are speculations.

The context switch between SQL mode and PL/SQL mode looks like it is technically setting up the execution environment for PL/SQL. Indeed this takes time, and the true PL/SQL execution time when repeatedly executing PL/SQL is very low in my case. Please mind actual times will differ on different systems. However, the main conclusion is that using PL/SQL in SQL execution probably is not the most performant thing to do, including using subquery factoring.

Tagged: oracle, performance, plsql, profiling, systemtap

[Cloud | On-Premise | Outsourcing | In-Sourcing] and why you will fail!

error-24842_640I was reading this article about UK government in-sourcing all the work they previously outsourced.

This could be a story about any one of a number of failed outsourcing or cloud migration projects I’ve read about over the years. They all follow the same pattern.

  • The company is having an internal problem, that they don’t know how to solve. It could be related to costs, productivity, a paradigm shift in business practices or just an existing internal project that is failing.
  • They decide launch down a path of outsourcing or cloud migration with unrealistic expectations of what they can achieve and no real ideas about what benefits they will get, other than what Gartner told them.
  • When it doesn’t go to plan, they blame the outsourcing company, the cloud provider, the business analysts, Gartner, terrorists etc. Notably, the only thing that doesn’t get linked to the failure is themselves.

You might have heard this saying,

“You can’t outsource a problem!”

Just hoping to push your problems on to someone else is a guaranteed fail. If you can’t clearly articulate what you want and understand the consequences of your choices, how will you ever get a result you are happy with?

Over the years we’ve seen a number of high profile consultancies get kicked off government projects. The replacement consultancy comes in, hires all the same staff that failed last time, then continue on the failure train. I’m not going to mention names, but if you have paid any attention to UK government IT projects over the last decade you will know who and what I mean.

Every time you hear someone complaining about failing projects or problems with a specific model (cloud, on-premise, outsourcing, in-sourcing), it’s worth taking a step back and asking yourself where the problem really is. It’s much easier to blame other people than admit you’re part of the problem! These sayings spring to mind.

“Garbage in, garbage out!”

“A bad workman blames his tools!”

Cheers

Tim…

PS. I’ve never done anything wrong. It’s the rest of the world that is to blame… :)

Update: I wasn’t suggesting this is only an issue in public sector projects. It just so happens this rant was sparked by a story about public sector stuff. :)


[Cloud | On-Premise | Outsourcing | In-Sourcing] and why you will fail! was first posted on January 22, 2016 at 2:47 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.