Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Kernel NFS fights back… Oracle throughput matches Direct NFS with latest Solaris improvements

After my recent series of postings, I was made aware of David Lutz’s blog on NFS client performance with Solaris.  It turns out that you can vastly improve the performance of NFS clients using a new parameter to adjust the number of client connections.

root@saemrmb9> grep rpcmod /etc/system
set rpcmod:clnt_max_conns=8

This parameter was introduced in a patch for various flavors of Solaris.  For details on the various flavors, see David Lutz’s recent blog entry on improving NFS client performance.  Soon, it should be the default in Solaris making out-of-box client performance scream.

DSS query throughput with Kernel NFS

I re-ran the DSS query referenced in my last entry and now kNFS matches the throughput of dNFS with 10gigE.


Kernel NFS throughput with Solaris 10 Update 8 (set rpcmod:clnt_max_conns=8)

This is great news for customers not yet on Oracle 11g.  With this latest fix to Solaris, you can match the throughput of Direct NFS on older versions of Oracle.  In a future post, I will explore the CPU impact of dNFS and kNFS with OLTP style transactions.

Posted in Oracle, Storage Tagged: 11g, 7410, analytics, database, dNFS, NAS, NFS, Oracle, performance, Solaris, Sun, tuning

Pipelined functions in oracle 9i

Functions can now stream data as virtual tables. May 2002 (updated May 2007)

CBO oddities in determining selectivity on NVARCHAR2 data type

Some time ago I was involved in solving of a performance problem which was really weird - a simple select on a indexed column with unique value always resulted in a full table scan. The key column was of type NVARCHAR2. To better understand the behavior I created a simple reproducible test case.

Let us start with table and index creation.

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

SQL> show parameter semant

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
nls_length_semantics string CHAR

SQL> create table x2 (id varchar2(32), idn nvarchar2(32), id16 nvarchar2(16), id20 nvarchar2(20));

Table created.

SQL> insert into x2 select lpad(rownum,32,'0'), lpad(rownum,32,'0'),lpad(rownum,16,'0'),lpad(rownum,20,'0')
from dual connect by level <= 20000; 20000 rows created. SQL> commit;

Commit complete.

SQL> create index x2_idn on x2(idn);

Index created.

SQL> analyze table x2 compute statistics;

Table analyzed.

SQL> explain plan for select * from x2 where idn=U'00000000000000000000000000000009';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 156302112

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 3281K| 136 (0)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| X2 | 20000 | 3281K| 136 (0)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("IDN"=U'00000000000000000000000000000009')

We have 20,000 rows with unique values in IDN column in the table, we have statistics and still the CBO produces an execution plan with FULL TABLE SCAN and with a very strange estimate of 20,000 rows reurned by this query.
Therefore our first question should be: What is wrong with the statistics? A simple query on USER_TAB_COLS shows the following results:

SQL> select column_name,num_distinct, low_value,high_value from user_tab_cols where table_name='X2';

COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE
------------------------------ ------------ ---------------------------------------------------------------- ----------------------------------------------------------------
ID 20000 3030303030303030303030303030303030303030303030303030303030303031 3030303030303030303030303030303030303030303030303030303230303030
IDN 1 0030003000300030003000300030003000300030003000300030003000300030 0030003000300030003000300030003000300030003000300030003000300030
ID16 20000 0030003000300030003000300030003000300030003000300030003000300031 0030003000300030003000300030003000300030003000320030003000300030
ID20 3 0030003000300030003000300030003000300030003000300030003000300030 0030003000300030003000300030003000300030003000300030003000300032

To display raw values I used function developed by Greg Rahn which gives the following result:

SQL> column lo_val format a32
SQL> column hi_val format a32
SQL> select column_name,
2 num_distinct,
3 display_raw(low_value,data_type) lo_val,
4 display_raw(high_value,data_type) hi_val
5 from user_tab_cols
6 where table_name='X2';

COLUMN_NAME NUM_DISTINCT LO_VAL HI_VAL
------------------------------ ------------ -------------------------------- --------------------------------
ID 20000 00000000000000000000000000000001 00000000000000000000000000020000
IDN 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ID16 20000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
ID20 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2

We can see that the NUM_DISTINCT column has quite strange values. Reason for this is that Oracle analyzes only first 32 bytes (and not characters) of the string. The default character set for NVARCHAR2 type is AL16UTF16 - a 2 byte character set.
The column IDN was defined as NVARCHAR2(32) and was actually completely filled so every row stores 64 bytes in this column. Here is the proof:

SQL> select dump(idn) from x2 where rownum = 1;

DUMP(IDN)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Typ=1 Len=64: 0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,48,0,56,0,51

Due to considering only first 32 bytes analyze comes back with a strange result that there is only one distinct value, what is of course true, as all distinct digits are beyond first 32 bytes.

Let us continue and use DBMS_STATS.GATHER_TABLE_STATS for analysis.

SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user ,
4 tabname=> 'X2',
5 cascade=> TRUE,
6 estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE,
7 degree=> DBMS_STATS.AUTO_DEGREE,
8 no_invalidate=> FALSE,
9 granularity=> 'ALL',
10 method_opt=> 'FOR ALL COLUMNS SIZE AUTO');
11 end;
12 /

SQL> select column_name,num_distinct,
2 display_raw(low_value,data_type) lo_val,
3 display_raw(high_value,data_type) hi_val
4 from user_tab_cols where table_name='X2'
5 /

COLUMN_NAME NUM_DISTINCT LO_VAL HI_VAL
------------------------------ ------------ -------------------------------- --------------------------------
ID 19820 00000000000000000000000000000001 00000000000000000000000000020000
IDN 20000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ID16 20000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
ID20 19942 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2

DBMS_STATS produces much more reliable cardinalities but the low/high values are still showing only 0 for IDN column. Here there is no difference between ANALYZE and DBMS_STATS as it can't be. Oracle looks only at first 32 bytes and that is the fact. Therefore the CBO will be not able to properly determine the cardinality for the range predicates. We will look closer this in the continuation of this post.
This fact might become dangerous also for all databases which use multi-byte character set as default character set. Actually Oracle will always look only at first 32 bytes and in case that the column will contain many many multi-byte characters this can become pretty odd. All this is something new and becomes more critical in now days when a lot of databases are migrated from single-byte character set to a multi-byte character sets. For sure migrating to any flavor of UTF-8/AL32UTF8 character set represent such a thread.

Conclusions about statistics for NVARCHAR2 data type
When I realized what is the reason behind for such behavior I suggested to my customer to switch back to rule based optimizer what solved the problem. The other possibility would be to change all NVARCHAR2 columns to VARCHAR2 but the customer accept this.
First, but the most important conclusion would be not to use NVARCHAR2 data type for key columns, but rather VARCHAR2. Or paraphrasing Mogens Norgard: "We don't use NVARCHAR2 data type!" :-)

*Range cardinalities on columns of NVARCHAR2 data type*
Reason for this part of post is my involvement in one thread on OTN where a lady from Austria, my neighbor country, was complaining about the "strange" execution plans and consequently also "un-wanted" performance. In one mail she explained me that they are experiencing performance problems since they have upgraded from Oracle9i to 10g. She also said that at the same time they moved from VARCHAR2 data type to NVARCHAR2 datatype.
My first impression was that this might be the same situation as I have already explained above. But after digging deeply in the problem, I realized that the problem lies in completely mis-estimated selectivity and consequently also the cardinality when NVARCHAR2 data type column is in question.

To be able to study the case I prepared an extremely simple reproducible test case. Let us start with the setup of the test environment:

SQL> create table x1 (cv varchar2(10), cn nvarchar2(10));

Table created.

SQL> insert into x1 select lpad(rownum,8,'0'), lpad(rownum,8,'0')
from dual connect by level <= 20000; 20000 rows created. SQL> commit;

Commit complete.

SQL> select cv,cn from x1 where rownum <= 10; CV CN ---------- ---------- 00000469 00000469 00000470 00000470 00000471 00000471 00000472 00000472 00000473 00000473 00000474 00000474 00000475 00000475 00000476 00000476 00000477 00000477 00000478 00000478 10 rows selected. SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user ,
4 tabname=> 'X1',
5 cascade=> TRUE,
6 estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE,
7 degree=> DBMS_STATS.AUTO_DEGREE,
8 no_invalidate=> FALSE,
9 granularity=> 'ALL',
10 method_opt=> 'FOR ALL COLUMNS SIZE 1');
11* end;
SQL> /

PL/SQL procedure successfully completed.

Let us run our first test SQL. I have used bind variables named :a1, :a2 of type VARCHAR2 and :n1, :n2 of type NVARCHAR2. I have event written a simple anonymous PL/SQL block where I selected the values for bind variables from the same table in order to be sure that the binding is done properly. While I was running the tests I enabled events 10053 and 10046.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
SQL_ID 1fj17ram77n5w, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 20 */ *
from x1
where cn >= to_nchar(:a1)
and cn <= to_nchar(:a2) Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.08 | 102 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 1 | 9999 |00:00:00.03 | 102 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(SYS_OP_C2C(:A1)<=SYS_OP_C2C(:A2)) 2 - filter(("CN"<=SYS_OP_C2C(:A2) AND "CN">=SYS_OP_C2C(:A1)))

PARSING IN CURSOR #12 len=111 dep=0 uid=88 oct=3 lid=88 tim=254514943459 hv=2792607932 ad='23f7056c' sqlid='1fj17ram77n5w'
select /*+ gather_plan_statistics */ /* run 20 */ *
from x1
where cn >= to_nchar(:a1)
and cn <= to_nchar(:a2) END OF STMT PARSE #12:c=0,e=734,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=254514943452 BINDS #12: Bind#0 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=64 off=0 kxsbbbfp=0fec0abc bln=32 avl=08 flg=05 value="00000000" Bind#1 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=0 off=32 kxsbbbfp=0fec0adc bln=32 avl=08 flg=01 value="00009999"

I got a hard parse (mis=1), optimizer peeked at the values of bind variables and the estimated cardinality was 1 while the actual one was 9999. From 10046 trace we can clearly see the values of bind variables used by optimizer for the optimization. I used VARCHAR2 type of columns as bind variables but used TO_NCHAR function to convert to NVARCHAR2 data type. The optimizer internally rewrote the SQL so that instead of TO_NCHAR used internal function SYS_OP_C2C which performs conversion from one character set to another.

SQL> select SYS_OP_C2C('Š') from dual;

S
-
S

For next run I used NVARCHAR2 type bind variables and here is the result:

SQL_ID d1bn5vyb6dcgn, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 15 */ *
from x1
where cn >= to_nchar(:n1)
and cn <= to_nchar(:n2) Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.08 | 759 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 50 | 9999 |00:00:00.04 | 759 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(:N1<=:N2) 2 - filter(("CN">=:N1 AND "CN"<=:N2)) PARSING IN CURSOR #15 len=111 dep=0 uid=88 oct=3 lid=88 tim=254434564390 hv=2523312628 ad='2c8d2200' sqlid='d1bn5vyb6dcgn' select /*+ gather_plan_statistics */ /* run 15 */ * from x1 where cn >= to_nchar(:n1)
and cn <= to_nchar(:n2) END OF STMT PARSE #15:c=0,e=746,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=254434564383 BINDS #15: Bind#0 oacdty=01 mxl=128(40) mxlc=20 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=02 csi=2000 siz=256 off=0 kxsbbbfp=0febbee0 bln=128 avl=16 flg=05 value="" Bind#1 oacdty=01 mxl=128(40) mxlc=20 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=02 csi=2000 siz=0 off=128 kxsbbbfp=0febbf60 bln=128 avl=16 flg=01 value=""

Here we got the estimated number of rows 50 what is exactly the default use by the CBO for ranges - selectivity is 0.0025. So the CBO simply decided to throw in the default selectivity. The values of bind variables are empty ("") - CBO just ignored them during the optimization phase but they were of course respected by the runtime engine.

SQL_ID cx88wv0q05tst, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 14 */ *
from x1
where substr(cn,1,10) >= to_nchar(:a1)
and substr(cn,1,10) <= to_nchar(:a2) Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.09 | 759 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 50 | 9999 |00:00:00.05 | 759 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(SYS_OP_C2C(:A1)<=SYS_OP_C2C(:A2)) 2 - filter((SUBSTR("CN",1,10)>=SYS_OP_C2C(:A1) AND
SUBSTR("CN",1,10)<=SYS_OP_C2C(:A2))) PARSING IN CURSOR #12 len=137 dep=0 uid=88 oct=3 lid=88 tim=254227180414 hv=2415463600 ad='23f7e900' sqlid='f5wapff7zk35h' select /*+ gather_plan_statistics */ /* run 16 */ * from x1 where substr(cn,1,10) >= to_nchar(:a1)
and substr(cn,1,10) <= to_nchar(:a2) END OF STMT PARSE #12:c=0,e=767,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=254227180407 BINDS #12: Bind#0 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=64 off=0 kxsbbbfp=0ff75ac0 bln=32 avl=08 flg=05 value="00000000" Bind#1 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=0 off=32 kxsbbbfp=0ff75ae0 bln=32 avl=08 flg=01 value="00009999"

In this case I added used SUBSTR function on the column and VARCHAR2 type bind variables and again the CBO threw in the default for range 0.0025, but we can see the values of bind variables.

SQL_ID auyacfyb6bj9r, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 14 */ *
from x1
where substr(cn,1,10) >= to_nchar(:n1)
and substr(cn,1,10) <= to_nchar(:n2) Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.09 | 759 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 50 | 9999 |00:00:00.05 | 759 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(:N1<=:N2) 2 - filter((SUBSTR("CN",1,10)>=:N1 AND SUBSTR("CN",1,10)<=:N2)) PARSING IN CURSOR #16 len=137 dep=0 uid=88 oct=3 lid=88 tim=254203929826 hv=1230940210 ad='23f78074' sqlid='13u552p4pxa1k' select /*+ gather_plan_statistics */ /* run 15 */ * from x1 where substr(cn,1,10) >= to_nchar(:n1)
and substr(cn,1,10) <= to_nchar(:n2) END OF STMT PARSE #16:c=0,e=770,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=254203929819 BINDS #16: Bind#0 oacdty=01 mxl=128(40) mxlc=20 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=02 csi=2000 siz=256 off=0 kxsbbbfp=10460bd0 bln=128 avl=16 flg=05 value="" Bind#1 oacdty=01 mxl=128(40) mxlc=20 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=02 csi=2000 siz=0 off=128 kxsbbbfp=10460c50 bln=128 avl=16 flg=01 value=""

In the CBO trace the inforamation was:

*******************************************
Peeked values of the binds in SQL statement
*******************************************
----- Bind Info (kkscoacd) -----
Bind#0
oacdty=01 mxl=32(20) mxlc=10 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=02 csi=2000 siz=64 off=0
kxsbbbfp=0f1f5a1c bln=32 avl=16 flg=05
value=""
Bind#1
oacdty=01 mxl=32(20) mxlc=10 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=02 csi=2000 siz=0 off=32
kxsbbbfp=0f1f5a3c bln=32 avl=16 flg=01
value=""

In the last case I just changed from VARCHAR2 to NVARCHAR2 bind variables and the result was expected - the default selectivity and the values of bind variables were again "".

Every time, when the CBO used binds for estimating selectivity I could see the following section after the execution plan:

Peeked Binds
============
Bind variable information
position=1
datatype(code)=1
datatype(string)=VARCHAR2(32)
char set id=170
char format=1
max length=32
value=00000000
Bind variable information
position=2
datatype(code)=1
datatype(string)=VARCHAR2(32)
char set id=170
char format=1
max length=32
value=00009999
Outline Data:
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.1.0.6')
DB_VERSION('11.1.0.6')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "X1"@"SEL$1")
END_OUTLINE_DATA
*/

And finally one case with VARCHAR2 data type.

SQL_ID bznnx6zk4hm11, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ /* run 22 */ *
from x1 where cv >=:a1
and cv <= :a2 Plan hash value: 2189453339 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 9999 |00:00:00.07 | 102 | |* 2 | TABLE ACCESS FULL| X1 | 1 | 9999 | 9999 |00:00:00.02 | 102 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(:A1<=:A2) 2 - filter(("CV"<=:A2 AND "CV">=:A1))

PARSING IN CURSOR #13 len=89 dep=0 uid=88 oct=3 lid=88 tim=265619391820 hv=3829943329 ad='2e0c3568' sqlid='bznnx6zk4hm11'
select /*+ gather_plan_statistics */ /* run 22 */ *
from x1
where cv >= :a1
and cv <= :a2 END OF STMT PARSE #13:c=0,e=762,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=265619391813 BINDS #13: Bind#0 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=64 off=0 kxsbbbfp=0fec0abc bln=32 avl=08 flg=05 value="00000000" Bind#1 oacdty=01 mxl=32(10) mxlc=10 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=170 siz=0 off=32 kxsbbbfp=0fec0adc bln=32 avl=08 flg=01 value="00009999"

In this last case I was querying the VARCHAR2 column and using bind variables of VARCHAR2 type and this time the optimizer estimated cardinality correctly. Of course it performed peeking at the values of bind variables.

Randolf Geist responded to my post on OTN and he suggested to confuse the CBO by using values like 'A0000000' to 'A00009999'. According to his observations also the selectivity for range predicates on VARCHAR2 data types is quite off after using this trick. Building a histogram returns things back to normal on both data types.
He pointed out that the CBO is obviously so smart to spot stored digits in VARCHAR2 columns and do good estimations by treating them as numbers while this is not true for the VARCHAR2 data type.

CONCLUSION
It is obvious that the CBO is not capable to perforem same estimates for ranges on NVARCHAR2 column as it is capable to do on VARCHAR2 column when numbers are stored as strings.
But the most important conclusion to be made is that one should use *appropriate* data types.

Of course that are many possibilities that the CBO might behave differently (like using DYNAMIC_SAMPLING, ...). So I am looking forward for the contributions of others, not just Randolf Geist.

This case was used to raise SR #3-1193039441: CBO and nvarchar2 --> A new Bug: 9220704 has been filed with development.

Does CREATE INDEX Gather Global Statistics?

You can add the COMPUTE STATISTICS clause to the CREATE INDEX statement. It instructs the SQL statement to gather and store index statistics in the data dictionary, while creating the index. This is useful because the overhead associated with the gathering of statistics while executing this SQL statement is negligible. In Oracle9i, the gathering of [...]

Force Cursor Invalidation

Many times it occurs that an inappropriate execution plan is used which was produced by using the current values of bind variables provided at the time of the hard parse. But later on the variables change so much that another execution plan would be required. Unfortunately there is no automatism in 9i and 10g that would spot this fact. Oracle finally resolved this problem in 11g.

The trick is to virtually set the statistics for the object which is involved in the query. What I mean by virtually is that I read the current statistics and store the same statistics back what makes no harm but the side effect is that the cursor is invalidated and hence it will be re-parsed and hopefully this time optimized for the right values of bind variables.

Here is the code:

CREATE OR REPLACE PROCEDURE Invalidate_statistics (
p_ownname VARCHAR2,
p_tabname VARCHAR2
) IS
m_srec DBMS_STATS.STATREC;
m_distcnt NUMBER;
m_density NUMBER;
m_nullcnt NUMBER;
m_avgclen NUMBER;
m_colname VARCHAR2(30);
--
-- This procedure invalidates all cursors present in v$sql
-- which refer to the table passed as input parameter
--
-- requires DBA privileges
--
-- This is a workaround to solve the problem of bind variable peeking
--
BEGIN
-- first find column definition for the first column - doesn't matter for which
-- column we reset the statistics
BEGIN
SELECT column_name INTO m_colname
FROM DBA_TAB_COLUMNS
WHERE owner = p_ownname
AND table_name = p_tabname
AND ROWNUM = 1;
EXCEPTION
WHEN NO_DATA_FOUND THEN RAISE_APPLICATION_ERROR( -20000,
'Table '||p_ownname||'.'||p_tabname||' does not exist');
END;
-- virtually change the statistics - replace with the same values what will cause
-- invalidation of all cursors which refer to this table - NO_INVALIDATE must be FALSE
DBMS_STATS.GET_COLUMN_STATS (ownname => p_ownname,
tabname => p_tabname,
colname => m_colname,
distcnt => m_distcnt,
density => m_density,
nullcnt => m_nullcnt,
srec => m_srec,
avgclen => m_avgclen
);
DBMS_STATS.SET_COLUMN_STATS (ownname => p_ownname,
tabname => p_tabname,
colname => m_colname,
distcnt => m_distcnt,
density => m_density,
nullcnt => m_nullcnt,
srec => m_srec,
avgclen => m_avgclen,
no_invalidate => FALSE
);
END;
/

The procedure reads and stores back the statistics for the first column of the table.

SQL> exec Invalidate_statistics('SCOTT','EMP')

PL/SQL procedure successfully completed.

Sometimes it requires to re-execute the invalidation as the cursor is currently executing. One can use this in 9i,10g,11g.

Direct NFS vs Kernel NFS bake-off with Oracle 11g and Solaris… and the winner is

NOTE::  Please see my next entry on Kernel NFS performance and the improvements that come with the latest Solaris.

==============

After experimenting with dNFS it was time to do a comparison with the “old” way.  I was a little surprised by the results, but I guess that really explains why Oracle decided to embed the NFS client into the database :)

bake-off with OLTP style transactions

This experiment was designed to load up a machine, a T5240, with OLTP style transactions until no more CPU available.  The dataset was big enough to push about 36,000 IOPS read and 1,500 IOPS write during peak throughput.  As you can see, dNFS performed well which allowed the system to scale until DB server CPU was fully utilized.   On the other hand, Kernel NFS throttles after 32 users and is unable to use the available CPU to scale transactional throughput.

lower cpu overhead yields better throughput

A common measure for benchmarks is to figure out how many transactions per CPU are possible.  Below, I plotted the CPU content needed for a particular transaction rate.  This chart shows the total measured CPU (user+system) to for a given TPS rate.


dNFS vs kNFS (TPS/CPU)

As expected, the transaction rate per CPU is greater when using dNFS vs kNFS.  Please do note, that this is a T5240 machine that has 128 threads or virtual CPUs.  I don’t want to go into semantics of sockets, cores, pipelines, and threads but thought it was at least worth noting.  Oracle sees a thread of a T5240 as a CPU, so that is what I used for this comparison.

silly little torture test

When doing the OLTP style tests with a normal sized SGA, I was not able to fully utilize the 10gigE interface or the Sun 7410 storage.   So, I decided to do a silly little micro benchmark with a real small SGA.  This benchmark just does simple read-only queries that essentially result in a bunch of random 8k IO.  I have included the output from the Fishworks analytics below for both kNFS and dNFS.


Random IOPS with kNFS and Sun Open Storage


Random IOPS with dNFS and Sun 7410 open storage

I was able to hit ~90K IOPS with 729MB/sec of throughput with just one 10gigE interface connected to Sun 7140 unified storage.  This is an excellent result with Oracle 11gR2 and dNFS for a random test IO test… but there is still more bandwidth available.  So, I decided to do a quick DSS style query to see if I could break the 1GB/sec barrier.

===dNFS===
SQL> select /*+ parallel(item,32) full(item) */ count(*) from item;
 COUNT(*)
----------
 40025111
Elapsed: 00:00:06.36

===kNFS===
SQL> select /*+ parallel(item,32) full(item) */ count(*) from item;
 COUNT(*)
----------
 40025111

Elapsed: 00:00:16.18

kNFS table scan


dNFS table scan

Excellent, with a simple scan I was able to do 1.14GB/sec with dNFS more than doubling the throughput of kNFS.

configuration notes and basic tuning

I was running on a T5240 with Solaris 10 Update 8.

$ cat /etc/release
Solaris 10 10/09 s10s_u8wos_08a SPARC
Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 16 September 2009

This machine has the a built-in 10gigE interface which uses multiple threads to increase throughput.  Out of the box, there is very little to tuned as long as you are on Solaris 10 Update 8.  I experimented with various settings, but found that only basic tcp settings were required.

ndd -set /dev/tcp tcp_recv_hiwat 400000
ndd -set /dev/tcp tcp_xmit_hiwat 400000
ndd -set /dev/tcp tcp_max_buf 2097152
ndd -set /dev/tcp tcp_cwnd_max 2097152

Finally, on the storage front, I was using the Sun Storage 7140 Unified storage server as the NFS server for this test.  This server was born out of the Fishworks project and is an excellent platform for deploying NFS based databases…. watch out NetApp.

what does it all mean?

dNFS wins hands down.  Standard kernel NFS only essentially allows one client per “mount” point.  So eventually, we see data queued to a mount point.  This essentially clips the throughput far too soon.   Direct NFS solves this problem by having each Oracle shadow process mount the device directly.  Also with dNFS, all the desired tuning and mount point options are not necessary.  Oracle knows what options are most efficient for transferring blocks of data and configures the connection properly.

When I began down this path of discovery, I was only using NFS attached storage because nothing else was available in our lab… and IO was not initially a huge part of the project at hand.  Being a performance guy who benchmarks systems to squeeze out the last percentage point of performance, I was skeptical about NAS devices.  Traditionally, NAS was limited by slow networks and clumsy SW stacks.   But times change.   Fast 10gigE networks and Fishworks storage combined with clever SW like Direct NFS really showed this old dog a new trick.

Posted in Oracle, Storage Tagged: 11g, 7410, analytics, dNFS, fishworks, NAS, NFS, Oracle, performance, Solaris, Sun

The Core Performance Fundamentals Of Oracle Data Warehousing – Introduction

At the 2009 Oracle OpenWorld Unconference back in October I lead a chalk and talk session entitled The Core Performance Fundamentals Of Oracle Data Warehousing. Since this was a chalk and talk I spared the audience any powerpoint slides but I had several people request that make it into a presentation so they could share [...]

The CPU Costing Model – A Few Thoughts Part II

As previously discussed, the formula used by the CBO using the CPU costing model is basically:
 
(sum of all the single block I/Os x average wait time for a single block I/O +
 sum of all the multiblock I/Os x average wait time for a multiblock I/O +
 sum of all the required CPU cycles / CPU cycles per second)
/
average [...]

2 Year Anniversary !!

A little anniversary slipped by unnoticed this past weekend. Which isn’t really too surprising considering how busy things have been lately, with Christmas just around the corner.
I’ve now been at this blogging lark for 2 whole years !!
Hopefully, there are a few people out there just that little bit wiser now regarding how indexes work [...]

Measurement Error Trap In Trace File (event 10046)

Some time ago I had an interesting case which I can use to clearly describe how one can be caught in measurement error trap.

But let us start at the beginning with this response time analysis:

Response Time Component Time % Elap AvgEla
---------------------------------------- ----------- ------- ---------
CPU service 3934.97s 48.39% 0.000716
un-accounted for time 1363.01s 16.76%
db file sequential read 1122.00s 13.80% 0.032253
gc buffer busy 451.73s 5.56% 0.011746
log buffer space 451.64s 5.55% 0.123974
buffer busy waits 176.79s 2.17% 0.029579
gc cr block 2-way 156.49s 1.92% 0.003287
gc cr grant 2-way 100.20s 1.23% 0.006541
latch: cache buffers chains 98.92s 1.22% 0.005708
gc current grant 2-way 69.68s 0.86% 0.006728
latch: library cache 30.10s 0.37% 0.010030
row cache lock 28.95s 0.36% 0.018727
gc current block 2-way 26.72s 0.33% 0.003828
gc cr block busy 19.35s 0.24% 0.006802
gc current grant busy 15.30s 0.19% 0.004999
latch: row cache objects 14.28s 0.18% 0.006165
gc cr block 3-way 11.73s 0.14% 0.002952
gc current block 3-way 11.34s 0.14% 0.003440
log file sync 10.71s 0.13% 0.315066
enq: SQ - contention 9.14s 0.11% 0.060911
My first thought was that there is an I/O problem as the average single block I/O took 32 milliseconds.


After digging for a while I have produced the following graphs which represent the I/O timing.

Let's first look at some facts:
- 3 node RAC
- Same storage
- Single block read time for Instance 1 was substantially different from read times for other instances during off hours
- 5 batch jobs during off hours
- The timings for Instance 1 are obviously not correct

I used the fact that I was performing the analysis on a 3 node RAC to check what are the timings on other two nodes and as you can see from the second graph, they were quite different.

Here is now the explanation:

The output from strace showed:

gettimeofday({1159440978, 931945}, NULL) = 0
pread(14, "\6\242\0\0\375\23\0\2+\254.\0\0\0\1\6\0054\0\0\1\0\5\0"..., 8192, 455 057408) = 8192
gettimeofday({1159440978, 944159}, NULL) = 0

Oracle records time just before performing a system call (pread). When the system call completes Oracle again records the current time and the difference reports as a wait time. Unfortunately due to high CPU load the process was for quite a while waiting at the system level to get on CPU and only then was able to read the current time and therefore the reported elapsed time was quite exaggerated.

Conclusion: Waiting in runque for CPU exaggerates all wait times of the process.

It is always a good practice to confirm findings with a different method. In this case I could use also the operating system tools to measure I/O timings. The same situation one can have on any other kind of wait event but there is not always a possibility to check it independently.

Finally, there was not really an I/O problem but the system was quite CPU bound. Of course the average single block I/O time over 5 - 10 ms shows that probably we are experiencing also I/O bottleneck. The customer later on replaced disk storage with a faster one together with the HW used for RAC.