Search

Top 60 Oracle Blogs

Recent comments

statistics

METHOD_OPT

Here is a poll prompted by today twitter talks on the default METHOD_OPT value (which is ‘for all columns size auto’ since 10g) and automatic statistics gathering job (auto-task since 11g). CBO development team suggests to use the default job to gather statistics with default options. Do you use the job exactly this way with real production databases? I’m very interested to see some numbers. Please share the poll link http://poll.fm/3n7pn so that more people vote. Thanks.

Incremental Partition Statistics Review

Introduction

Here is a summary of the findings while evaluating Incremental Partition Statistics that have been introduced in Oracle 11g.

The most important point to understand is that Incremental Partition Statistics are not "cost-free", so anyone who is telling you that you can gather statistics on the lowest level (partition or sub-partition in case of composite partitioning) without any noticeable overhead in comparison to non-incremental statistics (on the lowest level) is not telling you the truth.

Although this might be obvious I've already personally heard someone making such claims so it's probably worth to mention.

In principle you need to test on your individual system whether the overhead that is added to each statistics update on the lowest level outweighs the overhead of actually gathering statistics on higher levels, of course in particular on global level.

NewDensity

A recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the question, I realised that I had originally supplied a formula for calculating cardinality, rather than selectivity, so I thought I’d supply a proper example.

We’ll start with a script to create some data and stats – and I’m going to start with a script I wrote in Jan 2001 (which is why it happens to use the analyze command rather than dbms_stats.gather_table_stats, even though this example comes from an instance of 11.2.0.2).

I Wish

A couple of days ago I wrote about some things I’d like to see in index statistics, which means changing dbms_stats.gather_index_stats(). Here’s an idea for dbms_stats.gather_table_stats(). I’d like to see the number of chained and migrated rows counted separately in dba_tables when we run the packaged procedure. The optimizer will use information about chained or migrated rows, but the information is only collected if you use the analyze command (and even then the two figures are summed into the single value chain_cnt) .

I wish

Here are a few thoughts on dbms_stats – in particular the procedure gather_index_stats.

The procedure counts the number of used leaf blocks and the number of distinct keys using a count distinct operation, which means you get an expensive aggregation operation when you gather stats on a large index. It would be nice efficiency feature if Oracle changed the code to use the new Approximate NDV mechanism for these counts.

System Stats

A quick collation – and warning – for 11.2

Bottom line – be careful about what you do with system stats on 11.2

Footnote: the MOS link is a search string  producing a list of references. I set it up like that because one of the articles referencing the bug is called “Things to consider before upgrade to 11.2.0.2″ and it’s worth reading.

Addendum: one of the people on the two-day course I’ve just run in Berlin sent me a link for a quick note on how to set your own values for the system stats if you hit this bug. It’s actually quite a reasonable thing to do whether or not you hit the bug given the way that gathering the stats can produce unsuitable figures anyway:  setting system stats. (I’ve also added their company blog to the links on the right, they have a number interesting items and post fairly regularly.)

blevel=1

Here’s one of those quick answers I give sometimes on forums or newsgroups. I forget when I wrote this, and when, and what the specific question was – but it was something to do with rebuilding an index on a small table where data was constantly being deleted and inserted.

Another problem with high insert/delete rates appears with very small indexes.

If you have a table that is small but constantly recycles its space you may also find you have an index where the number of leaf blocks puts you close to the borderline between having blevel = 1 and blevel = 2. If the size crosses that border occasionally and the statistics are updated to reflect the change – which is quite likely for a table subject to lots of updates and deletes if you have automatic stats collection enabled – then execution plans could change, resulting in dramatic changes in performance.

The workaround is fairly obvious – don’t let Oracle collect stats automatically on that table, instead create a stats-collection strategy for eliminating the change in blevel. For example, keep the stats locked except when you run your own code to deal with the stats, making sure that you overwrite the index blevel with 1 even if it has just crossed the boundary to 2.

Footnote: the reason why a change from 1 to 2 is dramatic is because Oracle ignores the blevel in the optimizer arithmetic when it is set to 1; so the change from 1 to 2 actually has the impact of a change from zero to 2. Then the cost of a nested loop access is “cost of single access multiplied by number of times you do it” – so the sudden appearance of a 2 in the formula gives an increment in cost of  “2 * number of times you visit the table” of your small table is the second table in a nested loop – and suddenly a nested loop becomes much more expensive without a real change in the data size.

Footnote 2: it should be obvious that you don’t need to rebuild the index once you know what the problem is; but since we’re talking about a small index with a blevel that is usually 1 it probably won’t take more than a fraction of a second to rebuild the index and there’s a fair chance you can find a safe moment to do it. In terms of complexity the solution is just as simple as the stats solution – so you might as well consider it. The only thing you need to be careful about is that you don’t happen to rebuild the index at a time when the blevel is likely to be 2.

Footnote 3: For an example of the type of code that will adjust the blevel of an index see this URL. (Note, the example talks about copying stats from one place to another – but the principle is the same.)

Virtual bug

I’ve said in the past that one of the best new features, in my view, in 11g was the appearance of proper virtual columns; and I’ve also been very keen on the new “approximate NDV” that makes it viable to collect stats with the “auto_sample_size”.

Who’d have guessed that if you put them both together, then ran a parallel stats collection it would break :(

The bug number Karen quotes (10013177.8) doesn’t (appear to) mention extended stats – but since virtual columns, function-based indexes, and extended stats share a number of implementation details I’d guess that they might be affected as well.

Rows per block

A recent question on the OTN database forum:

Can any one please point to me a document or a way to calculate the average number of rows per block in oralce 10.2.0.3

One answer would be to collect stats and then approximate as block / avg_row_len – although you have to worry about things like row overheads, the row directory, and block overheads before you can be sure you’ve got it right. On top of this, the average might not be too helpful anyway. So here’s another (not necessarily fast) option that gives you more information about the blocks that have any rows in them (I picked the source$ table from a 10g system because source$ is often good for some extreme behaviour).


break on report

compute sum of tot_blocks on report
compute sum of tot_rows   on report

column avg_rows format 999.99

select
	twentieth,
	min(rows_per_block)			min_rows,
	max(rows_per_block)			max_rows,
	sum(block_ct)				tot_blocks,
	sum(row_total)				tot_rows,
	round(sum(row_total)/sum(block_ct),2)	avg_rows
from
	(
	select
		ntile(20) over (order by rows_per_block) twentieth,
		rows_per_block,
		count(*)			block_ct,
		rows_per_block * count(*)	row_total
	from
		(
		select
			fno, bno, count(*) rows_per_block
		from
			(
			select
				dbms_rowid.rowid_relative_fno(rowid)	as fno,
				dbms_rOwId.rowid_block_number(rowid)	as bno
			from
				source$
			)
		group by
			fno, bno
		)
	group by
		rows_per_block
	order by
		rows_per_block
	)
group by
	twentieth
order by
	twentieth
;

I’ve used the ntile() function to split the results into 20 lines, obviously you might want to change this according to the expected variation in rowcounts for your target table. In my case the results looked like this:

 TWENTIETH   MIN_ROWS   MAX_ROWS TOT_BLOCKS   TOT_ROWS AVG_ROWS
---------- ---------- ---------- ---------- ---------- --------
         1          1         11       2706       3470     1.28
         2         12         22         31        492    15.87
         3         23         34         30        868    28.93
         4         35         45         20        813    40.65
         5         46         57         13        664    51.08
         6         59         70         18       1144    63.56
         7         71         81         23       1751    76.13
         8         82         91         47       4095    87.13
         9         92        101         79       7737    97.94
        10        102        111        140      14976   106.97
        11        112        121        281      32799   116.72
        12        122        131        326      41184   126.33
        13        132        141        384      52370   136.38
        14        142        151        325      47479   146.09
        15        152        161        225      35125   156.11
        16        162        171        110      18260   166.00
        17        172        181         58      10207   175.98
        18        182        191         18       3352   186.22
        19        193        205         22       4377   198.95
        20        206        222         16       3375   210.94
                                 ---------- ----------
sum                                    4872     284538

Of course, the moment you see a result like this it prompts you to ask more questions.

Is the “bell curve” effect that you can see centred around the 13th ntile indicative of a normal distribution of row lengths – if so why is the first ntile such an extreme outlier – is that indicative of a number of peculiarly long rows, did time of arrival have a special effect, is it the result of a particular pattern of delete activity … and so on.

Averages are generally very bad indicators if you’re worried about the behaviour of an Oracle system.

Stats collection

From time to time I see people asking how they can check how far their system has got in it’s call to collect stats. The question is often a little ambiguous – they know which call to dbms_stats they’ve used, so they know whether they’re trying to monitor stats collection against a single table (taking advantage of v$session_longops springs to mind) or against an entire schema.

Here’s one simple-minded approach that I whipped up a few years ago – it’s possible you could do better with the latest versions of dbms_stats. Its purpose is simply to give you some way of checking what work dbms_stats needs to do (in this example, for a schema), so that you can check how much has been done and how much remains. The basic method can be modified in various ways to match you tastes.

We start with the simple observation that many calls to dbms_stats have an “options” parameter that allows you to specify things like ‘LIST STALE’, rather than ‘GATHER STALE’. The “list” options populate an object table type with details of the data segments whose statistics would be gathered by the corresponding “gather” option. So we could start with a simple piece of code to do the following:

declare
	m_object_list	dbms_stats.objecttab;
begin

	dbms_stats.gather_schema_stats(
		ownname		=> 'test_user',
		options		=> 'LIST AUTO',
--		options		=> 'LIST STALE',
--		options		=> 'LIST EMPTY',
		objlist		=> m_object_list
	);

	for i in 1..m_object_list.count loop
		dbms_output.put_line(
			rpad(m_object_list(i).ownname,30)     ||
			rpad(m_object_list(i).objtype, 6)     ||
			rpad(m_object_list(i).objname,30)     ||
			rpad(m_object_list(i).partname,30)    ||
			rpad(m_object_list(i).subpartname,30) ||
			lpad(m_object_list(i).confidence,4)
		);
	end loop;
end;
/

Call this before you make your call to gather stats (and in general it might be better to use the utl_file package to write to a file rather than using dbms_output and capturing screen output) then, as time passes, you can check the “last_analyzed” column on the relevant view to see which objects are still waiting for their stats to be collected. Of course, this approach is a little clunky, and requires a certain amount of tedious manual labour to get the check done, but once we have the basic principle the rest is easy. Let’s start by using the code in a pipelined function.

create or replace function list_stale (
	i_option	in	varchar2,
	i_user		in	varchar2 default user
)
return dbms_stats.objecttab pipelined
as
	pragma autonomous_transaction;
	m_object_list	dbms_stats.objecttab;
begin

	if i_option not in (
		'LIST AUTO', 'LIST STALE','LIST EMPTY'
	) then
		null;
	else
		dbms_stats.gather_schema_stats(
			ownname		=> i_user,
			options		=> i_option,
			objlist		=> m_object_list
		);

		for i in 1..m_object_list.count loop
			pipe row (m_object_list(i));
		end loop;

		end if;

	return;
end;
/

You’ll notice that I’ve declared the function to run as an autonomous transaction – the call to dbms_stats does various things (such as updating the mon_mods$ and col_usage$ tables) that you’re not supposed to do in a pipelined function, but you can hide these by using the automonous_transaction pragma. (Strangely the error message you get in the absences of the pragma is: “ORA-04092: cannot COMMIT in a trigger”.)

With this definition I can call the function with code like:

select  *
from    table(list_stale('LIST AUTO'))

OWNNAME      OBJTYP OBJNAME         PARTNAME         SUBPARTNAME            CONFIDENCE
------------ ------ --------------- ---------------- ---------------------- ----------
TEST_USER    TABLE  PT_RANGE                                                       100
TEST_USER    TABLE  PT_RANGE        P600                                           100
TEST_USER    TABLE  PT_RL                                                          100
TEST_USER    TABLE  PT_RL           P_2002_MAR                                     100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_THE_REST           100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_MN                 100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_EAST_COAST         100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_CA                 100
TEST_USER    TABLE  T1                                                             100
TEST_USER    INDEX  PT_PK                                                          100
TEST_USER    INDEX  PT_PK           P600                                           100
TEST_USER    INDEX  T1_PK                                                          100
        ... etc ...

This will give me a list of all objects in my schema that need stats collected – either because they have no stats, or their stats are stale.

But I can be more subtle than this. I could, for example, write a query that joins this “table” to the view dba_segments and keep running it as the stats collection proceeds to report the objects that are still waiting for stats and the sizes of those objects. The SQL needs just a little thought as the objtype is only reported as “TABLE” or “INDEX” so you have to do a UNION ALL in the join and work out which rows are really for partitions and which for subpartitions. And then you have to mess about a bit with outer joins because Oracle will, for example, want to collect stats on a table when a partition is stale – and there is no data segment at the table level of a partitioned table.

Your code might look something like this:

with list_result as (
	select *
	from	table(list_stale('LIST AUTO'))
)
select
	lst.objname, nvl(seg.segment_type,lst.objtype) segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.partname is null
and	seg.owner(+) = lst.ownname
and	seg.segment_type(+) = lst.objtype
and	seg.segment_name(+) = lst.objname
and	seg.partition_name is null
union all
select
	lst.objname, nvl(seg.segment_type,lst.objtype || ' PARTITION') segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.subpartname is null
and	lst.partname is not null
and	seg.owner(+) = lst.ownname
and	substr(seg.segment_type(+),1,5) = lst.objtype
and	seg.segment_name(+) = lst.objname
and	seg.partition_name(+) = lst.partname
union all
select
	lst.objname, segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.subpartname is not null
and	seg.owner = lst.ownname
and	substr(seg.segment_type,1,5) = lst.objtype
and	seg.segment_name = lst.objname
and	seg.partition_name = lst.subpartname
order by
	1,2,3
/

In a tiny schema, where every data segment is one of my “1MB uniform extents”, this is the sample of output I got:

OBJNAME         SEGMENT_TYPE       PARTNAME                           BLOCKS
--------------- ------------------ ------------------------------ ----------
PT_PK           INDEX
PT_PK           INDEX PARTITION    P200                                  128
PT_PK           INDEX PARTITION    P400                                  128
PT_PK           INDEX PARTITION    P600                                  128
PT_RANGE        TABLE
PT_RANGE        TABLE PARTITION    P200                                  128
PT_RANGE        TABLE PARTITION    P400                                  128
PT_RANGE        TABLE PARTITION    P600                                  128
PT_RL           TABLE
PT_RL           TABLE PARTITION    P_2002_FEB
PT_RL           TABLE PARTITION    P_2002_MAR
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX
RL_PK           INDEX PARTITION    P_2002_FEB
RL_PK           INDEX PARTITION    P_2002_MAR
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
T1              TABLE                                                    128
T1_PK           INDEX                                                    128
T2              TABLE                                                    128
T2_I1           INDEX                                                    128

Obviously you could do something more complex to associate indexes with their tables, then sum subpartitions back to partitions and partitions back to table level stats – but a simple list of objects with primary physical sizes might be sufficient to give you an idea of how much work is still to be done. If you want to be really elegant (or extravagant) you could even combine this SQL with another piece reporting the objects with last_analyzed more recent than the start of run including, if you’re not running a parallel collection, the approximate time taken for each object (think lag() and comparing consecutive times).