You’ve probably heard sayings like “sometimes things aren’t always what they seem” and “people lie”. Well, sometimes execution plans lie. It’s not really by intent, but it is sometimes difficult (or impossible) to represent everything in a query execution tree in nice tabular format like dbms_xplan gives. One of the optimizations that was introduced back in 10gR2 was the use of bloom filters. Bloom filters can be used in two ways: 1) for filtering or 2) for partition pruning (bloom pruning) starting with 11g. Frequently the data models used in data warehousing are dimensional models (star or snowflake) and most Oracle warehouses use simple range (or interval) partitioning on the fact table date key column as that is the filter that yields the largest I/O reduction from partition pruning (most queries in a time series star schema include a time window, right!). As a result, it is imperative that the join between the date dimension and the fact table results in partition pruning. Let’s consider a basic two table join between a date dimension and a fact table. For these examples I’m using STORE_SALES and DATE_DIM which are TPC-DS tables (I frequently use TPC-DS for experiments as it uses a [...]
I recently came across a tidy solution to a common problem – how to minimise code maintenance in a procedure while maximising flexibility of the procedure. The task was fairly simple – create a ref cursor for a calling program to return data that (a) followed complex selection rules and (b) allowed the user to specify numerous types of input.
The principle was simple – the final ref cursor was driven by a list of (say) order ids – and the details to be returned about those orders required some fairly complex SQL to execute. To separate the complexity of constructing the list of columns from the complexity of identifying the required rows the developers had split the procedure into two stages. First, select the list of relevant order ids using one of several possible statements – the appropriate statement being derived from analysis of the inputs to the procedure; secondly open a ref cursor using that list of order ids. In this way if a new set of rules for selection appeared the only new code needed was a new query to select the ids – the main body of code didn’t need to be modified and re-optimised.
They could have inserted the list of order ids into a global temporary table (GTT), of course; but what they actually did was to “bulk collect” them into an object table type, and then use the table() operator to reference them in their ref cursor. Here’s a very simple example demonstrating the concept:
create type number_type_table as table of number;
/
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
cast(rownum as number(8,0)) id,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 100000
;
alter table t1 add constraint t1_pk primary key(id);
-- collect some stats on the table
declare
m_n1_tab number_type_table;
begin
select
id
bulk collect into m_n1_tab
from
t1
where
rownum <= 10
;
for c1 in (
select
/*+
cardinality(l1 10)
*/
r1.small_vc
from
table(m_n1_tab) l1,
t1 r1
where
r1.id = l1.column_value
) loop
dbms_output.put_line(c1.small_vc);
end loop;
end;
/
I’ve used the table t1 both as the source of a few id values and as the target for the final select – the client code was obviously more subtle and complex.
Key details – from the top down – are: the “table type”, the efficiency of using a “bulk collect”, and the table(m_n1_tab) that appears in the cursor loop. (I’ve used a simple cursor loop in my demonstration rather than opening a ref cursor). You’ll notice that I’ve included a /*+ cardinality */ hint to give the optimizer a rough idea of the volume of data in the collection.
It’s clean, it’s simple, and it works well – until you start using distributed queries, and then sometimes the performance is dire. In this example my target table has a primary key, and I’m joining 10 rows on the primary key – the code is very fast and the plan (running 11.1.0.6) is as follows:
SELECT /*+ cardinality(l1 10) */ R1.SMALL_VC FROM TABLE(:B1 )
L1, T1 R1 WHERE R1.ID = L1.COLUMN_VALUE
Plan hash value: 609144676
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 39 (100)| |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 180 | 39 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| | | | | |
|* 4 | INDEX UNIQUE SCAN | T1_PK | 1 | | 0 (0)| |
| 5 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 16 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("R1"."ID"=VALUE(KOKBF$))
But when I changed the code so that the second use of t1 was referenced (through a loopback database link) as a remote table the query took about four minutes to complete, with a plan that looked like this:
SELECT /*+ cardinality(l1 10) */ R1.SMALL_VC FROM TABLE(:B1 )
L1, T1@D11G@LOOPBACK R1 WHERE R1.ID = L1.COLUMN_VALUE
Plan hash value: 3896059973
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 39 (100)| | | |
| 1 | NESTED LOOPS | | 10 | 220 | 39 (0)| 00:00:01 | | |
| 2 | COLLECTION ITERATOR PICKLER FETCH| | | | | | | |
|* 3 | FILTER | | 1 | 20 | 1 (0)| 00:00:01 | | |
| 4 | REMOTE | T1 | | | | | D11G@~ | R->S |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("R1"."ID"=VALUE(KOKBF$))
4 - SELECT "ID","SMALL_VC" FROM "T1" "R1" (accessing 'D11G@LOOPBACK' )
(Notice that the second use of table t1 is now “t1@d11g@loopback” – a remote table).
The optimizer has decided to take the same nested loop execution path, presumably because it can see the high precision indexed access path – but then something has gone wrong. The SQL that gets sent to the remote database doesn’t include an access predicate on table t1 – so the remote database does a full tablescan for each row in the collection. The local database pulls the entire set of rows across the database link and then uses a filter operation at line 3 to eliminate all but the one row it needed.
What’s gone wrong ? The problem (I think) is that the value we want to pass to the remote database comes from an abstract data type (ADT) and although we know that the base type is a simple number type the code isn’t designed to deduce that. Since the remote database may not know how to treat our abstract data types we can’t possibly send them across the network. (The client’s immediate problem was that remote remote table was 1.2M blocks long – ca. 10GB – and it took about 90 seconds to get one row by tablescan.)
The client could have switched to using GTTs in the cases that went wrong, of course, but didn’t really want to have two different code mechanisms. Fortunately, once I’d modelled the problem I found a workaround. It’s a method that you may not be able to take advantage of in all cases but it was perfect for the client. Since the basic problem is that we don’t seem to be allowed to send abstract types across the network all we have to do is turn the abstract type into an Oracle base type:
begin select id bulk collect into m_n1_tab from t1 where rownum <= 10 ; for c1 in ( select r1.small_vc from ( select /*+ no_merge cardinality(l1 10) */ cast(column_value as number(8,0)) v1 from table(m_n1_tab) l1 ) l1, t1@d11g@loopback r1 where r1.id = l1.v1 ) loop dbms_output.put_line(c1.small_vc); end loop; end; /
We replace the naked table() operator with a non-mergeable inline view that selects the absract data type from the table() operator then casts it into an Oracle base type. With this code change the query runs quickly again and we see the following execution plan.
SELECT R1.SMALL_VC FROM ( SELECT /*+ no_merge cardinality(l1 10) */ CAST(COLUMN_VALUE AS NUMBER(8,0)) V1 FROM TABLE(:B1 ) L1 ) L1, T1@D11G@LOOPBACK R1 WHERE R1.ID = L1.V1 Plan hash value: 966067288 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT| ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 39 (100)| | | | | 1 | NESTED LOOPS | | 10 | 330 | 39 (0)| 00:00:01 | | | | 2 | VIEW | | 10 | 130 | 29 (0)| 00:00:01 | | | | 3 | COLLECTION ITERATOR PICKLER FETCH| | | | | | | | | 4 | REMOTE | T1 | 1 | 20 | 1 (0)| 00:00:01 | D11G@~ | R->S | ------------------------------------------------------------------------------------------------------------ Remote SQL Information (identified by operation id): ---------------------------------------------------- 4 - SELECT "ID","SMALL_VC" FROM "T1" "R1" WHERE "ID"=:1 (accessing 'D11G@LOOPBACK' )
As you can see from the view operator at line 2, we have created a result set from the collection operation, and the query that goes to the remote site at line 4 now includes our high-precision predicate. It’s also rather convenient that we can include the cardinality hint in the in-line view – the client loves ANSI SQL, and for complex statements it can be quite difficult getting the hints right.
I just got back in the office from a 2 week conference + vacation (SFO,WAS,NY). Then I was finally back in shape to work and do the usual geek stuff again but suddenly my Neo laptop suddenly stopped working! (the one I mentioned here, but it’s now on Fedora)
It can’t even boot to BIOS, certainly a case worse than BSOD.
So after fiddling with the laptop and systematically ruling out other component failures (power cable,monitor,memory,HD), Yes it’s much like troubleshooting an Oracle database! … we decided to bring it to the service center.
In an earlier note on interpreting the content of frequency histograms I made a throwaway comment about the extra complexity of interpreting frequency histograms on character-based columns. This note starts to examine some of the complications.
The driving problem behind character columns is that they can get quite large – up to 4,000 bytes – so the content of an “accurate histogram” could become quite large, and Oracle seems to have taken a strategic decision (at some point in history) to minimise this storage. As a result we can see an algorithm that works roughly as follows:
Given this algorithm, we can do an approximate reversal (which will only be needed when the endpoint_actual_value is not available) by formatting the endpoint_value into a hex string, extracting the first six pairs of digits, converting to numeric and applying the chr() function to get a character value. (You’ll have to fiddle with this bit of code to handle multibyte character sets, of course).
With a nice friendly single-byte character code, the first 5 characters will be extracted correctly, and the sixth will be pretty close to the original. Here’s an example (which also includes the logic to convert the endpoint_number into a frequency):
rem
rem How to read a frequency histogram on a character column
rem
select
endpoint_number,
endpoint_number - nvl(prev_endpoint,0) frequency,
hex_val,
chr(to_number(substr(hex_val, 2,2),'XX')) ||
chr(to_number(substr(hex_val, 4,2),'XX')) ||
chr(to_number(substr(hex_val, 6,2),'XX')) ||
chr(to_number(substr(hex_val, 8,2),'XX')) ||
chr(to_number(substr(hex_val,10,2),'XX')) ||
chr(to_number(substr(hex_val,12,2),'XX')),
endpoint_actual_value
from (
select
endpoint_number,
lag(endpoint_number,1) over(
order by endpoint_number
) prev_endpoint,
to_char(endpoint_value,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')hex_val,
endpoint_actual_value
from
dba_tab_histograms
where
owner = 'XXX'
and table_name = 'YYY'
and column_name = 'STATUS_COLUMN'
)
order by
endpoint_number
;
set doc off
doc
ENDPOINT_NUMBER FREQUENCY HEX_VAL CHR(TO ENDPOINT_ACTUAL_VALUE
--------------- ---------- ------------------------------- ------ ------------------------------------------
40254 40254 434C4F534543E9175A7D6A7DC00000 CLOSEC CLOSED
40467 213 434F4E4649524E7E0D374A58200000 CONFIR CONFIRMED
40592 125 44454C49564550D642CA2965000000 DELIVE DELIVERED
41304 712 494E564F49432991BF41C99E800000 INVOIC INVOICED
41336 32 4E4556FFFFFFF1D5FBDBC624E00000 NEVÿÿÿ NEW
41434 98 5041494400000C08C1A415AD800000 PAID PAID
41435 1 5041594D454E5B08040F761BE00000 PAYMEN PAYMENT OVERDUE
41478 43 5049434B4544013F0FF93F6EC00000 PICKED PICKED
41479 1 524546554E4436441DE2A321000000 REFUND REFUND MADE
41480 1 524546554E4436441DE2A321000000 REFUND REFUND PENDING
41482 2 52455455524E2F6693F753B6C00000 RETURN RETURNED
11 rows selected.
#
You’ll notice from the sample output that “REFUND MADE” and “REFUND PENDING” are identical in their numeric representation, and that’s why all the actual values have been stored. You can also see how rounding problems have converted CLOSED to CLOSEC, and the padding applied to short strings (combined with rounding errors) has converted NEW to NEVÿÿÿ.
There are a number of side effects to the 6 bytes / 32 character limits that Oracle has imposed for histograms – and I’ll pick up a couple of those in further posts.
Footnote: It’s interesting to note that space utilisation isn’t considered a threat in 11g when looking at the ‘synopsis’ approach of creating the ‘approximate NDV’ for columns. The difference may be due to the passage of time, of course, on the other hand the threat from synopses is largely limited to disc space whereas histograms have to take up memory (in the dictionary cache / row cache) whenever they are used.
As I mentioned in a previous post, I won’t be doing much flying anymore and so am changing all my seminar offering to online seminars.
So, I’ve changed and re-arranged my seminar content into self-contained 4-hour deep dives and thanks to the online nature (no travel needed), people can choose which days they want to attend. If you’re interested in latch contention only, you can attend the Latch Contention deep dive for example etc. Or you can still attend all the deep dives. The cool thing is that these deep dive sessions take only half a day, too (and are priced accordingly). That way you don’t have to skip work for the whole day (or week) and still can get some of your daily work done too. Hopefully it makes your life a bit easier when getting approval to attend the sessions.
As the main feedback from my seminars has been that “there’s too much to learn” within the short 2-3 days I used to do my seminars in, I have arranged the material so that there will be more time to go deep into the subject area. Also, I have planned plenty of time for questions & answers (1 hour out of the 4 hours is planned Q&A sessions and attendees can also ask questions any time during the lecture & demos).
It looks like I will only offer my Advanced Oracle Troubleshooting v2.0 class online this year. I will probably schedule my Advanced SQL Tuning deep dives in January/February 2011 and the Advanced Troubleshooting class again in March/April and so on (until I go public with my other business, when I won’t have time for full length training anymore).
You can check the current schedule and pricing out here:
Here’s a brief outline of individual half-day Deep Dives I offer:
So, sign up now, seats are limited ;-)
Here’s a hidden threat in the optimizer strategy that may cause performance problems if you’re trying to operate a series of batch updates (or batch deletes).
In the past I’ve pointed out that a predicate like “rownum <= N" generally makes the optimizer use “first_rows(N)” optimisation methods – known in the code as first_k_rows optimisation.
This isn’t true for updates and deletes, as the following simple example indicates:
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum id,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 10000
;
create index t1_i1 on t1(id);
-- gather_table_stats, no histograms, compute, cascade
explain plan for
update t1 set
small_vc = upper(small_vc)
where
id > 100
and rownum <= 200
;
select * from table(dbms_xplan.display);
explain plan for
select
small_vc
from
t1
where
id > 100
and rownum <= 200
;
select * from table(dbms_xplan.display);
As usual I ran this with system statistics (CPU costing) disabled, using a locally managed tablespace with uniform 1MB extents and freelist management – simply because this leads to a repeatable test. Since I was running 11.1.0.6 I didn’t set the db_file_multiblock_read_count parameter (thus allowing the _db_file_optimizer_read_count to default to 8). These are the plans I got for the update and select respectively:
------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 200 | 3000 | 27 |
| 1 | UPDATE | T1 | | | |
|* 2 | COUNT STOPKEY | | | | |
|* 3 | TABLE ACCESS FULL| T1 | 9901 | 145K| 27 |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=200)
3 - filter("ID">100)
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 3000 | 6 |
|* 1 | COUNT STOPKEY | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 200 | 3000 | 6 |
|* 3 | INDEX RANGE SCAN | T1_I1 | | | 2 |
----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=200)
3 - access("ID">100)
Note how the select statement uses an index range scan with stop key as the best strategy for finding 200 rows and then stopping – and the total cost of 6 is the cost of visiting the (well-clustered) table data for two hundred rows. The update statement uses a full tablescan to find the first 200 rows with a total cost of 27 – which happens to be the cost of a completed tablescan, not the cost of “enough of the tablescan to find 200 rows”. The update statement has NOT been optimized with using the first_k_rows strategy – it has used the all_rows strategy.
The demonstration is just a starting-point of course – you need to do several more checks and tests to convince yourself that first_k_rows optimisation isn’t going to appear for updates (and deletes) and to discover why it can be a problem that needs to be addressed. One of the simplest checks is to look at the 10053 (CBO) trace files to see the critical difference, especially to notice what’s in the trace for the select but missing from the trace for the update. The critical lines show the following type of information – but only in the trace file for the select:
First K Rows: K = 200.00, N = 9901.00 First K Rows: Setup end First K Rows: K = 200.00, N = 9901.00 First K Rows: old pf = -1.0000000, new pf = 0.0202000 SINGLE TABLE ACCESS PATH (First K Rows) First K Rows: unchanged join prefix len = 1 Final cost for query block SEL$1 (#0) - First K Rows Plan:
But why might it matter anyway ? Here’s the shape of a piece of SQL, embedded in pl/sql, that I found recently at a client site:
update tabX set
col1 = {constant}
where
col2 in (
complex subquery
)
and {list of other predicates}
and rownum <= 200
returning
id
into
:bind_array
;
For most of the calls to this SQL there would be a small number of rows ready for update, and the pl/sql calling this update statement would populate an array (note the “returning” clause) with the ids for the rows updated and then do something with those ids. Unfortunately there were occasions when the data (and the statistics about the data) covered tens of thousands of rows that needed the update. When this happened the optimizer chose to unnest the complex subquery – instead of using a very precise and efficient filter subquery approach – and do a massive hash semi-join that took a couple of CPU minutes per 200 rows and hammered the system to death for a couple of hours.
If Oracle had followed the first_k_rows optimizer strategy it would have used the “small data” access path and taken much less time to complete the task. As it was we ended up using hints to force the desired access path – in this case it was sufficient to add a /*+ no_unnest */ hint to the subquery.
There are so many things to blog about these past few days. That is mainly about the cool stuffs around OCW and OOW, sessions that I have attended (OCW, unconference, OOW), plus the interesting people that I’ve met on various areas of expertise.. So I’ll be posting some highlights (and a lot of photos) on the next posts.
Last Monday (Sept. 20) I was able to present at the Oracle Closed World @ Thirsty Bear. The full agenda is here http://www.amiando.com/ocw.html?page=434169
One of the simple scripts I use from time to time monitors the amount of undo generated by a long running session by taking a session id (SID) as an input and joining v$session to v$transaction:
column start_scn format 999,999,999,999
select
tr.start_scn, tr.log_io, tr.phy_io, tr.used_ublk, tr.used_urec, recursive
from
v$session se,
V$transaction tr
where
se.sid = &1
and tr.ses_addr = se.saddr
;
START_SCN LOG_IO PHY_IO USED_UBLK USED_UREC REC
---------------- ---------- ---------- ---------- ---------- ---
136,509,466,121 10730956 358074 1 1 NO
136,515,115,543 12 0 1 3 YES
You’ll note that this is a session that has been doing quite a lot of work. I was watching it because I wanted to check that it was only generating undo in a series of very short recursive transactions. (It’s running a coalesce on a very large, very smashed, index).
Thanks to everybody who attended my OCW hacking session!
Sorry to guys who attended via webinar – I’ll do the session again in a few weeks, with audio from end to end hopefully! And I will get someone to assist me with monitoring the transmission quality and attendee questions etc.
Note that this stuff is mostly for hacking and fun – don’t use the undocumented stuff in production!
The links are below:
Download scripts & Tools:
Rlwrap links:
Diagnostic events:
X$TRACE
I haven’t written any articles on X$TRACE yet, but you can find some stuff from one of my very old presentations:
Or just type:
ALTER TRACING ENABLE “10704:4:ALL”
Where 10704 is the KST event number, 4 is the level and ALL means all Oracle PIDs (Oracle PIDs, not OSPID’s).
After hours of careful planning (as you’ll see from the images) I decided to webcast my tomorrow’s Oracle Closed World hacking session…
This session isn’t necessarily going to be useful, but it definitely should be fun !!!
You can register here:
Recent comments
17 weeks 4 days ago
27 weeks 2 days ago
29 weeks 13 hours ago
32 weeks 2 days ago
34 weeks 3 days ago
44 weeks 14 hours ago
45 weeks 4 days ago
46 weeks 4 days ago
46 weeks 5 days ago
49 weeks 3 days ago