In the previous post I've demonstrated that Oracle has some problems to make efficient use of B*Tree indexes if an IS NULL condition is followed by IN / OR predicates also covered by the same index - the predicates following are not used to navigate the index structure efficiently but are applied as filters on all index entries identified by the IS NULL.
In this part I'll show what results I got when repeating the same exercise using Bitmap indexes - after all they include NULL values anyway, so no special tricks are required to use them for an IS NULL search. Let's start again with the same data set (actually not exactly the same but very similar) and an index on the single expression that gets searched for via IS NULL - results are again from 18.3.0:
Indexing null values in Oracle is something that has been written about a lot in the past already. Nowadays it should be common knowledge that Oracle B*Tree indexes don't index entries that are entirely null, but it's possible to include null values in B*Tree indexes when combining them with something guaranteed to be non-null, be it another column or simply a constant expression.
Jonathan Lewis not too long ago published a note that showed an oddity when dealing with IS NULL predicates that in the end turned out not to be a real threat and looked more like an oddity how Oracle displays the access and filter predicates when accessing an index and using IS NULL together with other predicates following after.
Here is another example (besides the fact that Adaptive Cursor Sharing only gets evaluated during a PARSE call (still valid in 12c) and supports a maximum of 14 bind variables) I've recently come across at a client site where the default implementation of Adaptive Cursor Sharing fails to create a more suitable execution plan for different bind variable values.Broken down to a bare minimum the query was sometimes executed using non-existing values for a particular bind variable, but other times these values were existing and very popular. There were two suitable candidate indexes and one of them appeared to the optimizer more attractive in case of the "non-existing" value case.
We will take a look at in this blog post, by testing several different approaches and comparing the time and the statistics for each scenario.
The tests have been performed on a quarter rack ExaData database machine (2 db nodes – with 16 cores each and 3 storage servers). The database is setup for a data warehouse implementation and has been patched with bundle patch 5 at the time of testing.
The tests were executed on a table with 403M rows distributed over 14 range partitions – 7 non-compressed and 7 partitions compressed with HCC query option. Each test spans over two partitions covering 57.5M rows. Please note the dimension tables contain 4000 or less rows. The data is production data and are event based data, meaning data is generated when a certain events occur.
Each test-case/SQL has been executed 5 times under different scenario:
The test cases used are from a warehouse environment and I have modified the column and table names. For the bitmap test-cases I had to hint the queries to ensure the bitmap indexes was actually used.
The output from the test cases is over 20K lines, so I have summed up the elapsed time and a few statistics in tables below to provide a better overview.
Table figure for basic breakdown – Test case 1
Stats / Tests |
1 |
2 |
3 |
4 |
5 |
Elapsed time sec |
8.87 |
12.00 |
09.22 |
185.05 |
149.55 |
cell physical IO bytes saved by storage index |
0 |
0 |
0 |
0 |
0 |
cell physical IO bytes eligible for predicate offload |
5,209,325,568 |
0 |
5,209,325,568 |
0 |
0 |
cell physical IO interconnect bytes returned by smart scan |
2,562,203,600 |
0 |
2,562,201,584 |
0 |
0 |
cell flash cache read hits |
9 |
26 |
9 |
9,290 |
2,063 |
CC Total Rows for Decompression |
|
57,586,000 |
|
|
57,586,000 |
This is a basic query, which is used for high level summaries and serves as a good base line to compare with, for the other test-cases. There is no use of a where clause in the test case, so we will not benefit from any storage indexes in this case. The first 3 tests are without any indexes on the fact table and are performing much better than test 4 and 5 and we should of course not expect the CBO to follow this path anyway. It is evident for test 1 and 3 that the performance gained is supported by the storage server offloading and the smart scans. The above CC stats for test 2, tell us that the db node performs the decompression, so this test will have to burn extra CPU cycles compared to test 1 and 3. There is more to be mentioned for test 2, but I’ll try to cover that in the conclusion.
Table figure for LC breakdown - Test case 2
Stats / Tests |
1 |
2 |
3 |
4 |
5 |
Elapsed time sec |
2.05 |
19.17 |
1.84 |
30.33 |
36.58 |
cell physical IO bytes saved by storage index |
4,186,636,288 |
0 |
4,186,636,288 |
0 |
0 |
cell physical IO bytes eligible for predicate offload |
5,209,292,800 |
0 |
5,209,292,800 |
0 |
0 |
cell physical IO interconnect bytes returned by smart scan |
317,496,848 |
0 |
317,497,280 |
0 |
0 |
cell flash cache read hits |
18 |
59 |
36 |
1,043 |
219 |
CC Total Rows for Decompression |
0 |
57,782,554 |
0 |
0 |
7,842,364 |
Similar finding as we saw from the 1st test case; however, in this test-case we are performing the breakdown for a certain ID and therefore the performance of test 1 and 3, improved further from the IO saved by the Storage Index. For this test case, I ran test 1 and 3 on the save partitions and it is worth noticing, that second time around the savings from the Storage Index improved; so the storage indexes are further maintained/improved as we select data from the tables and partitions.
Table figure for lp breakdown - Test case 3
Stats / Tests |
1 |
2 |
3 |
4 |
5 |
Elapsed time sec |
2.99 |
6.01 |
2.72 |
49.22 |
39.29 |
cell physical IO bytes saved by storage index |
2,623,143,936 |
0 |
2,623,799,296 |
0 |
0 |
cell physical IO bytes eligible for predicate offload |
5,209,325,568 |
0 |
5,209,325,568 |
0 |
0 |
cell physical IO interconnect bytes returned by smart scan |
674,439,456 |
0
|
674,436,288 |
0 |
0 |
cell flash cache read hits |
64 |
44 |
10 |
2,113 |
635 |
CC Total Rows for Decompression |
0 |
57,979,108 |
0 |
0 |
15,582,048 |
Similar findings as we saw from the 2nd test case; this test is just performed on a different ID, which has a higher distinct count than the first ID we tested in test case 2; and as a result of that and on how the data is sorted during insert we are seeing less IO saved by the storage index.
Table figure for spcl breakdown – Test case 4
Stats / Tests |
1 |
2 |
3 |
4 |
5 |
Elapsed time sec |
1.67 |
13.69 |
01.14 |
12.77 |
7.90 |
cell physical IO bytes saved by storage index |
4,531,191,808 |
0 |
4,532,174,848 |
0 |
0 |
cell physical IO bytes eligible for predicate offload |
5,209,325,568 |
0 |
5,209,325,568 |
0 |
0 |
cell physical IO interconnect bytes returned by smart scan |
237,932,736 |
0 |
237,933,312 |
0 |
0 |
cell flash cache read hits |
73 |
52 |
10 |
594 |
183 |
CC Total Rows for Decompression |
0 |
57,782,554 |
0 |
0 |
5,614,752 |
This test case is performed with a where clause on multiple ID’s. Again test 1 and 3 are taking advantage of the Exadata features and are performing well. Test 4 and 5 are still not close to test 1 or 3, but have definitely become a bit more competitive. Comparing the two HCC tests (2 and 5) test 5 seems to do better as it only has to burn CPU cycles for 10% of the results set of test 2. A valid question to ask here would be why we are not seeing any benefits from either Storage offloading or indexing on test 2, but again I’ll defer that discussion to the conclusion.
Table figure for ttl breakdown - Test case 5
Stats / Tests |
1 |
2 |
3 |
4 |
5 |
Elapsed time sec |
12.82 |
15.50 |
11.67 |
254.26 |
304.92 |
cell physical IO bytes saved by storage index |
0 |
0 |
0 |
0 |
0 |
cell physical IO bytes eligible for predicate offload |
5,209,325,568 |
0 |
5,209,325,568 |
0 |
0 |
cell physical IO interconnect bytes returned by smart scan |
2,328,566,432 |
0 |
2,328,567,440 |
0 |
0 |
cell flash cache read hits |
9 |
15 |
9 |
132,467 |
2,341 |
CC Total Rows for Decompression |
0 |
57,586,000 |
0 |
0 |
61,643,318 |
Very similar findings as we saw from the 1st test case; the only difference is this query looks examine the time to something.
Table figure for ttlsc breakdown - Test case 6
Stats / Tests |
1 |
2 |
3 |
4 |
5 |
Elapsed time sec |
1.16 |
4.57 |
1.01 |
12.71 |
03.87 |
cell physical IO bytes saved by storage index |
4,697,096,192 |
0 |
4,698,832,896 |
0 |
0 |
cell physical IO bytes eligible for predicate offload |
5,209,292,800 |
0 |
5,209,292,800 |
0 |
0 |
cell physical IO interconnect bytes returned by smart scan |
55,906,960 |
0 |
55,906,384 |
0 |
0 |
cell flash cache read hits |
9 |
31 |
10 |
3891 |
107 |
CC Total Rows for Decompression |
0 |
57,749,795 |
0 |
0 |
1,998,299 |
Very similar findings as we saw for the 4th test case.
Most warehouse like queries I have performed in our Exadata environment is doing well without indexes on fact tables. So it is no surprise to me to hear more and more people are dropping most of their indexes and take advantage of the Exadata features. If you like to keep the primary key indexes on your dimension tables to ensure the hassle of resolving the duplicate key issues, that seems to be a valid option as well.
In my environment I’m still to find a case where the bitmap index search could compete with the no index approach; and let just say we found such a case, when it would still have to show significant improvements before I would choose that path; Consider the benefits of not having to maintain the bitmap indexes after each load. There are also several restrictions with bitmap indexes that would be nice not to have to worry about.
Now, I mentioned that I would get back to the test 2 results, which were based on Storage FTS on partitions compressed with the HCC query option. In the past I have performed queries on HCC tables and have seen IO savings from the Storage indexes.
Initially i suspected the test2 results observed above to be a bug or alternatively be related to my HCC compressed partitions are only 29MB a piece versa 2.4GB uncompressed. Oracle support/development has confirmed it to be related to the data size, as we can see from the stat "cell physical IO bytes eligible for predicate offload", which doesn't get bumped up after query. The reason for that is after partition pruning, the table is too small for predicate push to kick in and since predicate push doesn't kick in, the Storage Indexes won't kick in either.
Please be aware i don't know the Storage index internals, but I look forward to learn.
At present I'm quite busy and therefore don't have much time to spent on writing blog notes, but I couldn't resist to publish this small and simple test case.
Often you can read (mostly unqualified) rants in various places and forums about the Cost Based Optimizer how stupid, unpredictable etc. it seems to be.
So I think it's time to demonstrate how clever the optimizer sometimes can be.
Consider the following setup:
-- Use PCTFREE 99 so that only one row per (leaf) block
-- This can tell us how many "rows" had to be inspected
-- by checking the number of (leaf) blocks accessed
-- Unfortunately Oracle (usually) doesn't provide the information
-- how many rows have been accessed in the execution plan,
-- but only how many rows are returned by an operation
create table t_opt_clever (
id not null constraint pk_opt_clever primary key,
col1 not null,
col2 not null,
col3 not null,
col4 not null,
col5 not null,
filler
)
pctfree 99
pctused 1
as
select
level as id
, round(dbms_random.value(0, 200)) as col1
, round(dbms_random.value(0, 400)) as col2
, case
when level <= 666
then 'FIRST_BUCKET'
when level <= 833
then 'SECOND_BUCKET'
when level <= 1000
then 'THIRD_BUCKET'
end as col3
, round(dbms_random.value(0, 600)) as col4
, round(dbms_random.value(0, 800)) as col5
, rpad('x', 100, 'x') as filler
from
dual
connect by
level <= 1000;
create index idx_opt_clever1 on t_opt_clever (col5, col1, col4, col2) pctfree 99 compute statistics;
create index idx_opt_clever2 on t_opt_clever (col5, col1, col3, col4, col2) pctfree 99 compute statistics;
exec dbms_stats.gather_table_stats(null, 'T_OPT_CLEVER')
-- scale the table and index by factor 1000
exec dbms_stats.set_table_stats(null, 'T_OPT_CLEVER', numrows => 1000000, numblks => 30000)
exec dbms_stats.set_index_stats(null, 'PK_OPT_CLEVER', numrows=> 1000000, numlblks => 2000, numdist=>1000000, clstfct => 100000, indlevel => 3)
exec dbms_stats.set_index_stats(null, 'IDX_OPT_CLEVER1', numrows=> 1000000, numlblks => 14000, numdist=>1000000, clstfct => 1000000, indlevel => 3)
exec dbms_stats.set_index_stats(null, 'IDX_OPT_CLEVER2', numrows=> 1000000, numlblks => 16000, numdist=>1000000, clstfct => 1000000, indlevel => 3)
Basically this simulates a 1,000,000 rows table with two suboptimal indexes given the following Top 100 query:
Now what do you think, can one of these indexes efficiently be used by the optimizer, and if yes, which one?
At first sight both indexes can't be used to satisfy the requested sort order to avoid a costly full scan of data and a corresponding SORT ORDER BY (STOPKEY) operation, and can't be used efficiently to filter the data because the filter predicate is not among the leading columns.
Let's check the result:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID fz6vky8n5a3xq, child number 0
-------------------------------------
select * from ( select * from t_opt_clever where
col3 = 'FIRST_BUCKET' order by col3, col5, col1, col4, col2 ) where
rownum <= 100
Plan hash value: 4203008252
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.29 | 256 | 100 |
| 2 | VIEW | | 1 | 101 | 109 (0)| 100 |00:00:00.29 | 256 | 100 |
| 3 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 333K| 109 (0)| 100 |00:00:00.29 | 256 | 100 |
|* 4 | INDEX FULL SCAN | IDX_OPT_CLEVER2 | 1 | 101 | 8 (0)| 100 |00:00:00.01 | 156 | 0 |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=100)
4 - access("COL3"='FIRST_BUCKET')
filter("COL3"='FIRST_BUCKET')
24 rows selected.
That is quite interesting, the index IDX_OPT_CLEVER2 is used and no SORT ORDER BY operation can be found in the execution plan, although the index doesn't match the requested sort order. And here comes the cleverness of the optimizer: It recognizes that due to the filter predicate on COL3 this index can actually be used to satisfy the sort order because it is not relevant for the resulting order since COL3 will always be the constant value of the filter predicate. And the same applies to IDX_OPT_CLEVER1, by the way.
But IDX_OPT_CLEVER2 is more efficient than using IDX_OPT_CLEVER1 because the filter predicate can be evaluated on the index data already eliminating some of the rows before visiting the table. Depending on the clustering factor this can make a significant difference to the cost of the operation, since random row accesses to table rows potentially require to access a different block per row.
This can be seen when forcing the usage of IDX_OPT_CLEVER1:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5tgmgfvyyx6z6, child number 0
-------------------------------------
select * from ( select /*+ index(t_opt_clever idx_opt_clever1) */ * from
t_opt_clever where col3 = 'FIRST_BUCKET' order by col3,
col5, col1, col4, col2 ) where rownum <= 100
Plan hash value: 678132971
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.20 | 310 | 54 |
| 2 | VIEW | | 1 | 101 | 312 (1)| 100 |00:00:00.20 | 310 | 54 |
|* 3 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 101 | 312 (1)| 100 |00:00:00.20 | 310 | 54 |
| 4 | INDEX FULL SCAN | IDX_OPT_CLEVER1 | 1 | 1000K| 8 (0)| 154 |00:00:00.01 | 156 | 0 |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=100)
3 - filter("COL3"='FIRST_BUCKET')
23 rows selected.
Two things can be seen here:
1. The optimizer is again smart and is able to avoid the SORT ORDER BY operation, because the index IDX_OPT_CLEVER1 can also be used to return in the data in the requested order, again because COL3 is constant.
2. Using IDX_OPT_CLEVER1 is less efficient because more table rows have to be visited to apply the filter predicate.
The fact that the indexes can only be used efficiently under this special circumstance can be verified by changing the filter predicate so that COL3 can have more than a single value and therefore it's no longer possible to avoid an ORDER BY operation:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID axr6u0yvdk50f, child number 0
-------------------------------------
select * from ( select /*+ index(t_opt_clever idx_opt_clever2) */ * from
t_opt_clever where col3 in ('FIRST_BUCKET', 'SECOND_BUCKET') order by col3, col5, col1,
col4, col2 ) where rownum <= 100
Plan hash value: 2229390605
----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.02 | 1835 | | | |
| 2 | VIEW | | 1 | 666K| 703K (1)| 100 |00:00:00.02 | 1835 | | | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | 666K| 703K (1)| 100 |00:00:00.02 | 1835 | 20480 | 20480 |18432 (0)|
| 4 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 666K| 683K (1)| 833 |00:00:00.01 | 1835 | | | |
|* 5 | INDEX FULL SCAN | IDX_OPT_CLEVER2 | 1 | 666K| 16100 (1)| 833 |00:00:00.01 | 1002 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=100)
3 - filter(ROWNUM<=100)
5 - filter(("COL3"='FIRST_BUCKET' OR "COL3"='SECOND_BUCKET'))
25 rows selected.
Without the index hint the optimizer chooses a full table scan. Forcing e.g. the index IDX_OPT_CLEVER2 shows that indeed all rows had to be processed first and additionally a sort operation was necessary.
So it's interesting to note that the optimizer recognizes special cases where single value predicates allow an index usage that otherwise wouldn't be possible. This is a nice move, since it allows to perform above query in quite an efficient manner although the setup is suboptimal (e.g. a different index with COL3 as leading column or an appropriate IOT could be more suitable, depending on what else is done with the table). Under these (simulated) circumstances this optimization makes quite a difference compared to the otherwise only possible full table scan operation of a 30,000 blocks table.
By the way, above results could be reproduced on 10.2.0.4 and 11.1.0.7 Win32 using default system statistics and an 8KB LMT MSSM tablespace.
Recent comments
3 years 2 days ago
3 years 12 weeks ago
3 years 16 weeks ago
3 years 17 weeks ago
3 years 22 weeks ago
3 years 43 weeks ago
4 years 11 weeks ago
4 years 41 weeks ago
5 years 25 weeks ago
5 years 26 weeks ago