Search

OakieTags

Who's online

There are currently 0 users and 24 guests online.

Recent comments

InMemory Bonus

It should be fairly well known by now that when you enable the 12c InMemory (Columnar Store) option (and set the inmemory_size) your SQL may take advantage of a new optimizer transformation know as the Vector Transformation, including Vector Aggregation. You may be a little surprised to learn, though, that some of your plans may change even when they don’t produce any sign of a vector transformation as a consequence. This is because In-Memory Column Store isn’t just about doing tablescans very quickly it’s also about new code paths for doing clever things with predicates to squeeze all the extra benefits from the technology. Here’s an example:


rem
rem     Script:         12c_inmemory_surprise.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2016
rem
rem     Last tested
rem             12.1.0.2
rem

drop table t2 purge;
drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                          id,
        trunc((rownum - 1)/100)                         n1,
        trunc((rownum - 1)/100)                         n2,
        trunc(dbms_random.value(1,1e4))                 rand,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                          id,
        trunc((rownum - 1)/100)                         n1,
        trunc((rownum - 1)/100)                         n2,
        trunc(dbms_random.value(1,1e4))                 rand,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create index t1_n1   on t1(n1)   nologging;
create index t2_rand on t2(rand) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns (n1,n2) size 1'
        );
end;
/

There’s nothing particularly special about these two tables – I engineered them for a different demo, and the call to gather extended stats on the column group (n1, n2) is just a minor detail in getting better cardinality estimates in the upcoming plans. At this point I haven’t specified that either table should be in memory, so let’s see what plan I get from dbms_xplan.display_cursor() when I run a query that should do a hash join using t1 as the build table and t2 as the probe table:


select
        /*+
                qb_name(main)
        */
        count(*)
from    (
        select
                /*+ qb_name(inline) */
                distinct t1.v1, t2.v1
        from
                t1,t2
        where
                t1.n1 = 50
        and     t1.n2 = 50
        and     t2.rand = t1.id
        )
;

select * from table(dbms_xplan.display_cursor);

Plan hash value: 1718706536

-------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |       |       |       |  2441 (100)|          |
|   1 |  SORT AGGREGATE                         |       |     1 |       |            |          |
|   2 |   VIEW                                  |       | 10001 |       |  2441   (4)| 00:00:01 |
|   3 |    HASH UNIQUE                          |       | 10001 |   351K|  2441   (4)| 00:00:01 |
|*  4 |     HASH JOIN                           |       | 10001 |   351K|  2439   (4)| 00:00:01 |
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                  | T1_N1 |   100 |       |     1   (0)| 00:00:01 |
|   7 |      TABLE ACCESS FULL                  | T2    |  1000K|    14M|  2416   (4)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."RAND"="T1"."ID")
   5 - filter("T1"."N2"=50)
   6 - access("T1"."N1"=50)

Thanks to the column group the optimizer has estimated correctly that the number of rows selected from t1 would be 100. Beyond that there’s very little exciting about this execution plan.

So let’s modify t2 to be in-memory and see how the plan changes as we re-execute the query:


alter table t2 inmemory;

select
        /*+ qb_name(main) */
        count(*)
...

select * from table(...);


Plan hash value: 106371239

----------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |         |       |       |   259 (100)|          |
|   1 |  SORT AGGREGATE                          |         |     1 |       |            |          |
|   2 |   VIEW                                   |         | 10001 |       |   259  (27)| 00:00:01 |
|   3 |    HASH UNIQUE                           |         | 10001 |   351K|   259  (27)| 00:00:01 |
|*  4 |     HASH JOIN                            |         | 10001 |   351K|   257  (26)| 00:00:01 |
|   5 |      JOIN FILTER CREATE                  | :BF0000 |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1      |   100 |  2100 |     3   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN                  | T1_N1   |   100 |       |     1   (0)| 00:00:01 |
|   8 |      JOIN FILTER USE                     | :BF0000 |  1000K|    14M|   234  (20)| 00:00:01 |
|*  9 |       TABLE ACCESS INMEMORY FULL         | T2      |  1000K|    14M|   234  (20)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."RAND"="T1"."ID")
   6 - filter("T1"."N2"=50)
   7 - access("T1"."N1"=50)
   9 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))

The cost of the tablescan drops dramatically as the optimizer assumes the table will be in the in-memory column store (IMCS) but (in manys way more significantly) we suddenly have a (serial) Bloom filter for a hash join – which eliminates (most of) the data that wouldn’t have survived the hash join without having to use the CPU that would normally be spent probing the build table.

This is an interesting example of what the in-memory code path can do for us. There’s no point in using the Bloom filter for a serial hash join in “classic” Oracle because the Bloom filter is basically the bitmap of the hash table that is the first thing Oracle examines when probing the hash table – but with “in-memory” Oracle there are some enhancements to the Bloom filter and the code path that make using the Bloom filter an effective strategy. Most significant, perhaps, is that the in-memory code path can use SIMD instructions to perform multiple probes from t2 simultaneously, so not only do we get the benefits of avoiding disk access, buffer activity and row-by-row access to columns, we also reduce the CPU time spent on making the first-stage comparisons of the hash join. (And shared columnar dictionaries in 12.2 could reduce this even further!)

Footnote: I also have a note I scribbled dowsn at the Trivadis performance days last month that the Bloom filter used with IMCS carries the actual low and high values from the build table.  I may have misinterpreted this as I wrote it, but if that’s correct then it’s another step in eliminating data very quickly and very early when using IMCS (or Exadata if the same feature exists in the Bloom filters that get pushed to the storage servers.)