Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

UUID (aka GUID) vs. Oracle sequence number

When you want to generate a unique number, for a primary key, the most common idea (and the right one) is to get a number from an always increasing generator. Oracle provides SEQUENCE for this purpose, which is optimized, easy to use and scalable.

But some application developer has the idea to use a UUID (Universally Unique Identifier) also known as GUID (Globally Unique Identifier). The main advantage is that you do not need a shared object like a sequence because there is no need for a central authority. UUID is generated with a hash function on some local identifiers (host, process, thread) and the hash value is large enough to reduce the probability of collisions without the need for a shared central authority.

However, in the context of Oracle Database:

  • having a shared object is not a problem, as we have a database
  • large values are less efficient (more space in disk, memory and more CPU to process)


In order to put some numbers about this argument, I create a 10 million rows table with an ID generated from a SEQUENCE:

SQL> create sequence DEMO_NUM_ID cache 1e6;
Sequence DEMO_NUM_ID created.
SQL> create table DEMO_NUM
(ID constraint DEMO_NUM_ID primary key, NUM)
as select DEMO_NUM_ID.nextval, rownum
from xmltable('1 to 10000000');
Table DEMO_NUM created.
Elapsed: 00:04:37.796

The data type returned by the sequence is a NUMBER:

Name Null?    Type       
---- -------- ----------

A NUMBER stores two digits per bytes. Here, my 10 million values take on average less than 5 bytes:

SQL> select sum(vsize(ID))/count(*),max(ID),count(*) from DEMO_NUM;
----------------------- ---------- ----------
4.8888893 10000000 10000000

This is very efficient. It is even smaller than a ROWID which is the internal identification of a row.


Here is a similar table with the ID generated as a GUID:

SQL> create table DEMO_GUID
(ID constraint DEMO_GUID_ID primary key, NUM)
as select sys_guid(), rownum
from xmltable('1 to 10000000');
Table DEMO_GUID created.
Elapsed: 00:05:45.900

You can already see that it takes longer to generate. This generates a RAW datatype in 16 bytes.

Name Null? Type
---- -------- ------------

This is very large. Each row where it is a primary key, and each foreign key, and indexes on them, will take 16 bytes where a large part is always the same (hashed from the host, and process). Note that a RAW is displayed with its hexadecimal character translation which is 32 characters here, but does not store it as a VARCHAR2(32) as this will be 2 times larger, and has some additional processing for characterset.

No surprise here, each value is 16 bytes:

SQL> select sum(vsize(ID))/count(*) from DEMO_GUID;


I see only one advantage in GUID primary keys: they are RAW datatypes. I like it because we don’t want arithmetic operations on it. And in addition to that nobody will complain about gaps in numbering. Then, can we store our NUMBER from the sequence as a ROW?

Here, I’m converting to a varchar2 hexadecimal and then to a raw. There’s probably a more efficient method to convert a number to a binary row. UTL_RAW has a CAST_FROM_NUMBER but that is the NUMBER representation. There’s also a CAST_FROM_BINARY_INTEGER. Here, I did a simple conversion through a varchar2, not very nice but the execution time is correct.

SQL> create sequence DEMO_RAW_ID cache 1e6;
Sequence DEMO_RAW_ID created.
SQL> create table DEMO_RAW 
(ID constraint DEMO_RAW_ID primary key, NUM)
as select hextoraw(to_char(DEMO_RAW_ID.nextval,rpad('FM',65,'X')))
, rownum
from xmltable('1 to 10000000');
Table DEMO_RAW created.
Elapsed: 00:04:21.259

This storage is even smaller than the NUMBER. 3 bytes instead of 5 bytes on average for those 10 million values:

SQL> select * from 
(select sum(vsize(ID))/count(*) "vsize(NUM)" from DEMO_NUM),
(select sum(vsize(ID))/count(*) "vsize(RAW)" from DEMO_RAW),
(select sum(vsize(ID))/count(*) "vsize(GUID)" from DEMO_GUID);
vsize(NUM) vsize(RAW) vsize(GUID)
---------- ---------- -----------
4.8888893 2.993421 16

This is totally expected as a byte can store 255 different values, but NUMBER uses only 99 ones with the two digits representation. In my opinion, it would make sense to have a SEQUENCE returning an integer as a RAW binary representation. But as I don’t think people will actually use it, I will not fill an Enhancement Request for that.

Here is the size for the tables and the index on this ID column:

SQL> select segment_name,segment_type,bytes/1024/1024 MBytes 
from dba_segments
where owner=user and segment_name like 'DEMO%' order by mbytes;
SEGMENT_NAME                   SEGMENT_TYPE           MBYTES
------------------------------ ------------------ ----------

Of course, this confirms what we have seen with the average size. The GUID is definitely not a good solution.


12cR2 introduced a compression algorithm for indexes which is interesting even when there are no repeated column values: ADVANCED HIGH (which is available in Enterprise Edition with Advanced Compression Option):

SQL> alter index DEMO_GUID_ID rebuild compress advanced high;
Index DEMO_GUID_ID altered.
Elapsed: 00:01:30.035
SQL> alter index DEMO_RAW_ID rebuild compress advanced high;
Index DEMO_RAW_ID altered.
Elapsed: 00:00:57.193
SQL> alter index DEMO_NUM_ID rebuild compress advanced high;
Index DEMO_NUM_ID altered.
Elapsed: 00:00:49.574

This reduced all 3 indexes. But the GUID one is still the largest one even if a large part of the values are repeated.

SQL> select segment_name,segment_type,bytes/1024/1024 MBytes from dba_segments where owner=user and segment_name like 'DEMO%' order by mbytes;
SEGMENT_NAME                   SEGMENT_TYPE           MBYTES
------------------------------ ------------------ ----------

It is interesting to see that the benefit of RAW number vs. NUMBER datatype is smaller once compressed.

When is GUID smaller than a NUMBER?

Here is where a NUMBER starts to be larger than a 16 bytes GUID:

1E30+1 is the first number that reaches 17 bytes. You will never reach this number with a sequence. Just calculate the size of a database storing these numbers — even with huge gaps in the sequences. That’s impossible. And anyway, GUID is not a solution there given the high probability of collisions.

In summary: do not use GUID or UUID for your primary keys.

A NUMBER generated from a SEQUENCE is the right/efficient/scalable way to get a unique identifier. And if you want to be sure to have no collision when merging or replicating from another system, then just add another ‘system identifier’ as an additional column in the primary key. This can be a node number in a distributed system, or a 3-letter identifier of the company. When column values have a repeated subset, there’s no efficient deduplication or compression techniques. But when it is a different column that is fully repeated, table and index compression can be used. So, if you really need to add a hashed hostname, put it in an additional column rather than the UUID idea of mixing all in one value.

Your New Years Resolution

Aligning roughly with the calendar year, based on the Chinese zodiak we’re about to go from the year of the dog to the year of the pig. But for me, in the “Information Technology Zodiak” Smile , 2018 was the year of the hack, just as it was in 2017 and just as it will be for 2019.

I’ve not dedicated much time to keeping a record of all of the high profile breaches this year, but just off the top of my head I can think of:

  • The Elasticsearch breach,
  • The Starwood breach which, as a frequent traveller, probably guarantees I’ve been impacted in some as yet unknown way,
  • The Quora breach, a site that is very popular amongst I.T professionals.
  • The Google+ breach, which peeved me more about the fact that it was hush hushed by Google in order to preserve reputational damage. Um…what happened to looking after customers?

Adding to that list, whilst we were discussing security this year at Sangam 18, indian friends reminded me of the enormous Aadhaar breach at the start of the year, which is perhaps one of the largest of all time in terms of numbers of people impact.

And to be honest, I lost track of how many, which, where etc the amount of dramas that Facebook had.

I’m a realist so I’m not saying that hacks can always be avoided, or that anyone who gets hacked is being negligent. Sometimes even the best efforts are not good enough, because the hacks get more and more sophisticated all the time. But at the very least, I think we should set a goal for 2019 that all of our data at rest should be encrypted. If we can at least start with that small step, then at least if our data at rest is somehow nefariously obtained, it will still have a veil of protection.

So what’s your I.T resolution for next year? I hope it’s security related.

Another day…another "use the right datatype" post

Here’s an interesting little oddity (aka bug) with scalar queries.

We’ll start with a simple working example

SQL> create table t1 ( c1 number );

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> create table t2 ( c1 int, c2 varchar2(10));

Table created.

SQL> insert into t2 values(1,'t1');

1 row created.

SQL> insert into t2 values(1,'t01');

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','T1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','T2')

PL/SQL procedure successfully completed.

SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = b.c1 )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1

1 row selected.

That all seems straightforward:

  • We got the value 1 from T1.C1,
  • used that as an input to the query into T2
  • got the maximum of the 2 matching rows from table T2
  • return the result as a column outer query

Let us now repeat the example, but we’ll now make column C1 in table T2 a VARCHAR2. The query will remain unchanged, so now we are comparing a numeric “1” with a varchar2 value of “1”.

SQL> create table t1 ( c1 number );

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> create table t2 ( c1 varchar2(10), c2 varchar2(10));

Table created.

SQL> insert into t2 values('1','t1');

1 row created.

SQL> insert into t2 values('01','t01');

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','T1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','T2')

PL/SQL procedure successfully completed.

SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = b.c1 )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1
         1 t01

2 rows selected.

Whoops…that doesn’t look right.

Don’t get me wrong, that is an error in the way that we are processing the query, because we should not be getting 2 rows back from it. But it is another example of where you can encounter boundary cases (and hence bugs) when you stray from the well-trodden route. Something is obviously going awry during the implicit conversion, because if we explicitly take care of it, then things are fine.

SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = to_number(b.c1) )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1

1 row selected.

I’d love it if there was an init.ora parameter that could be used in (say) development environments that would error out any SQL statement with an implicit data type conversion, but until that day comes (and it may be never!) then please make sure you take care with your data types!

Happy New Year everyone!

Index Only access with Oracle, MySQL, PostgreSQL, and Microsoft SQL Server

In my previous post about the advantages of index access over full table scans, I mentioned covering indexes. This is when an Index Range Scan can retrieve all columns without going to the table. Adding to an index all the columns used by the SELECT or WHERE clause is an important tuning technique for queries that are around the inflection point between index access and table full scan. But not all RDBMS are equal. What PostgreSQL calls ‘Index Only’ actually reads the table, except for static data with no concurrent modifications.

I’ll show the execution plans for this Index Only access on Oracle, MySQL, PostgreSQL, and MS SQLServer. As my skills on the non-Oracle ones are very limited, do not hesitate to comment if you think something is not correct.

I create a DEMO table

As I don’t want to bother with datatypes names. I use CREATE TABLE AS SELECT

This works on MySQL and PostgreSQL:

create table DEMO as select 1 A,1 B,1 C union all select 1 A,1 B,1 C

With Oracle I need to mention a table and DUAL is there for this goal:

create table DEMO as select 1 A, 1 B, 1 C from dual 
union all select 1 A, 1 B , 1 C from dual;

With SQL Server , no ‘Create Table As Select’, but the INTO syntax is used to mention the table to create:

select 1 A, 1 B, 1 C into DEMO union all select 1 A, 1 B , 1 C;

This table has 2 rows. I’ll add more rows so that the optimizer/query planner can base its estimations on statistics.

Insert rows

My goal is to query with a predicate ‘where A=2’ returning a small number of rows, as I want to benefit from index access. This works on all databases: insert a lot of (1,1,1) with a cartesian product and a few (2,0,0)

insert into DEMO select 1,1,1 
insert into DEMO values (2,0,0);
insert into DEMO select a.A, a.B, a.C from DEMO a;
create index DEMO_AB on DEMO(A,B);
select A,count(*) from DEMO group by A;

Because this version of Oracle gathers statistics during the CTAS but not for further DML, I delete statistics to rely on dynamic sampling:

begin dbms_stats.delete_table_stats('','DEMO'); end;

Explain plan

For MySQL I don’t know how to get the execution plan with execution statistics, so I simply used EXPLAIN

explain select sum(A) from DEMO where A=2;
explain select sum(B) from DEMO where A=2;
explain select sum(C) from DEMO where A=2;

The goal is to show the plan when I select only the column used for access(A), or an additional column found in the index (B) or a column which exists only in the table. The one with B is the one where I expect Index Only access.

For PostgreSQL the EXPLAIN statement can execute and show execution statistics:

explain (analyze,verbose,costs,buffers)
select sum(A) from DEMO where A=2;

For SQL Server, this is done with SET STATISTICS PROFILE before running the statements:

set statistics profile on

For Oracle, I used DBMS_XPLAN :

select /*+ gather_plan_statistics */ sum(B) from DEMO where A=2;
select plan_table_output from dbms_xplan.display_cursor(format=>'allstats last +projection');


I’ve run them with db<>fiddle so that you can see the exact execution with the link provided.

Oracle 18c

When I access only A or B there is no need to go to the table with Oracle. This is a true Index Only access.

The +PROJECTION format shows which columns are returned by each operation and here it is clear that B comes from the index. No need to go to the table for each index entry.

The last example which selects column C is the only one where Oracle has to access the table:

In summary, with Oracle, I have to go to the table only when I need a column from it that is not in the index. This is a very interesting solution when I have a range scan which selects a large number of rows, where accessing rows scattered in the table is expensive, but where the table is too large for a full table scan to be efficient. Then the Index Only access is a solution. This technique is also known as Fat Index (from Tapio Lahdenmaki work).

PostgreSQL 11

PostgreSQL, since 9.2, has an access path explicitly called Index Only but if you run my example, you will not see it. And even when you query only the column A you will access the table:

This is just an optimized version of going to the table for each index entry, where a bitmap from index entries is built before scanning the table. The query planner may have chosen Index Only access, but it would be less efficient because - despite its name - Index Only access will finally access to the table rows. You see ‘Heap Fetches’ when it is the case. I blogged about it in the past. The reason is that PostgreSQL MVCC (multi-versioning to be able to read without blocking writes) is at Tuple level, and the index has no information about what is committed or not.

Postgres has a little optimization for this, using the visibility map, but this is maintained asynchronously by the vacuum process. In my db<>fiddle example you will see a real Index Only after a full vacuum:

In summary, PostgreSQL Index Only is really ‘Index Only’ when you see ‘Heap Fetches: 0’ and this means that the ‘fat index’ optimization is possible only for tables with rare modifications and frequent vacuum.

MySQL 8.0

MySQL with the InnoDB engine is MVCC at block level like Oracle, where versioned index blocks contain the information about the visibility. You can see that when I select column A or B which are in the index, the ‘Extra’ information of the execution plan mentions ‘Using Index’ which means that there’s no access to the table:

SQL Server 2017

SQL Server can do Index Only:

I can see the difference when selecting the C column which is not in the index. The need to go to the table for each index entry (RID Lookup) makes the optimizer chose a Table Scan for this example (except when hinted):

SQL Lite

As db<>fiddle is down for SQLite when writing this, I’ve run it with sqlfiddle!5/59a60/4 and I see ‘COVERING’ mentioned when selecting only A or B:


In summary

Oracle and MySQL with InnoDB engine are the ones who can really do Index Only, aka Covering Index, in a high OLTP (where readers do not block writers). SQL Server can also do it, but with reads blocking writes (except with snapshot isolation). PostgreSQL can also do it, with non-blocking reads, but not efficiently when the tables have concurrent modifications.

Of course, my favorite is Oracle as this access path, and the MVCC at block level, is there for decades. Here is Oracle 7.3 execution plan for the same example:

A Christmas Carol

You better watch out,
Let me tell you a fact.
If your SQL has literals,
You’re gonna be hacked.

SQL Injection is comin’ to town

We’ve got a library cache,
It’s memory all linked twice.
But it only works well,
When you’re binding all nice.

SQL Injection is comin’ to town

We know when you’ve been lazy,
And concatenated simple strings.
So just make sure you bind your stuff,
And don’t let the bad guys in.

So… you better watch out,
Let me tell you a fact.
If your SQL has literals,
You’re gonna be hacked.

SQL Injection is comin’ to town


Merry Christmas everyone! Smile

Oracle Security Blog Posts

I teach many training classes on Oracle security to lots of students worldwide both on-site and on-line and one area I often cover quote briefly is where can you find more information or keep up to date on Oracle security....[Read More]

Posted by Pete On 23/12/18 At 05:53 PM

We still need partitions and indexes, and I will continue to commute by train even when…

We still need indexes, and I will continue to commute by train even when self-driving cars become reality…

When the Autonomous Data Warehouse Cloud Service had been announced, Oracle came with this surprising idea that we do not need to create Indexes, Partitions and Materialized views for our analytic queries. It was even blocked in ADW and recently released but not recommended. Automatic indexing is for ATP and the message for ADW is: you don’t need indexes for your analytic queries.

In my opinion, and even with the best performance in non-index access, we will still need index range scans. And even when it is not the fastest access path. Because the fastest response time is not the first performance criteria for the end-user.

Full Table Scan vs. Index access

Here is a diagram I used to explain the cost of Full Table Scan vs. Index Access, depending on the number of rows to return.

FTS vs. Index cost / number of rows

Full Table Scan depends only on the size of the table. It has the same cost to return zero, one, half, or all the rows. Index Access is linearly proportional to the number of rows. The optimizer has a hard job to determine the point of inflection. And anyway, around the point of inflection, no method is really optimal. I always recommend providing structures that make one of those two access paths obviously the best.

The idea of the Autonomous Data Warehouse is to lower the cost of the Full Table Scan, thanks to a combination of hardware and software optimizations. Of course, it cannot be lowered to compete with an index unique access retrieving one row, like for OLTP ‘select * where PK=’. But the inflection point can be moved down enough so that Full Table Scan response time is correct for any analytic query. Then, with only one access method, the optimizer job is easier, less prone to errors. We don’t have to design indexes, and all this can be branded as autonomous tuning.

Markus Winand —

Before going to the problem of having no indexes to range scan, I’ll try to summarize quickly the properties of Full Table Scans and Index access. There’s also an excellent read to go into the detail: by Markus Winand.

There are two access paths to read table rows:

  • Full Table Scan, which is always available (at the exception of IOTs - Index Organized Tables, as there’s no table) where all the formatted blocks of a Heap table are read, without a specific order.
  • Access by ROWID where a sorted structure (as an Index) or hash function provides the physical address where to get the rows we need, without having to scan all blocks.

Full Table Scan is efficient because it can to read rows as it is physically the most efficient: contiguous blocks, in large I/O calls, in parallel, asynchronous, bypassing all caches. However, it has to read a lot more than needed. Without any determined order. Then the rows have to be filtered out later. And often sorted or hashed later, to be joined, ordered or deduplicated. Many features have improved Full Table Scan to keep it efficient even when we do not need a lot of rows: Storage Indexes, Zone Maps, SmartScan, Compression, Column Store, In-Memory, HCC,… Those features keep Full Table Scans still efficient for analytic queries where we query only a subset of rows and columns.

Yes, Full Table Scan on ADW is fast. But from my experience, ‘fast’ response time is not the most important criteria for the end-user. They want predictable response time.

Users prefer predictability over fast response time

You may think that Full Table Scan is predictable. That’s because you know the size of the table. But the end-user doesn’t know if the table is compressed, or full of free space. They do not realize how it can be longer when requesting the result to be ordered. They do not understand that querying today’s orders is not faster than querying the whole week. They cannot predict the response time because it is not proportional to the result they expect.

And there’s more: when many people run the same Full Table Scan, even when expecting different data, they are in competition and all response time will be longer because of buffer busy waits. When you query at a time where there’s a lot of activity on the storage, the response time will also be longer. All this cannot be predicted by the end-user. And the improvements such as Storage Index, In-Memory, Compression,… are not predictable by nature as they depend on the activity. They adapt from the previous queries and concurrent workload.

Even when Index Access response time is higher, it stays more predictable because it processes only the required rows. The time will be the same when the table grows by 200%. Or when 5 users are running similar queries. The cost is proportional to the number of rows returned: the end-user will accept easily that querying one week takes x7 more time than for one day. If it takes 5 minutes, they will manage it because they expect this longer duration. They don’t want the fastest response time, they just want a predictable time.

Autonomous Optimization as presented by Larry Ellison

The features of the Oracle Autonomous Database are presented with an analogy with cars, to push on the self-driving idea. For OLTP, with the Autonomous Transaction Processing, Oracle develops Automatic Indexing. And the analogy is: indexes are like new roads. The optimizer evaluates the execution plans (plans are like driving directions) with the goal to choose the one with the fastest response time. There’s no decision about the most predictable, the one which is less subject to contention with others, or the one which has a response time proportional to the result set. The CBO chose the lower cost, and the CBO cost is the estimated response time. With this goal, an optimized table scan looks the best.

But my goal, keeping the response time predictable by the end-user, is different. I’ll continue the analogy with the cars. Here is what Google Maps estimates for my morning commute to work:

Car: 35 min — 1 h 5 min

This is how ‘predictable’ is the commute by car at that time - the peak hour around Geneva. Typically between 35 and 65 minutes. It is, on average, faster than the train. But the time is not linearly proportional to the kilometers. Most of the time is spent in the few ‘red’ kilometers. That’s my analogy with Full Table Scan. We can drive fast, very fast. But the time it takes to reach the destination is uncertain.

And here is what Google Maps estimates for the same commute by train:

Train: 1 h — 1 h 1 min

This is completely predictable, proportional to the distance, and with nearly no variations. Index Access is like my commute by train: most of the time not faster than driving, but finally preferable because of its stability and predictability. And with good design, Index Access can be as precise as Swiss trains ;)

You can guess that I take the train to go to work. I don’t care that’s it takes longer, because I know it and can manage it — like opening my laptop and writing a blog post — so that it is not wasted time. And I know exactly when I’ll arrive at work for a meeting, or at home to pick up the kids.

Index, partitions, materialized views

If predictability is your first criteria, then your optimization should not rely on caches, parallelism, compression, storage indexes, in-memory population, result cache,… Of course, all these features are welcome when they kick-in at execution time, but they will not help you to forecast a predictable response time. The predictable optimizations are those for which the cost is proportional to the expected result, where we can forecast performance:

  • Partitions clearly reduce the cost of Full Table Scan with something predictable. The partition pruning has a business meaning. If the user queries a specific time range or geographical area, she knows that the time to retrieve will be proportional to it. Partitions, when properly designed, also keep the cost independent from the table growth.
  • Materialized views give this end-user predictability for aggregations. When a user queries for the monthly amount of sales in the year, she expects a 12 rows result, and she may not imagine that millions of rows have to be scanned for this. With a materialized summary, the response time will meet the volume of the result.
  • Indexes, when properly designed, will range scan exactly the values required for the query. And they will not add additional time for sorting or deduplicating the rows. Index access will give the best user experience because the retrieval is proportional to the cost imagined by the user. And Top-N queries is an important part of analytic queries. Only an index range scan answer a Top-N query without reading all the rows.

Index access is preferable when we query a limited number of rows. Full Table Scan is better when we need to read a large part of the table. The optimizer can choose the best one, but it is based on estimations. If you are in the area around the inflection point, you have the risk of execution plan change between two plans that are not optimal. This is where Cardinality Feedback can be good or bad. You should never leave critical queries in this zone. In the diagram at the top of this post, there’s a dotted line for ‘covering index’ which pushes the inflection point to a cardinality where it is still an alternative to FTS for a high number of rows. This is when the index contains all required columns and there’s no need to go to the table (and no dependency with the clustering factor, which is difficult to predict by the end-user).

In my opinion

With the price of a limited human-labor index design, we can provide speed and predictability to our users. Expensive hardware and software can compete with the former, but not the latter. And this is why I think we will always need indexes. The reason is the same as why I prefer to commute by train: predictability of the time.

By the way, there’s another reason after predictability and speed: green computing. Look at the resource (CPU and I/O) usage efficiency and guess which execution plan has the lower carbon footprint. Full Table Scan in Parallel Query, like all the trendy Big Data solutions where you scale by adding more nodes, is awesome and fast, but extremely expensive on resources. Indexes, when correctly designed, will read only what is needed to get the result, with predictable response time.

QC vs. PX

One last post before closing down for the Christmas break.

Here’s a little puzzle with a remarkably easy and obvious solution that Ivica Arsov presented at the UKOUG Tech2018 conference. It’s a brilliant little puzzle that makes a very important point, because it reminded me that most problems are easy and obvious only after you’ve seen them at least once. If you’ve done a load of testing and investigation into something it’s easy to forget that there may be many scenarios you haven’t even thought of testing – so when you see the next puzzle your mind follows all the things you’ve done previously and doesn’t think that you might be looking at something new.

In this case I had to wait until the end of the presentation to discover how “easy and obvious” the solution was. Here’s a query with its results: all I’m going to do is join a session (from v$session) with all its parallel execution slaves by looking for the matching qcsid in v$px_session:

break on server_group skip 1 duplicate

        px.sid, px.qcsid,
        px.server_group, px.server_set, px.server#,
        V$px_session px,
        v$session ss
        ss.username = 'TEST_USER'
and     ss.sid = px.sid
order by
        px.server_group nulls first, px.server_set, px.server#

---------- ---------- ------------ ---------- ---------- -------------
       357        357                                    b4wg6286xn324

       357        125            1          1          1 bppfad1y1auhj
       357        246                       1          2 bppfad1y1auhj
       357        364                       1          3 bppfad1y1auhj

       357          7            2          1          1 5vdbyjy0c7dam
       357        133                       1          2 5vdbyjy0c7dam
       357        253                       1          3 5vdbyjy0c7dam

As you can see session 357 is reported as a query coordinator session, with two parallel server groups of 3 slave processes each. Strangely, though, the coordinator and the two groups of parallel query slaves are reporting different SQL_IDs which is probably contrary to the experience that most of us have had. When a parallel query (or DML or DDL statement) is executing the query co-ordinator and all its slave processes should report the same SQL_ID – so what’s happening here?

Little pause for thought …
… and no doubt some of you were immediately aware of the probable explanation. It’s very simple if you’ve come across the phenomenon before. Here’s the SQL that allowed me (from another session) to capture this result:

rem     Script: px_qc_joke_2.sql
rem     Author: Jonathan Lewis
rem     Dated:  Dec 2018

create table t1 nologging
select ao.*
        all_objects ao,
         select rownum id
         from   dual
         connect by level  comment to avoid wordpress format issue

create table t2 nologging as select * from t1;
create table t3 nologging as select * from t1;

        dbms_stats.gather_table_stats(user,'t1', method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats(user,'t2', method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats(user,'t3', method_opt=>'for all columns size 1');

prompt  =====================
prompt  Starting PL/SQL block
prompt  Prepare to query v$
prompt  =====================

        cursor c1 is select /*+ parallel (t1 3) */ object_id from t1;
        cursor c2 is select /*+ parallel (t2 3) */ object_id from t2;
        m_id1 number;
        m_id2 number;
        open c1;
        fetch c1 into  m_id1;

        open c2;
        fetch c2 into  m_id2;

        for r in (select owner from t3 where object_id = least(m_id1,m_id2)) loop
        end loop;

        dbms_output.put_line(m_id1 || ' - ' || m_id2); 

        close c2;
        close c1;

I’ve opened two cursors on parallel queries inside an anonymous PL/SQL block. The driving session is the query co-ordinator for two different parallel queries at the same time because it’s keeping two cursors open, and it’s also executing the cursor that is the driving query block. If we check v$sql for the three SQL_IDs reported from v$px_session this is what we see:

declare  cursor c1 is select /*+ parallel (t1 3) */ object_id from t1;  cursor c
2 is select /*+ parallel (t2 3) */ object_id from t2;  m_id1 number;  m_id2 numb
er; begin  open c1;  fetch c1 into  m_id1;   open c2;  fetch c2 into  m_id2;   f
or r in (select owner from t3 where object_id = least(m_id1,m_id2)) loop   dbms_
output.put_line(r.owner);   dbms_lock.sleep(1);  end loop;   dbms_output.put_lin
e(m_id1 || ' - ' || m_id2);   close c2;  close c1; end;

SELECT /*+ parallel (t1 3) */ OBJECT_ID FROM T1

SELECT /*+ parallel (t2 3) */ OBJECT_ID FROM T2

Apart from the warning that it’s easy to be misled by a problem because you keep thinking of all the cases you’ve seen before there’s another important point behind this little quiz. It’s often said that when you run parallel queries you may actually use “2 * DOP” parallel query slaves – this is true (though for more complicated queries you may get multiple DFO trees at once, each with its “2 * DOP(n)” slaves) – it’s worth remembering that even with very simple queries a single session can have many cursors open at once, holding “2 * DOP” slave for each one, and ruin the response time for every other session because every other session ends up running serial queries.

Update (Fen 2019)

On a re-read of this note it cross my mind to wonder what would happen if parallel query queueing were enabled and you had two sessions executing PL/SQL blocks that ended up colliding in the following way:

  • Session 1: open parallel cursor 1a – using half the available PX slaves
  • Session 2: open parallel cursor 2a – using the rest of the PX slaves
  • Session 1: open parallel cursor 1b – demanding the rest of the slaves, and joining the parallel queue.
  • Session 2: open parallel cursor 2b – demanding the rest of the slave, and joining the parallel queue.

If this scenario is possible you now have a “parallel queue deadlock” between the sessions with no obvious scope for one of them to “rollback” and allow the other to continue. One of the PL/SQL blocks probably has to fail and terminate (which could be very bad news if you’ve done some DML and commits in the middle of the PL/SQL, prior to opening both cursors.

It’s possible, of course, that there are configuration details that mean this collision can’t happen. One day I may find time to test the idea – but maybe an interested (or well-informed) reader will supply the answer (or a reference to the bit of the manual I should have read before writing this update).



Your AskTOM Top 10 for 2018

Here we go folks…here are the top 10 most viewed questions in AskTOM this year!

We’ll count them down from 10 to 1

10) Inserting values into a table with ‘&’

Viewed 80,000 times.

First asked in 2012, this actually is not a database issue but a client issue. Many client tools view ampersand as a substitution variable, and this they intercept the execution before the command is sent to the database. Full details here

9) What is a SID, how to change it, how to find out what it is

Viewed 95,000 times.

A timeless classic. This question was asked in June 2000 – only 1 month after AskTOM went live. Given that most of our documentation contains, and most people are aware of the terms “instance” and “database”, I’ll never really understand why we didn’t go with ORACLE_INSTANCE instead of ORACLE_SID. Full details here.

8) How to Update millions or records in a table

Viewed 100,000 times.

Asked back in 2002, I can understand the popularity of this one. Perhaps the most common performance tuning issue is data conversion or data loading. Sometimes you just need to forget the word UPDATE and re-load the data – it can be much faster. Full details here.

7) How to connect SQLPlus without tnsnames.ora

Viewed 110,000 times.

Is it that much of a big deal to have a tnsnames.ora file? I guess so. Nowadays the EZ connect syntax makes connecting to any database trivial. Full details here.


Viewed 130,000 times.

Another understandable one for the top 10 because of the transition from old style export/import to data pump around the timeframe that this question was first asked. There is also the classic chicken-and-egg issue to deal with, namely, that you need a database in order to import, but won’t a full import than clash with all the existing objects? Full details here.

5) Converting CLOBS TO VARCHAR

Viewed 132,000 times.

Once a golden oldie from 2001! It seems an odd proposition – if you have a CLOB, there’s a good chance you have it because it exceeds the allowable size for VARCHAR2. So converting back down seems a risky exercise. Full details here.

4) ORA-12560: TNS:protocol adapter error

Viewed 135,000 times.

Ah yes, not our finest hour this one. Try to use a database on Windows that has not been started, and do you get an error saying “Database not started”? No. You get a crytpic TNS error. Full details here.

3) Format the Number for display

Viewed 143,000 times.

There is such a thing as “too much of a good thing”. And perhaps our incredible flexibility when it comes to formatting data, and the range of format masks is tough for people to swallow. Full details here.

2) IN (vs) EXISTS and NOT IN (vs) NOT EXISTS

Viewed 144,000 times.

Well, there’s a simple reason this gem keeps on coming back year after year after year. For years, books on performance tuning and blogs on performance tuning would claim that EXISTS was better than IN, or vice versa. Amazingly it has never been the case that one was guaranteed to be better than the other, and more alarmingly a straight swap from one to the other can even impact the results you get back. I think we’ll see this guy in the list every year…..forever. Full details here.

And finally we have arrived at the number 1 most viewed question this year. Which begs me to ask the question:

Why do so many of you need to know how many are in each table? Smile

1) Finding the number of rows in each table by a single sql

Viewed 510,000 times.

Wow. Half a million people need to know how many rows are in their database tables. I’ve no idea why, because the moment you count them, you’re already out of date. Full details here.

And there you … our top 10 for 2018.

Have a great festive season!

Transitive Closure

This is a follow-up to a note I wrote nearly 12 years ago, looking at the problems of transitive closure (or absence thereof) from the opposite direction. Transitive closure gives the optimizer one way of generating new predicates from the predicates you supply in your where clause (or, in some cases, your constraints); but it’s a mechanism with some limitations. Consider the following pairs of predicates:

    t1.col1 = t2.col2
and t2.col2 = t3.col3

    t1.col1 = t2.col2
and t2.col2 = 'X'

A person can see that the first pair of predicate allows us to infer that “t1.col1 = t3.col3” and the second pair of predicates allows us to infer that “t1.col1 = ‘X'”. The optimizer is coded only to recognize the second inference. This has an important side effect that can have a dramatic impact on performance in a way that’s far more likely to appear if your SQL is generated by code. Consider this sample data set (reproduced from the 2006 article):

rem     Script:         transitive_loop.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2006
rem     Purpose:
rem     Last tested

create table t1 
        mod(rownum,100) col1,
        rpad('x',200)   v1
        rownum <= 2000

create table t2
        mod(rownum,100) col2,
        rpad('x',200)   v2
        rownum <= 2000

create table t3
        mod(rownum,100) col3,
        rpad('x',200)   v3
        rownum <= 2000

-- gather stats if necessary

set autotrace traceonly explain

prompt  =========================
prompt  Baseline - two hash joins
prompt  =========================

        t1.*, t2.*, t3.*
        t1, t2, t3
        t2.col2 = t1.col1
and     t3.col3 = t2.col2

prompt  ================================================
prompt  Force mismatch between predicates and join order
prompt  ================================================

                leading(t1 t3 t2)
        t1.*, t2.*, t3.*
        t1, t2, t3
        t2.col2 = t1.col1
and     t3.col3 = t2.col2

The first query simply joins the tables in the from clause order on a column we know will have 20 rows for each distinct value, so the result sets will grow from 2,000 rows to 40,000 rows to 800,000 rows. Looking at the second query we would like to think that when we force Oracle to use the join order t1 -> t3 -> t2 it would be able to use the existing predicates to generate the predicate “t3.col3 = t1.col1” and therefore be able to do the same amount of work as the first query (and, perhaps, manage to produce the same final cardinality estimate).

Here are the two plans, taken from an instance of

Baseline - two hash joins

| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT    |      |   800K|   466M|    48  (38)| 00:00:01 |
|*  1 |  HASH JOIN          |      |   800K|   466M|    48  (38)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T3   |  2000 |   398K|    10   (0)| 00:00:01 |
|*  3 |   HASH JOIN         |      | 40000 |    15M|    21   (5)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   |  2000 |   398K|    10   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T2   |  2000 |   398K|    10   (0)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - access("T3"."COL3"="T2"."COL2")
   3 - access("T2"."COL2"="T1"."COL1")

Force mismatch between predicates and join order

| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT      |      |   800K|   466M| 16926   (3)| 00:00:01 |
|*  1 |  HASH JOIN            |      |   800K|   466M| 16926   (3)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | T2   |  2000 |   398K|    10   (0)| 00:00:01 |
|   3 |   MERGE JOIN CARTESIAN|      |  4000K|  1556M| 16835   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T1   |  2000 |   398K|    10   (0)| 00:00:01 |
|   5 |    BUFFER SORT        |      |  2000 |   398K| 16825   (2)| 00:00:01 |
|   6 |     TABLE ACCESS FULL | T3   |  2000 |   398K|     8   (0)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - access("T2"."COL2"="T1"."COL1" AND "T3"."COL3"="T2"."COL2")

As you can see, there’s a dramatic difference between the two plans, and a huge difference in cost (though the predicted time for both is still no more than 1 second).

The first plan, where we leave Oracle to choose the join order, builds an in-memory hash table from t3, then joins t1 to t2 with a hash table and uses the result to join to t3 by probing the in-memory hash table.

The second plan, where we force Oracle to use a join order that (I am pretending) we believe to be a better join order results in Oracle doing a Cartesian merge join between t1 and t3 that explodes the intermediate result set up to 4 million rows (and the optimizer’s estimate is correct) before eliminating a huge amount of redundant data.

As far as performance is concerned, the first query took 0.81 seconds to generate its result set, the second query took 8.81 seconds. In both cases CPU time was close to 100% of the total time.

As a follow-up demo I added the extra predicate “t3.col3 = t1.col1” to the second query, allowing the optimizer to use a hash join with the join order t1 -> t3 -> t2, and this brought the run time back down (with a slight increase due to the extra predicate check on the second join).


The choice of columns in join predicates may stop Oracle from choosing the best join order because it is not able to use transitive closure to generate all the extra predicates that the human eye can see. If you are using programs to generate SQL rather than writing SQL by hand you are more likely to see this limitation resulting in some execution plans being less efficient than they could be.