Who's online

There are currently 0 users and 32 guests online.

Recent comments

Oakies Blog Aggregator

DML Tracking

You’ve probably seen questions on the internet occasionally about finding out how frequently an object has been modified. The question is a little ambiguous – does it mean how much change has occurred, or how many DML statements have been executed; either may be an interesting measure. Of course, Oracle gave us a method of answering the first question a long time ago: v$segstat (or v$segment_statistics if you don’t mind doing the join) and the resulting content in the AWR or Statspack reports:

Segments by DB Blocks Changes   DB/Inst: XXXXXXXX/XXXXXXXX  Snaps: 85563-85564
-> % of Capture shows % of DB Block Changes for each top segment compared
-> with total DB Block Changes for all segments captured by the Snapshot

           Tablespace                      Subobject  Obj.      DB Block    % of
Owner         Name    Object Name            Name     Type       Changes Capture
---------- ---------- -------------------- ---------- ----- ------------ -------
XXXXXXXXX  DATA       MLOG$_xxxxxxxxxxxxxx            TABLE    1,144,112   22.16
XXXXXXXXX  DATA       PK_xxxxxxxxxxxxxxxxx            INDEX      614,256   11.90
XXXXXXXXX  DATA       xxxxxxxxxxxxxxx                 TABLE      490,080    9.49
XXXXXXXXX  IDX        IDX_xxxxxxxxxxxxxxxx            INDEX      353,392    6.84
XXXXXXXXX  IDX        PK_yyyyyyyyyyyyyyy              INDEX      273,664    5.30

There’s still a little ambiguity in this report (in the right circumstances you can change multiple rows in a block while only reporting a single block change), but it’s a very good indicator of the volume of change to an object; but is the million or so block changes due to one SQL statement or 1,000,000 SQL statements ?

With there is an (undocumented) option for finding the answer to that question. Since it’s undocumented I wouldn’t use it on production unless I was really desperate, and even then I’d do the standard “check with Oracle support” first; however I might use it on a test system if I were do something like running tests of a new version of an overnight suite of batch jobs.

You can enable “DML frequency tracking” which seems to count the number of DML statements that modify an object or, to be more accurate, a table. It doesn’t do anything to identify indexes, although it will report index organized tables (using the table name); it won’t identify partition-level DML individually for partitioned tables, it will simply record the DML as being relevant to the table.

To enable the feature you modify hidden parameter _dml_frequency_tracking and then query dynamic performance view v$object_dml_frequencies.

alter system set "_dml_frequency_tracking"=true;

	object, working, slot0, slot1, slot2, slot3, slot4 
order by
	working + slot0 + slot1 + slot2 + slot3

The view has 25 columns for numbered “slots”, and a “working” column. Each slot represents a 15 minute interval (set by parameter _dml_frequency_tracking_slot_time) and only 4 slots are maintained by default (set by parameter _dml_frequency_tracking_slots). The values roll from slot to slot every 15 minutes, and the “working” column keeps track of objects that have suffered DML since the last roll-over. Unfortunately there doesn’t seem to be any obvious way of tracking when a roll-over takes place, so you can’t tell how long the “working” column has been active for.

If you have the default 4 slots set, then “working” and slots 0 to 2 are populated with 15 minutes worth of information each, and slot3 seems to be the accumulated history of all accesses since you enabled the feature.

The view sits on top of x$ksxm_dft, but there are no interesting extra columns in the x$ that aren’t displayed in the v$ and gv$. (It’s an interesting little quirk that the v$ doesn’t following the usual pattern of being a view of the gv$ restricted to the current instance.) You can see the information in the shared pool (by querying v$sgastat) in an area named “dml frequency” – although the figure reported the last time I checked was wrong by a factor of roughly 3 – show about 8KB when I was expecting roughly 24KB.

I haven’t spent a lot of time investigating the feature yet, so there are probably a few questions and boundary conditions to test before using it in anger; but there don’t appear to be any special latches associated with the counters (so possibly the results would be under-count in a highly concurrent system, but maybe the results are collected under a mutex).

Related parameters

_dml_frequency_tracking (false) Control DML frequency tracking — can be set at system level on demand
_dml_frequency_tracking_advance (true) Control automatic advance and broadcast of DML frequencies — ?
_dml_frequency_tracking_slot_time (15) Time length of each slot for DML frequency tracking — needs restart
_dml_frequency_tracking_slots ( 4) Number of slots to use for DML frequency tracking — needs restart

OLTP compression: migrated rows are compressed

In his articles Compression in Oracle – Part 2: Read-Only Data and Compression in Oracle – Part 3: OLTP Compression, Jonathan Lewis has shown that block (re)compression is never attempted on updates - it is attempted only on inserts (and, of course, only if the used space crosses the PCTFREE threshold).
Now, since a row migration could be considered a row re-insertion - does it trigger a compression attempt of the block that the row is migrated into? The answer is yes, as I will show you in a while.
It is worth remembering that row migration can be quite dramatic in an OLTP-compressed table, because (as shown by Jonathan above) when a tokenized column of a compressed row is updated to a new value, the whole token is expanded, modified and left uncompressed, even if the new value could be substituted by a token already present in the block. Hence any kind of update, even the tiniest, has the potential to inflate the row considerably, and of course, if the row can't fit into the free space, the row has to be migrated to a new block.
Compressing migrated rows has the benefit that at least the whole size of the table is not probably going to grow much, assuming a reference scenario where rows were inserted randomly in the first place (i.e. without cleverly colocating similar rows in the same block, e.g. by inserting them in bulk using a well-designed order by), and assuming that the updating process that causes migration is random as well (which is probably almost always true). The "only" overhead is the additional block get necessary to follow the rowid that acts as the "forwarding address" from the original row position (where only the row header is stored) to the new (where the row piece is now stored).
Side note: it's interesting to note that this overhead is not present when doing a full table scan, since full scans simply do not follow the "forwarding address" since they are going to find the row piece anyway (a well-known fact that is also checked in my test case for completeness). Since, as reasoned about above for our reference scenario, the table size probably does not change much, a frequently-full-scanning DWH is probably going to enjoy similar performance even after updates (and this is a bit ironic for a feature named "OLTP compression"); not so for OLTP systems or DWHs that use single row access a lot (e.g. by following rowids from bitmap indexes), that have to pay the migration penalty.
But let's leave speculations alone, and get back to hacking ...
test case: migrated rows are compressed
Note: the test case (main.sql), its spool (main.lst) and block dumps (main.trc) are available in, alongside other test cases mentioned in the following.
For simplicity, we are going to use a tablespace with segment space managed manually, i.e. where free blocks are linked together by one freelist, and with a block size of 4K; Oracle version is
The test table is (note the PCTFREE set to 50%):

create table t (x varchar2(400 char))
pctfree 50
compress for oltp

The initial row set is generated by:

insert /*+ append */ into t
select rpad( 'A', 400, 'A') from dual connect by level <= 10000;

That is, all identical rows, all chars set to 'A' (coded as 0x41 in the WE8ISO8859P1 single-byte charset of my test database). Note that each row is sized 400 bytes and hence, when not compressed, only 10 rows can fit in a 4K block, and only 5 rows would be inserted given the PCTFREE=50 setting.
The block dump of the first block after the segment header shows that we have 2017 bytes free (tosp=0x7e1), as expected. In the row dump we find, in order, first the (lonely) token:

tab 0, row 0, @0xdf3
tl: 405 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [400] {repeat 41, 400 times}
bindmp: 00 8a fa 01 90 {repeat 41, 400 times}

and then 137 highly-compressed rows, all with the same structure, referencing the token:

tab 1, row 0, @0xdee
tl: 5 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [400] {repeat 41, 400 times}
bindmp: 2c 00 01 01 00

Note: check the above mentioned articles by Jonathan for a detailed explanation of the compression format. The bindmp line shows what's actually stored in the block: "2c 00 01" are the flag byte, the lock byte, and the column count; "01" means "1 token follows" and "00" is the pointer to token zero.
Let's update every row to lowercase. i.e. to a string of 400 'a' (0x61), thus causing massive row migration:

update t set x = lower(x);

After the update, in our block the token is gone; the first 5 rows were uncompressed but were kept in the block since they fit in the free space:

tab 1, row 0, @0xc5c
tl: 407 fb: --H-FL-- lb: 0x2  cc: 1
col  0: [400] {repeat 61, 400 times}
bindmp: 2c 02 01 00 fa 01 90 {repeat 61, 400 times}

but , indeed, all the other 132 (=137-5) ones were migrated:

tab 1, row 5, @0x5f7
tl: 9 fb: --H----- lb: 0x2  cc: 0
nrid:  0x04000148.0
bindmp: 20 02 00 04 00 01 48 00 00

The flag byte "--H-----" means "this is just the row header" (check this great note by Steve Adams) and nrid is the "forwarding address" rowid we spoke about previously.
Now, the interesting part - migrated rows got compressed.
Indeed, walking down the blocks containing the migrated rows, we see
a) token 0, holding 400 bytes set to 0x61 (same as above, not shown)
b) a certain number (usually a dozen) of compressed rows:

tab 1, row 0, @0xce0
tl: 11 fb: ----FL-- lb: 0x1  cc: 1
hrid: 0x04000101.5
col  0: [400] {repeat 61, 400 times}
bindmp: 0c 01 01 04 00 01 01 00 05 01 00

note that the row is migrated: the flag byte "----FL--" means "this is the First and Last row piece, the Header is not here" and hrid is the pointer to the header (i.e. the original row position). Of course, note that the row is compressed: the bindmp shows the usual row header triplet "0c 01 01", then the hrid "04 00 01 01 00 05" and then the usual "01" meaning "1 token follows" plus "00", the pointer to token zero.
c) uno or two uncompressed rows:

tab 1, row 10, @0xae0
tl: 413 fb: ----FL-- lb: 0x1  cc: 1
hrid: 0x04000101.f
col  0: [400] {repeat 61, 400 times}
bindmp: 0c 01 01 04 00 01 01 00 0f 00 fa 01 90 {repeat 61, 400 times}

note the bindmp, and the flag byte telling us, again, that this row is indeed migrated.
In addition, tosp is usually set to 0x907 (2311 bytes), about half of the block, honoring the PCTFREE setting to 50%.
This layout is typical of a series of row insertions in a empty block: the first inserts get simply stored uncompressed, until one that would make the used space cross the pctfree threshold triggers a potential compression before being inserted uncompressed. Note the order (already noted by Jonathan Lewis): first the compression is potentially attempted, then the row is stored - hence at least one uncompressed row is always present. There can be more because the compression is not always attempted, depending, possibly, on some cost consideration by the kernel. Check thr.sql and thr.trc if interested.
uncompressed un-migration
As a side investigation, I have also checked what happens when a row is "un-migrated"; I was expecting that un-migration could trigger a compression, but this does not seem to be the case, at least in my scenario.
I have prepared the first block (check the last part of main.trc) with only 12 migrated rows (leaving a very big free space of tosp=0xde6=3558 bytes), and then I have updated them to the original value in uppercase using "update t set x = upper(x)" . These rows, being identical, could be comfortably be un-migrated and compressed all together in the original block (just remember that we observed a block with 137 such rows at the start of the main investigation), but that has not happened. Instead, the first 8 rows have been un-migrated indeed but left uncompressed, and the last 4 has been simply left migrated; also, the block free space has dropped to a mere tosp=0x176=374 bytes, thus severely hitting the PCTFREE reserve.
Un-migration is a potential source of table size increase and/or pctfree space consumption, if this happens all the times in all scenarios - but I have not checked this extensively, though.

MEDIAN: For When You Don't Really Mean It

Median is a measure of central tendency useful in describing the typical
experience, or the typical case. It's a type of average along with the
mean, but less susceptible to skew from outlying...

Read the full post at

MEDIAN: For When You Don't Really Mean It

Median is a measure of central tendency useful in describing the typical
experience, or the typical case. It's a type of average along with the
mean, but less susceptible to skew from outlying...

Read the full post at

Heretics of Dune

Heretics of Dune is the fifth book in the Dune series by Frank Herbert.

This book picks up the story 1,500 years after the last one ended. The descendants of Siona had scattered throughout the universe, hidden from prescient minds by their unique genetics traits. Now some of them return from the scattering, but for what purpose?

The first book in the series is what drew me in. The next couple of books were not great. The fourth was a lot better. This one continues the upward trend. The intensity builds pretty much from the start all the way through, but the ending is a little weak. The next book starts where this one left off, so I guess that’s the reason for the week ending in this case.

As with the previous books, there are some fantastic sound bites. When you are reading the books on a Kindle you can see the pages littered with other people’s highlights. I could list hundreds, but here are just a few.


“Quite naturally, holders of power wish to suppress “wild” research. Unrestricted questing after knowledge has a long history of producing unwanted competition.”

“Bureaucracy destroys initiative. There is little that bureaucrats hate more than innovation, especially innovation that produces better results than the old routines. Improvements always make those at the top of the heap look inept. Who enjoys appearing inept?”

“… we only hate what’s really dangerous to us.”



Heretics of Dune was first posted on April 6, 2013 at 3:24 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Exadata and the db_block_checksum parameter.

With Exadata version came the Unbreakable Linux Kernel for Exadata, which had been the stock EL5 redhat kernel prior to this version (2.6.18). With the unbreakable kernel came the opportunity to run the perf utility. This utility has the opportunity to see which functions are active inside an executable when there’s a symbol table. And the oracle database executable has a symbol table! One reason to do this, is to get a more granular overview of what the Oracle database is doing than the wait interface, especially to get a more detailed overview of what the database is doing in what is visible in the wait interface as ‘on cpu’.

Right after the Exadata upgrade, I ran a simple query (which probably doesn’t reflect any real customer case) to get an idea. Previously I have been running ‘select count(*) from bigtable’ on Exadata before, and saw most of it being CPU, and a minor part of it being ‘cell smart table scan’. Now with perf I have the opportunity to get more details on what is the time spend on CPU!

These are the top 5 functions from that session:

    31.50%     476423   oracle  /u01/app/oracle/product/         [.] sxorchk
    30.20%     456774   oracle  /u01/app/oracle/product/         [.] kdstf00000010010kmP
     7.48%     113083   oracle  [kernel]                                                     [k] __default_send_IPI_dest_field
     6.96%     105301   oracle  /u01/app/oracle/product/         [.] qeaeCn1Serial
     2.94%      44475   oracle  /u01/app/oracle/product/         [.] kcbhvbo

So the top function used during the processing of the SQL is a function called ‘sxorchk’. Sadly, Oracle doesn’t provide information about this. Anyway, this is executing a XOR function for the reason of checking information gotten from storage. Probably the name of this function is System XOR CHecK.

Alright, once we know this we can look into the parameters of the instance I am executing the SQL, which is taken from the default Exadata parameter template (!!):

SYS@t11203 AS SYSDBA> show parameter db_block

------------------------------------ ----------- ------------------------------
db_block_buffers		     integer	 0
db_block_checking		     string	 FALSE
db_block_checksum		     string	 TYPICAL
db_block_size			     integer	 8192

So, this should be related to db_block_checksum, db_block_checking is off/false. Well, since this is a test instance, let’s turn it off, and redo the scan:

    49.35%     480911   oracle  /u01/app/oracle/product/         [.] kdstf00000010010kmP
    11.01%     107299   oracle  /u01/app/oracle/product/         [.] qeaeCn1Serial
     6.56%      63885   oracle  [kernel]                                                     [k] __default_send_IPI_dest_field
     3.97%      38712   oracle  /u01/app/oracle/product/         [.] kcbhvbo
     3.49%      33970   oracle  /u01/app/oracle/product/         [.] kdst_fetch

The sxorchk function is gone now! This is also reflected in the responsetime: the time with db_block_checksum set to typical is: 00:01:02.44, and with db_block_checksum set to false is: 00:00:41.68 on my system. That is a difference of roughly 20 seconds, or, as we can see from the first perf-function list: 31.5% of the time. That is a significant amount of time!

When I discussed this with Tuomas Pystynen, he asked me a very valid question: if this is a smartscan, the database does not get blocks, it gets result-sets. So blocks cannot be checked on the database layer. Is this offloaded to the storage/cell server?

Well, if it is offloaded to the storage, it will not be reflected in a function on the database layer. In other words, something is XOR’ed by the database foreground process, which is set (and can be reset) with db_block_checksum! Logical conclusion on these facts would be blocks are involved in some way…

There is way to know what is actually happening: looking at the backtrace of sxorchk function! Let’s do that!

    31.50%   oracle  /u01/app/oracle/product/         [.] sxorchk
                |--99.76%-- kcbhxoro
                |          kcbhvbo
                |          |          
                |          |--100.00%-- kcbhvb_corruptcheck
                |          |          kcbl_sage_block_check
                |          |          kcfis_oss_block_verify
                |          |          kcfis_validate_translation
                |          |          kcfis_process_completed_req
                |          |          kcfis_process_completed_buffer
                |          |          kcfis_process_reaped_io
                |          |          kcfis_read
                |          |          kcbl_predpush_get
                |          |          kcbldrget
                |          |          kcbgtcr
                |          |          ktrget3
                |          |          ktrget2
                |          |          kdst_fetch

What we see here is the function is called from the kcfis (Kernel Cache File Intelligent Storage is my guess) layer from a function called ‘kcfis_oss_block_verify’, in other words: a block, which is actually the resultset which is send from the cell server, is validated/checked. That is understandable, but the usage of the parameter ‘db_block_checksum’ for setting this is misleading, to put it in a nice way.

Next question which I asked myself is: how about a normal/non-Exadata database? Well, I can mimic a non-Exadata database by setting the parameter ‘cell_offload_processing’ to false to disable smartscans!

This is how the top-5 functions look like with db_block_checksum set to true without smartscan:

    20.83%     397620   oracle  /u01/app/oracle/product/         [.] sxorchk
    18.53%     353741   oracle  /u01/app/oracle/product/         [.] kdstf00000010000kmP
    10.05%     191847   oracle  [kernel]                                                     [k] __default_send_IPI_dest_field
     5.35%     102161   oracle  /u01/app/oracle/product/         [.] qeaeCn1Serial
     2.73%      52103   oracle  /u01/app/oracle/product/         [.] kcbhvbo

And this is how the top-5 functions look like with db_block_checksum set to false: without smartscan:

    36.51%     706798   oracle  /u01/app/oracle/product/         [.] kdstf00000010000kmP
    10.47%     202645   oracle  [kernel]                                                     [k] __default_send_IPI_dest_field
     5.58%     107941   oracle  /u01/app/oracle/product/         [.] qeaeCn1Serial
     3.57%      69044   oracle  /u01/app/oracle/product/         [.] kcbhvbo
     2.38%      46036   oracle  /u01/app/oracle/product/         [.] ktrgcm

If we get the backtrace of the sxorchk function without smartscan enabled:

    20.83%     397620   oracle  /u01/app/oracle/product/         [.] sxorchk
                |--99.79%-- kcbhxoro
                |          kcbhvbo
                |          kcbzvb
                |          kcbldrget
                |          kcbgtcr
                |          ktrget3
                |          ktrget2
                |          kdst_fetch

We can see the sxorchk function is called from kcbldrget (the direct path load function), with more or less the same helper function to prepare for XOR function, and no other layers (like kcfis in the backtrace with smartscan enabled).

My conclusion is: db_block_checksum is a parameter which enables checking of the data it has gotten from “storage”, which apparently is done via a XOR function. This checking is done for blocks on a regular/non-Exadata system, which the parameter suggests, but on Exadata with a smartscan the checking still is done, regardless the fact that no blocks but result-sets are gotten from “storage”.

The checking takes a significant portion of time during processing of my (very simple!) query. Probably other functions can make the overall processing more CPU intensive, which means the relative portion of time spend on checking gets less.

With Exadata/smartscan the relative amount of time spend on sxorchk with my simple SQL is 32%, without smartscan the relative amount of time drops to 21%. This is still a significant amount of time (and the function the most time is spend in, in both cases!).

Final conclusion is you should think about the setting of this parameter if you are doing much physical IO, and set it according to the needs of the database.

One addition: this is done on a V2 half rack version of Exadata (Nehalem CPU), so anyone with X2-[28] or X3-[28] is welcome to profile a ‘select count(*) from bigtable’ and post it here to see if there’s any difference. The database version is BP15.

Tagged: oracle exadata performance xor perf profiling

Switching Back to Regular Listener Log Format

Did you ever miss the older listener log file format and want to turn off the ADR-style log introduced in 11g? Well, it's really very simple.


Oracle introduced the Automatic Diagnostic Repository (ADR) with Oracle 11g Release 1. This introduced some type of streamlining of various log and trace files generated by different Oracle components such as the database, listener, ASM, etc. this is why you didn't find the alert log in the usual location specified by the familiar background_dump_dest initialization parameter but in a directory specified by a diferent parameter - ADR_BASE. Similarly listener logs now go in this format:


Remember, this is in the XML format; not the usual listener.log. The idea was to present the information in the listener log in a consistent, machine readable format instead of the usually cryptic inconsistent older listener log format. Here is an example of the new format:

  type='UNKNOWN' level='16' host_id='oradba2'
  host_addr='' version='1'
  31-MAR-2013 13:17:22 * service_update * D112D2 * 0

 time='2013-03-31T13:17:25.317-04:00' org_id='oracle' comp_id='tnslsnr'
 type='UNKNOWN' level='16' host_id='oradba2'
 WARNING: Subscription for node down event still pending 

Being in XML format, many tools now can be made to read the files unambiguously since the data is now enclosed within meaningful tags. Additionally the listener log files (the XML format) is now rotated. After reaching a certain threshold value the file is renamed to log_1.xml and a new log.xml is created - somewhat akin to the archived log concept in the case of redo log files.
While it proved useful for new tools, there was also the presence of myriads of tools that read the older log format perfectly. So Oracle didn't stop the practice of writing to the old format log. The old format log was still called listener.log but the directory it is created in is different - $ADR_BASE/tnslsnr/Hostname/listener/trace. Unfortunately there is no archiving scheme for this file so this simply kept growing.
In the pre-11g days you could temporarily redirect the log to a different location and archive the old one by setting the following parameter in listener.ora:

log_directory = tempLocation

However, in Oracle 11g R1 and beyond, this will not work; you can't set the location of the log_directory.


So, what's the solution? Simple. Just set the following parameter in listener.ora:

diag_adr_enabled_listener = off

This will disable the ADR style logging for the listener. Now, suppose you want to set the directory to /tmp and log file name to listener_0405.log, add the following into listener.ora (assuming the name of the listener is "listener"; otherwise make the necessary change below):

log_file_listener = listener_0405.log
log_directory_listener = /tmp

That's it. the ADR style logging will be permanently be gone and you will be reunited with your highly missed pre-11g style logging. You can confirm it:

LSNRCTL> status
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Alias                     listener
Version                   TNSLSNR for Linux: Version - Production
Start Date                26-NOV-2012 16:50:58
Uptime                    129 days 15 hr. 33 min. 31 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/product/11.2.0/grid/network/admin/listener.ora
Listener Log File         /tmp/listener_0405.log
Listening Endpoints Summary...

Services Summary...

Service "+ASM" has 1 instance(s).

... output truncated ...

Happy logging.

P.S. By the way, you can also change the values by issuing set commands from LSNRCTL command prompt:

LSNRCTL> set log_file '/tmp'

However, if you have heeded my advice earlier, you might have set admin_restrictions to ON; so can't use the set command. Instead, you would put the value in listener.ora and reload the listener for the desired effect.

Delphix Reminder

For anyone who was planning to join the webinar I’m doing with Kyle and a couple of the Delphix team – this is a transient blog post to remind you that it starts in a few minutes.

Free Webinar

It's webinar time again.

Join me on Wednesday, May 8th at for an overview session on the specifics of Oracle Parallel Execution.

The session starts at 16:00 UK (17:00 Central European) time. The webinar is totally free and the recording will made available afterwards.

Here's the link to the official landing page where you can register and below is the official abstract:


Oracle Parallel Execution, a feature of the Enterprise Edition, allows you to automatically distribute the processing of a SQL statement execution among multiple worker processes. This requires additional effort when analysing and troubleshooting such parallel executions, since it adds complexity that is simply not there with normal serial execution where only a single process is involved.

In this webinar I'll provide an overview of what these additional challenges are and how these can be approached. A live Q&A session will follow the presentation.

NoSQL and Oracle, Sex and Marriage

At last week’s Dallas Oracle Users Group meeting, an Oracle DBA asked me, “With all the new database alternatives out there today, like all these open source NoSQL databases, would you recommend for us to learn some of those?”

I told him I had a theory about how these got so popular and that I wanted to share that before I answered his question.

My theory is this. Developers perceive Oracle as being too costly, time-consuming, and complex:

  • An Oracle Database costs a lot. If you don’t already have an Oracle license, it’s going to take time and money to get one. On the other hand, you can just install Mongo DB today.
  • Even if you have an Oracle site-wide license, the Oracle software is probably centrally controlled. To get an installation done, you’re probably going to have to negotiate, justify, write a proposal, fill out forms, know, supplicate yourself to—er, I mean negotiate with—your internal IT guys to get an Oracle Database installed. It’s a lot easier to just install MySQL yourself.
  • Oracle is too complicated. Even if you have a site license and someone who’s happy to install it for you, it’s so big and complicated and proprietary... The only way to run an Oracle Database is with SQL (a declarative language that is alien to many developers) executed through a thick, proprietary, possibly even difficult-to-install layer of software like Oracle Enterprise Manager, Oracle SQL Developer, or sqlplus. Isn’t there an open source database out there that you could just manage from your operating system command line?

When a developer is thinking about installing a database today because he need one to write his next feature, he wants something cheap, quick, and lightweight. None of those constraints really sounds like Oracle, does it?

So your Java developers install this NoSQL thing, because it’s easy, and then they write a bunch of application code on top of it. Maybe so much code that there’s no affordable way to turn back. Eventually, though, someone will accidentally crash a machine in the middle of something, and there’ll be a whole bunch of partway finished jobs that die. Out of all the rows that are supposed to be in the database, some will be there and some won’t, and so now your company will have to figure out how to delete the parts of those jobs that aren’t supposed to be there.

Because now everyone understands that this kind of thing will probably happen again, too, the exercise may well turn into a feature specification for various “eraser” functions for the application, which (I hope, anyway) will eventually lead to the team discovering the technical term transaction. A transaction is a unit of work that must be atomic, consistent, isolated, and durable (that’where this acronym ACID comes from). If your database doesn’t guarantee that every arbitrarily complex unit of work (every transaction) makes it either 100% into the database or not at all, then your developers have to write that feature themselves. That’s a big, tremendously complex feature. On an Oracle Database, the transaction is a fundamental right given automatically to every user on the system.

Let’s look at just that ‘I’ in ACID for a moment: isolation. How big a deal is transaction isolation? Imagine that your system has a query that runs from 1 pm to 2 pm. Imagine that it prints results to paper as it runs. Now suppose that at 1:30 pm, some user on the system updates two rows in your query’s base table: the table’s first row and its last row. At 1:30, the pre-update version of that first row has already been printed to paper (that happened shortly after 1 pm). The question is, what’s supposed to happen at 2 pm when it comes time to print the information for the final row? You should hope for the old value of that final row—the value as of 1 pm—to print out; otherwise, your report details won’t add up to your report totals. However, if your database doesn’t handle that transaction isolation feature for you automatically, then either you’ll have to lock the table when you run the report (creating an 30-minute-long performance problem for the person wanting to update the table at 1:30), or your query will have to make a snapshot of the table at 1 pm, which is going to require both a lot of extra code and that same lock I just described. On an Oracle Database, high-performance, non-locking read consistency is a standard feature.

And what about backups? Backups are intimately related to the read consistency problem, because backups are just really long queries that get persisted to some secondary storage device. Are you going to quiesce your whole database—freeze the whole system—for whatever duration is required to take a cold backup? That’s the simplest sounding approach, but if you’re going to try to run an actual business with this system, then shutting it down every day—taking down time—to back it up is a real operational problem. Anything fancier (for example, rolling downtime, quiescing parts of your database but not the whole thing) will add cost, time, and complexity. On an Oracle Database, high-performance online “hot” backups are a standard feature.

The thing is, your developers could write code to do transactions (read consistency and all) and incremental (“hot”) backups. Of course they could. Oh, and constraints, and triggers (don’t forget to remind them to handle the mutating table problem), and automatic query optimization, and more, ...but to write those features Really Really Well™, it would take them 30 years and a hundred of their smartest friends to help write it, test it, and fund it. Maybe that’s an exaggeration. Maybe it would take them only a couple years. But Oracle has already done all that for you, and they offer it at a cost that doesn’t seem as high once you understand what all is in there. (And of course, if you buy it on May 31, they’ll cut you a break.)

So I looked at the guy who asked me the question, and I told him, it’s kind of like getting married. When you think about getting married, you’re probably focused mostly on the sex. You’re probably not spending too much time thinking, “Oh, baby, this is the woman I want to be doing family budgets with in fifteen years.” But you need to be. You need to be thinking about the boring stuff like transactions and read consistency and backups and constraints and triggers and automatic query optimization when you select the database you’re going to marry.

Of course, my 15-year-old son was in the room when I said this. I think he probably took it the right way.

So my answer to the original question—“Should I learn some of these other technologies?”—is “Yes, absolutely,” for at least three reasons:

  • Maybe some development group down the hall is thinking of installing Mongo DB this week so they can get their next set of features implemented. If you know something about both Mongo DB and Oracle, you can help that development group and your managers make better informed decisions about that choice. Maybe Mongo DB is all they need. Maybe it’s not. You can help.
  • You’re going to learn a lot more than you expect when you learn another database technology, just like learning another natural language (like English, Spanish, etc.) teaches you things you didn’t expect to learn about your native language.
  • Finally, I encourage you to diversify your knowledge, if for no other reason than your own self-confidence. What if market factors conspire in such a manner that you find yourself competing for an Oracle-unrelated job? A track record of having learned at least two database technologies is proof to yourself that you’re not going to have that much of a problem learning your third.