Top 60 Oracle Blogs

Recent comments


Importing geo-partitioned data… the easy way


setting the stage

I started at Cockroach labs back in June 2019 to help others learn how to architect and develop applications using a geo-distributed database.  There has been a resurgence in distributed database technology, but the focus on geo-distributed is quite unique to CockroachDB.  While the underlying technology is unique, developers and DBAs that come with a wealth of experience, need to know how to best use this innovative technology.  Given this situation, I thought it would be good to start a blog series to explore various topics facing anyone beginning to architect database solutions with CockroachDB.

To start using a database, the first step is to IMPORT table data so you can begin to see how the database performs and responds.  And thus the IMPORT series has started!

DB links

A recent thread on the ODC SQL & PL/SQL forum raised the question of how data moves across a database link – is it row by row or array based (or other)? It’s a good question as it’s fairly common knowledge that distributed queries can be very much slower than the same query when executed on an equivalent set of local tables, so it’s worth having a little poke around to see if there’s anything in the way the data moves across the link that could be tweaked to improve performance.

20 Indexes

If your system had to do a lot of distributed queries there’s a limit on indexes that might affect performance: when deriving an execution plan for a distributed query the optimizer will consider a maximum of twenty indexes on each remote table. if you have any tables with a ridiculous number of indexes (various 3rd party accounting and CRM systems spring to mind) and if you drop and recreate indexes on those tables in the wrong order then execution plans may change for the simple reason that the optimizer is considering a different subset of the available indexes.

FBI Limitation

A recent question on the ODC (OTN) database forum prompted me to point out that the optimizer doesn’t consider function-based indexes on remote tables in distributed joins. I then spent 20 minutes trying to find the blog note where I had demonstrated this effect, or an entry in the manuals reporting the limitation – but I couldn’t find anything, so I’ve written a quick demo which I’ve run on to show the effect. First, the SQL to create a couple of tables and a couple of indexes:

Distributed Trap

Here’s an interesting (and potentially very useful) observation from an OTN database forum thread that appeared at the end of last week. It concerns the problems of pulling data from remote systems, and I’ll start by building some data:

Uniquely parallel

Here’s a surprising (to me) execution plan from – parallel execution to find one row in a table using a unique scan of a unique index – produced by running the following script (data creation SQL to follow):

Distributed Sets

In an earlier post I’ve described how a distributed query can operate at a remote site if it’s a simple select but has to operate at the local site if it’s a CTAS (create as select) or insert as select. There’s (at least) one special case where this turns out to be untrue … provided you write the query in the correct fashion. I discovered this only as a result of doing a few experiments in response to a question on the OTN database forum.

Here’s a little demonstration, cut-n-pasted with a little cosmetic editing from an 11gR1 SQL*Plus session:

Distributed Queries – 3

A comment I’ve made many times in the past about distributed queries is that Oracle doesn’t try to retrieve histogram information from remote databases when optimizing a query. Checking back through previous posts, though, I don’t think I’ve ever mentioned it on the blog – so here’s an example demonstrating the point.

I’ll start with, with some fixed system stats (as they could make a difference to repeatability), and the following data set.

Distributed Queries – 2

I have often said that the optimizer “forgets” that it is dealing with a distributed query once it has collected the stats that it can about the objects in the query, and that as a consequence the driving site for a distributed query will be the local database unless you use the /*+ driving_site */ hint to change it.

While investigating an oddity with a distributed query between two databases a few days, I noticed something in the 10053 trace file that made me change my mind, and go back to look at earlier versions of Oracle.

Here are two sections extracted from a 10053 trace file running under with CPU costing (system statistics) enabled:

  Table: T1  Alias: AWAY
    Card: Original: 3240  Rounded: 41  Computed: 40.50  Non Adjusted: 40.50
  Access Path: TableScan
    Cost:  53.22  Resp: 53.22  Degree: 0
      Cost_io: 53.00  Cost_cpu: 2073815
      Resp_io: 53.00  Resp_cpu: 2073815
  Access Path: index (AllEqRange)
    Index: 0
    resc_io: 4.00  resc_cpu: 29536
    ix_sel: 0.0125  ix_sel_with_filters: 0.0125
    Cost: 4.00  Resp: 4.00  Degree: 1
  Remote table cost added, new values: cost 4.00  resc 4.00  resp .2f  Best:: AccessPath: IndexRange  Index: 0        <<===
         Cost: 4.00  Degree: 1  Resp: 4.00  Card: 40.50  Bytes: 0


  HA cost: 50.54
     resc: 50.54 resc_io: 50.00 resc_cpu: 5187262
     resp: 50.54 resp_io: 50.00 resp_cpu: 5187262
  Cost adjustment for NL join with remote table: 0.72             <<===
Join order aborted: cost > best plan cost

Note the two lines with the reference to “remote” (I’d highlight them properly, but you can’t do highlighing and code in the same text). Notice, also that one of the programmers made a bit of a mistake with their printf() call in the first of the lines – a bug that is still there in

Clearly Oracle is doing some arithmetic relating to the costs of accessing distributed data from at least (there was nothing similar in the equivalent trace file for, and I don’t have a 10.1 available for testing). Unfortunately I have yet to see a single distributed execution plan where it does the right thing – but that might be a problem related to histograms (and the failure to use them) rather than a defect in the algorithms for distributed cost.

I’ll have to spend some time looking at what it does before I can write any more about it – but given the number of times I’ve said the optimizer doesn’t do any arithmetic I thought it was important to point out that I was wrong as soon as I discovered the change.

Footnote: I have added a category “distributed” to my list of categories – and added a link to it at the bottom of every article I’ve written about distributed SQL. That’s a pattern that I may copy across other articles in the future – especially if I can find out how to order the articles by date (ascending).

[Further reading on distributed databases]