Search

Top 60 Oracle Blogs

Recent comments

Index

RDBMS (vs. NoSQL) scales the algorithm before the hardware

By Franck Pachot

.
In The myth of NoSQL (vs. RDBMS) “joins dont scale” I explained that joins actually scale very well with an O(logN) on the input tables size, thanks to B*Tree index access, and can even be bounded by hash partitioning with local index, like in DynamoDB single-table design. Jonathan Lewis added a comment that, given the name of the tables (USERS and ORDERS). we should expect an increasing number of rows returned by the join.

In this post I’ll focus on this: how does it scale when index lookup has to read more and more rows. I’ll still use DynamoDB for the NoSQL example, and this time I’ll do the same in Oracle for the RDBMS example.

Oracle 12c – pre-built join index

By Franck Pachot

.
This post is part of a series of small examples of recent features. I’m running this in the Oracle 20c preview in the Oracle Cloud. I have created a few tables in the previous post with a mini-snowflake scheme: a fact table CASES with the covid-19 cases per country and day. And a dimension hierarchy for the country with COUNTRIES and CONTINENTS tables.

DynamoDB: adding a Global covering index to reduce the cost

By Franck Pachot

.
People often think of indexes as a way to optimize row filtering (“get item” faster and cheaper). But indexes are also about columns (“attribute projection”) like some kind of vertical partitioning. In relational (“SQL”) databases we often add more columns to the indexed key. This is called “covering” or “including” indexes, to avoid reading the whole row. The same is true in NoSQL. I’ll show in this post how, even when an index is not required to filter the items, because the primary key partitioning is sufficient, we may have to create a secondary index to reduce the cost of partial access to the item. Here is an example with AWS DynamoDB where the cost depends on I/O throughput.

Indexing Null Values - Part 2

In the previous post I've demonstrated that Oracle has some problems to make efficient use of B*Tree indexes if an IS NULL condition is followed by IN / OR predicates also covered by the same index - the predicates following are not used to navigate the index structure efficiently but are applied as filters on all index entries identified by the IS NULL.

In this part I'll show what results I got when repeating the same exercise using Bitmap indexes - after all they include NULL values anyway, so no special tricks are required to use them for an IS NULL search. Let's start again with the same data set (actually not exactly the same but very similar) and an index on the single expression that gets searched for via IS NULL - results are again from 18.3.0:

Indexing Null Values - Part 1

Indexing null values in Oracle is something that has been written about a lot in the past already. Nowadays it should be common knowledge that Oracle B*Tree indexes don't index entries that are entirely null, but it's possible to include null values in B*Tree indexes when combining them with something guaranteed to be non-null, be it another column or simply a constant expression.

Jonathan Lewis not too long ago published a note that showed an oddity when dealing with IS NULL predicates that in the end turned out not to be a real threat and looked more like an oddity how Oracle displays the access and filter predicates when accessing an index and using IS NULL together with other predicates following after.

Index Only access with Oracle, MySQL, PostgreSQL, and Microsoft SQL Server

In my previous post about the advantages of index access over full table scans, I mentioned covering indexes. This is when an Index Range Scan can retrieve all columns without going to the table. Adding to an index all the columns used by the SELECT or WHERE clause is an important tuning technique for queries that are around the inflection point between index access and table full scan. But not all RDBMS are equal. What PostgreSQL calls ‘Index Only’ actually reads the table, except for static data with no concurrent modifications.

I’ll show the execution plans for this Index Only access on Oracle, MySQL, PostgreSQL, and MS SQLServer. As my skills on the non-Oracle ones are very limited, do not hesitate to comment if you think something is not correct.

We still need partitions and indexes, and I will continue to commute by train even when…

We still need indexes, and I will continue to commute by train even when self-driving cars become reality…

When the Autonomous Data Warehouse Cloud Service had been announced, Oracle came with this surprising idea that we do not need to create Indexes, Partitions and Materialized views for our analytic queries. It was even blocked in ADW and recently released but not recommended. Automatic indexing is for ATP and the message for ADW is: you don’t need indexes for your analytic queries.

In my opinion, and even with the best performance in non-index access, we will still need index range scans. And even when it is not the fastest access path. Because the fastest response time is not the first performance criteria for the end-user.

Full Table Scan vs. Index access

Here is a diagram I used to explain the cost of Full Table Scan vs. Index Access, depending on the number of rows to return.

Oracle Index compression for range scan on file names

A new blog post on the Databases at CERN blog about tables storing long file names in a table, with full path, and index range scan on a prefixed pattern: https://db-blog.web.cern.ch/blog/franck-pachot/2018-11-oracle-index-compression-range-scan-file-names

COMPRESS ADVANCED LOW

The 12cR1 advanced index compression does not help here as all values are unique. Only partial prefix is redundant.

COMPRESS ADVANCED HIGH

The advanced algorithm ‘high’ introduced in 12cR2 can reduce better. But there’s no magic. Redundancy should be addressed at design. Full test:

Column And Table Redefinition With Minimal Locking

TLDR: Note to future self… (1) Read this before you modify a table on a live PostgreSQL database. If you do it wrong then your app might totally hang. There is a right way to do it which avoids that. (2) Especially remember the lock_timeout step. Many blog posts around the ‘net are missing this and it’s very important.

Recently I was chatting with some PostgreSQL users (who, BTW, were doing rather large-scale cool stuff in PG) and they asked a question about making schema changes with minimal impact to the running application. They were specifically curious about changing a primary key from INT to BIGINT.  (Oh, you are making all your new PK fields BIGINT right?)