Amazon DynamoDB: a r(el)ational Glossary

By Franck Pachot

There are many NoSQL databases. And, because SQL is an ISO standard, “No SQL” also means “No Standard”. Many have a similar API and similar objects, but with completely different names. Today, NoSQL databases are used as an additional datastore for some well-defined use cases for which a hashed key-value store fits better than a relational table. And it quickly became “Not Only SQL” as it is complementary to RDBMS databases using SQL. But at the origin, the idea was to replace the RDBMS databases, refusing the SQL API, and then inventing a “No SQL” data store. When you want to replace something rather than proposing something new, you often adopt the same language to make it look similar. And this why, in my opinion we find some relational database terms like “Table” and “Index”. But they have a different meaning. Here is a dictionary where I try to explain the DynamoDB artifacts and differentiate from their Relational and SQL meaning.

Amazon DynamoDB: the cost of indexes

By Franck Pachot

That’s common to any data structure, whether it is RDBMS or NoSQL, indexes are good to accelerate reads but slow the writes. This post explains the consequences of adding indexes in DynamoDB.

RDBMS (vs. NoSQL) scales the algorithm before the hardware

By Franck Pachot

In The myth of NoSQL (vs. RDBMS) “joins dont scale” I explained that joins actually scale very well with an O(logN) on the input tables size, thanks to B*Tree index access, and can even be bounded by hash partitioning with local index, like in DynamoDB single-table design. Jonathan Lewis added a comment that, given the name of the tables (USERS and ORDERS). we should expect an increasing number of rows returned by the join.

In this post I’ll focus on this: how does it scale when index lookup has to read more and more rows. I’ll still use DynamoDB for the NoSQL example, and this time I’ll do the same in Oracle for the RDBMS example.

Oracle 12c – pre-built join index

By Franck Pachot

This post is part of a series of small examples of recent features. I’m running this in the Oracle 20c preview in the Oracle Cloud. I have created a few tables in the previous post with a mini-snowflake scheme: a fact table CASES with the covid-19 cases per country and day. And a dimension hierarchy for the country with COUNTRIES and CONTINENTS tables.

DynamoDB: adding a Global covering index to reduce the cost

By Franck Pachot

People often think of indexes as a way to optimize row filtering (“get item” faster and cheaper). But indexes are also about columns (“attribute projection”) like some kind of vertical partitioning. In relational (“SQL”) databases we often add more columns to the indexed key. This is called “covering” or “including” indexes, to avoid reading the whole row. The same is true in NoSQL. I’ll show in this post how, even when an index is not required to filter the items, because the primary key partitioning is sufficient, we may have to create a secondary index to reduce the cost of partial access to the item. Here is an example with AWS DynamoDB where the cost depends on I/O throughput.

Indexing Null Values - Part 2

In the previous post I've demonstrated that Oracle has some problems to make efficient use of B*Tree indexes if an IS NULL condition is followed by IN / OR predicates also covered by the same index - the predicates following are not used to navigate the index structure efficiently but are applied as filters on all index entries identified by the IS NULL.

In this part I'll show what results I got when repeating the same exercise using Bitmap indexes - after all they include NULL values anyway, so no special tricks are required to use them for an IS NULL search. Let's start again with the same data set (actually not exactly the same but very similar) and an index on the single expression that gets searched for via IS NULL - results are again from 18.3.0:

Indexing Null Values - Part 1

Indexing null values in Oracle is something that has been written about a lot in the past already. Nowadays it should be common knowledge that Oracle B*Tree indexes don't index entries that are entirely null, but it's possible to include null values in B*Tree indexes when combining them with something guaranteed to be non-null, be it another column or simply a constant expression.

Jonathan Lewis not too long ago published a note that showed an oddity when dealing with IS NULL predicates that in the end turned out not to be a real threat and looked more like an oddity how Oracle displays the access and filter predicates when accessing an index and using IS NULL together with other predicates following after.

Index Only access with Oracle, MySQL, PostgreSQL, and Microsoft SQL Server

In my previous post about the advantages of index access over full table scans, I mentioned covering indexes. This is when an Index Range Scan can retrieve all columns without going to the table. Adding to an index all the columns used by the SELECT or WHERE clause is an important tuning technique for queries that are around the inflection point between index access and table full scan. But not all RDBMS are equal. What PostgreSQL calls ‘Index Only’ actually reads the table, except for static data with no concurrent modifications.

I’ll show the execution plans for this Index Only access on Oracle, MySQL, PostgreSQL, and MS SQLServer. As my skills on the non-Oracle ones are very limited, do not hesitate to comment if you think something is not correct.

We still need partitions and indexes, and I will continue to commute by train even when…

We still need indexes, and I will continue to commute by train even when self-driving cars become reality…

When the Autonomous Data Warehouse Cloud Service had been announced, Oracle came with this surprising idea that we do not need to create Indexes, Partitions and Materialized views for our analytic queries. It was even blocked in ADW and recently released but not recommended. Automatic indexing is for ATP and the message for ADW is: you don’t need indexes for your analytic queries.

In my opinion, and even with the best performance in non-index access, we will still need index range scans. And even when it is not the fastest access path. Because the fastest response time is not the first performance criteria for the end-user.

Full Table Scan vs. Index access

