Search

Top 60 Oracle Blogs

Recent comments

DynamoDB

DynamoDB Scan: the most efficient operation

By Franck Pachot

.
The title is provocative on purpose because you can read in many places that you should avoid scans, and that Scan operations are less efficient than other operations in DynamoDB. I think that there is a risk, reading those message without understanding what is behind, that people will actually avoid Scans and replace them by something that is even worse. If you want to compare the efficiency of an operation, you must compare it when doing the same thing, or it is an Apple vs. Orange comparison. Here I’ll compare with two extreme use cases: the need to get all items, and the need to get one item only. And then I’ll explain further what is behind the “avoid scans” idea.

I have created a table with 5000 items:

DynamoDB Scan (and why 128.5 RCU?)

By Franck Pachot

.
In the previous post I described the PartiSQL SELECT for DynamoDB and mentioned that a SELECT without a WHERE clause on the partition key may result in a Scan, but the result is automatically paginated. This pagination, and the cost of a Scan, is something that may not be very clear from the documentation and I’ll show it here on the regular DynamoDB API. By not very clear, I think this is why many people in the AWS community fear that, with this new PartiQL API, there is a risk to full scan tables, consuming expensive RCUs. I was also misled, when I started to look at DynamoDB, by the AWS CLI “–no-paginate” option, as well as its “Consumed Capacity” always showing 128.5 even for very large scans. So those examples should, hopefully, clear out some doubts.

DynamoDB PartiQL – part II: SELECT

By Franck Pachot

.

Amazon DynamoDB: the cost of indexes

By Franck Pachot

.
That’s common to any data structure, whether it is RDBMS or NoSQL, indexes are good to accelerate reads but slow the writes. This post explains the consequences of adding indexes in DynamoDB.

Amazon DynamoDB Local: running NoSQL on SQLite

By Franck Pachot

.
DynamoDB is a cloud-native, managed, key-value proprietary database designed by AWS to handle massive throughput for large volume and high concurrency with a simple API.

RDBMS (vs. NoSQL) scales the algorithm before the hardware

By Franck Pachot

.
In The myth of NoSQL (vs. RDBMS) “joins dont scale” I explained that joins actually scale very well with an O(logN) on the input tables size, thanks to B*Tree index access, and can even be bounded by hash partitioning with local index, like in DynamoDB single-table design. Jonathan Lewis added a comment that, given the name of the tables (USERS and ORDERS). we should expect an increasing number of rows returned by the join.

In this post I’ll focus on this: how does it scale when index lookup has to read more and more rows. I’ll still use DynamoDB for the NoSQL example, and this time I’ll do the same in Oracle for the RDBMS example.

DynamoDB: adding a Local covering index to reduce the cost

By Franck Pachot

.
This is a continuation on the previous post on DynamoDB: adding a Global Covering Index to reduce the cost. I have a DynamoDB partitioned on “MyKeyPart”,”MyKeySort” and I have many queries that retrieve a small “MyIndo001” attribute. And less frequent ones needing the large “MyData001” attribute. I have created a Global Secondary Index (GSI) that covers the same key and this small attribute. Now, because the index is prefixed by the partition key, I can create a Local Secondary Index (LSI) to do the same. But there are many limitations. The first one is that I cannot add a local index afterwards. I need to define it at the table creation.

Drop table

Here I am in a lab so that I can drop and re-create the table. In real live you may have to create a new one, copy the items, synchronize it (DynamoDB Stream),…

DynamoDB: adding a Global covering index to reduce the cost

By Franck Pachot

.
People often think of indexes as a way to optimize row filtering (“get item” faster and cheaper). But indexes are also about columns (“attribute projection”) like some kind of vertical partitioning. In relational (“SQL”) databases we often add more columns to the indexed key. This is called “covering” or “including” indexes, to avoid reading the whole row. The same is true in NoSQL. I’ll show in this post how, even when an index is not required to filter the items, because the primary key partitioning is sufficient, we may have to create a secondary index to reduce the cost of partial access to the item. Here is an example with AWS DynamoDB where the cost depends on I/O throughput.