Top 60 Oracle Blogs

Recent comments

DynamoDB Scan: the most efficient operation

By Franck Pachot

The title is provocative on purpose because you can read in many places that you should avoid scans, and that Scan operations are less efficient than other operations in DynamoDB. I think that there is a risk, reading those message without understanding what is behind, that people will actually avoid Scans and replace them by something that is even worse. If you want to compare the efficiency of an operation, you must compare it when doing the same thing, or it is an Apple vs. Orange comparison. Here I’ll compare with two extreme use cases: the need to get all items, and the need to get one item only. And then I’ll explain further what is behind the “avoid scans” idea.

I have created a table with 5000 items:

aws dynamodb create-table --table-name Demo \
 --attribute-definitions AttributeName=K,AttributeType=N \
 --key-schema            AttributeName=K,KeyType=HASH \
 --billing-mode PROVISIONED --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25

for i in {1..5000} ; do
aws dynamodb put-item     --table-name Demo --item '{"K":{"N":"'${i}'"},"V":{"S":"'"$RANDOM"'"}}'

Because each time I demo on a small table I have people commenting with “this proves nothing, the table is too small” I have to precise that you don’t need petabytes to understand how it scales. Especially with DynamoDB which is designed to scale linearly: there is no magic that will happen after reaching a threshold, like you can have in RDBMS (small scans optimized with cache, large scans optimized with storage index / zone maps). If you have doubts, you can run the same and change 5000 by 5000000000 and you will observe the same, but you do that on your own cloud bill, not mine </p />

    	  	<div class=