Now that we know filter expressions aren’t the way to filter your data in DynamoDB, let’s look at a few strategies to properly filter your data.

Now that we know filter expressions aren’t the way to filter your data in DynamoDB, let’s look at a few strategies to properly filter your data. We’ll walk through a few strategies using examples below, but the key point is that in DynamoDB, you must use your table design to filter your data.

When designing your table in DynamoDB, you should think hard about how to segment your data into manageable chunks, each of which is sufficient to satisfy your query. This is how DynamoDB scales as these chunks can be spread around different machines.

Results can only be sorted by range keys in indexes (see created and withScanIndexForward(false)). That’s all nice and good except if you decide at some point in time later that you want to sort by another field. This would require adding a new index. The problem with this is that local indexes can only be added at creation time of the table! Global indexes can be added after the fact but are no longer free — you pay the same cost as for another table!

Incredibly easy to add data to a DynamoDB, but with the long-term goals of the project, including the ability to filter by each query, dynamoDB is not as scalable. Especially given the fact that to be flexible enough to query on any attribute, we would need a large number of indices that would all need to be updated.

In DynamoDB, if you need to access just a few attributes with the lowest possible latency, consider projecting only those attributes into a global secondary index. The smaller the index, the less that it costs to store it, and the less your write costs are. However, in our case, when we filter, we still want the ability

Storage Considerations

When an application writes an item to a table, DynamoDB automatically copies the correct subset of attributes to any global secondary indexes in which those attributes should appear. Your AWS account is charged for storage of the item in the base table and also for storage of attributes in any global secondary indexes on that table.

The amount of space used by an index item is the sum of the following:

The size in bytes of the base table primary key (partition key and sort key)

The size in bytes of the index key attribute

The size in bytes of the projected attributes (if any)

100 bytes of overhead per index item

To estimate the storage requirements for a global secondary index, you can estimate the average size of an item in the index and then multiply by the number of items in the base table that have the global secondary index key attributes.

Querying DynamoDB by date range:

https://medium.com/cloud-native-the-gathering/querying-dynamodb-by-date-range-899b751a6ef2

Why RDS might be better than DynamoDB

https://blog.codebarrel.io/why-we-switched-from-dynamodb-back-to-rds-before-we-even-released-3c2ee092120c

As a result we ended up re-writing our entire persistence layer using RDS and QueryDSL for Java. The resulting code is much more maintainable and we can handle the required load with a medium postgres RDS instance ($424 per year). We also feel much more confident in our ability to handle new query requirements in future. With DynamoDB, implementing changing requirements could have become a very costly exercise in future due to the need for more global indexes!

Date

February 22, 2023

Up next

Requirements And Usecases For The Data Platform The high-level goal of the project is to build an end-to-end service that can interface with ASIN or IDQ-level data and present the information to

Previously

Hitchhiker’s Guide To Swe At Amazon 3.1 million servers Virtualization: Taking a real machine, and converting it something like 128 virtual machines (for more efficiency) Bare-metal