Accessing data

Amazon’s DynamoDB offers 4 data access method. Dynamodb-mapper directly exposes them. They are documented here from the fastest to the slowest. It is interesting to note that, because of Amazon’s throughput credit, the slowest is also the most expensive.

Strong vs eventual consistency

While this is not stricly speaking related the mapper itself, it seems important to clarify this point as this is a key feature of Amazon’s DynamoDB.

Tables are spreaded among partitions for redundancy and performance purpose. When writing an item, it takes some time to replicate it on all partitions. Usually less than a second according to the technical specifications. Accessing an item right after writing it might get you an outdated version.

In most applications, this will not be an issue. In this case we say that data is ‘eventually consistent’. If this matters, you may request ‘strong consistency’ thus asking for the most up to date version. ‘strong consistency’ is also more twice as expensive in terms of capacity units as ‘eventual consistency’ and a bit slower too. So that keeping this aspect in mind is important.

‘Eventual consistency’ is the default behavior in all requests. It also the only available option for scan and get_batch.

Querying

The 4 DynamoDB query methods are:

They all are classmethods returning instance(s) of the model. To get object(s):

>>> obj = MyModelClass.get(...)

Use get or batch_get to get one or more item by exact id. If you need more than one item, it is highly recommended to use batch_get instead of get in a loop as it avoids the cost of multiple network call. However, if strong consistency is required, get is the only option as DynamoDB does not support it in batch mode.

When objects are logically grouped using a range_key it is possible to get all of them in a simple query and fast query provided they all have the same known hash_key. query() also supports a couple of handy filters.

When querying, you pay only for the results you really get this is what makes filtering interesting. They work both for strings and for numbers. The BEGINSWITH filter is extremely handy for namespaced range_key. When using EQ(x) filter, it may be preferable for readability to rewrite it as a regular get. The cost in terms of read units is strictly speaking the same.

If needed query() support strong consistency, reversing scan order and limiting the results count.

The last function, scan, is like a generalised version of query. Any field can be filtered and more filters are available. There is a complete list on the Boto website. Nonetheless, scan results are always eventually consistent.

This said, scan is extremely expensive in terms of throughput and its use should be avoided as much as possible. It may even impact negatively pending regular requests causing them to repetively fail. Underlying Boto tries to gracefully handle this but you overall application’s performance and user experience might suffer a lot. For more informations about scan impact, please see Amazon’s developer guide

Use case: Get user Chuck Norris

This first example is pretty straight-forward.

from dynamodb_mapper.model import DynamoDBModel

# Example model
class MyUserModel(DynamoDBModel):
    __table__ = u"..."
    __hash_key__ = u"fullname"
    __schema__ = {
        # This is probably a good key in a real world application because of homonynes
        u"fullname": unicode,
        # [...]
    }

# Get the user
myuser = MyUserModel.get("Chuck Norris")

# Do some work
print "myuser({})".format(myuser.fullname)

Use case: Get only objects after 2012-12-21 13:37

At the moment, filters only accepts strings and numbers. If you need to filter dates for time based applications. To workaround this limitation, you need to export the datetime object to the internal W3CDTF representation.

from datetime import datetime
from dynamodb_mapper.model import DynamoDBModel, utc_tz
from boto.dynamodb.condition import *

# Example model
class MyDataModel(DynamoDBModel):
    __table__ = u"..."
    __hash_key__ = u"h_key"
    __range_key__ = u"r_key"
    __schema__ = {
        u"h_key": int,
        u"r_key": datetime,
        # [...]
    }

# Build the date condition and export it to W3CDTF representation
date_obj = datetime.datetime(2012, 12, 21, 13, 31, 0, tzinfo=utc_tz),
date_str = date_obj.astimezone(utc_tz).strftime("%Y-%m-%dT%H:%M:%S.%f%z")

# Get the results generator
mydata_generator = MyDataModel.query(
    hash_key_value=42,
    range_key_condition=GT(date_str)
)

# Do some work
for data in mydata_generator:
    print "data({}, {})".format(data.h_key, data.r_key)

Use case: Query the most up to date revision of a blogpost

There is no builtin filter but this can easily be achieved using a conjunction of limit and reverse parameters. As query returns a generator, limit parameter could seem to be of no use. However, internaly DynamoDB sends results by batches of 1MB and you pay for all the results so... you’d beter use it.

from dynamodb_mapper.model import DynamoDBModel, utc_tz

# Example model
class MyBlogPosts(DynamoDBModel):
    __table__ = u"..."
    __hash_key__ = u"post_id"
    __range_key__ = u"revision"
    __schema__ = {
        u"post_id": int,
        u"revision": int,
        u"title": unicode,
        u"tags": set,
        u"content": unicode,
        # [...]
    }

# Get the results generator
mypost_last_revision_generator = MyBlogPosts.query(
    hash_key_value=42,
    limit=1,
    reverse=True
)

# Get the actual blog post to render
try:
    mypost = mypost_last_revision_generator.next()
except StopIteration:
    mypost = None # Not Found

This example could easily be adapted to get the first revision, the n first comments. You may also combine it with a condition to get pagination like behavior.