Data models

Models are formal Pythons objects telling the mapper how to map DynamoDB data to regular Python and vice versa.

Bare minimal model

A bare minimal model with only a hash_key needs only to define a __table__ and a hash_key.

from dynamodb_mapper.model import DynamoDBModel

class MyModel(DynamoDBModel):
    __table__ = u"..."
    __hash_key__ = u"key"
    __schema__ = {
        u"key": int,
        #...
    }

The model can then be instanciated and used like any other Python class.

>>> data = MyModel()
>>> data.key = u"foo/bar"

Initial values can even be specified directly in the constructor. Otherwise, unless defaults are provided, all fields are set to None

>>> data = MyModel(key=u"foo/bar")
>>> repr(data.key)
"u'foo/bar'"

About keys

While this is not stricly speaking related the mapper itself, it seems important to clarify this point as this is a key feature of Amazon’s DynamoDB.

Amazon’s DynamoDB has support for 1 or 2 keys per objects. They must be specified at table creation time and can not be altered. Neither renamed nor added or removed. It is not even possible to change their values whithout deleting and re-inserting the object in the table.

The first key is mandatory. It is called the hash_key. The hash_key is to access data and controls its replications among database partitions. To take advantage of all the provisioned R/W throughput, keys should be as random as possible. For more informations about hash_key, please see Amazon’s developer guide

The second key is optional. It is called the range_key. The range_key is used to logically group data with a given hash_key. More informations below.

Data access relying either on the hash_key or both the hash_key and the range_key is fast and cheap. All other options are very expensive.

We intend to add migration tools to Dynamodb-mapper in a later revision but do not expect miracles in this area.

This is why correctly modeling your data is crucial with DynamoDB.

Creating the table

Unlike other NoSQL engines like MongoDB, tables must be created and managed explicitely. At the moment, dynamodb-mapper abstracts only the initial table creation. Other lifecycle managment operations may be done directly via Boto.

To create the table, use create_table() with the model class as first argument. When calling this method, you must specify how much throughput you want to provision for this table. Throughput is mesured as the number of atomic KB requested or sent per second. For more information, please see Amazon’s official documentation.

from dynamodb_mapper.model import DynamoDBModel, ConnectionBorg

conn = ConnectionBorg()
conn.create_table(MyModel, read_units=10, write_units=10, wait_for_active=True)

Important note: Unlike most databases, table creation may take up to 1 minute. during this time, the table is not usable. Also, you can not have more than 10 tables in CREATING or DELETING state any given time for your whole Amazon account. This is an Amazon’s DynamoDB limitation.

The connection manager automatically reads your credentials from either:

  • /etc/boto.cfg
  • ~/.boto
  • or AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables

If none of these places defines them or if you want to overload them, please use set_credentials() before calling create_table.

For more informations on the connection manager, pease see ConnectionBorg

Region

To change the AWS region from the the default us-east-1, use set_region() before any method that creates a connection. The region defaults to RegionInfo:us-east-1.

You can list the currently available regions like this:

>>> import boto.dynamodb
>>> boto.dynamodb.regions()
[RegionInfo:us-east-1, RegionInfo:us-west-1, RegionInfo:us-west-2,
RegionInfo:ap-northeast-1, RegionInfo:ap-southeast-1, RegionInfo:eu-west-1]

Advanced usage

Namespacing the models

This is more an advice, than a feature. In DynamoDB, each customer is allocated a single database. It is highly recommended to namespace your tables with a name of the form <application>-<env>-<model>.

Deep schema definition and validation with Onctuous

Onctous (http://pypi.python.org/pypi/onctuous) has been integrated into DynamoDB-Mapper as part of 1.8.0 release cycle.

Before writing any validator relying on Onctuous, there is a crucial point to take into account. Validators are run when loading from DynamoDB and when saving to DynamoDB. save stores the output of the validators while reading functions feeds the validators with raw DynamoDB values that is to say, the serialized output of the validators.

Hence, validators must be accept both serialized and already de-serialized input. As of Onctuous 0.5.2, Coerce can safely do that as it checks the type before attempting anything.

To sum up, schema entries of the form

  • base type (int, unicode, float, dict, list, ...) works seamlessly.
  • datetime type: same special behavior as before
  • [validators] and {'keyname': validators} are automatically (de-)serialized
  • callable validators (All, Range, ...) MUST accept both serialized and de-serialized input

Here is a basic schema example using deep validation:

from dynamodb_mapper.model import DynamoDBModel
from onctuous.validators import Match, Length, All, Coerce
from datetime import datetime

class Article(DynamoDBModel):
    __table__ = "Article"
    __hash_key__ = "slug"
    __schema__ = {
        # Regex validation. Input and output are unicode so no coercion problem
        "slug": Match("^[a-z0-9-]+$"),

        # Regular title and body definition
        "title": unicode,
        "body": unicode,

        # Special case for dates. Not that you would have to handle
        # (de-)serialization yourself if you wanted to apply condition
        "published_date": datetime,

        # list of tags. I force unicode as an example even though it is not
        # strictly speaking needed here
        "tags": [All(Coerce(unicode), Length(min=3, max=15))],
    }

Using auto-incrementing index

For those coming from a SQL-like world (or even MongoDB UUIDs), adding an ID field or using the default one has become automattic. However, these environments are not limited to 2 indexes. Moreover, DynamoDB has no built-in support for UUIDs. Nonetheless, Dynamodb-mapper implements this feature at a higher level while. For more technical background on the internal implementation.

If the field value is left to its default value of 0, a new hash_key will automatically be generated when saving. Otherwise, the item is inserted at the specified hash_key.

Before using this feature, make sure you really need it. In most cases another field can be used in place. A good hint is “which field would I have marked UNIQUE in SQL ?”.

  • for users, email or login field shoud do it.
  • for blogposts, permalink could to it too.
  • for orders, datetime is a good choice.

In some applications, you need a combination of 2 fields to be unique. You may then consider using one as the hash_key and the other as the range_key or, if the range_key is needed for another purpose, combine try combining them.

At Ludia, this is a feature we do not use anymore in our games at the time of writing.

So, when to use it ? Some applications still need a ticket like approach and dates could be confusing for the end user. The best example for this is a bugtracking system.

Use case: Bugtracking System

from dynamodb_mapper.model import DynamoDBModel, autoincrement_int

class Ticket(DynamoDBModel):
    __table__ = u"bugtracker-dev-ticket"
    __hash_key__ = u"ticket_number"
    __schema__ = {
        u"ticket_number": autoincrement_int,
        u"title": unicode,
        u"description": unicode,
        u"tags": set, # target, version, priority, ..., order does not matter
        u"comments": list, # probably not the best because of the 64KB limitation...
        #...
    }

# Create a new ticket and auto-generate an ID
ticket = Ticket()
ticket.title = u"Chuck Norris is the reason why Waldo hides"
ticket.tags = set([u'priority:critical', u'version:yesterday'])
ticket.description = u"Ludia needs to create a new social game to help people all around the world find him again. Where is Waldo?"
ticket.comments.append(u"...")
ticket.save()
print ticket.ticket_number # A new id has been generated

# Create a new ticket and force the ID
ticket = Ticket()
ticket.ticket_number = 42
ticket.payload = u"foo/bar"
ticket.save() # create or replace item #42
print ticket.ticket_number # id has not changed

To prevent accidental data overwrite when saving to an arbitrary location, please see the detailed presentation of Saving.

Please note that hash_key=-1 is currently reserved and nothing can be stored at this index.

You can not use autoincrement_int and a range_key at the same time. In the bug tracker example above, it also means that tickets number are distributed on the application scope, not on a per project scope.

This feature is only part of Dynamodb-mapper. When using another mapper or direct data access, you might corrupt the counter. Please see the reference documentation for implementation details and technical limitations.

Using a range_key

Models may define a second key index called range_key. While hash_key only allows dict like access, range_key allows to group multiple items under a single hash_key and to further filter them.

For example, let’s say you have a customer and want to track all it’s orders. The naive/SQL-like implementation would be:

from dynamodb_mapper.model import DynamoDBModel, autoincrement_int

class Customer(DynamoDBModel):
    __table__ = u"myapp-dev-customers"
    __hash_key__ = u"login"
    __schema__ = {
        u"login": unicode,
        u"order_ids": set,
        #...
    }

class Order(DynamoDBModel):
    __table__ = u"myapp-dev-orders"
    __hash_key__ = u"order_id"
    __schema__ = {
        u"order_id": autoincrement_int,
        #...
    }

# Get all orders for customer "John Doe"
customer = Customer(u"John Doe")
order_generator = Order.get_batch(customer.order_ids)

But this approach has many drawbacks.

  • It is expensive:
    • An update to generate a new autoinc ID
    • An insertion for the new order item
    • An update to add the new order id to the customer
  • It is risky:
    • Items are limited to 64KB but the order_ids set has no growth limit
  • To get all orders from a giver customer, you need to read the customer first

    and use a get_batch() request

As a first enhancement and to spare a request, you can use datetime instead of autoincrement_int for the key order_id but with the power of range keys, you could to get all orders in a single request:

from dynamodb_mapper.model import DynamoDBModel
from datetime import datetime

class Customer(DynamoDBModel):
    __table__ = u"myapp-dev-customers"
    __hash_key__ = u"login"
    __schema__ = {
        u"login": unicode,
        #u"orders": set, => This field is not needed anymore
        #...
    }

class Order(DynamoDBModel):
    __table__ = u"myapp-dev-orders"
    __hash_key__ = u"login"
    __range_key__ = u"order_id"
    __schema__ = {
        u"order_id": datetime,
        #...
    }

# Get all orders for customer "John Doe"
Order.query(u"John Doe")

Not only is this approach better, it is also much more powerful. We could easily limit the result count, sort them in reverse order or filter them by creation date if needed. For more background on the querying system, please see the accessing data section of this manual.

Default values

When instanciating a model, all fields are initialised to “neutral” values. For containers (dict, set, list, ...) it is the empty container, for unicode, it’s the empty string, for numbers, 0...

It is also possible to specify the values taken by the fields when instanciating either with a __defaults__ dict or directly in __init__. The former applies to all new instances while the later is obviously on a per instance basis and has a higher precedence.

__defaults__ is a {u'keyname':default_value}. __init__ syntax follows the same logic: Model(keyname=default_value, ...).

default_value can either be a scalar value or a callable with no argument returning a scalar value. The value must be of type matching the schema definition, otherwise, a TypeError exception is raised.

Example:

from dynamodb_mapper.model import DynamoDBModel, utc_tz
from datetime import datetime

# define a model with defaults
class PlayerStrength(DynamoDBModel):
    __table__ = u"player_strength"
    __hash_key__ = u"player_id"
    __schema__ = {
        u"player_id": int,
        u"strength": unicode,
        u"last_update": datetime,
    }
    __defaults__ = {
        u"strength": u'weak', # scalar default value
        u"last_update": lambda: datetime.now(utc_tz), # callable default value
    }
>>> player = PlayerStrength(strength=u"chuck norris") # overload one of the defaults
>>> print player.strength
chuck norris
>>> print player.lastUpdate
2012-12-21 13:37:00.00000