.. _data-models: ########### Data models ########### .. currentmodule:: dynamodb_mapper.model Models are formal Pythons objects telling the mapper how to map DynamoDB data to regular Python and vice versa. Bare minimal model ================== A bare minimal model with only a ``hash_key`` needs only to define a ``__table__`` and a ``hash_key``. :: from dynamodb_mapper.model import DynamoDBModel class MyModel(DynamoDBModel): __table__ = u"..." __hash_key__ = u"key" __schema__ = { u"key": int, #... } The model can then be instanciated and used like any other Python class. >>> data = MyModel() >>> data.key = u"foo/bar" Initial values can even be specified directly in the constructor. Otherwise, unless :ref:`defaults are provided `, all fields are set to ``None`` >>> data = MyModel(key=u"foo/bar") >>> repr(data.key) "u'foo/bar'" About keys ========== While this is not stricly speaking related the mapper itself, it seems important to clarify this point as this is a key feature of Amazon's DynamoDB. Amazon's DynamoDB has support for 1 or 2 keys per objects. They must be specified at table creation time and can not be altered. Neither renamed nor added or removed. It is not even possible to change their values whithout deleting and re-inserting the object in the table. The first key is mandatory. It is called the ``hash_key``. The ``hash_key`` is to access data and controls its replications among database partitions. To take advantage of all the provisioned R/W throughput, keys should be as random as possible. For more informations about ``hash_key``, please see `Amazon's developer guide `_ The second key is optional. It is called the ``range_key``. The ``range_key`` is used to logically group data with a given ``hash_key``. :ref:`More informations below `. Data access relying either on the ``hash_key`` or both the ``hash_key`` and the ``range_key`` is fast and cheap. All other options are **very** expensive. We intend to add migration tools to Dynamodb-mapper in a later revision but do not expect miracles in this area. This is why correctly modeling your data is crucial with DynamoDB. Creating the table ================== Unlike other NoSQL engines like MongoDB, tables must be created and managed explicitely. At the moment, dynamodb-mapper abstracts only the initial table creation. Other lifecycle managment operations may be done directly via Boto. To create the table, use :py:meth:`~.ConnectionBorg.create_table` with the model class as first argument. When calling this method, you must specify how much throughput you want to provision for this table. Throughput is mesured as the number of atomic KB requested or sent per second. For more information, please see `Amazon's official documentation `_. :: from dynamodb_mapper.model import DynamoDBModel, ConnectionBorg conn = ConnectionBorg() conn.create_table(MyModel, read_units=10, write_units=10, wait_for_active=True) Important note: Unlike most databases, table creation may take up to 1 minute. during this time, the table is *not* usable. Also, you can not have more than 10 tables in ``CREATING`` or ``DELETING`` state any given time for your whole Amazon account. This is an Amazon's DynamoDB limitation. The connection manager automatically reads your credentials from either: - ``/etc/boto.cfg`` - ``~/.boto`` - or ``AWS_ACCESS_KEY_ID`` and ``AWS_SECRET_ACCESS_KEY`` environment variables If none of these places defines them or if you want to overload them, please use :py:meth:`~.ConnectionBorg.set_credentials` before calling ``create_table``. For more informations on the connection manager, pease see :py:class:`~.ConnectionBorg` Region ------ To change the AWS region from the the default ``us-east-1``, use :py:meth:`~.ConnectionBorg.set_region` before any method that creates a connection. The region defaults to ``RegionInfo:us-east-1``. You can list the currently available regions like this: :: >>> import boto.dynamodb >>> boto.dynamodb.regions() [RegionInfo:us-east-1, RegionInfo:us-west-1, RegionInfo:us-west-2, RegionInfo:ap-northeast-1, RegionInfo:ap-southeast-1, RegionInfo:eu-west-1] .. TODO: more documentations/features on table lifecycle Advanced usage ============== Namespacing the models ---------------------- This is more an advice, than a feature. In DynamoDB, each customer is allocated a single database. It is highly recommended to namespace your tables with a name of the form ``--``. Deep schema definition and validation with Onctuous --------------------------------------------------- Onctous (http://pypi.python.org/pypi/onctuous) has been integrated into DynamoDB-Mapper as part of 1.8.0 release cycle. Before writing any validator relying on Onctuous, there is a crucial point to take into account. Validators are run when loading from DynamoDB *and* when saving to DynamoDB. ``save`` stores the output of the validators while reading functions feeds the validators with raw DynamoDB values that is to say, the serialized output of the validators. Hence, validators must be accept both serialized and already de-serialized input. As of Onctuous 0.5.2, ``Coerce`` can safely do that as it checks the type before attempting anything. To sum up, schema entries of the form - base type (``int``, ``unicode``, ``float``, ``dict``, ``list``, ...) works seamlessly. - ``datetime`` type: same special behavior as before - ``[validators]`` and ``{'keyname': validators}`` are automatically (de-)serialized - callable validators (``All``, ``Range``, ...) MUST accept both serialized and de-serialized input Here is a basic schema example using deep validation: :: from dynamodb_mapper.model import DynamoDBModel from onctuous.validators import Match, Length, All, Coerce from datetime import datetime class Article(DynamoDBModel): __table__ = "Article" __hash_key__ = "slug" __schema__ = { # Regex validation. Input and output are unicode so no coercion problem "slug": Match("^[a-z0-9-]+$"), # Regular title and body definition "title": unicode, "body": unicode, # Special case for dates. Not that you would have to handle # (de-)serialization yourself if you wanted to apply condition "published_date": datetime, # list of tags. I force unicode as an example even though it is not # strictly speaking needed here "tags": [All(Coerce(unicode), Length(min=3, max=15))], } .. _auto-increment-when-to-use: Using auto-incrementing index ----------------------------- For those coming from a SQL-like world (or even MongoDB UUIDs), adding an ID field or using the default one has become automattic. However, these environments are not limited to 2 indexes. Moreover, DynamoDB has no built-in support for UUIDs. Nonetheless, Dynamodb-mapper implements this feature at a higher level while. For more technical background on the :ref:`internal implementation `. If the field value is left to its default value of 0, a new hash_key will automatically be generated when saving. Otherwise, the item is inserted at the specified ``hash_key``. Before using this feature, make sure you *really need it*. In most cases another field can be used in place. A good hint is "which field would I have marked UNIQUE in SQL ?". - for users, ``email`` or ``login`` field shoud do it. - for blogposts, ``permalink`` could to it too. - for orders, ``datetime`` is a good choice. In some applications, you need a combination of 2 fields to be unique. You may then consider using one as the ``hash_key`` and the other as the ``range_key`` or, if the ``range_key`` is needed for another purpose, combine try combining them. At Ludia, this is a feature we do not use anymore in our games at the time of writing. So, when to use it ? Some applications still need a ticket like approach and dates could be confusing for the end user. The best example for this is a bugtracking system. Use case: Bugtracking System ---------------------------- :: from dynamodb_mapper.model import DynamoDBModel, autoincrement_int class Ticket(DynamoDBModel): __table__ = u"bugtracker-dev-ticket" __hash_key__ = u"ticket_number" __schema__ = { u"ticket_number": autoincrement_int, u"title": unicode, u"description": unicode, u"tags": set, # target, version, priority, ..., order does not matter u"comments": list, # probably not the best because of the 64KB limitation... #... } # Create a new ticket and auto-generate an ID ticket = Ticket() ticket.title = u"Chuck Norris is the reason why Waldo hides" ticket.tags = set([u'priority:critical', u'version:yesterday']) ticket.description = u"Ludia needs to create a new social game to help people all around the world find him again. Where is Waldo?" ticket.comments.append(u"...") ticket.save() print ticket.ticket_number # A new id has been generated # Create a new ticket and force the ID ticket = Ticket() ticket.ticket_number = 42 ticket.payload = u"foo/bar" ticket.save() # create or replace item #42 print ticket.ticket_number # id has not changed To prevent accidental data overwrite when saving to an arbitrary location, please see the detailed presentation of :ref:`saving`. .. Suggestion: remove the range_key limitation when using `autoincrement_int`. might be useful to store revisions for ex Please note that ``hash_key=-1`` is currently reserved and nothing can be stored at this index. You can not use ``autoincrement_int`` and a ``range_key`` at the same time. In the bug tracker example above, it also means that tickets number are distributed on the application scope, not on a per project scope. This feature is only part of Dynamodb-mapper. When using another mapper or direct data access, you might *corrupt* the counter. Please see the `reference documentation <~.model.autoincrement_int>`_ for implementation details and technical limitations. .. _range-key: Using a range_key ----------------- Models may define a second key index called ``range_key``. While ``hash_key`` only allows dict like access, ``range_key`` allows to group multiple items under a single ``hash_key`` and to further filter them. For example, let's say you have a customer and want to track all it's orders. The naive/SQL-like implementation would be: :: from dynamodb_mapper.model import DynamoDBModel, autoincrement_int class Customer(DynamoDBModel): __table__ = u"myapp-dev-customers" __hash_key__ = u"login" __schema__ = { u"login": unicode, u"order_ids": set, #... } class Order(DynamoDBModel): __table__ = u"myapp-dev-orders" __hash_key__ = u"order_id" __schema__ = { u"order_id": autoincrement_int, #... } # Get all orders for customer "John Doe" customer = Customer(u"John Doe") order_generator = Order.get_batch(customer.order_ids) But this approach has many drawbacks. - It is expensive: - An update to generate a new autoinc ID - An insertion for the new order item - An update to add the new order id to the customer - It is risky: - Items are limited to 64KB but the ``order_ids`` set has no growth limit - To get all orders from a giver customer, you need to read the customer first and use a :py:meth:`~.DynamoDBModel.get_batch` request As a first enhancement and to spare a request, you can use ``datetime`` instead of ``autoincrement_int`` for the key ``order_id`` but with the power of range keys, you could to get all orders in a single request: :: from dynamodb_mapper.model import DynamoDBModel from datetime import datetime class Customer(DynamoDBModel): __table__ = u"myapp-dev-customers" __hash_key__ = u"login" __schema__ = { u"login": unicode, #u"orders": set, => This field is not needed anymore #... } class Order(DynamoDBModel): __table__ = u"myapp-dev-orders" __hash_key__ = u"login" __range_key__ = u"order_id" __schema__ = { u"order_id": datetime, #... } # Get all orders for customer "John Doe" Order.query(u"John Doe") Not only is this approach better, it is also much more powerful. We could easily limit the result count, sort them in reverse order or filter them by creation date if needed. For more background on the querying system, please see the :ref:`accessing data ` section of this manual. .. _using-default-values: Default values -------------- When instanciating a model, all fields are initialised to "neutral" values. For containers (``dict``, ``set``, ``list``, ...) it is the empty container, for ``unicode``, it's the empty string, for numbers, 0... It is also possible to specify the values taken by the fields when instanciating either with a ``__defaults__`` dict or directly in ``__init__``. The former applies to all new instances while the later is obviously on a per instance basis and has a higher precedence. ``__defaults__`` is a ``{u'keyname':default_value}``. ``__init__`` syntax follows the same logic: ``Model(keyname=default_value, ...)``. ``default_value`` can either be a scalar value or a callable with no argument returning a scalar value. The value must be of type matching the schema definition, otherwise, a ``TypeError`` exception is raised. Example: :: from dynamodb_mapper.model import DynamoDBModel, utc_tz from datetime import datetime # define a model with defaults class PlayerStrength(DynamoDBModel): __table__ = u"player_strength" __hash_key__ = u"player_id" __schema__ = { u"player_id": int, u"strength": unicode, u"last_update": datetime, } __defaults__ = { u"strength": u'weak', # scalar default value u"last_update": lambda: datetime.now(utc_tz), # callable default value } >>> player = PlayerStrength(strength=u"chuck norris") # overload one of the defaults >>> print player.strength chuck norris >>> print player.lastUpdate 2012-12-21 13:37:00.00000 Related exceptions ================== SchemaError ----------- .. autoclass:: SchemaError InvalidRegionError ------------------ .. autoclass:: InvalidRegionError