.. _data-models:

###########
Data models
###########

.. currentmodule:: dynamodb_mapper.model

Models are formal Pythons objects telling the mapper how to map DynamoDB data
to regular Python and vice versa.

Bare minimal model
==================

A bare minimal model with only a ``hash_key`` needs only to define a ``__table__``
and a ``hash_key``.

::

    from dynamodb_mapper.model import DynamoDBModel

    class MyModel(DynamoDBModel):
        __table__ = u"..."
        __hash_key__ = u"key"
        __schema__ = {
            u"key": int,
            #...
        }

The model can then be instanciated and used like any other Python class.

>>> data = MyModel()
>>> data.key = u"foo/bar"

Initial values can even be specified directly in the constructor. Otherwise, unless
:ref:`defaults are provided <using-default-values>`, all fields are set to ``None``

>>> data = MyModel(key=u"foo/bar")
>>> repr(data.key)
"u'foo/bar'"

About keys
==========

While this is not stricly speaking related the mapper itself, it seems important
to clarify this point as this is a key feature of Amazon's DynamoDB.

Amazon's DynamoDB has support for 1 or 2 keys per objects. They must be specified
at table creation time and can not be altered. Neither renamed nor added or removed.
It is not even possible to change their values whithout deleting and re-inserting
the object in the table.

The first key is mandatory. It is called the ``hash_key``. The ``hash_key`` is
to access data and controls its replications among database partitions. To take
advantage of all the provisioned R/W throughput, keys should be as random as
possible. For more informations about ``hash_key``, please see `Amazon's
developer guide <http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/BestPractices.html#UniformWorkloadBestPractices>`_

The second key is optional. It is called the ``range_key``. The ``range_key`` is
used to logically group data with a given ``hash_key``. :ref:`More informations
below <range-key>`.

Data access relying either on the  ``hash_key`` or both the ``hash_key`` and
the ``range_key`` is fast and cheap. All other options are **very** expensive.

We intend to add migration tools to Dynamodb-mapper in a later revision but do not
expect miracles in this area.

This is why correctly modeling your data is crucial with DynamoDB.

Creating the table
==================

Unlike other NoSQL engines like MongoDB, tables must be created and managed
explicitely. At the moment, dynamodb-mapper abstracts only the initial table
creation. Other lifecycle managment operations may be done directly via Boto.

To create the table, use :py:meth:`~.ConnectionBorg.create_table` with the model
class as first argument. When calling this method, you must specify how much
throughput you want to provision for this table. Throughput is mesured as the
number of atomic KB requested or sent per second. For more information, please
see `Amazon's official documentation
<http://aws.amazon.com/dynamodb/faqs/#What_is_provisioned_throughput>`_.

::

    from dynamodb_mapper.model import DynamoDBModel, ConnectionBorg

    conn = ConnectionBorg()
    conn.create_table(MyModel, read_units=10, write_units=10, wait_for_active=True)

Important note: Unlike most databases, table creation may take up to 1 minute.
during this time, the table is *not* usable. Also, you can not have more than 10
tables in ``CREATING`` or ``DELETING`` state any given time for your whole Amazon
account. This is an Amazon's DynamoDB limitation.

The connection manager automatically reads your credentials from either:

- ``/etc/boto.cfg``
- ``~/.boto``
- or ``AWS_ACCESS_KEY_ID`` and ``AWS_SECRET_ACCESS_KEY`` environment variables

If none of these places defines them or if you want to overload them, please use
:py:meth:`~.ConnectionBorg.set_credentials` before calling ``create_table``.

For more informations on the connection manager, pease see :py:class:`~.ConnectionBorg`

Region
------

To change the AWS region from the the default ``us-east-1``, use
:py:meth:`~.ConnectionBorg.set_region` before any method that creates a
connection. The region defaults to ``RegionInfo:us-east-1``.

You can list the currently available regions like this:

::

    >>> import boto.dynamodb
    >>> boto.dynamodb.regions()
    [RegionInfo:us-east-1, RegionInfo:us-west-1, RegionInfo:us-west-2,
    RegionInfo:ap-northeast-1, RegionInfo:ap-southeast-1, RegionInfo:eu-west-1]

.. TODO: more documentations/features on table lifecycle

Advanced usage
==============

Namespacing the models
----------------------

This is more an advice, than a feature. In DynamoDB, each customer is allocated
a single database. It is highly recommended to namespace your tables with a name
of the form ``<application>-<env>-<model>``.

Deep schema definition and validation with Onctuous
---------------------------------------------------

Onctous (http://pypi.python.org/pypi/onctuous) has been integrated into
DynamoDB-Mapper as part of 1.8.0 release cycle.

Before writing any validator relying on Onctuous, there is a crucial point to
take into account. Validators are run when loading from DynamoDB *and* when saving
to DynamoDB. ``save`` stores the output of the validators while reading functions feeds the
validators with raw DynamoDB values that is to say, the serialized output of the
validators.

Hence, validators must be accept both serialized and already de-serialized input.
As of Onctuous 0.5.2, ``Coerce`` can safely do that as it checks the type before
attempting anything.

To sum up, schema entries of the form

 - base type (``int``, ``unicode``, ``float``, ``dict``, ``list``, ...) works seamlessly.
 - ``datetime`` type: same special behavior as before
 - ``[validators]`` and ``{'keyname': validators}`` are automatically (de-)serialized
 - callable validators (``All``, ``Range``, ...) MUST accept both serialized and de-serialized input

Here is a basic schema example using deep validation:

::

    from dynamodb_mapper.model import DynamoDBModel
    from onctuous.validators import Match, Length, All, Coerce
    from datetime import datetime

    class Article(DynamoDBModel):
        __table__ = "Article"
        __hash_key__ = "slug"
        __schema__ = {
            # Regex validation. Input and output are unicode so no coercion problem
            "slug": Match("^[a-z0-9-]+$"),

            # Regular title and body definition
            "title": unicode,
            "body": unicode,

            # Special case for dates. Not that you would have to handle
            # (de-)serialization yourself if you wanted to apply condition
            "published_date": datetime,

            # list of tags. I force unicode as an example even though it is not
            # strictly speaking needed here
            "tags": [All(Coerce(unicode), Length(min=3, max=15))],
        }

.. _auto-increment-when-to-use:

Using auto-incrementing index
-----------------------------

For those coming from a SQL-like world (or even MongoDB UUIDs), adding an
ID field or using the default one has become automattic. However, these environments
are not limited to 2 indexes. Moreover, DynamoDB has no built-in support for UUIDs.
Nonetheless, Dynamodb-mapper implements this feature at a higher level while.
For more technical background on the :ref:`internal implementation <auto-increment-internals>`.

If the field value is left to its default value of 0, a new hash_key will
automatically be generated when saving. Otherwise, the item is inserted at the
specified ``hash_key``.

Before using this feature, make sure you *really need it*. In most cases another
field can be used in place. A good hint is "which field would I have marked
UNIQUE in SQL ?".

- for users, ``email`` or ``login`` field shoud do it.
- for blogposts, ``permalink`` could to it too.
- for orders, ``datetime`` is a good choice.

In some applications, you need a combination of 2 fields to be unique. You may
then consider using one as the ``hash_key`` and the other as the ``range_key``
or, if the ``range_key`` is needed for another purpose, combine try combining them.

At Ludia, this is a feature we do not use anymore in our games at the time of
writing.

So, when to use it ? Some applications still need a ticket like approach and dates
could be confusing for the end user. The best example for this is a bugtracking
system.

Use case: Bugtracking System
----------------------------

::

    from dynamodb_mapper.model import DynamoDBModel, autoincrement_int

    class Ticket(DynamoDBModel):
        __table__ = u"bugtracker-dev-ticket"
        __hash_key__ = u"ticket_number"
        __schema__ = {
            u"ticket_number": autoincrement_int,
            u"title": unicode,
            u"description": unicode,
            u"tags": set, # target, version, priority, ..., order does not matter
            u"comments": list, # probably not the best because of the 64KB limitation...
            #...
        }

    # Create a new ticket and auto-generate an ID
    ticket = Ticket()
    ticket.title = u"Chuck Norris is the reason why Waldo hides"
    ticket.tags = set([u'priority:critical', u'version:yesterday'])
    ticket.description = u"Ludia needs to create a new social game to help people all around the world find him again. Where is Waldo?"
    ticket.comments.append(u"...")
    ticket.save()
    print ticket.ticket_number # A new id has been generated

    # Create a new ticket and force the ID
    ticket = Ticket()
    ticket.ticket_number = 42
    ticket.payload = u"foo/bar"
    ticket.save() # create or replace item #42
    print ticket.ticket_number # id has not changed

To prevent accidental data overwrite when saving to an arbitrary location, please
see the detailed presentation of :ref:`saving`.

.. Suggestion: remove the range_key limitation  when using `autoincrement_int`. might be useful to store revisions for ex

Please note that ``hash_key=-1`` is currently reserved and nothing can be stored
at this index.

You can not use ``autoincrement_int`` and a ``range_key`` at the same time. In the
bug tracker example above, it also means that tickets number are distributed on
the application scope, not on a per project scope.

This feature is only part of Dynamodb-mapper. When using another mapper or
direct data access, you might *corrupt* the counter. Please see the `reference
documentation <~.model.autoincrement_int>`_ for implementation details and
technical limitations.

.. _range-key:

Using a range_key
-----------------

Models may define a second key index called ``range_key``. While ``hash_key`` only
allows dict like access, ``range_key`` allows to group multiple items under a single
``hash_key`` and to further filter them.

For example, let's say you have a customer and want to track all it's orders. The
naive/SQL-like implementation would be:

::

    from dynamodb_mapper.model import DynamoDBModel, autoincrement_int

    class Customer(DynamoDBModel):
        __table__ = u"myapp-dev-customers"
        __hash_key__ = u"login"
        __schema__ = {
            u"login": unicode,
            u"order_ids": set,
            #...
        }

    class Order(DynamoDBModel):
        __table__ = u"myapp-dev-orders"
        __hash_key__ = u"order_id"
        __schema__ = {
            u"order_id": autoincrement_int,
            #...
        }

    # Get all orders for customer "John Doe"
    customer = Customer(u"John Doe")
    order_generator = Order.get_batch(customer.order_ids)

But this approach has many drawbacks.

- It is expensive:
    - An update to generate a new autoinc ID
    - An insertion for the new order item
    - An update to add the new order id to the customer
- It is risky:
    - Items are limited to 64KB but the ``order_ids`` set has no growth limit
- To get all orders from a giver customer, you need to read the customer first
    and use a :py:meth:`~.DynamoDBModel.get_batch` request

As a first enhancement and to spare a request, you can use ``datetime`` instead of
``autoincrement_int`` for the key ``order_id`` but with the power of range keys,
you could to get all orders in a single request:

::

    from dynamodb_mapper.model import DynamoDBModel
    from datetime import datetime

    class Customer(DynamoDBModel):
        __table__ = u"myapp-dev-customers"
        __hash_key__ = u"login"
        __schema__ = {
            u"login": unicode,
            #u"orders": set, => This field is not needed anymore
            #...
        }

    class Order(DynamoDBModel):
        __table__ = u"myapp-dev-orders"
        __hash_key__ = u"login"
        __range_key__ = u"order_id"
        __schema__ = {
            u"order_id": datetime,
            #...
        }

    # Get all orders for customer "John Doe"
    Order.query(u"John Doe")

Not only is this approach better, it is also much more powerful. We could
easily limit the result count, sort them in reverse order or filter them by
creation date if needed. For more background on the querying system, please see
the :ref:`accessing data <accessing-data>` section of this manual.

.. _using-default-values:

Default values
--------------

When instanciating a model, all fields are initialised to "neutral" values. For
containers (``dict``, ``set``, ``list``, ...) it is the empty container, for
``unicode``, it's the empty string, for numbers, 0...

It is also possible to specify the values taken by the fields when instanciating
either with a ``__defaults__`` dict or directly in ``__init__``. The former applies
to all new instances while the later is obviously on a per instance basis and has
a higher precedence.

``__defaults__`` is a ``{u'keyname':default_value}``. ``__init__`` syntax follows
the same logic: ``Model(keyname=default_value, ...)``.

``default_value`` can either be a scalar value or a callable with no argument
returning a scalar value. The value must be of type matching the schema definition,
otherwise, a ``TypeError`` exception is raised.

Example:

::

    from dynamodb_mapper.model import DynamoDBModel, utc_tz
    from datetime import datetime

    # define a model with defaults
    class PlayerStrength(DynamoDBModel):
        __table__ = u"player_strength"
        __hash_key__ = u"player_id"
        __schema__ = {
            u"player_id": int,
            u"strength": unicode,
            u"last_update": datetime,
        }
        __defaults__ = {
            u"strength": u'weak', # scalar default value
            u"last_update": lambda: datetime.now(utc_tz), # callable default value
        }

>>> player = PlayerStrength(strength=u"chuck norris") # overload one of the defaults
>>> print player.strength
chuck norris
>>> print player.lastUpdate
2012-12-21 13:37:00.00000

Related exceptions
==================

SchemaError
-----------

.. autoclass:: SchemaError

InvalidRegionError
------------------

.. autoclass:: InvalidRegionError