.. _accessing-data:

##############
Accessing data
##############

Amazon's DynamoDB offers 4 data access method. Dynamodb-mapper directly exposes
them. They are documented here from the fastest to the slowest. It is interesting
to note that, because of Amazon's throughput credit, the slowest is also the most
expensive.

Strong vs eventual consistency
==============================

While this is not stricly speaking related the mapper itself, it seems important
to clarify this point as this is a key feature of Amazon's DynamoDB.

Tables are spreaded among partitions for redundancy and performance purpose. When
writing an item, it takes some time to replicate it on all partitions. Usually
less than a second according to the technical specifications. Accessing an item
right after writing it might get you an outdated version.

In most applications, this will not be an issue. In this case we say that data is
'eventually consistent'. If this matters, you may request 'strong consistency'
thus asking for the most up to date version. 'strong consistency' is also more
twice as expensive in terms of capacity units as 'eventual consistency' and a bit
slower too. So that keeping this aspect in mind is important.

'Eventual consistency' is the default behavior in all requests. It also the only
available option for ``scan`` and ``get_batch``.

.. todo: get with update

Querying
========

The 4 DynamoDB query methods are:

- :py:meth:`~.DynamoDBModel.get`
- :py:meth:`~.DynamoDBModel.get_batch`
- :py:meth:`~.DynamoDBModel.query`
- :py:meth:`~.DynamoDBModel.scan`

They all are ``classmethods`` returning instance(s) of the model.
To get object(s):

>>> obj = MyModelClass.get(...)

Use ``get`` or ``batch_get`` to get one or more item by exact id. If you need
more than one item, it is highly recommended to use ``batch_get`` instead of
``get`` in a loop as it avoids the cost of multiple network call. However, if
strong consistency is required, ``get`` is the only option as DynamoDB does not
support it in batch mode.

When objects are logically grouped using a :ref:`range_key <range-key>` it is
possible to get all of them in a simple query and fast query provided they all
have the same known ``hash_key``. :py:meth:`~.DynamoDBModel.query` also supports
`a couple of handy filters <http://docs.pythonboto.org/en/latest/ref/dynamodb.html#boto.dynamodb.layer2.Layer2.query>`_.

When querying, you pay only for the results you really get this is what makes
filtering interesting. They work both for strings and for numbers. The
``BEGINSWITH`` filter is extremely handy for namespaced ``range_key``. When
using ``EQ(x)`` filter, it may be preferable for readability to rewrite it as a
regular ``get``. The cost in terms of read units is strictly speaking the same.

If needed :py:meth:`~.DynamoDBModel.query` support ``strong consistency``,
reversing scan order and limiting the results count.

The last function, ``scan``, is like a generalised version of ``query``. Any field
can be filtered and more filters are available. There is a `complete list
<http://docs.pythonboto.org/en/latest/ref/dynamodb.html#boto.dynamodb.layer2.Layer2.scan>`_
on the Boto website. Nonetheless, ``scan`` results are *always* ``eventually
consistent``.

This said, ``scan`` is extremely expensive in terms of throughput and its use
should be avoided as much as possible. It may even impact negatively pending
regular requests causing them to repetively fail. Underlying Boto tries to
gracefully handle this but you overall application's performance and user
experience might suffer a lot. For more informations about ``scan`` impact,
please see `Amazon's developer guide
<http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/BestPractices.html#ScanQueryConsiderationBestPractices>`_

To retrieve results of :py:meth:`~.DynamoDBModel.get_batch`,
:py:meth:`~.DynamoDBModel.query` and :py:meth:`~.DynamoDBModel.scan`, just loop
over the result list. Technically, they all rely on high-level generators
abstracting the query chunking logic.

All querying methods persists the original raw object for
:ref:`raise_on_conflict <saving>` and transactions.

Use case: Get user ``Chuck Norris``
-----------------------------------

This first example is pretty straight-forward.

::

    from dynamodb_mapper.model import DynamoDBModel

    # Example model
    class MyUserModel(DynamoDBModel):
        __table__ = u"..."
        __hash_key__ = u"fullname"
        __schema__ = {
            # This is probably a good key in a real world application because of homonynes
            u"fullname": unicode,
            # [...]
        }

    # Get the user
    myuser = MyUserModel.get("Chuck Norris")

    # Do some work
    print "myuser({})".format(myuser.fullname)


Use case: Get only objects after ``2012-12-21 13:37``
-----------------------------------------------------

At the moment, filters only accepts strings and numbers. If you need to filter
dates for time based applications. To workaround this limitation, you need to
export the ``datetime`` object to the internal W3CDTF representation.

::

    from datetime import datetime
    from dynamodb_mapper.model import DynamoDBModel, utc_tz
    from boto.dynamodb.condition import *

    # Example model
    class MyDataModel(DynamoDBModel):
        __table__ = u"..."
        __hash_key__ = u"h_key"
        __range_key__ = u"r_key"
        __schema__ = {
            u"h_key": int,
            u"r_key": datetime,
            # [...]
        }

    # Build the date condition and export it to W3CDTF representation
    date_obj = datetime.datetime(2012, 12, 21, 13, 31, 0, tzinfo=utc_tz),
    date_str = date_obj.astimezone(utc_tz).strftime("%Y-%m-%dT%H:%M:%S.%f%z")

    # Get the results generator
    mydata_generator = MyDataModel.query(
        hash_key_value=42,
        range_key_condition=GT(date_str)
    )

    # Do some work
    for data in mydata_generator:
        print "data({}, {})".format(data.h_key, data.r_key)

Use case: Query the most up to date revision of a blogpost
----------------------------------------------------------

There is no builtin filter but this can easily be achieved using a conjunction
of ``limit`` and ``reverse`` parameters. As ``query`` returns a generator,
``limit`` parameter could seem to be of no use. However, internaly DynamoDB sends
results by batches of 1MB and you pay for all the results so... you'd beter use it.

::

    from dynamodb_mapper.model import DynamoDBModel, utc_tz

    # Example model
    class MyBlogPosts(DynamoDBModel):
        __table__ = u"..."
        __hash_key__ = u"post_id"
        __range_key__ = u"revision"
        __schema__ = {
            u"post_id": int,
            u"revision": int,
            u"title": unicode,
            u"tags": set,
            u"content": unicode,
            # [...]
        }

    # Get the results generator
    mypost_last_revision_generator = MyBlogPosts.query(
        hash_key_value=42,
        limit=1,
        reverse=True
    )

    # Get the actual blog post to render
    try:
        mypost = mypost_last_revision_generator.next()
    except StopIteration:
        mypost = None # Not Found

This example could easily be adapted to get the first revision, the ``n`` first
comments. You may also combine it with a condition to get pagination like behavior.


.. TODO: use case avec le prefixage