.. _accessing-data: ############## Accessing data ############## Amazon's DynamoDB offers 4 data access method. Dynamodb-mapper directly exposes them. They are documented here from the fastest to the slowest. It is interesting to note that, because of Amazon's throughput credit, the slowest is also the most expensive. Strong vs eventual consistency ============================== While this is not stricly speaking related the mapper itself, it seems important to clarify this point as this is a key feature of Amazon's DynamoDB. Tables are spreaded among partitions for redundancy and performance purpose. When writing an item, it takes some time to replicate it on all partitions. Usually less than a second according to the technical specifications. Accessing an item right after writing it might get you an outdated version. In most applications, this will not be an issue. In this case we say that data is 'eventually consistent'. If this matters, you may request 'strong consistency' thus asking for the most up to date version. 'strong consistency' is also more twice as expensive in terms of capacity units as 'eventual consistency' and a bit slower too. So that keeping this aspect in mind is important. 'Eventual consistency' is the default behavior in all requests. It also the only available option for ``scan`` and ``get_batch``. .. todo: get with update Querying ======== The 4 DynamoDB query methods are: - :py:meth:`~.DynamoDBModel.get` - :py:meth:`~.DynamoDBModel.get_batch` - :py:meth:`~.DynamoDBModel.query` - :py:meth:`~.DynamoDBModel.scan` They all are ``classmethods`` returning instance(s) of the model. To get object(s): >>> obj = MyModelClass.get(...) Use ``get`` or ``batch_get`` to get one or more item by exact id. If you need more than one item, it is highly recommended to use ``batch_get`` instead of ``get`` in a loop as it avoids the cost of multiple network call. However, if strong consistency is required, ``get`` is the only option as DynamoDB does not support it in batch mode. When objects are logically grouped using a :ref:`range_key ` it is possible to get all of them in a simple query and fast query provided they all have the same known ``hash_key``. :py:meth:`~.DynamoDBModel.query` also supports `a couple of handy filters `_. When querying, you pay only for the results you really get this is what makes filtering interesting. They work both for strings and for numbers. The ``BEGINSWITH`` filter is extremely handy for namespaced ``range_key``. When using ``EQ(x)`` filter, it may be preferable for readability to rewrite it as a regular ``get``. The cost in terms of read units is strictly speaking the same. If needed :py:meth:`~.DynamoDBModel.query` support ``strong consistency``, reversing scan order and limiting the results count. The last function, ``scan``, is like a generalised version of ``query``. Any field can be filtered and more filters are available. There is a `complete list `_ on the Boto website. Nonetheless, ``scan`` results are *always* ``eventually consistent``. This said, ``scan`` is extremely expensive in terms of throughput and its use should be avoided as much as possible. It may even impact negatively pending regular requests causing them to repetively fail. Underlying Boto tries to gracefully handle this but you overall application's performance and user experience might suffer a lot. For more informations about ``scan`` impact, please see `Amazon's developer guide `_ To retrieve results of :py:meth:`~.DynamoDBModel.get_batch`, :py:meth:`~.DynamoDBModel.query` and :py:meth:`~.DynamoDBModel.scan`, just loop over the result list. Technically, they all rely on high-level generators abstracting the query chunking logic. All querying methods persists the original raw object for :ref:`raise_on_conflict ` and transactions. Use case: Get user ``Chuck Norris`` ----------------------------------- This first example is pretty straight-forward. :: from dynamodb_mapper.model import DynamoDBModel # Example model class MyUserModel(DynamoDBModel): __table__ = u"..." __hash_key__ = u"fullname" __schema__ = { # This is probably a good key in a real world application because of homonynes u"fullname": unicode, # [...] } # Get the user myuser = MyUserModel.get("Chuck Norris") # Do some work print "myuser({})".format(myuser.fullname) Use case: Get only objects after ``2012-12-21 13:37`` ----------------------------------------------------- At the moment, filters only accepts strings and numbers. If you need to filter dates for time based applications. To workaround this limitation, you need to export the ``datetime`` object to the internal W3CDTF representation. :: from datetime import datetime from dynamodb_mapper.model import DynamoDBModel, utc_tz from boto.dynamodb.condition import * # Example model class MyDataModel(DynamoDBModel): __table__ = u"..." __hash_key__ = u"h_key" __range_key__ = u"r_key" __schema__ = { u"h_key": int, u"r_key": datetime, # [...] } # Build the date condition and export it to W3CDTF representation date_obj = datetime.datetime(2012, 12, 21, 13, 31, 0, tzinfo=utc_tz), date_str = date_obj.astimezone(utc_tz).strftime("%Y-%m-%dT%H:%M:%S.%f%z") # Get the results generator mydata_generator = MyDataModel.query( hash_key_value=42, range_key_condition=GT(date_str) ) # Do some work for data in mydata_generator: print "data({}, {})".format(data.h_key, data.r_key) Use case: Query the most up to date revision of a blogpost ---------------------------------------------------------- There is no builtin filter but this can easily be achieved using a conjunction of ``limit`` and ``reverse`` parameters. As ``query`` returns a generator, ``limit`` parameter could seem to be of no use. However, internaly DynamoDB sends results by batches of 1MB and you pay for all the results so... you'd beter use it. :: from dynamodb_mapper.model import DynamoDBModel, utc_tz # Example model class MyBlogPosts(DynamoDBModel): __table__ = u"..." __hash_key__ = u"post_id" __range_key__ = u"revision" __schema__ = { u"post_id": int, u"revision": int, u"title": unicode, u"tags": set, u"content": unicode, # [...] } # Get the results generator mypost_last_revision_generator = MyBlogPosts.query( hash_key_value=42, limit=1, reverse=True ) # Get the actual blog post to render try: mypost = mypost_last_revision_generator.next() except StopIteration: mypost = None # Not Found This example could easily be adapted to get the first revision, the ``n`` first comments. You may also combine it with a condition to get pagination like behavior. .. TODO: use case avec le prefixage