.. _migrations:

##########
Migrations
##########

As the the development goes, data schema in the application evolves. As this is
NoSQL, there is no notion of "column" hence no way to update a whole table at a
time. In a sense, this is a good point. Migrations may be done lazily with no
need to lock the database for hours.

Migration module aims to provide simple tools for most common migration scenarios.

Migration concepts
==================

Migrations involves 2 steps

 1. detecting the current version
 2. if need be, perform operations

Version detection will **always** be performed as long as a ``Migration`` class
is associated with the ``DynamoDbModel`` to make sure the object is up to date.

The version is detected by running ``check_N`` successively on the raw boto data.
``N`` is a revision integer. Revisions number do not need to be consecutive and
are sorted in natural decreasing order. Its means that ``N=11`` is considered
bigger than ``N=2``.

 - If ``check_N`` returns ``True``, detected version version will be ``N``.
 - If ``check_N`` returns ``False``, go on with the immediate lower version
 - If no ``check_N`` succeed, :py:class:`~.VersionError` is raised.

Migration in itself is performed by successively running ``migrate_to_N`` on the
raw boto data. This enables you to run incremental migration. The first migrator
ran has ``N > current_version``. Revision number ``N`` needs not be consecutive
nor to have ``check_N`` equivalents.

If your lowest possible version is ``n``, you need to have a ``check_n`` but no
``migrate_to_n`` as there is no lower version to migrate to ``n``. On the contrary,
you need to have both a migrator and a version checker to the latest revisions.
The migrator will be needed to update older objects while the the version checker
will ensure the Item is at the latest revision. If it returns ``True``, no
migration will be performed.

At the end of the process, the version is assumed to be the latest. No additional
check will be performed. The migrated object needs to be saved manually.

When will the migration be useful?
----------------------------------

Non null field is added
    - **detection**: no field in raw_data
    - **migration**: add the field in raw_data
    - Note: this is of no use if empty values are allowed as there is no distinction between empty and non existing values in boto
Renamed field
    - **detection**: old field name in raw_data
    - **migration**: insert a new field with the old value and ``del`` the old field in raw_data.
Deleted field
    - **detection**: old field still exist in raw data
    - **migration**: ``del`` old field from raw data
Type change
    - **detection**: if converting the raw data field to the expected type fails.
    - **migration**: perform the type conversion manually and serialize it back *before* returning other data


When will it be of no use?
--------------------------

Table rename
    You need to manually fall-back to the old table.
Field migration between table
    You still need some high level magic.

For complex use cases, you may consider freezing you application and running an
EMR on it.

Use case: Rename field 'mail' to 'email'
========================================

Migration engine
----------------

::

    from dynamodb_mapper.migration import Migration

    class UserMigration(Migration):
        # Is it at least compatible with first revision ?
        def check_1(self, raw_data):
            field_count = 0
            field_count += u"id" in raw_data and isinstance(raw_data[u"id"], unicode)
            field_count += u"energy" in raw_data and isinstance(raw_data[u"energy"], int)
            field_count += u"mail" in raw_data and isinstance(raw_data[u"mail"], unicode)

            return field_count == len(raw_data)

        #No migrator to version 1: in can not be older than version 1 !

        # Is the object Up to date ?
        def check_2(self, raw_data):
            field_count = 0
            field_count += u"id" in raw_data and isinstance(raw_data[u"id"], unicode)
            field_count += u"energy" in raw_data and isinstance(raw_data[u"energy"], int)
            field_count += u"email" in raw_data and isinstance(raw_data[u"email"], unicode)

            return field_count == len(raw_data)

        # migrate from previous revision (1) to this one (the latest)
        def migrate_to_2(self, raw_data):
            raw_data[u"email"] = raw_data[u"mail"]
            del raw_data[u"mail"]
            return raw_data

Enable migrations in model
--------------------------

::

    from dynamodb_mapper.model import DynamoDBModel

    class User(DynamoDBModel):
        __table__ = "user"
        __hash_key__ = "id"
        __migrator__ = UserMigration # Single line to add !
        __schema__ = {
            "id": unicode,
            "energy": int,
            "email": unicode
        }

Example run
-----------

Let's say you have an object at revision 1 in the db. It will look like this:

::

    raw_data_version_1 = {
        u"id": u"Jackson",
        u"energy": 6742348,
        u"mail": u"jackson@tldr-ludia.com",
    }

Now, migrate it:

>>> jackson = User.get(u"Jackson")
# Done, jackson is migrated, but let's check it
>>> print jackson.email
u"jackson@tldr-ludia.com" #Alright !
>>> jackson.save(raise_on_conflict=True)
# Should go fine if no concurrent access

``raise_on_conflict`` integration
=================================

Internally, ``raise_on_conflict`` relies on the raw data dict from boto to
generate a non conflict detection. This dict is stored in the model instance
*before* the migration engine is triggered so that `raise_on_conflict` feature
will keep on working as expected.

This behavior guarantees that :ref:`transactions` works as expected even when
dealing with migrated objects.

Related exceptions
==================

VersionError
------------

.. autoclass:: dynamodb_mapper.migration.VersionError