Migrations

As the the development goes, data schema in the application evolves. As this is NoSQL, there is no notion of “column” hence no way to update a whole table at a time. In a sense, this is a good point. Migrations may be done lazily with no need to lock the database for hours.

Migration module aims to provide simple tools for most common migration scenarios.

Migration concepts

Migrations involves 2 steps

  1. detecting the current version
  2. if need be, perform operations

Version detection will always be performed as long as a Migration class is associated with the DynamoDbModel to make sure the object is up to date.

The version is detected by running check_N successively on the raw boto data. N is a revision integer. Revisions number do not need to be consecutive and are sorted in natural decreasing order. Its means that N=11 is considered bigger than N=2.

  • If check_N returns True, detected version version will be N.
  • If check_N returns False, go on with the immediate lower version
  • If no check_N succeed, VersionError is raised.

Migration in itself is performed by successively running migrate_to_N on the raw boto data. This enables you to run incremental migration. The first migrator ran has N > current_version. Revision number N needs not be consecutive nor to have check_N equivalents.

If your lowest possible version is n, you need to have a check_n but no migrate_to_n as there is no lower version to migrate to n. On the contrary, you need to have both a migrator and a version checker to the latest revisions. The migrator will be needed to update older objects while the the version checker will ensure the Item is at the latest revision. If it returns True, no migration will be performed.

At the end of the process, the version is assumed to be the latest. No additional check will be performed. The migrated object needs to be saved manually.

When will the migration be useful?

Non null field is added
  • detection: no field in raw_data
  • migration: add the field in raw_data
  • Note: this is of no use if empty values are allowed as there is no distinction between empty and non existing values in boto
Renamed field
  • detection: old field name in raw_data
  • migration: insert a new field with the old value and del the old field in raw_data.
Deleted field
  • detection: old field still exist in raw data
  • migration: del old field from raw data
Type change
  • detection: if converting the raw data field to the expected type fails.
  • migration: perform the type conversion manually and serialize it back before returning other data

When will it be of no use?

Table rename
You need to manually fall-back to the old table.
Field migration between table
You still need some high level magic.

For complex use cases, you may consider freezing you application and running an EMR on it.

Use case: Rename field ‘mail’ to ‘email’

Migration engine

from dynamodb_mapper.migration import Migration

class UserMigration(Migration):
    # Is it at least compatible with first revision ?
    def check_1(self, raw_data):
        field_count = 0
        field_count += u"id" in raw_data and isinstance(raw_data[u"id"], unicode)
        field_count += u"energy" in raw_data and isinstance(raw_data[u"energy"], int)
        field_count += u"mail" in raw_data and isinstance(raw_data[u"mail"], unicode)

        return field_count == len(raw_data)

    #No migrator to version 1: in can not be older than version 1 !

    # Is the object Up to date ?
    def check_2(self, raw_data):
        field_count = 0
        field_count += u"id" in raw_data and isinstance(raw_data[u"id"], unicode)
        field_count += u"energy" in raw_data and isinstance(raw_data[u"energy"], int)
        field_count += u"email" in raw_data and isinstance(raw_data[u"email"], unicode)

        return field_count == len(raw_data)

    # migrate from previous revision (1) to this one (the latest)
    def migrate_to_2(self, raw_data):
        raw_data[u"email"] = raw_data[u"mail"]
        del raw_data[u"mail"]
        return raw_data

Enable migrations in model

from dynamodb_mapper.model import DynamoDBModel

class User(DynamoDBModel):
    __table__ = "user"
    __hash_key__ = "id"
    __migrator__ = UserMigration # Single line to add !
    __schema__ = {
        "id": unicode,
        "energy": int,
        "email": unicode
    }

Example run

Let’s say you have an object at revision 1 in the db. It will look like this:

raw_data_version_1 = {
    u"id": u"Jackson",
    u"energy": 6742348,
    u"mail": u"jackson@tldr-ludia.com",
}

Now, migrate it:

>>> jackson = User.get(u"Jackson")
# Done, jackson is migrated, but let's check it
>>> print jackson.email
u"jackson@tldr-ludia.com" #Alright !
>>> jackson.save(raise_on_conflict=True)
# Should go fine if no concurrent access

raise_on_conflict integration

Internally, raise_on_conflict relies on the raw data dict from boto to generate a non conflict detection. This dict is stored in the model instance before the migration engine is triggered so that raise_on_conflict feature will keep on working as expected.

This behavior guarantees that Transactions works as expected even when dealing with migrated objects.