As the the development goes, data schema in the application evolves. As this is NoSQL, there is no notion of “column” hence no way to update a whole table at a time. In a sense, this is a good point. Migrations may be done lazily with no need to lock the database for hours.
Migration module aims to provide simple tools for most common migration scenarios.
Migrations involves 2 steps
- detecting the current version
- if need be, perform operations
Version detection will always be performed as long as a Migration class is associated with the DynamoDbModel to make sure the object is up to date.
The version is detected by running check_N successively on the raw boto data. N is a revision integer. Revisions number do not need to be consecutive and are sorted in natural decreasing order. Its means that N=11 is considered bigger than N=2.
- If check_N returns True, detected version version will be N.
- If check_N returns False, go on with the immediate lower version
- If no check_N succeed, VersionError is raised.
Migration in itself is performed by successively running migrate_to_N on the raw boto data. This enables you to run incremental migration. The first migrator ran has N > current_version. Revision number N needs not be consecutive nor to have check_N equivalents.
If your lowest possible version is n, you need to have a check_n but no migrate_to_n as there is no lower version to migrate to n. On the contrary, you need to have both a migrator and a version checker to the latest revisions. The migrator will be needed to update older objects while the the version checker will ensure the Item is at the latest revision. If it returns True, no migration will be performed.
At the end of the process, the version is assumed to be the latest. No additional check will be performed. The migrated object needs to be saved manually.
For complex use cases, you may consider freezing you application and running an EMR on it.
from dynamodb_mapper.migration import Migration
class UserMigration(Migration):
# Is it at least compatible with first revision ?
def check_1(self, raw_data):
field_count = 0
field_count += u"id" in raw_data and isinstance(raw_data[u"id"], unicode)
field_count += u"energy" in raw_data and isinstance(raw_data[u"energy"], int)
field_count += u"mail" in raw_data and isinstance(raw_data[u"mail"], unicode)
return field_count == len(raw_data)
#No migrator to version 1: in can not be older than version 1 !
# Is the object Up to date ?
def check_2(self, raw_data):
field_count = 0
field_count += u"id" in raw_data and isinstance(raw_data[u"id"], unicode)
field_count += u"energy" in raw_data and isinstance(raw_data[u"energy"], int)
field_count += u"email" in raw_data and isinstance(raw_data[u"email"], unicode)
return field_count == len(raw_data)
# migrate from previous revision (1) to this one (the latest)
def migrate_to_2(self, raw_data):
raw_data[u"email"] = raw_data[u"mail"]
del raw_data[u"mail"]
return raw_data
from dynamodb_mapper.model import DynamoDBModel
class User(DynamoDBModel):
__table__ = "user"
__hash_key__ = "id"
__migrator__ = UserMigration # Single line to add !
__schema__ = {
"id": unicode,
"energy": int,
"email": unicode
}
Let’s say you have an object at revision 1 in the db. It will look like this:
raw_data_version_1 = {
u"id": u"Jackson",
u"energy": 6742348,
u"mail": u"jackson@tldr-ludia.com",
}
Now, migrate it:
>>> jackson = User.get(u"Jackson")
# Done, jackson is migrated, but let's check it
>>> print jackson.email
u"jackson@tldr-ludia.com" #Alright !
>>> jackson.save(raise_on_conflict=True)
# Should go fine if no concurrent access
Internally, raise_on_conflict relies on the raw data dict from boto to generate a non conflict detection. This dict is stored in the model instance before the migration engine is triggered so that raise_on_conflict feature will keep on working as expected.
This behavior guarantees that Transactions works as expected even when dealing with migrated objects.