Model Evolution and Migrations¶
One of the most irritating parts of maintaining an application for a while is the need to do data migrations from one version of the schema to another. While Ming can’t completely remove the pain of migrations, it does seek to make migrations as simple as possible.
Performing Migrations¶
First of all let’s populate our database with some stray data that needs to be migrated:
>>> import random
>>> TAGS = ['foo', 'bar', 'snafu', 'mongodb']
>>>
>>> # Insert the documents through PyMongo so that Ming is not involved
>>> session.db.wiki_page.insert_many([
... dict(title='Page %s' % idx, text='Text of Page %s' %idx, tags=random.sample(TAGS, 2)) for idx in range(10)
... ])
InsertManyResult([ObjectId('66e1e8c2a8572d7f63002564'), ObjectId('66e1e8c2a8572d7f63002565'), ObjectId('66e1e8c2a8572d7f63002566'), ObjectId('66e1e8c2a8572d7f63002567'), ObjectId('66e1e8c2a8572d7f63002568'), ObjectId('66e1e8c2a8572d7f63002569'), ObjectId('66e1e8c2a8572d7f6300256a'), ObjectId('66e1e8c2a8572d7f6300256b'), ObjectId('66e1e8c2a8572d7f6300256c'), ObjectId('66e1e8c2a8572d7f6300256d')], acknowledged=True)
>>>
>>> session.db.wiki_page.find_one()
{'_id': ObjectId('66e1e8c2a8572d7f63002564'), 'title': 'Page 0', 'text': 'Text of Page 0', 'tags': ['bar', 'foo']}
Suppose we decided that we want to gather metadata of the pages in a metadata
property, which will contain the categories
and tags
of the page.
We might write our new schema as follows:
class WikiPage(MappedClass):
class __mongometa__:
session = session
name = 'wiki_page'
_id = FieldProperty(schema.ObjectId)
title = FieldProperty(schema.String(required=True))
text = FieldProperty(schema.String(if_missing=''))
metadata = FieldProperty(schema.Object({
'tags': schema.Array(schema.String),
'categories': schema.Array(schema.String)
}))
But now if we try to .find() things in our database, our metadata has gone missing:
>>> WikiPage.query.find().first()
<WikiPage _id=ObjectId('66e1e8c2a8572d7f63002564')
title='Page 0' text='Text of Page 0' metadata=I{'tags':
[], 'categories': []}>
What we need now is a migration. Luckily, Ming makes migrations manageable.
First of all we need to declare the previous schema so that Ming knows how to validate the old values (previous versions schemas are declared using the Ming Foundation Layer as they are not tracked by the UnitOfWork or IdentityMap):
from ming import collection, Field
OldWikiPageCollection = collection('wiki_page', session,
Field('_id', schema.ObjectId),
Field('title', schema.String),
Field('text', schema.String),
Field('tags', schema.Array(schema.String))
)
Whenever Ming fetches a document from the database it will validate it against our model schema.
If the validation fails it will check the document against
the previous version of the schema (provided as __mongometa__.version_of
)
and if validation passes the __mongometa__.migrate
function is called to upgrade the data.
So, to be able to upgrade our data, all we need to do is include the previous schema,
and a migration function in our __mongometa__
:
class WikiPage(MappedClass):
class __mongometa__:
session = session
name = 'wiki_page'
version_of = OldWikiPageCollection
@staticmethod
def migrate(data):
result = dict(data, metadata={'tags': data['tags']}, _version=1)
del result['tags']
return result
_id = FieldProperty(schema.ObjectId)
title = FieldProperty(schema.String(required=True))
text = FieldProperty(schema.String(if_missing=''))
_version = FieldProperty(1, required=True)
metadata = FieldProperty(schema.Object({
'tags': schema.Array(schema.String),
'categories': schema.Array(schema.String)
}))
Then to force the migration we also added a _version
property which passes validation
only when its value is 1
(Using schema.Value
).
As old models do not provide a _version field they won’t pass validation and
so they will trigger the migrate process:
>>> WikiPage.query.find().limit(3).all()
[<WikiPage _id=ObjectId('66e1e8c2a8572d7f63002564')
title='Page 0' text='Text of Page 0' _version=1
metadata=I{'categories': [], 'tags': ['bar', 'foo']}>, <WikiPage _id=ObjectId('66e1e8c2a8572d7f63002565')
title='Page 1' text='Text of Page 1' _version=1
metadata=I{'categories': [], 'tags': ['foo', 'bar']}>, <WikiPage _id=ObjectId('66e1e8c2a8572d7f63002566')
title='Page 2' text='Text of Page 2' _version=1
metadata=I{'categories': [], 'tags': ['mongodb', 'foo']}>]
And that’s it.
Lazy Migrations¶
Migrations are performed lazily as the objects are loaded from the database, so you only
pay the cost of migration the data you access. Also the migrated data is not saved back
on the database unless the object is modified. This can be easily seen by querying
documents directly through pymongo as on mongodb they still have tags
outside of metadata
:
>>> next(session.db.wiki_page.find())
{'_id': ObjectId('66e1e8c2a8572d7f63002564'), 'title': 'Page 0', 'text': 'Text of Page 0', 'tags': ['bar', 'foo']}
Eager Migrations¶
If, unlike for lazy migrations, you wish to migrate all the objects in a collection,
and save them back you can use the migrate
function available on the foundation
layer manager:
>>> next(session.db.wiki_page.find()).get('tags')
['bar', 'foo']
>>>
>>> from ming.odm import mapper
>>> mapper(WikiPage).collection.m.migrate()
>>>
>>> next(session.db.wiki_page.find()).get('metadata')
{'categories': [], 'tags': ['bar', 'foo']}
That will automatically migrate all the documents in the collection one by one.
Chained Migrations¶
If you evolved your schema multiple times you can chain migrations by adding a version_of
to all the previous versions of the data:
class MyModel(MappedClass):
class __mongometa__:
session = session
name = 'mymodel'
version_of = collection('mymodel', session,
Field('_id', schema.ObjectId),
Field('name', schema.String),
Field('_version', schema.Value(1, required=True)),
version_of=collection('mymodel', session,
Field('_id', schema.ObjectId),
Field('name', schema.String),
),
migrate=lambda data: dict(_id=data['_id'], name=data['name'].upper(), _version=1)
)
@staticmethod
def migrate(data):
return dict(_id=data['_id'], name=data['name'][::-1], _version=2)
_id = FieldProperty(schema.ObjectId)
name = FieldProperty(schema.String(required=True))
_version = FieldProperty(2, required=True)
Then just apply all the migrations as you normally would:
>>> session.db.mymodel.insert_one(dict(name='desrever'))
InsertOneResult([ObjectId('66e1e8c2a8572d7f6300256e')], acknowledged=True)
>>> session.db.mymodel.find_one()
{'_id': ObjectId('66e1e8c2a8572d7f6300256e'), 'name': 'desrever'}
>>>
>>> # Apply migration to version 1 and then to version 2
>>> mapper(MyModel).collection.m.migrate()
>>>
>>> session.db.mymodel.find_one()
{'_id': ObjectId('66e1e8c2a8572d7f6300256e'), '_version': 2, 'name': 'REVERSED'}
The resulting documented changed name from "desrever"
to "REVERSED"
that is because _version=1
forced the name to be uppercase and then
_version=2
reversed it.
Note
When migrating make sure you always bring forward the _id
value in the
old data, or you will end up with duplicated data for each migration step
as a new id would be generated for newly migrated documents.