Custom Schemas and Properties

Ming provides builtin properties for most common use cases: RelationProperty, ForeignIdProperty, FieldProperty, FieldPropertyWithMissingNone those allow you to define relations, references to other models, store common values and get back properties that might not be available on the collection.

Also when validating data the ming.schema module provides schemas to validate most common use cases and data types.

While those are usually enough, there might be cases where you want to define your own properties or schemas to force better contraints or add conversions.

Custom Schema

Schemas are only used to enforce constraints, they usually do not convert values and are applied both when data is loaded from or saved to the database.

Validating with Schemas

Schemas can be easily implemented by subclassing FancySchemaItem which already provides if_missing and required support. Then the actual validation will be performed by a custom ._validate method:

class EmailSchema(schema.FancySchemaItem):
    regex = re.compile(r'[\w\.\+\-]+\@[\w]+\.[a-z]{2,3}$')

    def _validate(self, value, **kw):
        if not self.regex.match(value):
            raise schema.Invalid('Not a valid email address', value)
        return value

Then objects can use that schema like any other:

class Contact(MappedClass):
    class __mongometa__:
        session = session
        name = 'contact'

    _id = FieldProperty(schema.ObjectId)
    name = FieldProperty(schema.String(required=True))
    email = FieldProperty(EmailSchema)

And can be created and queried as usual:

>>> Contact(name='Contact 1', email='contact1@server.com')
<Contact _id=ObjectId('59525d4c20daf901cadb2561')
  email='contact1@server.com' name='Contact 1'>
>>> session.flush()
>>> session.clear()
>>> 
>>> c = Contact.query.find().first()
>>> c.email
u'contact1@server.com'

Trying to create a Contact with an invalid email address will fail as it won’t pass our schema validation:

>>> try:
...     Contact(name='Contact 1', email='this-is-invalid')
... except schema.Invalid as e:
...     error = e
... 
>>> error
Invalid('Not a valid email address',)

Schema validation is not only enforced on newly created items or when setting properties, but also on data loaded from the database. This is because as MongoDB is schema-less there is no guarantee that the data we are loading back is properly formatted:

>>> session.db.contact.insert(dict(name='Invalid Contact',
...                                email='this-is-invalid'))
ObjectId('59525d4c20daf901cadb2562')
>>> 
>>> try:
...     c1 = Contact.query.find().first()
... except schema.Invalid as e:
...     error = e
... 
>>> error
Invalid('email:Not a valid email address',)

Schemas can’t convert

As schemas are validated both when saving and loading data a good use case for a custom schema might be checking that a field is storing a properly formatted email address but it cannot be used for an hashed password.

Let’s see what happens when we try to perform some kind of conversion inside a schema. For example we might try to define a PasswordSchema:

class PasswordSchema(schema.FancySchemaItem):
    def _validate(self, value, **kwargs):
        return hashlib.md5(value).hexdigest()

which is used by our UserWithSchema class:

class UserWithSchema(MappedClass):
    class __mongometa__:
        session = session
        name = 'user_with_schema'

    _id = FieldProperty(schema.ObjectId)
    name = FieldProperty(schema.String(required=True))
    password = FieldProperty(PasswordSchema)

Then we can create a new user and query it back:

>>> user = UserWithSchema(name='User 1',
...                       password='12345678')
>>> session.flush()
>>> session.clear()
>>> 
>>> user = UserWithSchema.query.find().first()
>>> user.password
'579646aad11fae4dd295812fb4526245'

At first sight it might seem that everything worked as expected. Our user got created and the password is actually an md5.

But is it the right md5?

>>> user = UserWithSchema.query.find().first()
>>> hashlib.md5('12345678').hexdigest() == user.password
False

It looks like it isn’t. Actually when we query the user back we get something which is even different from the value stored on the database:

>>> user = UserWithSchema.query.find().first()
>>> user.password
'579646aad11fae4dd295812fb4526245'
>>> 
>>> user = session.db.user_with_schema.find_one()
>>> user['password']
u'550e1bafe077ff0b0b67f4e32f29d751'

And what we have on the db is not even the md5 of our password:

>>> user = session.db.user_with_schema.find_one()
>>> user['password']
u'550e1bafe077ff0b0b67f4e32f29d751'
>>> 
>>> hashlib.md5('12345678').hexdigest()
'25d55ad283aa400af464c76d713c07ad'

That’s because our value is actually the md5 recursively applied multiple times whenever the validation was performed:

>>> user = UserWithSchema.query.find().first()
>>> 
>>> pwdmd5 = hashlib.md5('12345678').hexdigest()
>>> for i in range(10):
...     if pwdmd5 == user.password:
...         break
...     pwdmd5 = hashlib.md5(pwdmd5).hexdigest()
... 
>>> pwdmd5 == user.password
True
>>> i
2

So what we learnt is that schemas should never be used to convert values as they can be applied recursively any number of times whenever the document is saved or loaded back!

So what can we use to convert values? Custom Properties

Custom Properties

Custom Properties are specific to the ODM layer and are not available on the Ming Foundation Layer which implements only schema validation.

The benefit of custom properties over schemas is that you actually know whenever the valid is read or saved and so they can be properly used for conversion of values to and from python.

Converting with Properties

Ming Properties actually implement the Python Descriptor Protocol which is based on __get__, __set__?and __delete__ methods to retrieve, save and remove values from an object.

So implementing a custom property is a matter of subclassing FieldProperty and providing our custom behaviour:

class PasswordProperty(FieldProperty):
    def __init__(self):
        # Password is always a required string.
        super(PasswordProperty, self).__init__(schema.String(required=True))

    def __get__(self, instance, cls=None):
        if instance is None: return self

        class Password(str):
            def __new__(cls, content):
                self = str.__new__(cls, '******')
                self.raw_value = content
                return self

        # As we don't want to leak passwords we return an asterisked string
        # but the real value of the password will always be available as .raw_value
        # so we can check passwords when logging in.
        return Password(super(PasswordProperty, self).__get__(instance, cls))

    def __set__(self, instance, value):
        pwd = hashlib.md5(value).hexdigest()
        super(PasswordProperty, self).__set__(instance, pwd)

Then we can use it like any other property in our model:

class User(MappedClass):
    class __mongometa__:
        session = session
        name = 'user'

    _id = FieldProperty(schema.ObjectId)
    name = FieldProperty(schema.String(required=True))
    password = PasswordProperty()

This is already enough to be able to store properly hashed passwords:

>>> user = User(name='User 1',
...             password='12345678')
>>> session.flush()
>>> 
>>> user = session.db.user.find_one()
>>> user['password']
u'25d55ad283aa400af464c76d713c07ad'
>>> user['password'] == hashlib.md5('12345678').hexdigest()
True

And as we provided some kind of password leakage prevention by always returning an asterisked string for the password let’s check if it works as expected:

>>> user = User.query.find().first()
>>> user.password
'******'
>>> user.password.raw_value
u'25d55ad283aa400af464c76d713c07ad'

As we can see the password is properly returned as a Password instance which is a string with asterisk that also provides the real value as .raw_value.