Custom Schemas and Properties¶
Ming provides builtin properties for most common use cases: RelationProperty
,
ForeignIdProperty
, FieldProperty
, FieldPropertyWithMissingNone
those allow you to define relations, references to other models, store common values
and get back properties that might not be available on the collection.
Also when validating data the ming.schema
module provides schemas to validate
most common use cases and data types.
While those are usually enough, there might be cases where you want to define your own properties or schemas to force better contraints or add conversions.
Custom Schema¶
Schemas are only used to enforce constraints, they usually do not convert values and are applied both when data is loaded from or saved to the database.
Validating with Schemas¶
Schemas can be easily implemented by subclassing FancySchemaItem
which
already provides if_missing
and required
support. Then the actual validation
will be performed by a custom ._validate
method:
class EmailSchema(schema.FancySchemaItem):
regex = re.compile(r'[\w\.\+\-]+\@[\w]+\.[a-z]{2,3}$')
def _validate(self, value, **kw):
if not self.regex.match(value):
raise schema.Invalid('Not a valid email address', value)
return value
Then objects can use that schema like any other:
class Contact(MappedClass):
class __mongometa__:
session = session
name = 'contact'
_id = FieldProperty(schema.ObjectId)
name = FieldProperty(schema.String(required=True))
email = FieldProperty(EmailSchema)
And can be created and queried as usual:
>>> Contact(name='Contact 1', email='contact1@server.com')
<Contact _id=ObjectId('67646ad12d02456b0b9b62ee')
name='Contact 1' email='contact1@server.com'>
>>> session.flush()
>>> session.clear()
>>>
>>> c = Contact.query.find().first()
>>> c.email
'contact1@server.com'
Trying to create a Contact with an invalid email address will fail as it won’t pass our schema validation:
>>> try:
... Contact(name='Contact 1', email='this-is-invalid')
... except schema.Invalid as e:
... error = e
...
>>> error
Invalid('Not a valid email address')
Schema validation is not only enforced on newly created items or when setting properties, but also on data loaded from the database. This is because as MongoDB is schema-less there is no guarantee that the data we are loading back is properly formatted:
DeleteResult({'n': 1}, acknowledged=True)
>>> session.db.contact.insert_one(dict(name='Invalid Contact',
... email='this-is-invalid'))
InsertOneResult([ObjectId('67646ad12d02456b0b9b62ef')], acknowledged=True)
>>>
>>> try:
... c1 = Contact.query.find().first()
... except schema.Invalid as e:
... error = e
...
>>> error
Invalid('email:Not a valid email address')
Schemas can’t convert¶
As schemas are validated both when saving and loading data a good use case for a custom schema might be checking that a field is storing a properly formatted email address but it cannot be used for an hashed password.
Let’s see what happens when we try to perform some kind of conversion inside
a schema. For example we might try to define a PasswordSchema
:
class PasswordSchema(schema.FancySchemaItem):
def _validate(self, value, **kwargs):
return hashlib.md5(value).hexdigest()
which is used by our UserWithSchema
class:
class UserWithSchema(MappedClass):
class __mongometa__:
session = session
name = 'user_with_schema'
_id = FieldProperty(schema.ObjectId)
name = FieldProperty(schema.String(required=True))
password = FieldProperty(PasswordSchema)
Then we can create a new user and query it back:
>>> user = UserWithSchema(name='User 1',
... password='12345678')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/docs/checkouts/readthedocs.org/user_builds/ming/envs/latest/lib/python3.12/site-packages/ming/odm/mapper.py", line 421, in __init__
self.func(self_, *args, **kwargs)
File "/home/docs/checkouts/readthedocs.org/user_builds/ming/envs/latest/lib/python3.12/site-packages/ming/odm/mapper.py", line 457, in _basic_init
setattr(self_, k, v)
File "/home/docs/checkouts/readthedocs.org/user_builds/ming/envs/latest/lib/python3.12/site-packages/ming/odm/property.py", line 98, in __set__
value = self.field.schema.validate(value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/docs/checkouts/readthedocs.org/user_builds/ming/envs/latest/lib/python3.12/site-packages/ming/schema.py", line 268, in _validate_optional
return self._validate(value, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/docs/checkouts/readthedocs.org/user_builds/ming/checkouts/latest/docs/src/ming_odm_schemas.py", line 37, in _validate
return hashlib.md5(value).hexdigest()
^^^^^^^^^^^^^^^^^^
TypeError: Strings must be encoded before hashing
>>> session.flush()
>>> session.clear()
>>>
>>> user = UserWithSchema.query.find().first()
>>> user.password
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'password'
At first sight it might seem that everything worked as expected. Our user got created and the password is actually an md5.
But is it the right md5?
>>> user = UserWithSchema.query.find().first()
>>> hashlib.md5('12345678').hexdigest() == user.password
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: Strings must be encoded before hashing
It looks like it isn’t. Actually when we query the user back we get something which is even different from the value stored on the database:
>>> user = UserWithSchema.query.find().first()
>>> user.password
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'password'
>>>
>>> user = session.db.user_with_schema.find_one()
>>> user['password']
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: 'NoneType' object is not subscriptable
And what we have on the db is not even the md5 of our password:
>>> user = session.db.user_with_schema.find_one()
>>> user['password']
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: 'NoneType' object is not subscriptable
>>>
>>> hashlib.md5('12345678').hexdigest()
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: Strings must be encoded before hashing
That’s because our value is actually the md5 recursively applied multiple times whenever the validation was performed:
>>> user = UserWithSchema.query.find().first()
>>>
>>> pwdmd5 = hashlib.md5('12345678').hexdigest()
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: Strings must be encoded before hashing
>>> for i in range(10):
... if pwdmd5 == user.password:
... break
... pwdmd5 = hashlib.md5(pwdmd5).hexdigest()
...
Traceback (most recent call last):
File "<console>", line 2, in <module>
NameError: name 'pwdmd5' is not defined
>>> pwdmd5 == user.password
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'pwdmd5' is not defined
>>> i
0
So what we learnt is that schemas should never be used to convert values as they can be applied recursively any number of times whenever the document is saved or loaded back!
So what can we use to convert values? Custom Properties
Custom Properties¶
Custom Properties are specific to the ODM layer and are not available on the Ming Foundation Layer which implements only schema validation.
The benefit of custom properties over schemas is that you actually know whenever the valid is read or saved and so they can be properly used for conversion of values to and from python.
Converting with Properties¶
Ming Properties actually implement the Python Descriptor Protocol
which is based on __get__
, __set__
?and __delete__
methods
to retrieve, save and remove values from an object.
So implementing a custom property is a matter of subclassing FieldProperty
and providing our custom behaviour:
class PasswordProperty(FieldProperty):
def __init__(self):
# Password is always a required string.
super().__init__(schema.String(required=True))
def __get__(self, instance, cls=None):
if instance is None: return self
class Password(str):
def __new__(cls, content):
self = str.__new__(cls, '******')
self.raw_value = content
return self
# As we don't want to leak passwords we return an asterisked string
# but the real value of the password will always be available as .raw_value
# so we can check passwords when logging in.
return Password(super().__get__(instance, cls))
def __set__(self, instance, value):
pwd = hashlib.md5(value).hexdigest()
super().__set__(instance, pwd)
Then we can use it like any other property in our model:
class User(MappedClass):
class __mongometa__:
session = session
name = 'user'
_id = FieldProperty(schema.ObjectId)
name = FieldProperty(schema.String(required=True))
password = PasswordProperty()
This is already enough to be able to store properly hashed passwords:
>>> user = User(name='User 1',
... password='12345678')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/docs/checkouts/readthedocs.org/user_builds/ming/envs/latest/lib/python3.12/site-packages/ming/odm/mapper.py", line 421, in __init__
self.func(self_, *args, **kwargs)
File "/home/docs/checkouts/readthedocs.org/user_builds/ming/envs/latest/lib/python3.12/site-packages/ming/odm/mapper.py", line 457, in _basic_init
setattr(self_, k, v)
File "/home/docs/checkouts/readthedocs.org/user_builds/ming/checkouts/latest/docs/src/ming_odm_properties.py", line 37, in __set__
pwd = hashlib.md5(value).hexdigest()
^^^^^^^^^^^^^^^^^^
TypeError: Strings must be encoded before hashing
>>> session.flush()
>>>
>>> user = session.db.user.find_one()
>>> user['password']
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: 'NoneType' object is not subscriptable
>>> user['password'] == hashlib.md5('12345678').hexdigest()
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: 'NoneType' object is not subscriptable
And as we provided some kind of password leakage prevention by always returning an asterisked string for the password let’s check if it works as expected:
>>> user = User.query.find().first()
>>> user.password
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'password'
>>> user.password.raw_value
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'password'
As we can see the password is properly returned as a Password
instance
which is a string with asterisk that also provides the real value as .raw_value
.