MongoEngine/mongoengine

Support for Mongo JSON schema validation

edanaher opened this issue · 1 comments

Problem

While MongoEngine has lovely schema validation, it only applies to changes that go through MongoEngine. Any changes made through pymongo or other access can easily bypass this validation and write data that does not match MongoEngine's schema. This can result in exciting failures to read or save data, since the existing data doesn't match the schema.

Proposed High-level Solution

Mongo supports extremely flexible Schema Validation via the JSON Schema standard. MongoEngine could leverage this to enforce its schema at the database level, avoiding the issue of outside sources writing schema-invalid data.

Notably, Mongo schema validation also supports a validation level that will only enforce validation on documents that are already valid, allowing easy migration to Mongo schema validation.

Interface

There are a couple options for the interface

  1. Allow a field or model to have a flag indicating that its constraints should be propagated to a mongo JSON schema. When this is set, MongoEngine could convert its schema to a JSON schema (I believe the JSON schema spec is sufficiently expressive for this, though there may be some edge cases that cannot be converted).
  2. Add another metadata field (on models or fields) that expresses the constraint directly in JSON schema terms. This may be more expressive, but requires additional developer effort to write the schema, and risks the JSON schema and MongoEngine schema differing and causing writes to fail unexpectedly.

Implementation

I am not familiar with MongoEngine internals; my gut is that this would be a major undertaking, but I would love to hear otherwise.

The natural implementation would be to treat this like migrations in a traditional relational database, much like Django ORM or other ORMs: when you update a JSON schema, a migration file is generated that encapsulates the change, and that change must be run via a migration process. That would be a significant change to MongoEngine and, to some degree, goes against the spirit of Mongo.

A less invasive implementation would be to just check the schemas on startup; particularly if using moderate validation, updating the schema is a much lighter process than in a relational database; it doesn't touch any data and doesn't require modifying existing data to fit the new schema. It's just a query to update the schema. This would have some implication for startup time, but assuming a reasonable number of collections and reasonable latency, it should be negligable. Alternatively, this could even be done in the background after startup, though at some cost to consistency if updates are performed before the schemas are updated.

Thoughts?

I suspect that this is not something that will be implemented in the near term in MongoEngine, and it would be better to do something outside of MongoEngine to manage JSON schemas. But I want to get this idea out there, since MongoEngine seems like a natural place to put these schemas, and I may be overestimating the work involved.

@edanaher I have recently published a JSON schema generator for MongoEngine models, thought you might want to check out: https://github.com/symphonicityy/mongoengine-jsonschema