This is an experimental Object-Document Mapping library for MongoDB. You can watch it being developed live on MongoDB's YouTube channel!
- Managing large amounts of data in MongoDB while keeping a data schema flexible is challenging.
- This ODM is not an active record implementation, mapping documents in the database directly into similar objects in code.
- This ODM is designed to abstract underlying documents, mapping potentially multiple document schemata into a shared object representation.
- It should also simplify the evolution of documents in the database, automatically migrating individual documents' schemas either on-read or on-write.
- There should be "escape hatches" so that unforeseen mappings can be implemented, hiding away the implementation code behind hopefully reuseable components.
The library currently doesn't interact directly with MongoDB - what it does do is wrap BSON documents returned by PyMongo or Motor.
For example, let's say you have a BSON document like this:
user_data_bson = {'_id': ObjectId('657072b56731c9e580e9dd70'),
'bio': 'Music conference able doctor degree debate. Participant usually above '
'relate.',
'birth_date': datetime.datetime(1999, 7, 6, 0, 0),
'email': 'deanjacob@yahoo.com',
'follower_count': 59,
'full_name': 'Deborah White',
'user_id': '4',
'user_name': '@tanya15',
'followers': [{'_id': ObjectId('657072b66731c9e580e9dda6'),
'bio': 'Rich beautiful color life. Relationship instead win '
'join enough board successful.',
'user_id': '58',
'user_name': '@rduncan'},
{'_id': ObjectId('657072b66731c9e580e9dd99'),
'bio': 'Picture day couple democratic morning. Environment '
'manage opportunity option star food she. Occur imagine '
'population single avoid.',
'user_id': '45',
'user_name': '@paynericky'},
]}
You can define a wrapper for it like this:
from docbridge import Document
class UserProfile(Document):
pass
The wrapper doesn't currently do very much - it just makes the dict
returned by PyMongo look more like a regular Python class:
profile = UserProfile(user_data_bson, db=None)
print(repr(profile._id)) # ObjectId('657072b56731c9e580e9dd70')
print(repr(profile.user_id)) # "4"
The real power of the library (like with most ODMs) comes from attaching field definitions to the class, to transform the way data is looked up on the underlying document.
Here is how the Field
class can be used to configure mappings to different field names in the underlying document, or to transform the data in the underlying field, to convert a string to an int:
from docbridge import Document, Field
class UserProfile(Document):
id = Field(field_name="_id") # id maps to the _id doc field.
user_id = Field(transform=int) # user_id transforms the field value to an int
profile = UserProfile(user_data_bson, db=None)
print(repr(profile.id)) # ObjectId('657072b56731c9e580e9dd70')
print(repr(profile.user_id)) # 4 <- This is an int now!
print(
repr(profile.follower_count)
) # 59 <- You can still access other doc fields as attributes.
There are other types of field, though. FallthroughField is one of them. It allows you to try to look up a field by one name, and if the field is missing, it will try other names that it's been configured with.
Note: This field type will probably disappear, as I may merge its
functionality into Field
.
from docbridge import Document, FallthroughField
class UserProfile(Document):
# The `name` attribute will look up the "full_name" field,
# and fall back to the "name" if it's missing.
name = FallthroughField(
field_names=[
"full_name", # v2
"name", # v1
]
)
profile = UserProfile({"full_name", "Mark Smith"})
assert profile.name == "Mark Smith" # Works
profile = UserProfile({"name", "Mark Smith"})
assert profile.name == "Mark Smith" # Also works!
Some support already exists for abstracting MongoDB Design Patterns, like the Subset Pattern. The subset pattern preserves document size at a reasonable level by only embedding a subset of related data - for example, only the first 10 followers on a social media profile. The rest of the followers would be stored in their own collection, and loaded only when necessary.
class Follower(Document):
_id = Field(transform=str)
class Profile(Document):
_id = Field(transform=str)
followers = SequenceField(
type=Follower,
superset_collection="followers",
# The following query will be executed on "followers" if the field
# is iterated past the embedded follower subdocuments.
superset_query=lambda ob: [
{
"$match": {"user_id": ob.user_id},
},
{"$unwind": "$followers"},
{"$replaceRoot": {"newRoot": "$followers"}},
],
)
# Print all the profile's followers to the screen,
# including those in the followers collection:
profile = Profile(user_data_bson, db=test_db)
for follower in profile:
print(follower.id)
I've been developing docbridge on YouTube. You can catch the live streams at 2pm GMT on Wednesdays, or you can view the recordings:
Introducing my plans for the library, and building out the Document
class, and the Simple
and Fallthrough
classes. (The latter two get renamed later to Field
and FallthroughField
)
Writing some Pytest test fixtures that will run tests in a transaction, and roll back any changes to the database. Then (attempting to) publish my module to PyPI!
Joins are a fundamental part of data modeling in MongoDB! This episode adds a field type for embedded arrays, and in the next episode it'll be extended to look up data in other collections!
More metaprogramming to turn a sequence of items that is split across documents and collections into a single Python sequence.
It's all very well reading data from the database, but it's also nice to be able to update it!
It turns out there's quite a lot of work to record and replay updates. Let's get on with it!
It turns out there's quite a lot of work to record and replay updates. Let's get on with it!