In this lesson we'll learn how to properly associate and establish relationships between the data stored in a database. By associating data, we can eliminate the need for duplicate data entries and a more organized data structure when retrieving the data.
- fork and clone
In order to understand how and why we set up relationships, read the following article: Modeling Relationships in MongoDB
As you can see, there are many different ways of associating data with MongoDB. There are trade offs to every type of association. What's important to understand, is how to set up the associations.
You'll typically see the following:
- One-To-Many
- Many-To-Many
Take five minutes and read the MongoDB docs on relationships:
Once again, what are the trade-offs between embedding a document vs referencing a document?
What's a one-to-many relationship? A common example is a blog app. A blog has users, a user can have many blog posts. One-to-many relationships are quite common and we need to know how to implement them on the database level.
In MongoDB we can create a one-to-many relationship by either:
- embedding the related documents
- referencing the related document(s)
There are trade-offs to each. We should understand them and pick what suits our use case best.
Let's consider a few examples!
Consider the following user
document with many posts
's embedded in it:
{
_id: ObjectId("3e399709171f6188450e43d2"),
name: "Joe Schmoe",
handle:"joeBeans123",
posts: [
{
title: "123 Fake Street",
description: "Faketon",
likes: 16,
},
{
title: "1 Some Other Street",
description: "Boston",
likes: 32,
}
]
}
As you can see, we can embed content into an existing document
. This is pretty common in MongoDB.
- Cons
- As the user creates more posts, the array grows. This can lead to increased query latency due to scanning the documents within a collection.
- A user might accidently create a duplicate entry leaving your front end with duplicate posts.
- Deleting specific records may become challenging due to how mongoDB scans collections.
- Pros
- simpler queries to find data
- the data is returned with the desired record right off the bat
Here's another way we can embed documents:
{
_id: ObjectId("2e399709171f6188450e43d2")
title: "Learn JavaScript",
description: "Take a coding bootcamp on JavaScript",
status: "active",
user: {
first_name: "Joe",
last_name: "Schmoe",
email: "j.schmoe@gmail.com",
job_title: "Junior Developer"
}
}
{
_id: ObjectId("8e399709171f6588450e43g2")
title: "Learn React",
description: "Take a coding bootcamp on React",
status: "active",
user: {
first_name: "Joe",
last_name: "Schmoe",
email: "j.schmoe@gmail.com",
job_title: "Junior Developer"
}
}
Notice a pattern here. Both of the users are the same. The biggest issue with embedding documents in this way, is that you'll end up creating duplicate records (in this case a user).
Let's see how we would model the same data by having posts reference a user document:
users collection
{
_id: ObjectId("3e399709171f6188450e43d2"),
name: "Joe Schmoe"
}
posts collection
{
_id: ObjectId("9e391709171f6188450e43f4"),
user_id: ObjectId("3e399709171f6188450e43d2"),
title: "123 Fake Street",
description: "Faketon",
likes: 16
}
{
_id: ObjectId("8s32170987gf6188450y43f2"),
user_id: ObjectId("3e399709171f6188450e43d2"),
title: "1 Some Other Street",
description: "Boston",
likes: 32
}
Utilizing this design, we have much more control over the records in our database. By using references
, we create a virtual link between collections to describe what record belongs to who. The reference
is typically the _id
of a document due to it being unique and unmodifiable!
- Cons
- queries to load the associated data are much more complex
- managing multiple schemas/collections becomes trickier as you add more references
- Pros
- reduces the risk of duplicates due to schema restrictions that we can add
- managing the data becomes easier (Deleting and Updating)
- the data is organized in a way that latency does not increase (mongoDB performs some magic to make references super fast)
We can also use references
within the parent document. We store the records within an array of the child documents ObjectId
.
{
_id: ObjectId("4e339749175f6147450e43d1")
first_name: "Joe",
last_name: "Schmoe",
email: "j.schmoe@gmail.com",
job_title: "Junior Developer",
tasks: [ ObjectId("2e399709171f6188450e43d2"), ObjectId("8e399709171f6588450e43g2") ]
}
task
s documents
{
_id: ObjectId("2e399709171f6188450e43d2")
title: "Learn JavaScript",
description: "Take a coding bootcamp on JavaScript",
status: "active"
}
{
_id: ObjectId("8e399709171f6588450e43g2")
title: "Learn React",
description: "Take a coding bootcamp on React",
status: "active"
}
This is a common way to model one-to-many relationships where we know we plan on creating requests for a user and all their associated tasks. Instead of embedding tasks within the user document, we embed the task id. This is more efficient. Our user document stays small. Its efficient to request all users or a specific user. This model supports a data model where a user can have many tasks because all we're storing is the task id inside the user document - so it doesn't take up much space in the user document. And if we do want the task data we can request it from the tasks collection based on the task id found in the user document.
Let's implement document references. We have the concept tasks and users. Tasks belong to users via referencing. How would we create that via code?! Let's start:
npm init -y
npm install mongoose
npm install --save-dev chance
mkdir db models seed
touch db/index.js models/{user,task,index}.js seed/tasksUsers.js query.js
Create a .gitignore
file
echo "
/node_modules
.DS_Store" >> .gitignore
Now let's open up Visual Studio Code and write some code:
code .
Inside our db
folder we are going to use Mongoose to establish a connection to our MongoDB tasksDatabase
:
db/index.js
const mongoose = require('mongoose')
mongoose
.connect('mongodb://127.0.0.1:27017/tasksDatabase')
.then(() => {
console.log('Successfully connected to MongoDB.')
})
.catch((e) => {
console.error('Connection error', e.message)
})
// mongoose.set('debug', true)
const db = mongoose.connection
module.exports = db
Notice
mongoose.set('debug', true)
is commented out. This line of code is super handy if you ever need to debug any mongoDB queries. Feel free to uncomment it and use it.
Let's create our task schema:
models/task.js
const { Schema } = require('mongoose')
const Task = new Schema(
{
title: { type: String, required: true },
description: { type: String, required: true }
},
{ timestamps: true }
)
module.exports = Task
Now we can create our user schema:
models/user.js
const { Schema } = require('mongoose')
const User = new Schema(
{
first_name: { type: String, required: true },
last_name: { type: String, required: true },
email: { type: String, required: true },
job_title: { type: String, required: true },
tasks: [{ type: Schema.Types.ObjectId, ref: 'tasks' }]
},
{ timestamps: true }
)
module.exports = User
We'll now set up our models:
models/index.js
const { model } = require('mongoose')
const TaskSchema = require('./task')
const UserSchema = require('./user')
const User = model('users', UserSchema)
const Task = model('tasks', TaskSchema)
module.exports = {
User,
Task
}
Notice how we create a tasks array that holds a reference to the tasks schema. Our user model now has a relationship to our task model. Our user model can hold an arrays of task ids.
Ok. Let's populate our database with data so we can query against it and make sure we setup our models correctly.
Let's now create a seed file to create some data for our database:
seed/tasksUsers.js
const db = require('../db')
const Chance = require('chance')
const { Task, User } = require('../models')
const chance = new Chance()
db.on('error', console.error.bind(console, 'MongoDB connection error:'))
const createTasks = async () => {
const tasks = [...Array(400)].map((task) => {
return new Task({
title: chance.sentence(),
description: chance.paragraph()
})
})
await Task.insertMany(tasks)
console.log('Created Tasks!')
return tasks
}
const createUsersWithTasks = async (tasks) => {
console.log(tasks)
let lenOfItems = 100
const users = [...Array(lenOfItems)].map((user) => {
const selectedTasks = tasks.splice(0, tasks.length / lenOfItems)
return {
first_name: chance.first(),
last_name: chance.last(),
email: chance.email(),
job_title: chance.profession(),
tasks: selectedTasks.map((task) => task._id)
}
})
await User.insertMany(users)
console.log('Created Users!')
}
const run = async () => {
const tasks = await createTasks()
await createUsersWithTasks(tasks)
db.close()
}
run()
Once you've written a script to seed data test it out:
node seed/tasksUsers.js
You should now be able to open up MongoDB Compass and see your database with all the seed data and proper relationship between users and tasks.
You can also test that your data and relationships are good by writing a simple query file:
query.js
const db = require('./db')
const { User, Task } = require('./models')
const findAllUsers = async () => {
const users = await User.find()
console.log('All users:', users)
}
const findAllTasks = async () => {
const tasks = await Task.find()
console.log('All tasks:', tasks)
}
const findOneWithTasks = async () => {
const user1 = await User.findOne()
// Try to use the populate method here to load all of the tasks for a user
// https://mongoosejs.com/docs/populate.html
console.log(JSON.stringify(user1), null, 2)
}
const run = async () => {
try {
// await findAllUsers()
// await findAllTasks()
// await findOneWithTasks()
} catch (error) {
console.log(error)
} finally {
await db.close()
}
}
run()
In this lesson, we covered a few different ways of associating data with mMongoDB. This knowledge will slowly settle in, the more your work with MongoDB. For now, just be aware that there are two ways to create relationships: embedding and referencing. And there are tradeoffs to both. Feel free to come back to this lesson, study it, review it, and work with it. This knowledge takes a while to solidify.