Using Seed Data

Learning Goals

  • Use a seed file to add sample data to your database
  • Use the Faker gem to quickly generate sample data

Introduction

What good is a database without any data? When working with any application involving a database, it's a good idea to populate your database with some realistic data when you are working on building new features. Active Record, and many other ORMs, refer to the process of adding sample data to the database as "seeding" the database. In this lesson, we'll see some of the conventions and built-in features that make it easy to seed data in an Active Record application.

This lesson is set up as a code-along, so make sure to fork and clone the lesson. Then run these commands to set up the dependencies and set up the database:

$ bundle install
$ bundle exec rake db:migrate

In this application, we have a migration for one table, games:

# db/migrate/20210718134231_create_games.rb
class CreateGames < ActiveRecord::Migration[6.1]
  def change
    create_table :games do |t|
      t.string :title
      t.string :genre
      t.string :platform
      t.integer :price
      t.timestamps # generates created_at and updated_at columns
    end
  end
end

And a corresponding Game class that inherits from Active Record:

# app/models/game
class Game < ActiveRecord::Base

end

Why Do We Need Seed Data?

With Active Record, we've seen how simple it is to add data to a database by using built-in methods that will write SQL code for us. We could create a new record in the games table by opening up a console session and using the .create method. The command would look something like this:

Game.create(title: "Breath of the Wild", platform: "Switch", genre: "Action-adventure", price: 60)

Running this command would create the new record and save it to the database, but how can we share this data with other developers who are working on the same application? How could we recover this data if our development database was deleted? We could include the database in version control, but this is generally considered bad practice: since our database might get quite large over time, it's not practical to include it in version control (you'll even notice that in our Active Record projects' .gitignore file, we include a line that instructs Git not to track any .sqlite3 files). There's got to be a better way!

The common approach to this problem is that instead of sharing the actual database with other developers, we share the instructions for creating data in the database with other developers. By convention, the way we do this is by creating a Ruby file, db/seeds.rb, which is used to populate our database.

We've already seen a similar scenario by which we can share instructions for setting up a database with other developers: using Active Record migrations to define how the database tables should look. Now, we'll have two kinds of database instructions we can use:

  • Migrations: define how our tables should be set up
  • Seeds: add data to those tables

Using the seeds.rb File

To use the seeds.rb file to add data to the database, all we need to do is write code that uses Active Record methods to create new records. Add this to the db/seeds.rb file:

# db/seeds.rb
Game.create(title: "Breath of the Wild", platform: "Switch", genre: "Action-adventure", price: 60)
Game.create(title: "Final Fantasy VII", platform: "Playstation", genre: "RPG", price: 60)
Game.create(title: "Mario Kart", platform: "Switch", genre: "Racing", price: 60)

To run this code, you could run ruby db/seeds.rb. But since this is a very common operation, we can also use a Rake task to run the code in this file. Run the Rake task now:

$ bundle exec rake db:seed

As long as there aren't any error messages, you won't see any output in the terminal. We can check if the operation succeeded by entering into the console:

$ bundle exec rake console

And checking if the records were created:

Game.count
# => 3
Game.last
# => #<Game:0x00007ff40641f698
#  id: 3,
#  title: "Mario Kart",
#  genre: "Racing",

Awesome! Exit out of the console.

What happens if we want to add some more data to the database? Well, we could try adding another .create call in our db/seeds.rb file:

# db/seeds.rb
Game.create(title: "Breath of the Wild", platform: "Switch", genre: "Action-adventure", price: 60)
Game.create(title: "Final Fantasy VII", platform: "Playstation", genre: "RPG", price: 60)
Game.create(title: "Mario Kart", platform: "Switch", genre: "Racing", price: 60)
Game.create(title: "Candy Crush Saga", platform: "Mobile", genre: "Puzzle", price: 0)

And running the seed file again, then checking the data in the console:

$ bundle exec rake db:seed
$ bundle exec rake console

Let's see our updated data:

Game.last
# => #<Game:0x00007fc123ae3af8
#  id: 7,
#  title: "Candy Crush Saga",
#  genre: "Puzzle",
#  platform: "Mobile",
Game.count
# => 7

Hmm, we only added four games in the db/seeds.rb file: why are there now seven games in the database? Well, remember — every time we run rake db:seed, we are creating new records in the games table. There's nothing stopping our code from producing duplicate data in the database. We're just instructing Active Record to create new code using this file!

We can use another Rake command to replant the seed data:

$ bundle exec rake db:seed:replant

This command removes the data from all existing tables, and then re-runs the seed file. It's handy if you want to start fresh! Just be cautious using this command, since it will delete all your existing data.

We can now see our fresh database with just four records in the games table, as intended. Run bundle exec rake console:

Game.count
# => 4

Generating Randomized Data

One challenge of seeding a database is thinking up lots of sample data. Ultimately, when you're developing an application, it's helpful to have realistic data, but the actual content is not so important.

One tool that can be used to help generate a lot of realistic randomized data is the Faker gem. This gem is already included in the Gemfile for this application, so we can try it out. Run bundle exec rake console, and try out some Faker methods:

Faker::Name.name
# => "Arnoldo Collier"
Faker::Name.name
# => "Teodoro Thiel"
Faker::Name.name
# => "Monte Stanton"

As you can see, every time we call the #name method, we get a new random name. Faker has a lot of built-in randomized data generators that you can use:

Faker::Internet.email
# => "chi@beatty.co"
Faker::Food.ingredient
# => "Jasmine Rice"
Faker::Kpop.girl_groups
# => "2NE1"

It even has some for generating game data, which we'll use in our seed file. Let's use Faker to generate 50 random games. Replace the data in the seeds.rb file with the following code:

# Add a console message so we can see output when the seed file runs
puts "Seeding games..."

# run a loop 50 times
50.times do
  # create a game with random data
  Game.create(
    title: Faker::Game.title,
    genre: Faker::Game.genre,
    platform: Faker::Game.platform,
    price: rand(0..60) # random number between 0 and 60
  )
end

puts "Done seeding!"

Then, run bundle exec rake db:seed:replant to re-seed the database. Let's check out what random games were created with bundle exec rake console:

Game.count
# => 50
Game.last
# => #<Game:0x00007fb4086909d8
#  id: 50,
#  title: "PlayerUnknown's Battlegrounds",
#  genre: "Trivia",
#  platform: "Nintendo 64",
#  price: 16,
#  created_at: 2021-07-18 14:28:56 UTC,
#  updated_at: 2021-07-18 14:28:56 UTC>

Great! Now we've got plenty of seed data to work with, and an easy way for ourselves or other developers to populate the database any time we need to do so.

Run learn test now to pass the test and complete this lesson.

Conclusion

In this lesson, we learned the importance of having a seed file along with our database migrations in order for ourselves and other developers to quickly set up the database with sample data. We also learned how to use the Faker gem to quickly generate randomized seed data.

Resources