feature: Helper utility to generate a collection of faked objects for bulk seeding

Question

feature: Helper utility to generate a collection of faked objects for bulk seeding

Closed this issue 2 years ago · 3 comments

Helper utility to generate a collection of faked objects for bulk seeding

Use Case

When using RedwoodJS and seeding data with Prisma, I often find myself wanting to quickly generate N models populated with data.

For example, given the Prisma model:

model Person {
  id            Int    @id @default(autoincrement())
  fullName      String @unique
  postalAddress String
}

I may want to quickly generate 100 people.

Currently, I do something like this to construct 100 people with some fake data and then save ...

  const data: Prisma.PersonCreateArgs['data'][] = [
        ...Array(100).keys(),
      ].map((key) => {
        return {
          fullName: copycat.fullName(key),
          postalAddress: copycat.postalAddress(key),
        }
      })

// ...

  await db.person.createMany({ data, skipDuplicates: true })

But, I keep having to remember the somewhat clunky syntax

[ ...Array(100).keys(),].map((key) => {}

and it would be useful in copycat to have some method that can:

take an argument for how many data objets to generate
return a collection of those populated objects to save

I could envision several implementations:

a callback to set the "generate" methods ... in the above example the returned {fullName, postalCode}
option to return a collection (memory heavy) to use with createMany
option to "stream" so-to-speak so can create individual via Prisma create (to support databases other than Postgres, but if you are using Snaplet, not sure why... but could be useful still).
option to configure the "shape" like that in fictional to configure the object ... e.g., define:

const person =  {
          fullName,
          postalAddress,
        }

const person =  {
          name: fullName,
          postalAddress,
        }

so that no need to pass in the key per item.

Answer 1 · 2022-10-12T17:11:40.000Z

Hey @dthyresson o/

The use case totally makes sense, thanks for bringing it up!

I actually think a combination of times and shape might help here:

import { copycat } from '@snaplet/copycat'
import { shape } from 'fictional'

const person = shape({
  fullName: copycat.fullName,
  postalAddress: copycat.postalAddress
})

copycat.times('someInputKey', 100, person)

/* =>
[
  {
    fullName: 'Arne Turner',
    postalAddress: '884 Williamson Inlet, Minot 3065, Lebanon'
  },
  {
    fullName: 'Casper Deckow',
    postalAddress: '141 Tierra Mountain, Kenner 6145, Cayman Islands'
  },
  // ...
]
*/

Would that work for you?

If it'd help we could also re-export shape from fictional so you only have to work with copycat

re streaming and memory, its a good point - times will all be in-memory. Would this be enough for your use case, or do you need to generate a very large number of items?

Answer 2 · 2022-10-12T18:53:09.000Z

@justinvdm I didn't know about times ... that's perfect:

import { copycat } from '@snaplet/copycat'
import { shape } from 'fictional'

// ...
    const n = 100_000
    const batch = 10_000
    console.info(`Creating ${n} people to search through ...`)

    const person = shape({
      fullName: copycat.fullName,
      postalAddress: copycat.postalAddress,
    })

    for (var i = 1; i <= n / batch; i++) {
      const data: Prisma.PersonCreateArgs['data'][] = copycat.times(
        `person-${i}`,
        batch,
        person
      )

      console.info(
        `Creating ${data.length} x ${i} = ${data.length * i} records ...`
      )

      await db.person.createMany({ data, skipDuplicates: true })
    }

Is working well for me to batch create 100k records (yes, I wouldn't typically seed this but I needed to demo a case insensitive search index and needed a larger dataset).

I'll write up a post for Redwood or maybe update the seed documentation to suggest this approach.

exporting shape might be a nice improvement
eventually, something the stream and create in batches could also be helpful.

// 1_000 times in batches of 100
copycat.inBatches('someInputKey', 1_000, 100, person), callback)

or ... maybe add a batchSize option to times with a possible callback

Note: If I was seeding 100k+, I'd probably restore a snapshot form Snaplet :)

Answer 3 · 2022-10-12T19:31:23.000Z

Great, glad times works :)

exporting shape might be a nice improvement

👍 I'll make a PR for this soon

eventually, something the stream and create in batches could also be helpful.

👍 I can see from your example how it would simplify things for use cases like seed scripts.

I'll close this one for now then, thank you for bringing this up.