supabase-community/copycat

feature: Helper utility to generate a collection of faked objects for bulk seeding

Closed this issue ยท 3 comments

Helper utility to generate a collection of faked objects for bulk seeding

Use Case

When using RedwoodJS and seeding data with Prisma, I often find myself wanting to quickly generate N models populated with data.

For example, given the Prisma model:

model Person {
  id            Int    @id @default(autoincrement())
  fullName      String @unique
  postalAddress String
}

I may want to quickly generate 100 people.

Currently, I do something like this to construct 100 people with some fake data and then save ...

  const data: Prisma.PersonCreateArgs['data'][] = [
        ...Array(100).keys(),
      ].map((key) => {
        return {
          fullName: copycat.fullName(key),
          postalAddress: copycat.postalAddress(key),
        }
      })

// ...

  await db.person.createMany({ data, skipDuplicates: true })

But, I keep having to remember the somewhat clunky syntax

[ ...Array(100).keys(),].map((key) => {}

and it would be useful in copycat to have some method that can:

  • take an argument for how many data objets to generate
  • return a collection of those populated objects to save

I could envision several implementations:

  • a callback to set the "generate" methods ... in the above example the returned {fullName, postalCode}
  • option to return a collection (memory heavy) to use with createMany
  • option to "stream" so-to-speak so can create individual via Prisma create (to support databases other than Postgres, but if you are using Snaplet, not sure why... but could be useful still).
  • option to configure the "shape" like that in fictional to configure the object ... e.g., define:
const person =  {
          fullName,
          postalAddress,
        }

const person =  {
          name: fullName,
          postalAddress,
        }

so that no need to pass in the key per item.

Hey @dthyresson o/

The use case totally makes sense, thanks for bringing it up!

I actually think a combination of times and shape might help here:

import { copycat } from '@snaplet/copycat'
import { shape } from 'fictional'

const person = shape({
  fullName: copycat.fullName,
  postalAddress: copycat.postalAddress
})

copycat.times('someInputKey', 100, person)

/* =>
[
  {
    fullName: 'Arne Turner',
    postalAddress: '884 Williamson Inlet, Minot 3065, Lebanon'
  },
  {
    fullName: 'Casper Deckow',
    postalAddress: '141 Tierra Mountain, Kenner 6145, Cayman Islands'
  },
  // ...
]
*/

Would that work for you?

If it'd help we could also re-export shape from fictional so you only have to work with copycat

re streaming and memory, its a good point - times will all be in-memory. Would this be enough for your use case, or do you need to generate a very large number of items?

@justinvdm I didn't know about times ... that's perfect:

import { copycat } from '@snaplet/copycat'
import { shape } from 'fictional'

// ...
    const n = 100_000
    const batch = 10_000
    console.info(`Creating ${n} people to search through ...`)

    const person = shape({
      fullName: copycat.fullName,
      postalAddress: copycat.postalAddress,
    })

    for (var i = 1; i <= n / batch; i++) {
      const data: Prisma.PersonCreateArgs['data'][] = copycat.times(
        `person-${i}`,
        batch,
        person
      )

      console.info(
        `Creating ${data.length} x ${i} = ${data.length * i} records ...`
      )

      await db.person.createMany({ data, skipDuplicates: true })
    }

Is working well for me to batch create 100k records (yes, I wouldn't typically seed this but I needed to demo a case insensitive search index and needed a larger dataset).

I'll write up a post for Redwood or maybe update the seed documentation to suggest this approach.

  • exporting shape might be a nice improvement
  • eventually, something the stream and create in batches could also be helpful.
// 1_000 times in batches of 100
copycat.inBatches('someInputKey', 1_000, 100, person), callback)

or ... maybe add a batchSize option to times with a possible callback

Note: If I was seeding 100k+, I'd probably restore a snapshot form Snaplet :)

Great, glad times works :)

exporting shape might be a nice improvement

๐Ÿ‘ I'll make a PR for this soon

eventually, something the stream and create in batches could also be helpful.

๐Ÿ‘ I can see from your example how it would simplify things for use cases like seed scripts.

I'll close this one for now then, thank you for bringing this up.