feature: Helper utility to generate a collection of faked objects for bulk seeding
Closed this issue ยท 3 comments
Helper utility to generate a collection of faked objects for bulk seeding
Use Case
When using RedwoodJS and seeding data with Prisma, I often find myself wanting to quickly generate N models populated with data.
For example, given the Prisma model:
model Person {
id Int @id @default(autoincrement())
fullName String @unique
postalAddress String
}
I may want to quickly generate 100 people.
Currently, I do something like this to construct 100 people with some fake data and then save ...
const data: Prisma.PersonCreateArgs['data'][] = [
...Array(100).keys(),
].map((key) => {
return {
fullName: copycat.fullName(key),
postalAddress: copycat.postalAddress(key),
}
})
// ...
await db.person.createMany({ data, skipDuplicates: true })
But, I keep having to remember the somewhat clunky syntax
[ ...Array(100).keys(),].map((key) => {}
and it would be useful in copycat to have some method that can:
- take an argument for how many data objets to generate
- return a collection of those populated objects to save
I could envision several implementations:
- a callback to set the "generate" methods ... in the above example the returned {fullName, postalCode}
- option to return a collection (memory heavy) to use with createMany
- option to "stream" so-to-speak so can create individual via Prisma create (to support databases other than Postgres, but if you are using Snaplet, not sure why... but could be useful still).
- option to configure the "shape" like that in
fictional
to configure the object ... e.g., define:
const person = {
fullName,
postalAddress,
}
const person = {
name: fullName,
postalAddress,
}
so that no need to pass in the key per item.
Hey @dthyresson o/
The use case totally makes sense, thanks for bringing it up!
I actually think a combination of times
and shape
might help here:
import { copycat } from '@snaplet/copycat'
import { shape } from 'fictional'
const person = shape({
fullName: copycat.fullName,
postalAddress: copycat.postalAddress
})
copycat.times('someInputKey', 100, person)
/* =>
[
{
fullName: 'Arne Turner',
postalAddress: '884 Williamson Inlet, Minot 3065, Lebanon'
},
{
fullName: 'Casper Deckow',
postalAddress: '141 Tierra Mountain, Kenner 6145, Cayman Islands'
},
// ...
]
*/
Would that work for you?
If it'd help we could also re-export shape
from fictional so you only have to work with copycat
re streaming and memory, its a good point - times
will all be in-memory. Would this be enough for your use case, or do you need to generate a very large number of items?
@justinvdm I didn't know about times
... that's perfect:
import { copycat } from '@snaplet/copycat'
import { shape } from 'fictional'
// ...
const n = 100_000
const batch = 10_000
console.info(`Creating ${n} people to search through ...`)
const person = shape({
fullName: copycat.fullName,
postalAddress: copycat.postalAddress,
})
for (var i = 1; i <= n / batch; i++) {
const data: Prisma.PersonCreateArgs['data'][] = copycat.times(
`person-${i}`,
batch,
person
)
console.info(
`Creating ${data.length} x ${i} = ${data.length * i} records ...`
)
await db.person.createMany({ data, skipDuplicates: true })
}
Is working well for me to batch create 100k records (yes, I wouldn't typically seed this but I needed to demo a case insensitive search index and needed a larger dataset).
I'll write up a post for Redwood or maybe update the seed documentation to suggest this approach.
- exporting
shape
might be a nice improvement - eventually, something the stream and create in batches could also be helpful.
// 1_000 times in batches of 100
copycat.inBatches('someInputKey', 1_000, 100, person), callback)
or ... maybe add a batchSize
option to times
with a possible callback
Note: If I was seeding 100k+, I'd probably restore a snapshot form Snaplet :)
Great, glad times
works :)
exporting shape might be a nice improvement
๐ I'll make a PR for this soon
eventually, something the stream and create in batches could also be helpful.
๐ I can see from your example how it would simplify things for use cases like seed scripts.
I'll close this one for now then, thank you for bringing this up.