MaterializeInc/datagen

Feature: Add support for generating relational data

bobbyiliev opened this issue · 1 comments

An experimental JSON relational option has been added. We need to add the same for SQL and AVRO schemas.

This adds some very basic relational functionality for JSON schemas. You can specify the PK and the FK in the schema eg:

[
    {
        "_meta": {
            "topic": "mz_datagen_users",
            "key": "id"
        },
        "id": "datatype.uuid",
        "name": "internet.userName",
        "email": "internet.exampleEmail",
        "phone": "phone.imei",
        "website": "internet.domainName",
        "city": "address.city",
        "company": "company.name"
    },
    {
        "_meta": {
            "topic": "mz_datagen_posts",
            "foreignKey": "user_id",
            "key": "id"
        },
        "id": "datatype.uuid",
        "user_id": "datatype.uuid",
        "title": "lorem.sentence",
        "body": "lorem.paragraph"
    },
]

A unique ID will be generated and used for the users.id and posts.user_id

Commenting so I don’t forget. We also would want to capture one-many, one-one, and many-many relationships. This would allow us to precompute a keyspace and have more realistic looking updates and deletes.

Let’s say with users, posts, and post comments. One user has many posts, each post has one user but many comments, and each comment has one post and one user. We should be able to precompute 100 unique users and 10 posts per user and 5 comments per post.

More thoughts: https://www.notion.so/materialize/Make-Datagen-CLI-Relational-eaea0f1a48e54f528e6ea23c566598d4?pvs=4