Sample data generator library.
The motivation to create this project was the high number of limitations for similar web apps in the free tier. This is easy to run locally, in a Jupyter notebook or even have it as a library for other projects.
- Generates sample data in the form of physical files, in the most popular formats, like
csv
andparquet
. - Generates sample data in the form of in-memory structures, like list of dictionaries or even table objects like Pandas or Pyspark dataframes.
- Highly customizable column types.
- Modular structure, you can even create your own classes if you want.
Run pip install mockalot
To make use of parquet, pandas or pyspark, extra packges were included. You can run pip install mockalot[parquet]
and easily install all the necessary libraries to work with parquet. This is not mandatory tho, so if you are in an environment that has everything installed out-of-the box (AWS Glue, Databricks, etc), you can use Mockalot without worrying with extras.
from mockalot import Mockalot
from mockalot.generators import EmailGenerator, UUIDGenerator, NameGenerator
from mockalot.writers import CSVWriter
mocker = Mockalot()
mocker.set_config("sample_size", 20000) \
.set_column("id", UUIDGenerator, {}) \
.set_column("name", NameGenerator, {}) \
.set_column("email", EmailGenerator, {}) \
.set_writer(CSVWriter, {"output_filename": "users"})
mocker.run()
The snipped above will create a CSV file of 20k lines, consisted of 3 columns(id, name and email), written into ./output/users.csv
.
There are more usage examples here.
You can see the project's roadmap here.
mockalot is available under the MIT license. See the LICENSE file for more info.