Utility that generates fake data in json
, yaml
, yml
, cson
, or csv
formats based on models which are defined in yaml
. Data can be generated using any combination of FakerJS, ChanceJS, or Custom Functions.
Generated data can be output in the following formats and destinations:
json
yaml
yml
cson
csv
- Zip Archive of
json
,yaml
,yml
,cson
orcsv
files - Couchbase Server
- Couchbase Sync Gateway Server
npm install fakeit --save-dev
# or
npm install fakeit --global
Usage: fakeit [command] [<file|directory|glob> ...]
Commands:
console [options] outputs the result to the console
couchbase [options] This will output to couchbase
sync-gateway [options] no idea
directory|folder [options] [<dir|file.zip>] [<models...>] Output the file(s) into a directory
help
Options:
-h, --help output usage information
-V, --version output the version number
--root <directory> Defines the root directory from which paths are resolve from (process.cwd())
--babel <glob> The location to the babel config (+(.babelrc|package.json))
-c, --count <n> Overrides the number of documents to generate specified by the model. Defaults to model defined count
-v, --verbose Enables verbose logging mode (false)
-S, --no-spinners Disables progress spinners
-L, --no-log Disables all logging except for errors
-T, --no-timestamp Disables timestamps from logging output
-f, --format <type> this determines the output format to use. Supported formats: json, csv, yaml, yml, cson. (json)
-n, --spacing <n> the number of spaces to use for indention (2)
-l, --limit <n> limit how many files are output at a time (100)
-x, --seed <seed> The global seed to use for repeatable data
All data is generated from one or more YAML files. Models are defined similarly to how models are defined in Swagger, with the addition of a few more properties that are used for data generation:
At the root of a model the following keys are used, if it's not required then it's optional
The name of the model
The data type of the model to be generated. This needs to be set top level, as well as a per property/items basis. It determines the starting data type, and how the result of the build loop will be converted once complete
Note: If type isn't set it defaults to 'null'
.
types | data type | description |
---|---|---|
number, long, integer | 0 |
Converts result to number using parseInt |
double, float | 0 |
Converts result to number using parseFloat |
string | '' |
Converts result to a string using result.toString() |
boolean, bool | false |
Converts result to a boolean if it's not already, if result is a string and is 'false' , '0' , 'undefined' , 'null' it will return false |
array | [] |
returns the result from the build loop |
object, structure | {} |
returns the result from the build loop |
null, undefined, * (anything else) | null |
returns the result from the build loop |
name: Types example
# typically object or array
type: object
key:
build: faker.random.uuid()
properties:
foo:
# can be set on properties of an object
type: object
properties:
bar:
# can be set on nested properties
type: string
data:
value: FakeIt ftw
bar:
type: array
items:
# can be set on items
type: string
data:
min: 1
max: 10
build: faker.random.word()
This is the main data object that is uses the same properties in several different situations.
min
: The minimum number of documents to generatemax
: The maximum number of documents to generatecount
: A fixed number of documents to generate. If this is defined thenmin
andmax
are ignored. Ifmin
,max
, andcount
aren't definedcount
defaults to 1pre_run
: A function that runs before the documents are generatedpre_build
: A function to be run before each document is generatedvalue
: Returns a value (can't be a function).build
: The function to be run when the property is built. Only runs ifvalue
isn't definedfake:
A template string to be used by Faker i.e."{{name.firstName}}"
. This will only run ifbuild
, andvalue
aren't defined.post_build
: A function to be run after each document is generatedpost_run
: A function that runs after all the documents are generated for that model
The following keys can only be defined in the top level data object
dependencies
: An array of dependencies of file paths to the dependencies of the current model. They are relative to the model, and or they can be absolute paths. Don't worry about the order, we will resolve all dependencies automagically #yourwelcomeinputs
: A object/string of input(s) that's required for this model to run. If it's a string the file name is used as the key. The key is what you reference when you want to get data (akathis.inputs[key]
). The value is the file path to the inputs location. It can be relative to the model or an absolute path.
This determines the name of the document that's being generated. It only needs to be defined once per document. This is a reference to a generated property and is used for the filename or Document ID.
If the key is an object it needs the data
option defined above, it will only work with value
, build
, and fake
since this already runs after the document has been built.
If the key is a string then it use the string value to find the value of the document that was just built (using the lodash get method).
In this example after each document is built it will look for the _id
property and return it's result (aka user_1
, user_2
, etc.)
name: Key String Example
type: object
key: _id
data:
pre_run: |
globals.user_counter = 0;
properties:
_id:
type: string
description: The document id
data:
post_build: `user_${this.user_id}`
user_id:
type: integer
description: The users id
data:
build: ++globals.user_counter
In this example the key will be 'user_' + the current user_id
(aka user_1
, user_2
, etc.)
name: Key Object Example
type: object
key:
data:
build: `user_${this.user_id}`
data:
pre_run: |
globals.user_counter = 0;
properties:
user_id:
type: integer
description: The users id
data:
build: ++globals.user_counter
If a seed is defined it will ensure that the documents created repeatable results. If you have a model with a data range of 2-10 a random number between 2 and 10 documents will be created no matter what the seed is. Let's say that 4 documents are generated the first time you run the model, each of those documents will be completely different than the next (as expected). Later you come back and you generate the data again this time it might generate 6 documents. The first 4 documents generated the second time will be exactly the same as the first time you generated the data. The seed can be number or string.
This only works if you use faker
and chance
to generate your random fake data. It can be produced with other fake data generation libraries if they support seeds.
faker.date
functions will not produce the same fake data each time.
For any function defined above be sure to use |
for multi line functions and NOT >
. To see an in depth explanation see this issue
Each of these functions is passed the following variables that can be used at the time of it's execution:
documents
- An object containing a key for each model whose value is an array of each document that has been generatedglobals
- An object containing any global variables that may have been set by any of the run or build functionsinputs
- An object containing a key for each input file used whose value is the deserialized version of the files datafaker
- A reference to FakerJSchance
- A reference to ChanceJSdocument_index
- This is a number that represents the currently generated document's position in the run orderrequire
- This is the noderequire
function, it allows you to require your own packages. Should require and set them in the pre_run functions for better performance.
For the pre_run
, and post_run
the this
context refers to the current model.
For the pre_build
, build
, and post_build
the this
context refers to the object currently being generated.
If you have a nested object being created in an array or something, this
will refer to closest object not the outer object/array.
name: Users
type: object
key:
data:
build: `user_${this.user_id}`
data:
min: 200
max: 500
pre_run: |
globals.user_counter = 0;
properties:
user_id:
description: The users id
data:
build: faker.random.uuid()
name:
description: The users first name
data:
fake: '{{name.firstName}}'
last_name:
description: The users last name
data:
fake: '{{name.lastName}}'
username:
description: The users username
data:
fake: '{{internet.userName}}'
password:
description: The users password
data:
fake: '{{internet.password}}'
email:
description: The users email address
data:
fake: '{{internet.email}}'
phone:
description: The users mobile phone
data:
fake: '{{phone.phoneNumber}}'
post_build: this.phone.replace(/x[0-9]+$/, '')
Results in the following
{
"user_id": "4d9ec95c-f45d-42f4-9d32-4ac81d83f95b",
"name": "Sandy",
"last_name": "Turner",
"username": "Zella61",
"password": "gi7NVXsUoARHhyU",
"email": "Buck_Cormier@hotmail.com",
"phone": "715.612.8609"
}
{
"user_id": "7f513d5b-f944-4a80-b52a-4876627368b7",
"name": "Duane",
"last_name": "VonRueden",
"username": "Mafalda92",
"password": "3uXo4hFZJTdf1hp",
"email": "Rickie_Braun@hotmail.com",
"phone": "(356) 009-7477 "
}
...etc
This is used to define out the properties for an object.
Each key inside of the properties
will be apart of the generated object. Each of the keys use the following properties to build the values.
type
: The data type of the property. Values can be:string
,object
,structure
,number
,integer
,double
,long
,float
,array
,boolean
,bool
description
: A description of the property. This is just extra notes for the developer and doesn't affect the data.data
: The same data options as defined above
name: test
key:
build: faker.random.uuid()
type: object
properties:
id:
data:
build: faker.random.uuid()
title:
type: string
description: The main title to use
data:
# single line is returned just like arrow functions
build: |
faker.random.word()
phone:
type: object
# This can be nested under another key
properties:
home:
type: string
data:
# this will also be returned
build: faker.phone.phoneNumber().replace(/x[0-9]+$/, '')
work:
type: string
data:
# this will also be returned
build: chance.bool({ likelihood: 35 }) ? faker.phone.phoneNumber().replace(/x[0-9]+$/, '') : null
This will return a object like this
{
"id": "4ce4da5c-0614-47d3-8fd6-3614c5461830",
"title": "alliance",
"phone": {
"home": "(949) 194-3347",
"work": "314-939-0541"
}
}
{
"id": "a649bbec-d629-4594-8fc8-ae34d97811a2",
"title": "Unbranded",
"phone": {
"home": "012-296-9810",
"work": null
}
}
etc...
This is used to define out how each item in an array is built
It uses the same structure as properties
does but it will return an array of values.
name: Array example
key:
data:
build: faker.random.uuid()
type: object
properties:
keywords:
type: array
description: An array of keywords
items:
type: string
data:
min: 3
max: 10
build: faker.random.word()
# You can also create a array of objects
phones:
type: array
description: An array of phone numbers
items:
type: object
data:
min: 1
max: 3
properties:
cell:
type: string
data:
build: faker.phone.phoneNumber().replace(/x[0-9]+$/, '')
home:
type: string
data:
build: chance.bool({ likelihood: 45 }) ? faker.phone.phoneNumber().replace(/x[0-9]+$/, '') : null
work:
type: string
data:
build: chance.bool({ likelihood: 10 }) ? faker.phone.phoneNumber().replace(/x[0-9]+$/, '') : null
{
"keywords": [ "GB", "Sports", "redundant", "Plastic", ],
"phones": [
{
"cell": "(555) 555 - 5555",
"home": "(666) 666 - 6666",
"work": null
},
{
"cell": "(777) 777 - 7777",
"home": null
"work": "(888) 888 - 8888",
}
]
}
It can be beneficial to define definitions that can be referenced one or more times throughout a model. This can be accomplished by using the $ref:
property. Consider the following example:
name: Contacts
type: object
key: contact_id
data:
min: 1
max: 4
properties:
contact_id:
data:
build: "chance.guid()"
details:
schema:
$ref: '#/definitions/Details'
phones:
type: array
items:
$ref: '#/definitions/Phone'
data:
min: 1
max: 4
emails:
type: array
items:
$ref: '#/definitions/Email'
data:
min: 0
max: 3
addresses:
type: array
items:
$ref: '#/definitions/Address'
data:
min: 0
max: 3
definitions:
Email:
data:
build: "faker.internet.email()"
Phone:
type: object
properties:
phone_type:
data:
build: "faker.random.arrayElement([ 'Home', 'Work', 'Mobile', 'Main', 'Other' ])"
phone_number:
data:
build: "faker.phone.phoneNumber().replace(/x[0-9]+$/, '')"
extension:
data:
build: "chance.bool({ likelihood: 20 }) ? chance.integer({min: 1000, max: 9999}).toString() : ''"
Address:
type: object
properties:
address_type:
data:
build: "faker.random.arrayElement([ 'Home', 'Work', 'Other' ]);"
address_1:
data:
# This uses es6 and only works if your project already has it install or you're on node 6+
build: "`${faker.address.streetAddress()} ${faker.address.streetSuffix()}`"
address_2:
data:
build: "chance.bool({ likelihood: 35 }) ? faker.address.secondaryAddress() : ''"
city:
data:
build: "faker.address.city()"
state:
data:
build: "faker.address.stateAbbr()"
postal_code:
data:
build: "faker.address.zipCode()"
country:
data:
build: "faker.address.countryCode()"
Details:
type: object
properties:
first_name:
data:
fake: "{{name.firstName}}"
last_name:
data:
build: "return chance.bool({ likelihood: 70 }) ? faker.name.lastName() : ''"
company:
type: string
description: The contacts company
data:
build: "return chance.bool({ likelihood: 30 }) ? faker.company.companyName() : ''"
job_title:
type: string
description: The contacts job_title
data:
build: "return chance.bool({ likelihood: 30 }) ? faker.name.jobTitle() : ''"
For this model we used 4 references:
$ref: '#/definitions/Details'
$ref: '#/definitions/Phone'
$ref: '#/definitions/Email'
$ref: '#/definitions/Address'
These could have been defined inline but that would make it more difficult to see our model definition, and each of these definitions can be reused. References are processed and included before a model is run and it's documents are generated.
The model defaults can be overwritten at run time by executing the pre_run
function. The this
keyword in both the pre_run
and post_run
functions is the processed model. Below are some examples of changing the number of documents the model should generate before the generation process starts.
name: Users
type: object
key: _id
data:
pre_run: |
this.data.count = 100
# etc...
This becomes beneficial if you are providing input data and want to generate a fixed number of documents. Take the following command for example:
Here we want to generate a countries model but we might not necessarily know the exact amount of data being provided by the input. We can reference the input data in our model's pre_run
function and set the number to generate based on the input array.
name: Countries
type: object
key: _id
data:
inputs: '../inputs/countries.csv'
pre_run: |
this.data.count = inputs.countries.length;
# etc...
If you don't want to use the CLI version of this app you can always use the JS api.
import Fakeit from 'fakeit'
const fakeit = new Fakeit()
fakeit.generate('glob/to/models/**/*.yaml')
.then((data) => {
console.log(data)
})
Below are the default options that are used unless overwritten.
import Fakeit from 'fakeit'
const fakeit = new Fakeit({
root: process.cwd(), // The root directory to operate from
babel_config: '+(.babelrc|package.json)', // glob to search for the babel config. This search starts from the closest instance of `node_modules`
seed: 0, // the seed to use. If it's 0 then a random seed is used each time. A string or a number can be passed in as an seed
log: true, // if then logging to the console is enable
verbose: false, // if true then verbose logging is enable
timestamp: true, // if true the logging output to console has timestamps
})
// models can be an a comma delimited string of globs, or an array of globs
// any models that are passed will output/returned.
const models = 'glob/to/models/**/*.yaml'
fakeit.generate(models, {
// this is the format to output it in
// available formats `json`, `csv`, `yaml`, `yml`, `cson`
format: 'json',
// the character(s) to use for spacing
spacing: 2,
// The type of output to use. Below are the available types
// `return`: This will the data in an array
// `console`: This will output the data to the console
// `couchbase`: This will output the data to a Couchbase server.
// `sync-gateway`: This will output the data to a Couchbase Sync Gateway server
// `directory`: The directory path to output the files (aka `path/to/the/destination`)
output: 'return',
// limit how many files are output at a time, this is useful
// to not overload a server or lock up your computer
limit: 100,
// this is used in the console output and if true it will
// format and colorize the output
highlight: true,
// the file name of the zip file. Currently this can only be used if you're
// outputting the data to a directory. It can't be used to output a zip file
// to a server, the console, or returned. (aka `archive.zip`)
archive: '',
// These options are used if the `output` option is `sync-gateway`,
// or `couchbase`. Otherwise they're ignored.
server: '127.0.0.1', // the server address to use for the server
bucket: 'default', // the bucket name
username: '', // the username to use if applicable
password: '', // the password for the account if applicable
timeout: 5000, // timeout for the servers
})
.then((data) => {
// the data returned will always be a string in the format that was set
data = JSON.parse(data)
// do something with data array of arrays
})
To see more examples of some of the things you can do take a look at the test cases that are in this repo
- Model Dependencies are now defined in the model it's self by the file path to the model that the current one depends on. It doesn't matter what order they're because they will be resolve automagically.
- Model Inputs are now defined in the model it's self by the file path to the inputs location that the current model depends on. It can be a string or an object.
- Babel +6 support now exists. We don't install any presets or plugins for you but if
.babelrc
orbabelConfig
exists in thepackage.json
of your project then all the functions are transpiled. - Better error handling has been added so you know what went wrong and where it happened.
- JS support has also been added so you are no longer required to use the command line to create fake data.
- Added support for seeds to allow repeatable data.
- Added a progress indicator to show how many documents have been created