/datacraft-xeger

Custom plugin for datacraft to generate values using regular expressions.

Primary LanguagePythonMIT LicenseMIT

datacraft-xeger

Custom plugin for datacraft to generate values using regular expressions. Uses the rstr package. xeger is regex backwards. Inspiration from the original Java Package xeger.

Usage in Specs

You can use the xeger as a type in your datacraft data specs. See example:

{
  "ssn":{
    "type": "xeger",
    "data": "\\d{3]-\\d{2}-\\d{4}"
  }
}
$ datacraft -s xeger.json -i 3 --format json-pretty -x -l error
[
    {
        "ssn": "322-81-1469"
    },
    {
        "ssn": "697-21-8178"
    },
    {
        "ssn": "340-78-5377"
    }
]

Custom Regex Types

Users can make use of the datacraft_xeger module to create custom datacraft value suppliers with regex patterns. The example below shows how to register custom types for different country phone number patterns.

import datacraft
import datacraft_xeger.suppliers as xeger

phone_patterns = {
    # type_name: pattern
    'uk-phone': r'\+44 \d{4} \d{6}',
    'aus-phone': r'\+61 4\d{2} \d{3} \d{3}',
    'nz-phone': r'\+64 \d{2} \d{4} \d{4}',
    # ...
}


@datacraft.registry.types('uk-phone')
def _custom_regex_uk_phone(spec, loader):
    return xeger.xeger_supplier(phone_patterns['uk-phone'])


@datacraft.registry.types('aus-phone')
def _custom_regex_aus_phone(spec, loader):
    return xeger.xeger_supplier(phone_patterns['aus-phone'])


@datacraft.registry.types('nz-phone')
def _custom_regex_nz_phone(spec, loader):
    return xeger.xeger_supplier(phone_patterns['nz-phone'])

Once registered these types can be used as part of the data generation process. See the example data spec:

{
  "name": ["ann", "bob", "carl"],
  "age": { "type":  "rand_int_range", "data":  [25, 75]},
  "phone": {
    "type": "weighted_ref",
    "data": {
      "UK": 0.5, "AUS": 0.3, "NZ": 0.2
    }
  },
  "refs": {
    "UK": { "type": "uk-phone" },
    "AUS": { "type": "aus-phone" },
    "NZ": { "type": "nz-phone" }
  }
}

Running datacraft against this spec and using the custom code loading feature:

datacraft -s custom.json -c custom.py -i 3 --format json-pretty -x -l warn
[
    {
        "name": "ann",
        "age": 67,
        "phone": "+64 07 2500 7403"
    },
    {
        "name": "bob",
        "age": 49,
        "phone": "+61 435 126 947"
    },
    {
        "name": "carl",
        "age": 61,
        "phone": "+44 7693 148185"
    }
]