Convert JSON schema to ElasticSearch mappings
A mapping type has:
Meta-fields
Meta-fields are used to customize how a document’s metadata associated is treated. Examples of meta-fields include the document’s _index
, _type
, _id
, and _source
fields.
Fields or properties
A mapping type
contains a list of fields or properties pertinent to the document.
Field datatypes
Each field has a data type
which can be:
- a simple type like
text
,keyword
,date
,long
,double
,boolean
orip
- a type which supports the hierarchical nature of JSON such as
object
ornested
- a specialised type like
geo_point
,geo_shape
, orcompletion
It is often useful to index the same field in different ways for different purposes. For instance, a string
field could be indexed as a text
field for full-text search, and as a keyword
field for sorting or aggregations. Alternatively, you could index a string field with the standard
analyzer, the english
analyzer, and the french
analyzer.
This is the purpose of multi-fields. Most datatypes support multi-fields via the fields
parameter.
- npm:
npm install json-schema-to-es-mapping -S
- yarn:
yarn add json-schema-to-es-mapping
The easiest way to get started is to use buildMappingsFor
to create a mappings object for a named index given a JSON schema.
const mappings = buildMappingsFor("people", schema);
Example:
const schema = {
$schema: "http://json-schema.org/draft-07/schema#",
$id: "http://example.com/person.schema.json",
title: "Person",
description: "A person",
type: "object",
properties: {
name: {
description: "Name of the person",
type: "string"
},
age: {
description: "Age of person",
type: "number"
}
},
required: ["name"]
};
const { buildMappingsFor } = require("json-schema-to-es-mapping");
const mappings = buildMappingsFor("people", schema);
console.log({ mappings });
This will by default give the following mappings result:
{
"mappings": {
"people": {
"properties": {
"name": {
"type": "keyword"
},
"age": {
"type": "integer"
}
}
}
}
}
The function buildMappingsFor
uses the build
function to return the properties map and simply wraps them with a mappings
object for the named index.
Currently all Elastic Search core data types are supported (except for binary
).
- string
- numeric
- boolean
- date
- object
- ranges (numeric, date) (soon)
- geo_point (soon)
- ip (soon)
Note: The most feature complete version can currently be found in the to-ts branch. This branch is almost complete. It has unit test coverage of most of the functionality, includes initial support for complex schema types (such as anyOf
a list of types) and the code has been converted to TypeScript.
Please help with the finishing touches so it can be released if you want or need these extra mappings and other features.
You can assist the numeric type mapper by supplying a numType
for the field entry, such as numType: "double"
See ES number reference for list of valid numType
s (except for scaled_float
)
- Numeric
- Date
To make a numeric field entry be mapped to an ES numeric range:
- Set
range: true
- Set a minimum range value, either
minimum
orexlusiveMinimum
- Set a maximum range value, either
maximum
orexlusiveMaximum
If you leave out the range: true
it will be resolved as a number, using the min and max values and the multipleOf
(precision). These properties will in combination be used to determine the exact numeric type (byte
, short
, ... double
) to be used in the Elastic Search numeric type mapping.
To make an entry detect as a date range, the same applies as for a number range but the entry must also resolve to a date type (see types/util.js
function isDate(obj)
for details)
Now also resolves:
- Array items that are themselves object types
- References to object definitions (ie.
$ref
) - Parent-child mapping
Support for Geo location mapping will likely be included in the near future.
Please Let me know any other features you'd like to include for a more feature complete library!
Initial work to support these features have been started in the dev branch and should land soon (0.4.0).
For more fine-grained control, use the build
function directly.
const { build } = require("json-schema-to-es-mapping");
const { properties, results } = build(schema);
console.log({ properties, results });
Will output the following Elastic Search Mapping schema:
{
"name": {
"type": "text"
},
"age": {
"type": "float"
}
}
The results
will in this (simple) case give the same results as the mappings
:
{
name: { type: "keyword" },
age: { type: "float" }
}
You can use the Event driven approach with the onResult
and other calback handlers, to generate a more context specific mapping for Elastic Search context, given your requirements.
const received = [];
const onResult = result => {
console.log("received", result);
received.push(result);
};
// potentially use to call resolve callback of Promise
const onComplete = fullResult => {
console.log("ES mapping done :)", {
fullResult, // 'internal" results
received // list built by onResult
});
};
// potentially use to call reject callback of Promise
const onError = errMsg => {
console.error("ES mapping error", errMsg);
throw errMsg;
};
// potentially use to call reject callback of Promise
const onThrow = err => throw err;
const config = { onResult, onComplete, onError, onThrow };
The onResult
handler will populate the received
array with the following:
[
{ parentName: "Person", key: "name", resultKey: "name", type: "text" },
{
parentName: "Person",
key: "age",
resultKey: "age",
type: "float"
}
];
You will also get notified on:
- successful completion of JSON schema mapping via
onComplete
callback - aborted due to processing error via
onError
callback - aborted due to throwing exception via
onThrow
callback
The Event driven approach is entirely optional, but can be used for a more "stream like" approach. This approach works well with async promises (ie. reject
and resolve
callbacks).
On each result received you can then issue a command to the Elastic Search server (f.ex via the REST interface) to add a new mapping that reflects the result received.
PUT person/_mapping/_doc
{
"properties": {
"age": {
"type": "float"
}
}
}
Alternatively only submit the ES index mappings after onComplete
is triggered, to make sure the full JSON schema could be processed, so that you don't end up with partial schema mappings.
For a nested schema of the form:
{
$schema: "http://json-schema.org/draft-07/schema#",
$id: "http://example.com/person.schema.json",
title: "Person",
description: "A person",
type: "object",
properties: {
name: {
description: "Name of the person",
type: "string"
},
dog: {
type: "object",
typeName: "Animal",
properties: {
name: {
description: "Name of the dog",
type: "string",
required: true
},
age: {
description: "Age of dog",
type: "number"
}
}
}
},
required: ["name"]
};
buildMappingsFor
will in this case generate an Elastic Search mapping as follows:
mappings: {
people: {
properties: {
name: {
type: "keyword"
},
dog: {
properties: {
name: {
type: "keyword"
},
age: {
type: "float"
}
}
}
}
}
}
Note that the dog
object results in a nested mapping (see ElasticSearch resources below)
The results
will in this case give:
{
name: { type: 'keyword' },
dog_name: { type: 'keyword' },
dog_age: { type: 'float' },
dog: {
name: { type: 'keyword' },
age: { type: 'float' }
}
}
Notice how the dog properties are provided both in flat and nested form. Depending on your requirements, you might want to store the Elastic Search data in a more flat form than in your general application domain model.
You can pass a custom function shouldSetResult(converter)
which controls under which converter conditions the result should be set. You can also pass:
- a custom name separator
nameSeparator
- a
resultKey(converter)
function, to customize how result keys (names) are generated - a
nestedKey(converter)
function, to customize how nested result keys (names) are generated
Example:
const config = {
shouldSetResult: converter => {
return converter.type !== "object";
},
nameSeparator: "__" // example: dog__age
};
This configuration will result in results discarding the nested form, thus only retaining flattened field mappings.
{
name: { type: 'keyword' },
dog__name: { type: 'keyword' },
dog__age: { type: 'float' },
}
If you add an onResult
handler to receive results, it will look as follows:
results:
[
{
parentName: 'Person',
key: 'name',
resultKey: 'name',
type: 'keyword'
},
{
parentName: 'dog',
key: 'name',
resultKey: 'dog__name',
type: 'keyword'
},
{ parentName: 'dog',
key: 'age',
resultKey: 'dog__age',
type: 'float'
},
{ parentName: 'Person',
typeName: 'Animal',
key: 'dog',
resultKey: 'dog',
properties: {
name: { type: 'keyword' },
age: { type: 'float' }
}
}
]
}
Note the typeName
in the result for the dog
fields (more on this later)
The default configuration is as follows.
{
_meta_: {
types: {
string: "keyword",
number: "float",
object: "object",
array: "nested",
boolean: "boolean",
date: "date"
}
},
fields: {
name: {
type: "keyword"
},
content: {
type: "text"
},
text: {
type: "text"
},
title: {
type: "text"
},
caption: {
type: "text"
},
label: {
type: "text"
},
tag: {
type: "keyword",
index: "not_analyzed"
}
}
}
Note that some or all of these might benefit from being defined as multi fields, that are indexed and analyzed both as text
and keyword
.
You can pass in a custom configuration object (last argument) to override or extend it ;)
Note that for convenience, we pass in some typical field mappings based on names. Please customize this further to your needs.
- Type mappers
- Rules
You can pass in custom Type mapper factories if you want to override how specific types are mapped.
Internally this is managed in the SchemaEntry
constructor in entry.js
:
this.defaults = {
types: {
string: toString,
number: toNumber,
boolean: toBoolean,
array: toArray,
object: toObject,
date: toDate,
dateRange: toDateRange,
numericRange: toNumericRange
},
typeOrder: [
"string",
"dateRange",
"numericRange",
"number",
"boolean",
"array",
"object",
"date"
]
};
this.types = {
...this.defaults.types,
...(config.types || {})
};
this.typeOrder = config.typeOrder || this.defaults.typeOrder;
To override, simply pass in a custom types
object and/or a custom typeOrder
array of the precedence order they should be resolved in.
Create a toObject
file loally in your project that contains your overrides
const { types } = require("json-schema-to-es-mapping");
const { MappingObject, toObject, util } = types;
class MyMappingObject extends MappingObject {
// ...override
createMappingResult() {
return this.hasProperties
? this.buildObjectValueMapping()
: this.defaultObjectValueMapping;
}
buildObjectValueMapping() {
const { buildProperties } = this.config;
return buildProperties(this.objectValue, this.mappingConfig);
}
}
module.exports = function toObject(obj) {
return util.isObject(obj) && new MyMappingObject(obj).convert();
};
Import the toObject
function and pass it in the types
object of the config
object passed to the build
function.
// custom implementation
const toObject = require("./toObject");
const myConfig = {
types: {
toObject
}
};
// will now use the custom toObject for mapping JSON schema object to ES object
build(schema, myConfig);
Depending on your requirements, you can post-process the generated mapping to better suit your specific needs and strategies for handling nested/complex data relationships.
Core:
- String (
text
,keyword
) - Numeric (
long
,integer
,short
,byte
,double
,float
,half_float
,scaled_float
) - Date (
date
) - Boolean (
boolean
) - Binary (
binary
) - Range (
integer_range
,float_range
,long_range
,double_range
,date_range
)
The default type mappings are as follows:
boolean
->boolean
object
->object
array
->nested
string
->keyword
number
->integer
date
->date
For array
it will use type
of first array item if basic type and the type for all array items are the same.
{
"type": "array",
"items":{
"type": "integer"
}
}
If array item types are note "uniform" it will throw an error.
For the following array JSON schema entry the mapper will currently set the mapping type to string
(by default). Please use the customization options outlined to define a more appropriate mapping strategy if needed.
{
"type": "array",
"items" : [{
"type": "string"
// ...
},
{
"type": "string"
// ...
},
]
}
You can override the default type mappings by passing a types
entry with type mappings in the _meta_
entry of config
const config = {
_meta_: {
types: {
number: "long", // use "integer" for numbers
string: "text" // use "text" for strings
}
}
};
You can pass an extra configuration object with specific rules for ES mapping properties that will be merged into the resulting mapping.
const config = {
_meta_: {
types: {
number: "long", // use "integer" for numbers
string: "text" // use "text" for strings
}
},
fields: {
created: {
// add extra indexing field meta data for Elastic search
format: "strict_date_optional_time||epoch_millis"
// ...
},
firstName: {
type: "keyword" // make sure firstName will be a keyword field (exact match) in ES mapping
}
}
};
const { build } = require("json-schema-to-es-mapping");
const mapping = build(schema, config);
Also note that you can pass in many of the functions used internally, so that the internal mechanics themselves can easily be customized as needed or used as building blocks.
- Elasticsearch: Nested datatype
- Elasticsearch: Nested Objects
- Elasticsearch data schema for nested objects
- Elasticsearch : Advanced search and nested objects
To override the default mappings for certain fields, you can pass in a fields mapping entry in the config
object as follows:
const config = {
fields: {
timestamp: {
type: "date",
format: "dateOptionalTime"
}
// ... more custom field mappings
}
};
For a more scalable customization, pass an entryFor
function which returns custom mappings
depending on the entry being processed.
key
resultKey
(ie. potentially nested key name)parentName
name of parent entry if nested propertyschemaValue
(entry from JSON schema being mapped)
You could f.ex use this to provide custom mappings for specific types of date fields.
const config = {
entryFor: ({ key }) => {
if (key === "date" || key === "timestamp") {
return {
type: "date",
format: "dateOptionalTime"
};
}
}
};
You can use resolve-type-maps to define mappings to be used across your application in various schema-like contexts:
- GraphQL schema
- Data storage (tables, colletions etc)
- Validation
- Forms
- Data Display
- Indexing (including Elastic Search)
- Mocks and fake data
const fieldMap = {
name: {
matches: ['title', 'caption', 'label'],
elastic: {
type: 'string',
}
}
tag: {
matches: ['tags'],
elastic: {
type: 'keyword',
}
},
text: {
matches: ['description', 'content'],
elastic: {
type: 'text',
}
},
date: {
matches: ['date', 'timestamp'],
elastic: {
type: 'text',
format: 'dateOptionalTime'
}
}
}
const typeMap = {
Person: {
matches: ['User'],
fields: {
dog: {
// ...
elastic: {
type: 'nested',
// ...
}
},
// ...
}
}
}
Then pass an entryFor
function in the config object to resolve the entry to be used for the ES mapping entry.
import { createTypeMapResolver } from "resolve-type-maps";
const map = {
typeMap,
fieldMap
};
const resolverConfig = {};
const functions = {
resolveResult: (obj) => obj.elastic;
}
const resolver = createTypeMapResolver(
{ map, functions },
resolverConfig
);
const config = {
entryFor: ({ parentName, typeName }) => {
// ensure capitalized and camelized name
const type = classify(typeName || parentName);
const name = converter.key;
return resolver.resolve({ type, name });
}
};
Note that for typeName
to be set, either set a className
or typeName
property on the object entry in the JSON schema (see dog
example above) or alternatively provide a lookup function typeNameFor(name)
on the config object passed in.
For inner workings, see TypeMapResolver.ts
The above configuration should look up the elastic mapping entry to use, based on the type/field combination in the typeMap
first and then fall back to the field name only in the fieldMap
if not found. On a match, it will resolve by returning entry named elastic
in the object matching.
{
Person: {
matches: [/User/],
fields: {
dog: {
// ...
elastic: {
type: 'nested',
// ...
}
},
}
}
}
It should match a schema (or nested schema entry) named Person
or User
on the typeMap
entry Person
. For the nested dog
entry it should then match on the entry dog
under fields
and return the entry for elastic, ie:
{
type: "nested";
}
If no match is made in the typeMap
, it will follow a similar strategy by lookup a match in the fieldMap
(as per the maps
entry passed in the config
object when creating the resolver
), matching only on the field name.
Uses jest for unit testing.
Currently not well tested. Please help add more test coverage :)
- Convert project to TypeScript
- Add unit tests for ~80% test coverage
- Improve mappings for:
- Date range
2019 Kristian Mandrup (CTO@Tecla5)
MIT