Scribery/aushape

Consider having field value sub-fields

spbnick opened this issue · 2 comments

As Elasticsearch doesn't support arrays with values of different types, it will be a problem to store raw and interpreted values with different types in it, as currently planned.

We need to either have everything as strings, which wouldn't work for searching some quantitative field values, or have subfields for some interpreted fields.

We need to also try indexing them as Multi Field Types.

Say, have "keyword" and "long" subfields for each.

We need to balance the raw JSON size and increasing default querying difficulty of having subfields in source JSON, but more specific mapping vs index size increase of having subfields in Elasticsearch.

Another good idea can be to have one field subfield per value type. So, if the raw value is string then JSON would have "s" subfield, and XML would have "s" attribute. This way both schemas and mappings can be greatly simplified. We would still need to store the knowledge of what field of what record is what type, but at least the schemas and mappings would be simpler.