kitodo/kitodo-production

Long numbers are not written into the Elasticsearch Index

Closed this issue · 2 comments

Describe the bug
If you have a long number, e.g. a catalog identifier like 991022551489706476 those number does not get indexed into the Elasticsearch index. I encountered that problem, when i tried to import a child process and Kitodo did not find a parent process although the parent process exists. The problem is that for those long numbers no value gets indexed.The metadata section of the indexed document looks like this. (The relevant field here is CatalogIDDigital)

                    {
                        "name": "Author",
                        "content": "Martin Chemnitz"
                      },
                      {
                        "name": "CatalogIDDigital"
                      },
                      {
                        "name": "DocLanguage",
                        "content": "lat"
                      }

The problem is that those numbers have the Java type Long and when constructing the Index we currently only look for strings and integers:

if (value instanceof String || value instanceof Integer) {
json.put(prepareKey(key), value);
} else if (value instanceof JSONObject) {

We probably should also keep Longs here by explicitely allowing that.

 Object value = xmlJSONObject.get(key);
            if (value instanceof String || value instanceof Integer || value instanceof Long) {
                json.put(prepareKey(key), value);
            }

When applying this change the CatalogIDDigital has a value in Elasticsearch.

  {
                        "name": "CatalogIDDigital",
                        "content": 9910211111112122
                      },
                      {
                        "name": "TitleDocMain",
                        "content": "... Pars Examinis decretorum Concilii Tridentini"
                      },

This fix enables me to find my parent process again, but i am not sure wether we better convert the value to a string or how to differentiate this from other cases where we need a Long. (we probably do not need Longs for identifiers, since we are not doing any calculations on them).

To Reproduce
Steps to reproduce the behavior:

  1. Put a long number into a metadatafield
  2. Save the record
  3. Check the Elasticsearch Index. The long number is missing from the metadata

Expected behavior
Long numbers shoutd not be lost when indexing the document.

Release
3.6.1, Master

@solth I put this into the 3.6.2 milestone because i consider it as a quite serious bug. Feel free to take it out, if you are not planning to include this in the next bugfix release.

@BartChris I can understand that this is a serious issue. I planned to release 3.6.2 in the next days, though, so it would be great if you could provide a fix for this issue based on the solution you described above. I think you are right that catalog IDs should be Strings instead of numbers (since those IDs could be anything and could also include characters in addition to numbers).

Fixed by #5936