SearchHit.sourceAsString adds superfluous type metadata
tastyminerals opened this issue · 2 comments
I am stuggling to figure out how can one deserialize a valid JSON hit string into a SearchHit
? We are using JSON string -> SearchHit
pattern a lot in our tests and classic elasticsearch library SearchHit
allowed to do the following:
def sourceFixtureAsSearchHit(fileName: String, docId: Int = 1): SearchHit = {
val fixture = loadUTF8FixtureAsString(fileName)
val source = new BytesArray(fixture)
val hit = new SearchHit(docId)
hit.sourceRef(source)
hit
}
The elastic4s SearchHit
doesn't provide sourceRef
. Hence, we parse (via circe) a JSON string into a Map[String, Any]
val sourceMap = parser.parse(fixture).getOrElse(Json.Null)
.asObject.map(_.toMap)
.getOrElse(Map.empty[String, Json])
and then store it into elastic4s SearchHit(_source = sourceMap)
. This however produces a different JSON representation during serialization back via hit.sourceAsString
. For example, the original JSON
{"document_id" : "0b85846f-2c7b-4cc8-b265-6c3fdf1da815"}
becomes
{
"document_id": {
"value": "0b85846f-2c7b-4cc8-b265-6c3fdf1da815",
"array": false,
"null": false,
"boolean": false,
"number": false,
"string": true,
"object": false
}
}
This drastically increases the resulting string size: 214 lines -> ~3k. So, this doesn't look like the correct way to create SearchHit
from strings. So, how does one deserialize a string into SearchHit
?
Apparently, a workaround is possible if you use .sourceAsMap
instead of .sourceAsString
and then convert it to a new json string on your end. So the question is why .sourceAsString
adds all that additional type metadata? It will index significantly more data if not checked :(
This has something to do with the Jackson that elastic4s uses. So, whenever SearchHit
is instantiated manually and _source
is set. The downstream .sourceAsString
will generate a json with type elements. We avoid it now only by calling .sourceAsMap
on the manually instantiated SearchHit
, converting the result map into Json object and then back to String using circe.