rwynn/monstache

Bug: Setting mongodb field value to null does not index it in Elasticsearch

Closed this issue · 5 comments

Thank you so much for sucha great project, I found a bug in the sync.

With the config below

gzip = true
resume = true
verbose = true
stats = true
index-stats = true
index-oplog-time = true
direct-read-split-max = -1
direct-read-concur = 1
direct-read-no-timeout = true
index-as-update = true
elasticsearch-max-conns = 1
[gtm-settings]
buffer-size = 2048
buffer-duration = "75ms"
max-await-time = "2s"

If you have a document in mongodb and you update a field and set it to null - the field / value get omitted in elasticsearch _bulk update request.

If I took the sample plugin mentioned and compiled it and ran monstache with the plugin

package main
import (
	"github.com/rwynn/monstache/v6/monstachemap"
)

func Map(input *monstachemap.MapperPluginInput) (output *monstachemap.MapperPluginOutput, err error) {
	doc := input.Document
	output = &monstachemap.MapperPluginOutput{Document: doc}
	return
}

the value get indexed as null correctly in elasticsearch.

If I update the field with empty string it get indexed and works fine, I think there is something that is removing the null fields?

Thank you

Hi @meabed ,

I wasn't able to replicate this issue with the code in the rel6 branch. What version of monstache are you using? Are you able to give minimal steps to replicate?

This is what I am seeing in the logs when I set things to null in MongoDB...

INFO 2023/12/22 14:12:29 Started monstache version 6.7.17
INFO 2023/12/22 14:12:29 Go version go1.20.3
INFO 2023/12/22 14:12:29 MongoDB go driver v1.13.1
INFO 2023/12/22 14:12:29 Elasticsearch go driver 7.0.31
INFO 2023/12/22 14:12:29 Successfully connected to MongoDB version 6.0.12
INFO 2023/12/22 14:12:29 Successfully connected to Elasticsearch version 8.11.3
INFO 2023/12/22 14:12:29 Listening for events
INFO 2023/12/22 14:12:29 Sending systemd READY=1
WARN 2023/12/22 14:12:29 Systemd notification not supported (i.e. NOTIFY_SOCKET is unset)
INFO 2023/12/22 14:12:29 Watching changes on the deployment
TRACE 2023/12/22 14:12:42 POST /_bulk HTTP/1.1
Host: localhost:9200
User-Agent: elastic/7.0.31 (linux-amd64)
Content-Length: 145
Accept: application/json
Authorization: Basic ZZZ
Content-Type: application/x-ndjson
Accept-Encoding: gzip

{"update":{"_index":"test.test","_id":"6585cbfa8952c6532bf0db6c"}}
{"doc":{"a":{"zed":null},"bar":null,"foo":1,"zed":null},"doc_as_upsert":true}

TRACE 2023/12/22 14:12:42 HTTP/1.1 200 OK
Content-Length: 244
Content-Type: application/json
X-Elastic-Product: Elasticsearch

{"errors":false,"took":37,"items":[{"update":{"_index":"test.test","_id":"6585cbfa8952c6532bf0db6c","_version":7315497415933427715,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":6,"_primary_term":3,"status":200}}]}

Thank you @rwynn im using the latest docker build 6.7.14

I will make small repo to reproduce.

It happens on updating existing document in elastic search not on new document.

I tried both insert and update using $set. But in both cases the null was delivered for me.

I'll take a look at the repo though when you get a chance.

Hi @rwynn

Thank you so much, I have tested and its working as expected.

The issue I had it was from the javascript transformation it was removing the null and unset fields :)

Thank you for your support 🙏

Debugging the issue - this was actually from lodash _.omit method it was removing this fields - FYI For anyone having the same issue.