Memory leak when using geo_point
mbianchihr opened this issue · 0 comments
PGSync version:
3.1.0
Postgres version:
16
Elasticsearch/OpenSearch version:
8.11.4
Redis version:
7.2.4
Python version:
3.11.6
Problem Description:
When marking fields as geo_point
memory leak.
[
{
"database": "db",
"index": "delivery",
"plugins": [
"Multisource"
],
"nodes": {
"table": "parcel",
"schema": "public",
"columns": [
"id",
"externalId",
"status",
"weight",
"statusChangeTime",
"statusNote",
"createdAt"
],
"transform": {
"mapping": {
"location": {
"type": "geo_point"
},
"proofLocation": {
"type": "geo_point"
}
}
},
...
}
}
]
Then in plugin I have following "rewrite" since the location fields are deep down and are remapped to top level.
from pgsync import plugin
class MultisourcePlugin(plugin.Plugin):
name = 'Multisource'
def transform(self, doc, **kwargs):
...
doc['location'] = None
doc['proofLocation'] = None
...
if doc['delivery'].get('location'):
doc['location'] = doc['delivery']['location']['coordinates']
if doc['delivery'].get('proofLocation'):
doc['proofLocation'] = doc['delivery']['proofLocation']['coordinates']
...
When not using this there barely any memory used, and redis is in both cases using around 6-10mb of ram. The real issue lies in memory usage that makes the EC2 fail and become unresponsive at around 200 000 rows with usage of 3.7GB of ram, with our current data set of 1.5 milion it would eat up to 32+GB of ram if estimating from current usage.
This is happening on initial sync, what I suppose is happening is data is not being cleared from memory because fields do not exist in the schema/top level.
However I would love to be pointed to correct place, and yes I sadly have to have these fields on top level instead of parcel.delivery.location
.