High Memory and CPU Overload on VPS During Bulk Updates with Nested Relations
Closed this issue · 1 comments
PGSync version: 3.2.0
Postgres version: 15.1
Elasticsearch/OpenSearch version: 7.17.23
Redis version: 7.2.5
Python version: 3.9
Problem Description:
I recently added PGSync to my docker-compose.yml for Supabase with the following setup:
pgsync-redis:
container_name: pgsync-redis
image: redis
command: redis-server --requirepass ${PGSYNC_REDIS_PASSWORD}
ports:
- 6379:6379
pgsync:
container_name: pgsync
build: ./requirements/pgsync
depends_on:
db:
condition: service_healthy
pgsync-redis:
condition: service_healthy
environment:
- PG_USER=supabase_admin
- PG_PASSWORD=${POSTGRES_PASSWORD}
- PG_HOST=${POSTGRES_HOST}
- PG_PORT=${POSTGRES_PORT}
- PG_DATABASE=${POSTGRES_DB}
- REDIS_HOST=pgsync-redis
- REDIS_PORT=6379
- REDIS_AUTH=${PGSYNC_REDIS_PASSWORD}
- ELASTICSEARCH_SCHEME=${ELASTICSEARCH_SCHEME}
- ELASTICSEARCH_HOST=${ELASTICSEARCH_HOST}
- ELASTICSEARCH_PORT=${ELASTICSEARCH_PORT}
- ELASTICSEARCH_USER=${ELASTICSEARCH_USER}
- ELASTICSEARCH_PASSWORD=${ELASTICSEARCH_PASSWORD}
- ELASTICSEARCH=True
- LOG_LEVEL=INFO
Everything works perfectly during initial synchronization with Elasticsearch, and updates to individual entries in my Postgres database are also synced correctly.
However, I encountered a severe issue when running a Python script that updates data for around 10,000 movies daily at 10 AM. Since adding PGSync to my stack, this bulk update caused my VPS to overload in terms of RAM usage, which I noticed through the extremely slow performance of my Supabase API and even SSH access to my VPS. This overload persisted for over 10 hours after the script had finished running.
Here are the relevant details:
-
Postgres contains two tables:
- tmdb_movie (columns like id, original_title, etc.)
- tmdb_movie_translation (linked to tmdb_movie with a movie_id foreign key)
-
PGSync configuration:
[
{
"database": "postgres",
"index": "movies",
"mapping": {
"translations": {
"type": "nested"
}
},
"nodes": {
"table": "tmdb_movie",
"schema": "public",
"columns": [
"id",
"adult",
"backdrop_path",
"budget",
"homepage",
"imdb_id",
"original_language",
"original_title",
"popularity",
"release_date",
"revenue",
"runtime",
"status",
"vote_average",
"vote_count",
"collection_id"
],
"children": [
{
"table": "tmdb_movie_translation",
"columns": [
"title",
"overview",
"language_id"
],
"label": "translations",
"relationship": {
"variant": "object",
"type": "one_to_many"
}
}
]
}
}
]
The first synchronization goes smoothly, and individual updates are synced correctly. But when the bulk update occurs, it seems that the numerous triggers, combined with the nested relationship in tmdb_movie_translation, are causing a massive overload.
Observed behavior:
- Last logs from PGSync were repeating the following message in a loop:
pgsync | Sync postgres:movies Xlog: [0] => Db: [4] => Redis: [0] => Elasticsearch: [958,552]...
- The overload continued even after stopping PGSync and restarting the database. Supabase containers were continuously overloaded in terms of CPU and RAM.
Possible Related Issues:
I suspect this might be related to how PGSync handles the large number of triggers during bulk updates, especially with nested relationships : #88
Any insights or solutions would be greatly appreciated! 😁
Well I've upgrade my VPS with more vCPU and RAM and seems to work. I close this issue for now 😁