Server died

Question

Server died

riderx opened this issue 10 months ago · 2 comments

Bug report

I confirm this is a bug with Supabase, not with my own application.
I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

Sometimes the requested query is too long to run or too big i'm not sure, and then the CPU of supabase goes crazy, the only way to make it go down is to reboot the instance, can we have a way to limit the response time or the usage of the wrapper to not kill the db itself?

Expected behavior

Never make the db died

Screenshots

If applicable, add screenshots to help explain your problem.

System information

OS: [e.g. macOS, Windows]
Browser (if applies) [e.g. chrome, safari]
Version of supabase-js: [e.g. 6.0.2]
Version of Node.js: [e.g. 10.10.0]

Additional context

Add any other context about the problem here.

Answer 1 · 2024-02-08T07:49:45.000Z

Can you please share the query that you were running? Which wrapper was this?

Answer 2 · 2024-02-08T14:06:03.000Z

@imor I work with @riderx and after having to disable clickhouse FDW we concluded it's a very, very wierd behaviour of the push down where clause support in clickhouse. Here is what I am talking about:

CREATE OR REPLACE FUNCTION public.get_total_stats_v4(userid uuid)
RETURNS TABLE(mau bigint, bandwidth double precision, storage double precision)
LANGUAGE plpgsql SECURITY DEFINER
AS $$
DECLARE
    anchor_start date;
    anchor_end date;
    usage_table_name text;
BEGIN
    SELECT subscription_anchor_start, subscription_anchor_end INTO anchor_start, anchor_end
    FROM stripe_info
    WHERE customer_id=(SELECT customer_id from users where id=userid);

    RETURN QUERY SELECT 
            COALESCE(MAX(clickhouse_app_usage.mau), 0)::bigint AS mau,
            COALESCE(round(convert_bytes_to_gb(SUM(clickhouse_app_usage.bandwidth))::numeric,2), 0)::float AS bandwidth,
            COALESCE(round(convert_bytes_to_gb(SUM(clickhouse_app_usage.storage_added - clickhouse_app_usage.storage_deleted))::numeric,2), 0)::float AS storage
        FROM clickhouse_app_usage
        WHERE app_id IN (SELECT app_id from apps where user_id=userid)
        AND date >= anchor_start
        AND date <= anchor_end
        LIMIT 1;
END;  
$$;

CREATE OR REPLACE FUNCTION public.get_total_stats_v5(userid uuid)
RETURNS TABLE(mau bigint, bandwidth double precision, storage double precision)
LANGUAGE plpgsql SECURITY DEFINER
AS $$
DECLARE
    anchor_start date;
    anchor_end date;
    app_ids text[];
    usage_table_name text;
BEGIN
    SELECT subscription_anchor_start, subscription_anchor_end INTO anchor_start, anchor_end
    FROM stripe_info
    WHERE customer_id=(SELECT customer_id from users where id=userid);

    -- Retrieve the app_ids into the variable
    SELECT array_agg(app_id) INTO app_ids FROM apps WHERE user_id=userid;

    -- Use the app_ids variable in the query
    RETURN QUERY SELECT 
            COALESCE(MAX(clickhouse_app_usage.mau), 0)::bigint AS mau,
            COALESCE(round(convert_bytes_to_gb(SUM(clickhouse_app_usage.bandwidth))::numeric,2), 0)::float AS bandwidth,
            COALESCE(round(convert_bytes_to_gb(SUM(clickhouse_app_usage.storage_added - clickhouse_app_usage.storage_deleted))::numeric,2), 0)::float AS storage
        FROM clickhouse_app_usage
        WHERE app_id = any(app_ids)
        AND date >= anchor_start
        AND date <= anchor_end
        LIMIT 1;
END;  
$$;

As you can see, in v4 I do WHERE app_id IN (SELECT app_id from apps where user_id=userid) but in v5 I do WHERE app_id = any(app_ids). This is a huge difference, it causes the behavior of the FDW to fundamentally change

The issue we were experiencing was caused by a very high cpu usage. This function was called really often and it was incredibly expensive.

As a recommendation I would suggest changing the docs to inform about this behavior or fix the issue