Stripe FDW randomly returns no data
vincenzon opened this issue · 8 comments
I am querying stripe FDW tables using an rpc call using supabase-js from my web app. About half the time, the query returns no results. It should return 2 rows, and does so about half the time. I can reproduce it reliably by just refreshing the page. It is not obvious when it will return results and when not. It may be a function of how quickly I refresh, but isn't always. I can refresh after waiting several minutes, get no results, refresh again and get results as well as vice-versa. There isn't any rhyme or reason to it as far as I can tell. When it returns no results, it does not have an error, just an empty result: {data: [], error: null}
.
Supabase support is telling me to use pg_cron and to build what is effectively a materialized view of the stripe FDW data within supabase. If that is the recommended workflow, I think it should be documented as such.
Here is the rpc call with the query, it joins the product and price tables. There are two products, and those were populated once a while ago and have not changed.
create or replace function public.get_subscription_prices()
returns table (
name text,
product_id text,
price_id text,
price real
)
language plpgsql security definer as
$$
#variable_conflict use_column
begin
set statement_timeout to 60000;
return QUERY
select p.name, p.id as product_id, q.id as price_id, (q.unit_amount / 100.0)::real as price
from stripe.products p
left join stripe.prices q on p.id = q.product
where p.active and q.active;
end
$$;
What happens if you run the following query directly in the database:
select p.name, p.id as product_id, q.id as price_id, (q.unit_amount / 100.0)::real as price
from stripe.products p
left join stripe.prices q on p.id = q.product
where p.active and q.active;
Does it also produce results intermittently? What about even simpler queries like:
select q.id, q.unit_amount, q.product, q.active from stripe.prices;
Or
select p.id, p.name, p.active from stripe.products p;
This looks like a bug in wrappers. If I run a query directly, e.g. select * from stripe.products
it successfully returns results. But if I wrap the same query in a function, it only returns the result the first time. The difference in these is that the StripeFdw
object is created afresh each time select * from stripe.products
is run but not when the query is run from inside the function.
I analyzed further and this bug is more involved than I thought. If the function with the query is called after select * from stripe.products
it fails with this error: ERROR: required option 'object' is not specified
.
To reproduce run the following:
create or replace function public.get_products()
returns table (
product_id text
)
language plpgsql as
$$
begin
return query
select p.id as product_id from stripe.products p;
end
$$;
select * from public.get_products();
select * from stripe.products;
select * from public.get_products();--ERROR: required option 'object' is not specified
The error occurs because the code assumes that a foreign table will only ever have one FdwState
(see these lines). But this is not true when we have a foreign table query wrapped in a function and a direct query at the same time. While the select statement query always creates a new FdwState
, the query in the function does not. This means the direct query is releasing the memory context in which the wrapped query's FdwState
was allocated, resulting in a use after free bug. To fix this we need to ensure that a memory context holding an FdwState
is not reset.
When 1st time calling the function, postgres will cache the execution plan to improve query performance. So for the following calls, the FdwState
won't be initialised again and execution plan is reused. This will cause unexpected results as the FdwState
is reused without resetting its internal states.
For short-term solution, we can use dynamic query in the function to prevent postgres from using execution plan cache and re-create FdwState
each time. For example,
create or replace function public.get_products()
returns table (
product_id text
)
language plpgsql as
$$
begin
return query
execute 'select p.id as product_id from stripe.products p'; -- use dynamic query here
end
$$;
For long-term solution, we might need to find a good way to reuse FdwState
during each function call and also don't interfere with other normal queries.
The dynamic query is indeed working. Thank you. For me this is closed, but I'll leave it to you to decide whether to keep it open to track the long-term issue.
Thanks for this, looks like it took some serious work.
Although we have a good workaround (using a dynamic query), reopening this to track the use-after-free bug that we still need to fix.