Can't use `response_body` from `http_get` virtual tables multiple times
rakoo opened this issue · 2 comments
First of all, thank you for this extension it's pretty sweet !
It looks like I can't reuse the body multiple times:
sqlite3> .mode table
sqlite3> select response_body, response_body from http_get('https://api.github.com/issues');
+--------------------------------------------------------------+---------------+
| response_body | response_body |
+--------------------------------------------------------------+---------------+
| {"message":"Not Found","documentation_url":"https://docs.git | |
| hub.com/rest/reference/issues#list-issues-assigned-to-the-au | |
| thenticated-user"} | |
+--------------------------------------------------------------+---------------+
Thanks for filing an issue!
You're right - referencing the response_body
will read the response body reader to completion, so referencing it twice means the 2nd time will return NULL
I'll see how hard it'll be to cache that response. In the meantime, a workaround could be wrapping the query in a CTE with AS MATERIALIZED
:
with responses as materialized (
select response_body
from http_get('https://api.github.com/issues')
)
select response_body, response_body
from responses;
/*
┌──────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────┐
│ response_body │ response_body │
├──────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────┤
│ {"message":"Not Found","documentation_url":"https://docs.git │ {"message":"Not Found","documentation_url":"https://docs.git │
│ hub.com/rest/reference/issues#list-issues-assigned-to-the-au │ hub.com/rest/reference/issues#list-issues-assigned-to-the-au │
│ thenticated-user"} │ thenticated-user"} │
└──────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────┘
*/
Hey there !
Yeah I was thinking about that workaround but it doesn't work for me: I'd like to use your extension in a recursive CTE, so it has to be in the "recursive statement" (https://sqlite.org/lang_with.html#recursive_common_table_expressions). My goal is to scrape an endpoint that offers paginated results with a next
url that points to the next page:
WITH t(nextPage, numResults) AS (
SELECT 'https://some.url', 0
UNION ALL
SELECT response_body->'next', json_array_length(response_body->'results') FROM http_get(t.nextPage)
)
I can't replace the second select with another CTE; I can't even replace it with
SELECT b->'next', json_array_length(b->'results') FROM (SELECT http_get_body(t.nextPage) as b)
because a subquery like that doesn't work, it doesn't recognize that t
is part of the recurvise CTE.