asg017/sqlite-http

Can't use `response_body` from `http_get` virtual tables multiple times

rakoo opened this issue · 2 comments

rakoo commented

First of all, thank you for this extension it's pretty sweet !

It looks like I can't reuse the body multiple times:

sqlite3> .mode table
sqlite3> select response_body, response_body from http_get('https://api.github.com/issues');

+--------------------------------------------------------------+---------------+
|                        response_body                         | response_body |
+--------------------------------------------------------------+---------------+
| {"message":"Not Found","documentation_url":"https://docs.git |               |
| hub.com/rest/reference/issues#list-issues-assigned-to-the-au |               |
| thenticated-user"}                                           |               |
+--------------------------------------------------------------+---------------+

Thanks for filing an issue!

You're right - referencing the response_body will read the response body reader to completion, so referencing it twice means the 2nd time will return NULL

I'll see how hard it'll be to cache that response. In the meantime, a workaround could be wrapping the query in a CTE with AS MATERIALIZED:

with responses as materialized (
  select response_body 
  from http_get('https://api.github.com/issues')
)
select response_body, response_body 
from responses;
/*
┌──────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────┐
│                        response_body                         │                        response_body                         │
├──────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────┤
│ {"message":"Not Found","documentation_url":"https://docs.git │ {"message":"Not Found","documentation_url":"https://docs.git │
│ hub.com/rest/reference/issues#list-issues-assigned-to-the-au │ hub.com/rest/reference/issues#list-issues-assigned-to-the-au │
│ thenticated-user"}                                           │ thenticated-user"}                                           │
└──────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────┘
*/
rakoo commented

Hey there !

Yeah I was thinking about that workaround but it doesn't work for me: I'd like to use your extension in a recursive CTE, so it has to be in the "recursive statement" (https://sqlite.org/lang_with.html#recursive_common_table_expressions). My goal is to scrape an endpoint that offers paginated results with a next url that points to the next page:

WITH t(nextPage, numResults) AS (
    SELECT 'https://some.url', 0
    UNION ALL
    SELECT response_body->'next', json_array_length(response_body->'results') FROM http_get(t.nextPage)
)

I can't replace the second select with another CTE; I can't even replace it with

SELECT b->'next', json_array_length(b->'results') FROM (SELECT http_get_body(t.nextPage) as b)

because a subquery like that doesn't work, it doesn't recognize that t is part of the recurvise CTE.