python-bonobo/bonobo-sqlalchemy

Unable to retrieve results past first page

timlukasiewicz opened this issue · 3 comments

I was trying to use bonobo-sqlalchemy to kickstart some BI processes by dumping one of our tables to CSV periodically.

I build a very simple bonobo graph and service dictionary

def get_services(**options):
    return {
        'fs': bonobo.open_fs(S3_BUCKET_URL),
        'sqlalchemy.engine': sqlalchemy.create_engine(SQL_ENGINE_STRING)
    }

def get_graph(**options):
    graph = bonobo.Graph()
    graph.add_chain(
        bonobo_sqlalchemy.Select('SELECT * FROM table1;'),
        bonobo.CsvWriter('output.csv')
    )

The first 'page' of the results (1000 rows out of about 7000) were properly written to CSV, but then an exception was thrown and the process came to a screeching halt as soon as bonobo-alchemy tried to retrieve the second page of 1000 results and start iterating through them.

ERR.|0002|bonobo.execution.contexts.base: <NodeExecutionContext(+Select) in=1 out=1000 err=1>
│ Traceback (most recent call last):
│   File "C:\Users\Tim\AppData\Local\Programs\Python\Python37\lib\site-packages\bonobo\execution\contexts\node.py", line 102, in loop
│     self.step()
│   File "C:\Users\Tim\AppData\Local\Programs\Python\Python37\lib\site-packages\bonobo\execution\contexts\node.py", line 140, in step
│     result = next(results)
│   File "C:\Users\Tim\AppData\Local\Programs\Python\Python37\lib\site-packages\bonobo_sqlalchemy\readers.py", line 70, in __call__
│     context.set_output_fields(row.keys())
│   File "C:\Users\Tim\AppData\Local\Programs\Python\Python37\lib\site-packages\bonobo\execution\contexts\node.py", line 215, in set_output_fields
│     self.set_output_type(BagType(typename, fields))
│   File "C:\Users\Tim\AppData\Local\Programs\Python\Python37\lib\site-packages\bonobo\execution\contexts\node.py", line 201, in set_output_type
│     raise RuntimeError('Cannot override output type, already have %r.', self._output_type)
╰ RuntimeError  ('Cannot override output type, already have %r.', <class 'bonobo.execution.contexts.node.Bag'>)
INFO|0002|botocore.credentials: Found credentials in shared credentials file: ~/.aws/credentials
 - Select in=1 out=1000 err=1 [done]
 - CsvWriter in=1000 out=1000 [done]

I had actually opened this issue to provide context for a PR with a fix that I was going to submit, but it looks like you may have already resolved the issue in commit 521865f, but it just hasn't filtered through to the pip package yet.

Can you try using bonobo 0.7 ? (dévelop)

It should be a bit less strict on typings

I can confirm the issue does not appear in the develop/v0.7 branch.

Thanks!