[MISC] Check that `pandas` can be used to connect to multiple tables

Question

[MISC] Check that `pandas` can be used to connect to multiple tables

Closed this issue a month ago · 2 comments

Some extra definition:

With superduper we should connect like this:

db = superduper('parent_directory/*.csv')

This means that if we need output tables, these should be saved as 'parent_directory/<name-of-output-table>.csv'.

We will need BytesEncoding.base64 everywhere, and we should somehow save the output table after every computation.

We should restrict this so that it does not work in cluster mode.

Answer 1 · 2024-04-03T07:51:12.000Z

For example:

db = superduper(['customers.csv', 'orders.csv'])

table = Table('orders')

db.execute(table.filter(table.brand == 'Nike'))

Answer 2 · 2024-04-17T08:26:08.000Z

As part of [TEST-USE] Transfer learning #1967
I am trying this

from superduperdb import superduper
db = superduper(['sample.xlsx'], metadata_store=f'mongomock://meta')

and I am getting this error
ValueError: Couldn't auto-identify ['sample.xlsx'], please wrap explicitly using ``superduperdb.components.*``
Any inputs will be appreciated, thank you.