[MISC] Check that `pandas` can be used to connect to multiple tables
Closed this issue · 2 comments
blythed commented
Some extra definition:
With superduper
we should connect like this:
db = superduper('parent_directory/*.csv')
This means that if we need output tables, these should be saved as 'parent_directory/<name-of-output-table>.csv'
.
We will need BytesEncoding.base64
everywhere, and we should somehow save the output table after every computation.
We should restrict this so that it does not work in cluster mode.
blythed commented
For example:
db = superduper(['customers.csv', 'orders.csv'])
table = Table('orders')
db.execute(table.filter(table.brand == 'Nike'))
Lalith-Sagar-Devagudi commented
As part of [TEST-USE] Transfer learning #1967
I am trying this
from superduperdb import superduper
db = superduper(['sample.xlsx'], metadata_store=f'mongomock://meta')
and I am getting this error
ValueError: Couldn't auto-identify ['sample.xlsx'], please wrap explicitly using ``superduperdb.components.*``
Any inputs will be appreciated, thank you.