MLMI2-CSSI/foundry

Download dataset

BenGalewsky opened this issue · 3 comments

As a foundry dataset user I want to access a dataset as a pandas dataframe so I can perform my analysis

Description

Create a new DatasetCache class which will be instantiated along with the foundry instance.

This class has three methods:

  1. flush
  2. download_dataset
  3. is_dataset_in_cache

The cache lives by default in ./data. This can be overridden with FOUNDRY_CACHE_PATH environment var.

This story is for the user to be able to download a dataset.

Add a new method to Dataset class: get_as_dict

Assumptions

  1. Dataset has only one file
  2. FoundryCache is implemented as part of this issue
  3. No splits
  4. Foundry example notebooks would need to be updated along with this implementation (also good for testing)

Acceptance Criteria

f = Foundry()
datasets = f.search("DOI.123/445")

assert len(datasets) == 1
res = datasets[0].get_as_dict()

includes work on #411, as well as adding a as_list option to search()

Hi @blue442 , was this part of that big PR you did? Should I mark this as complete?

@kjschmidt913 yes it was - mark away!