EpistasisLab/pmlb

AI Feynman datasets

Opened this issue · 4 comments

I am trying to fetch a dataset form AI Feynman but I receive the following error:

from pmlb import fetch_data

name = "feynman_III_12_43"
dataset = fetch_data(name)

ValueError: Dataset not found in PMLB.

Hi @aminravanbakhsh
Which version of PMLB are you running?
I managed to fetch this dataset without problems. I'm using python==3.8.19 and pmlb==1.0.2a.

Two possible solutions:

  1. Install pmlb from the source. Clone this repo and do pip install . from its root . That's how I installed it here. I'm using a conda environment specifically for building PMLB at its latest version.
  2. Download the dataset folder from this repo (https://github.com/EpistasisLab/pmlb/tree/master/datasets/feynman_III_12_43), put it into a local folder, and use fetch_data(name, local_dir='<path to the folder>'), it should work, as long as the name of the folder and the .tsv.gz file are the same. I tried creating a local copy manually and it worked:
from pmlb import fetch_data

name = "feynman_III_12_43_copy"
dataset = fetch_data(name, local_cache_dir=f"./datasets/")
dataset```

Hi @gAldeia
Thank you for your reply.
I am using :

pmlb==1.0.1.post3
Python 3.12.4

@aminravanbakhsh Did you tried downloading the dataset locally and using the local_cache_dir to load it? It seems that your version 1.0.1.post3 was released in Sep 10, 2020, and the Feynman datasets were added just after July 2021 . Installing it locally by cloning the repo and performing pip install . should also solve your problem.

While this may be a workaround, ideally the PMLB should be updated at PyPI to its latest version.

Right now I am trying to submit new datasets, and there is this github action issue that is keeping me from actually doing it. If the local cache works I think we can close this issue and open a new one to update PyPI package to its latest version.