Fallback to arrow defaults when loading dataset with custom features that aren't registered locally
alex-hh opened this issue · 0 comments
alex-hh commented
Describe the bug
Datasets allows users to create and register custom features.
However if datasets are then pushed to the hub, this means that anyone calling load_dataset without registering the custom Features in the same way as the dataset creator will get an error message.
It would be nice to offer a fallback in this case.
Steps to reproduce the bug
load_dataset("alex-hh/custom-features-example")
(Dataset creation process - must be run in separate session so that NewFeature isn't registered in session in which download is attempted:)
from dataclasses import dataclass, field
import pyarrow as pa
from datasets.features.features import register_feature
from datasets import Dataset, Features, Value, load_dataset
from datasets import Feature
@dataclass
class NewFeature(Feature):
_type: str = field(default="NewFeature", init=False, repr=False)
def __call__(self):
return pa.int32()
def examples_generator():
for i in range(5):
yield {"feature": i}
ds = Dataset.from_generator(examples_generator, features=Features(feature=NewFeature()))
ds.push_to_hub("alex-hh/custom-features-example")
register_feature(NewFeature, "NewFeature")
Expected behavior
It would be nice, and offer greater extensibility, if there was some kind of graceful fallback mechanism in place for cases where user-defined features are stored in the dataset but not available locally.
Environment info
3.0.2