Renumics/sliceguard

Error on loading audio data

bagustris opened this issue · 2 comments

It is known from the current codes that sliceguard's from_huggingface function only supports image data.

def from_huggingface(dataset_identifier: str):
# Simple utility method to support loading of huggingface datasets
# Currently only supports image data. Use custom load function if you need something else.

However, the example on the following page stated that now sliceguard supports audio data.

https://renumics.com/docs/use-cases/audio-classification

Following example above, I faced RuntimeError below (Audio is not supported)

In [7]: from renumics import spotlight
   ...: from sliceguard import SliceGuard
   ...: from sliceguard.data import from_huggingface
   ...: from sklearn.metrics import accuracy_score
   ...: 
   ...: # Load an Example Dataset as DataFrame
   ...: df = from_huggingface("renumics/emodb")
   ...: 
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[7], line 7
      4 from sklearn.metrics import accuracy_score
      6 # Load an Example Dataset as DataFrame
----> 7 df = from_huggingface("renumics/emodb")

File ~/miniconda3/envs/spotlight/lib/python3.9/site-packages/sliceguard/data.py:36, in from_huggingface(dataset_identifier)
     29 for fname, ftype in cur_split.features.items():
     30     if (
     31         not isinstance(ftype, Image)
     32         and not isinstance(ftype, ClassLabel)
     33         and not isinstance(ftype, Value)
     34         and not isinstance(ftype, Sequence)
     35     ):
---> 36         raise RuntimeError(
     37             f"Found unsupported datatype {ftype}. Use custom load function."
     38         )
     39     # Run transformations for specific data types if needed.
     40     if isinstance(ftype, ClassLabel):

RuntimeError: Found unsupported datatype Audio(sampling_rate=None, mono=True, decode=True, id=None). Use custom load function.

PR #58 may solve this issue.

@bagustris Sliceguard does indeed support audio data, the load function does not yet. However, if you need a solution immediately you can simply point to wavefile in the data frame supplied to the find_issues function. You can base your code on this Example.

And you are right, PR #58 aims to solve this issue. I will approve it as soon as it has passed review.

Let me know if you need any more support with your use case or encounter any more issues!

@bagustris I Just merged PR #58 and released v0.0.31.
Hopefully, this solves your issue. If any issues remain, feel free to open an issue again or comment under this issue. Closing this for now.