Get audio data from external data sources and start iteration

Question

Get audio data from external data sources and start iteration

Tpcsy opened this issue 5 months ago · 7 comments

Describe the question.

I'm trying to use DALI to load and process audio data. I need to batch-fetch audio using an external source, but I'm encountering some issues,This is my code.

from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
from nvidia.dali.plugin.pytorch import DALIGenericIterator
import numpy as np
from os.path import join
import random

 class ExternalInputIterator(object):
     def __init__(self, batch_size, all):

         self.all_audios = all
         self.batch_size = batch_size
 
     def __len__(self):
         return len(self.all_audios)
 
     def __iter__(self):
         self.i = 0
         return self
 
     def __next__(self):
         batch = []
         for _ in self.batch_size:
             idx = random.randint(0, len(self.all_audios) - 1)
             vidname = self.all_audios[idx]
             wavpath = join(vidname, "audio.wav")
             f = open(wavpath, "rb")
             b=np.frombuffer(f.read(), dtype=np.uint8)
             # print(b)
             batch.append(b)
         return batch 
 
 eii = ExternalInputIterator(batch_size=16, all=path)
  @pipeline_def
 def spectrogram_pipe(nfft, window_length, window_step, device="cpu"):
     encoder = fn.external_source(eii, num_outputs=1)
     audio, _ = fn.decoders.audio(encoder[0], sample_rate=16000)
     audio = audio.gpu()
     spectrogram = fn.spectrogram(
         audio, device=device, nfft=nfft, window_length=window_length, window_step=window_step, reflect_padding=False
     )
     return spectrogram

 pipe = spectrogram_pipe(
         device="gpu",
         batch_size=16,
         num_threads=4,
         device_id=0,
         nfft=800,
         window_length=800,
         window_step=200,
     )
 train_data = DALIGenericIterator(
         pipe,
         ['mel'],
     )
 for i, data in enumerate(train_data):
     if i<1:
         mel = data[0]['mel']
         print(mel.shape)
     else:break

When I try to run it, an error occurs.
RuntimeError: The external source callback returned an unexpected batch size. Expected batch_size <= 16, actual: 475214
It seems like it treated the encoded audio tensor (475214,) as the batch_size. After making the following changes in ExternalInputIterator, it runs successfully.

  f = open(wavpath, "rb")
  b=np.frombuffer(f.read(), dtype=np.uint8)
  # print(b)
  batch.append([b])

But I encountered a new issue where the iterator output data format is torch.Size([1, 401, 297]), which is obviously incorrect. I need data in batch_size format, so it should be torch.Size([16, 401, 297]). The lengths of the encoded audio vary, and I'm not sure how to modify the code. I appreciate any help.

Check for duplicates

I have searched the open bugs/issues and have found no duplicates for this bug report

Answer 1 · 2024-04-25T13:17:03.000Z

Hi @Tpcsy,

Thank you for reaching out.
Can you try:

             b=np.frombuffer(f.read(), dtype=np.uint8)
             # print(b)
             batch.append(b)
         return [batch]

As the documentation states:

Depending on the value of num_outputs, the source can supply one or more data items. The data item can be a whole batch (default) or a single batch entry (when batch==False). If num_outputs is not set, the source is expected to return one item (a batch or a sample). If this value is specified (even if its value is 1), the data is expected to a be tuple, or list, where each element corresponds to respective return value of the external_source.

Answer 2 · 2024-04-25T13:17:35.000Z

Thank you very much for your reply, I will try your suggestions.

Answer 3 · 2024-04-26T03:40:11.000Z

Hi @JanuszL
Thank you very much for your reply.
I tried the changes you suggested, and it resulted in the following error:
RuntimeError: [/opt/dali/dali/pipeline/data/tensor_list.cc:1012] Assert on "IsDenseTensor()" failed: The batch must be representable as a tensor - it must have uniform shape and be allocated in contiguous memory.
I'm not sure if this is related to the different sizes of my audio files.

Answer 4 · 2024-04-26T07:38:34.000Z

Hi @Tpcsy,

I'm not sure if this is related to the different sizes of my audio files.

DALI expects (similarly to Torch data processing) samples to have uniform shapes in the batch. What you can do is to either trim/pad them or use 'DALIRaggedIterator' that will allow you to return nonuniform batch as a set of Torch tensors.

Answer 5 · 2024-04-26T08:44:54.000Z

Thank you very much for your suggestion. I tried the DALIRaggedIterator as you mentioned and successfully retrieved a batch of data. Now, I want to perform cropping operations on my tensors in the pipeline. Due to the nature of my task, each audio clip needs a different cropping position, and I need to return string-type data from an external source to guide the cropping operation on my audio tensors. However, I couldn't find a method in the pipeline that can accept and use string data. Can you please advise me on how to proceed? Thanks again for your help.

Answer 6 · 2024-04-26T08:48:14.000Z

I need to return string-type data from an external source to guide the cropping operation on my audio tensors

I'm not sure if I understand your idea correctly.
What you should do is to return the start and size of the cropping window and use the slice operator.

Answer 7 · 2024-04-27T01:01:10.000Z

Thank you very much for your response. I have successfully resolved my issue. I will close this matter now and once again thank you for your advice.