hugofloresgarcia/vampnet

system error with oneset mask width slider >0

tig3rmast3r opened this issue · 7 comments

i've managed to let it work on Windows 10 but if i try to generate audio with the oneset mask slider higher than 0 i get this error:
soundfile.LibsndfileError: Error opening 'C:\Users\xxxx\AppData\Local\Temp\tmptlcxs7uq.wav': System error.

With slider at 0 everything works and is amazing, i tested all presets and i only have to move that slider to 0 if is not already.

any clue ? thanks

just as a note dont know if is related, i had to manually install madmom cloning from its git (else it gives error in win10), also i had to upgrade numpy to version 23 cause with 22 i was getting errors on startup.

hmm, I feel like this could either be version problem with madmom or sndfile, since onset detection happens through madmom. what's the full call stack? is this libsndfile error happening inside madmom?

Here's the full stack printed from powershell, looks like "file not found"
Traceback (most recent call last):
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\gradio\routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\gradio\blocks.py", line 1389, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\gradio\blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\gradio\utils.py", line 703, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\vampnet\app.py", line 219, in vamp
return _vamp(data, return_mask=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\vampnet\app.py", line 136, in _vamp
mask, pmask.onset_mask(sig, z, interface, width=data[onset_mask_width])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\vampnet\vampnet\mask.py", line 201, in onset_mask
sig.write(f.name)
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\audiotools\core\audio_signal.py", line 602, in write
soundfile.write(str(audio_path), self.audio_data[0].numpy().T, self.sample_rate)
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 343, in write
with SoundFile(file, 'w', samplerate, channels,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 658, in init
self._file = self._open(file, mode_int, closefd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening 'C:\Users\xxxxx\AppData\Local\Temp\tmpcfwf7po5.wav': System error.
INFO:httpx:HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 500 Internal Server Error"
INFO:httpx:HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"

Hmm, I don't have a windows machine to debug on atm, but it looks like it's failing to write the input audio file to a temp directory for onset processing:

File "C:\Users\xxxxx\vampnet\vampnet\mask.py", line 201, in onset_mask
sig.write(f.name)

The way f.name is created is here:

with tempfile.NamedTemporaryFile(suffix='.wav') as f:

this could be it: https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file

looks like we're trying to open the file twice: once when NamedTemporaryFile() is created, and another in sig.write.

This solution from stackoverflow could work, you could give it a try! I'm happy to accept a PR!

import os
import tempfile


class CustomNamedTemporaryFile:
    """
    This custom implementation is needed because of the following limitation of tempfile.NamedTemporaryFile:

    > Whether the name can be used to open the file a second time, while the named temporary file is still open,
    > varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
    """
    def __init__(self, mode='wb', delete=True):
        self._mode = mode
        self._delete = delete

    def __enter__(self):
        # Generate a random temporary file name
        file_name = os.path.join(tempfile.gettempdir(), os.urandom(24).hex())
        # Ensure the file is created
        open(file_name, "x").close()
        # Open the file in the given mode
        self._tempFile = open(file_name, self._mode)
        return self._tempFile

    def __exit__(self, exc_type, exc_val, exc_tb):
        self._tempFile.close()
        if self._delete:
            os.remove(self._tempFile.name)
            

Hi there,
i've just modified this line

with tempfile.NamedTemporaryFile(suffix='.wav') as f:

with
with tempfile.NamedTemporaryFile(suffix='.wav',delete=False) as f:
and it works!
it's probably going to grow temp folder overtime, not that clever solution but given my near 0 python knowledge i'm ok with this for now :)
Thanks for the hint!

I have a question, assuming i want to create my own mask, i would like to make that when i click generate instead of creating the mask file it will load a mask.wav file from vampnet\assets folder, would you be so kind to point me where i should act in the code more or less ?
is that even possible ? i mean, the mask is just the input audio with muted parts that will be inpainted or is a more complex operation ?

The mask is not the audio with muted parts (though we can represent the mask as that).

A better way to think of the mask is an array with 1s in the timesteps where we want to generate audio and 0s in the timesteps where we want conditioning. Note that the "width" of these time steps depends on the tokenizer's hop length.

You can get the tokenizer hop_length using interface.codec.hop_length, which will give you the tokenizer's hop size in samples.

you could try something like checking if most (or all) samples in a given chunk of hop_length samples are equal to.

this could be a good starting point, though it's not tested:

def audio_file_mask(
    sig: AudioSignal, 
    z: torch.Tensor,
    interface, 
):
    """
    create a mask from an audio file. 
    
    where muted sections (where samples == 0 on a given hop length) equal 1s, in the mask, 
    and nonmuted sections equal to 0s in the mask. 
    """

    # get the number of samples in a hop
    hop_length = interface.codec.hop_length

    # get the number of timesteps in the z array
    n_steps = z.shape[-1]

    # create a mask, set muted sections to 1
    mask = torch.zeros_like(z)

    for i in range(n_steps):
        # get the start and end indices for the hop
        start = i * hop_length
        end = (i + 1) * hop_length

        # if all samples in the hop are 0, then we have a muted section
        # checking the first channel only!
        if torch.all(sig.samples[0, 0, start:end] == 0):
            mask[:, :, i] = 1

    return mask


if __name__ == "__main__":
    sig = AudioSignal("mask.wav")

    interface: Interface # initialize an interface here
    sig = interface.preprocess(sig)
    z = interface.encode(sig)

    mask = audio_file_mask(sig, z, interface=interface)