hidet for WhisperSpeech

Question

hidet for WhisperSpeech

mehdi-gital opened this issue 8 months ago · 1 comments

Hi team,
I'm trying to get hidet to optimize the two models on lines 15, 16 in:
https://github.com/collabora/WhisperSpeech/blob/main/whisperspeech/pipeline.py
(more importantly SADelARTransformer at line 16)
I'm working with a small gpu (MX150). When I use hidet as backend of the computation graph, it works without any errors but it doesn't help with the time at all. Is there anything I could be doing to fix this?

I'm testing through https://github.com/collabora/WhisperSpeech/blob/main/Inference%20example.ipynb and below is the modified code I'm using.

Thanks

class Pipeline:

    def __init__(self):
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

        hidet_ = True
        if hidet_:
            print('with hidet')
            print(device)
            self.t2s = torch.compile(TSARTransformer.load_model().to(device), backend='hidet') 
            self.s2a = torch.compile(SADelARTransformer.load_model().to(device), backend='hidet')
        else:
            print('without hidet')
            self.t2s = TSARTransformer.load_model().to(device)
            self.s2a = SADelARTransformer.load_model().to(device)
        self.vocoder = Vocoder()

    def generate_atoks(self, text, speaker="8699"):
        text = text.replace("\n", " ")
        start = time.time()
        stoks = self.t2s.generate(text, cps=14)
        end = time.time()
        print('t2s', end - start)

        start = time.time()
        atoks = self.s2a.generate(stoks, [speaker])
        end = time.time()
        print('s2a', end - start)

        return atoks
        
    def generate(self, text, speaker="8699"):
        return self.vocoder.decode(self.generate_atoks(text, speaker))
    
    def generate_to_file(self, fname, text, speaker="8699"):
        self.vocoder.decode_to_file(fname, self.generate_atoks(text, speaker))
        
    def generate_to_notebook(self, text, speaker="8699"):
        start = time.time()
        atokz = self.generate_atoks(text, speaker)
        end = time.time()
        print('generate_atoks(text, speaker) time', end - start)

        start = time.time()
        self.vocoder.decode_to_notebook(atokz)
        end = time.time()
        print('vocoder.decode_to_notebook(atokz)', end - start)

Answer 1 · 2023-10-24T03:08:20.000Z

Hi @mehdi-gital,

Thanks for trying out hidet and the script to reproduce the execution.

Currently, we are a little short of hands, and your help in investigating the performance problem would be very appreciated!

Some suggestions on how to find the bottleneck:

Use nsight system (the NVIDIA's GPU profiler) to profile the model execution (for both hidet backend and default eager mode). See the documentation on how to use Nsight System GUI (). You can also use nsight systems together with nvtx to annotate the model execution.
In the profiling trace, you should be able to get the execution timeline, in which you can know the kernels that have been launched and their execution time.
Compare the hidet kernels and pytorch's kernels and figure out which operator makes hidet slower.
Optimize the operator in hidet. You can share the information with us and we are happy to optimize the performance when we have more hands.

Let me know if you find any step hard confusing or hard to do, and I am happy to help.