yl4579/StyleTTS2

Licensing issue

fakerybakery opened this issue · 10 comments

Hi,
This package uses the phonemizer library which is GPL licensed (because it depends on espeak-ng by Jonathan Duddington and nothing's been heard of him for years). That means all software that uses it must also be GPL licensed. Might it be possible to switch to an alternate library (preferably deep_phonemizer or g2p_en)? Thanks!

Only if the phonemizer library would be changed it needs to be done with GPL, and even in that case only the new library and not the entire project would be GPL licensed.
Using it is no problem

Hi, unfortunately GPL requires all software that links to it to be also licensed under GPL. It would be amazing to keep this software MIT-licensed, so that's why I suggest switching to deep_phonemizer.

Hi, unfortunately GPL requires all software that links to it to be also licensed under GPL. It would be amazing to keep this software MIT-licensed, so that's why I suggest switching to deep_phonemizer.

wow what a nightmare, I forgot about that change.
Well it's not enforceable, you can just build a slim wrapper with a "very similar" API and make it interchangebale.
There are ways to call DLL functions from windows command line (same in linux), does that make the console using that and the tool that can universally call DLL functions suddenly GPL violating ?
It's not enforceable. Though I agree that given that nasty "idea" GPL should be avoided when possible.

yl4579 commented

I took it from VITS, which is also MIT and uses phonemizer. I didn't know it was actually GPL. How does VITS stay MIT while using phonemizer then?

Here an example from multi billion USD company: https://github.com/mozilla/TTS
They also use phonemizer but use the Mozilla license.

It's unpleasant to have such legal quirks lingering but if it ever becomes an issue I'm quite sure there are many ways to work around that.

  • A wrapper library that people include themselves for example, with exactly the same interface

Of course, if that could be avoided without much work, even better.

I have searched the whole repository, the phonemizer is used in Demo/*.ipynb and Colab/*.ipynb. The ipynb file is the document and not the code. And the core source code of StyleTTS2 is NOT use any phonemizer.

For GPL we can use it as binary such as gcc, and not to link lib or import. Someone can call espeak-ng binary command to get G2P function, or use speechbrain/soundchoice-g2p vinai/xphonebert-base.

The conclusion is StyleTTS2 use MIT license is totally OK.

yl4579 commented

@clcarwin Thanks for your help, I can mark this as solved now.

Yes, thanks for the clarification!!

In case anyone's interested, here's a MIT licensed python package of StyleTTS2 (just inference) that uses Gruut as the phoneme converter -- it's still not as good as phonemizers built on espeak but I found it was the best alternative that was MIT licensed.

I have searched the whole repository, the phonemizer is used in Demo/*.ipynb and Colab/*.ipynb. The ipynb file is the document and not the code. And the core source code of StyleTTS2 is NOT use any phonemizer.

For GPL we can use it as binary such as gcc, and not to link lib or import. Someone can call espeak-ng binary command to get G2P function, or use speechbrain/soundchoice-g2p vinai/xphonebert-base.

The conclusion is StyleTTS2 use MIT license is totally OK.

@fakerybakery @clcarwin @cmp-nct @yl4579

Yes, it's true that the phonemizer is directly imported in the demo notebooks (Demo/.ipynb and Colab/.ipynb). However, it's important to note that when anyone uses this code in a practical setting, they are likely to create .py files for all the preprocessing steps, including converting text to phonemes. In such a scenario, the phonemizer would inevitably be used in the .py files, which could introduce GPL-related concerns into the project.

Regarding the second point, can I use a separate subprocess to call espeak-ng instead? Here's an example implementation I'm considering:

import subprocess

def text_to_phonemes(text: str) -> str:
    try:
        # Call the espeak-ng command to get phonemes with stress and punctuation
        result = subprocess.run(
            ['espeak-ng', '-x', '-q', '--punct=".,;:!?()"', text],
            capture_output=True, text=True, check=True
        )
        # Remove extra spaces and newlines from the output
        phonemes = result.stdout.strip()
        return phonemes
    except subprocess.CalledProcessError as e:
        print(f"Error converting text to phonemes: {e}")
        return ""

# Example usage
text = "Hello, how are you?"
phonemes = text_to_phonemes(text)
print(f"Phonemes: {phonemes}")

Would this approach be acceptable for avoiding GPL issues, as it uses espeak-ng as a standalone binary via subprocess? My concern is ensuring this solution aligns with license compliance while still providing the necessary functionality for phoneme conversion.

Any insights or suggestions would be greatly appreciated. Thanks in advance for your help!