Phonemize Japanese Language
MikuAuahDark opened this issue · 2 comments
Describe the bug
Phonemizer can't be used to phonemize Japanese characters.
Phonemizer version
phonemizer-3.2.1
available backends: espeak-ng-1.52, segments-2.2.1
uninstalled backends: espeak-mbrola, festival
System
Windows 11 22H2 patch 1265
To reproduce
import phonemizer
print(phonemizer.phonemize("ほたる", language="ja", backend="espeak"))
Could not load the mbrola.dll file.
is printed on the console followed by RuntimeError: failed to load voice "ja"
Expected behavior
Runs without problems.
Additional context
Running espeak-ng
from command-line directly works.
C:\Users\MikuAuahDark>espeak-ng -q -x --ipa -v ja "ほたる"
ho̞tˈäɽɯᵝ
Additional context
This LuaJIT script also properly able to return values same as the espeak-ng
command-line, so phonemizer probably did something fancy regarding initialization?
local ffi = require("ffi")
local espeak = ffi.load("C:/Program Files/eSpeak NG/libespeak-ng.dll")
ffi.cdef[[
int espeak_Initialize(int output, int buflength, const char *path, int options);
const char *espeak_TextToPhonemes(const void **textptr, int textmode, int phonememode);
int espeak_SetVoiceByName(const char *name);
]]
local text = "ほたる"
-- 3 = allow espeakEVENT_PHONEME events AND espeakEVENT_PHONEME events give IPA phoneme names
print("espeak_Initialize", espeak.espeak_Initialize(2, 0, nil, 3))
print("espeak_SetVoiceByName", espeak.espeak_SetVoiceByName("ja"))
local temp = ffi.new("const char*[1]")
temp[0] = text
while temp[0] ~= nil do
-- 1 = UTF-8 mode, 2 = bit 1 = IPA phonetic
local result = espeak.espeak_TextToPhonemes(ffi.cast("const void**", temp), 1, 2)
if result == nil then
print("espeak_TextToPhonemes failed")
end
print("espeak_TextToPhonemes", ffi.string(result))
end
Alright, found the issue.
I have to add this
if voice.identifier.startswith('mb'):
continue
Before inserting it to the list of available languages:
phonemizer/phonemizer/backend/espeak/wrapper.py
Lines 239 to 241 in fd39cdc
In my eSpeak installation, MBROLA voices are listed first. Using LuaJIT, I was able to print list of the voices in order they're listed by espeak_ListVoices
:
...
mb\mb-it2 italian-mbrola-2 it
mb\mb-jp1 japanese-mbrola-1 ja
mb\mb-jp2 japanese-mbrola-2 ja
mb\mb-jp3 japanese-mbrola-3 ja
jpx\ja Japanese ja
art\jbo Lojban jbo
...