Kyubyong/g2p

For I'm, it's, I'll, you're, I've, I'd

begeekmyfriend opened this issue · 5 comments

>>> g2p('It\'s')
['IH1', 'T', ' ', 'EH1', 'S'] # Should be ['IH1', 'T', ' ', 'S']
>>> g2p('I\'m')
['AY1', ' ', 'AH0', 'M'] # Should be ['AY1', ' ', 'M']
's
S
Z
'll
L
've
V
'd
D
're
R
't
T
'm
M

Sorry, I have got something wrong. Hope it did not bother you too much...

But wait, there are still problems in it.

It wasn't a joke, said Severson,
IH1T WAA1ZEH1NTAY1 AH0 JHOW1K , SEH1D SEH1VER0SAH0N ,
They say/ 'yin yang'%.
DHEY1 SEY1 YIH1N YAE1NG .
I'm a man.
AY1AH0M AH0 MAE1N .
But hey%, thanks for bein/' in my corner%.
BAH1T HHEY1 , THAE1NGKS FAO1R BIY1N IH0N MAY1 KAO1RNER0 .
You'll get it.
YUW1EH1L GEH1T IH1T .
I'd like to write to you.
AY1DIY1 LAY1K TUW1 RAY1T TUW1 YUW1 .
It's OK.
IH1TEH1S OW1KEY1 .
I've got it.
AY1VIY1 GAA1T IH1T .

Above all, wasn't, It's, I've and I'd still be wrong...

You're right. I've corrected by changing the word tokenizer from nltk.word_tokenize to TweetTokenizer. Try again. Thanks!

I'm glad to see it all right now. Sorry for my late response! So kind of you!

Hi, another tiny problem. The new TweetTokenizer cannot distinguish punctuation and abbreviation as follows. The original tokenizer seems good for it.

>>> from g2p_en import G2p
>>> g2p = G2p()
>>> ''.join(g2p('8 p.m.'))
'EY1T PIY1 . EH1M .'