For I'm, it's, I'll, you're, I've, I'd

Question

For I'm, it's, I'll, you're, I've, I'd

begeekmyfriend opened this issue 5 years ago · 5 comments

>>> g2p('It\'s')
['IH1', 'T', ' ', 'EH1', 'S'] # Should be ['IH1', 'T', ' ', 'S']
>>> g2p('I\'m')
['AY1', ' ', 'AH0', 'M'] # Should be ['AY1', ' ', 'M']

's
S
Z
'll
L
've
V
'd
D
're
R
't
T
'm
M

Answer 1 · 2019-11-28T07:26:04.000Z

Sorry, I have got something wrong. Hope it did not bother you too much...

Answer 2 · 2019-11-28T07:40:33.000Z

But wait, there are still problems in it.

It wasn't a joke, said Severson,
IH1T WAA1ZEH1NTAY1 AH0 JHOW1K , SEH1D SEH1VER0SAH0N ,
They say/ 'yin yang'%.
DHEY1 SEY1 YIH1N YAE1NG .
I'm a man.
AY1AH0M AH0 MAE1N .
But hey%, thanks for bein/' in my corner%.
BAH1T HHEY1 , THAE1NGKS FAO1R BIY1N IH0N MAY1 KAO1RNER0 .
You'll get it.
YUW1EH1L GEH1T IH1T .
I'd like to write to you.
AY1DIY1 LAY1K TUW1 RAY1T TUW1 YUW1 .
It's OK.
IH1TEH1S OW1KEY1 .
I've got it.
AY1VIY1 GAA1T IH1T .

Above all, wasn't, It's, I've and I'd still be wrong...

Answer 3 · 2019-12-31T01:18:56.000Z

You're right. I've corrected by changing the word tokenizer from nltk.word_tokenize to TweetTokenizer. Try again. Thanks!

Answer 4 · 2020-01-03T08:21:14.000Z

I'm glad to see it all right now. Sorry for my late response! So kind of you!

Answer 5 · 2020-01-03T09:03:30.000Z

Hi, another tiny problem. The new TweetTokenizer cannot distinguish punctuation and abbreviation as follows. The original tokenizer seems good for it.

>>> from g2p_en import G2p
>>> g2p = G2p()
>>> ''.join(g2p('8 p.m.'))
'EY1T PIY1 . EH1M .'