Query: Singlish POS Tagger model

Question

Query: Singlish POS Tagger model

lancetansg opened this issue 6 years ago · 5 comments

lancetansg commented 6 years ago

Hi @wanghm92

Would like to understand how singlish_posTagger.model is used to train the dependency parser.
Where to obtain more of the Singlish dataset?

Answer 1 · 2019-03-15T05:57:52.000Z

1: Please go to https://github.com/jiesutd/NNHetSeq and use singlish_posTagger.model for the POS tagger model

2: Please go to this branch https://github.com/wanghm92/Sing_Par/tree/ud_tf0.12/Singlish/treebank

Answer 2 · 2019-03-15T06:23:13.000Z

Hi @wanghm92

If I want to use singlish_posTagger.model to tag a input sentence, how should I go about that? Appreciate your help on that
Understood, just curious, where did you get the data from? Manual labeling?

Answer 3 · 2019-03-15T06:42:28.000Z

You may refer to https://github.com/jiesutd/NNHetSeq/blob/master/example/run_stack.sh as an example of running a tagger.

First you need to convert your data into something similar to this https://github.com/jiesutd/NNHetSeq/blob/master/example/pd/pd.dev.nn.sample

Basically, it is one word per line, with trailing characters, sentences are separated by empty lines.

The tagger model is built on https://github.com/SUTDNLP/LibN3L, which requires such format of inputs, as exemplified by https://github.com/SUTDNLP/NNNamedEntity

Seems to be a bit complicated to figure out how to use the legacy code bases.

An alternative is that you may want to re-implement the base POS tagger with modern platforms such as Tensorflow and Pytorch, or Keras. The network structure is simple and relatively clearly stated in the paper.

Or, you don't even need the tagger, since the treebank with auto POS tags are provided for reproducibility.

Answer 4 · 2019-03-15T06:43:22.000Z

As the paper described, the raw sentences are crawled from local forums. The treebank is manually labeled.

Answer 5 · 2019-03-15T06:49:20.000Z

Thanks for the clarification