wanghm92/Sing_Par

Query: Singlish POS Tagger model

lancetansg opened this issue · 5 comments

Hi @wanghm92

  1. Would like to understand how singlish_posTagger.model is used to train the dependency parser.

  2. Where to obtain more of the Singlish dataset?

1: Please go to https://github.com/jiesutd/NNHetSeq and use singlish_posTagger.model for the POS tagger model

2: Please go to this branch https://github.com/wanghm92/Sing_Par/tree/ud_tf0.12/Singlish/treebank

Hi @wanghm92

  1. If I want to use singlish_posTagger.model to tag a input sentence, how should I go about that? Appreciate your help on that

  2. Understood, just curious, where did you get the data from? Manual labeling?

You may refer to https://github.com/jiesutd/NNHetSeq/blob/master/example/run_stack.sh as an example of running a tagger.

First you need to convert your data into something similar to this https://github.com/jiesutd/NNHetSeq/blob/master/example/pd/pd.dev.nn.sample

Basically, it is one word per line, with trailing characters, sentences are separated by empty lines.

The tagger model is built on https://github.com/SUTDNLP/LibN3L, which requires such format of inputs, as exemplified by https://github.com/SUTDNLP/NNNamedEntity

Seems to be a bit complicated to figure out how to use the legacy code bases.

An alternative is that you may want to re-implement the base POS tagger with modern platforms such as Tensorflow and Pytorch, or Keras. The network structure is simple and relatively clearly stated in the paper.

Or, you don't even need the tagger, since the treebank with auto POS tags are provided for reproducibility.

As the paper described, the raw sentences are crawled from local forums. The treebank is manually labeled.

Thanks for the clarification