stanfordnlp/CoreNLP

ChineseHeadFinder: dictionary key 'INTJ' repeated with different values

tanloong opened this issue · 3 comments

In ChineseHeadFinder.java, the key "INTJ" is duplicated with different values, at line 57 and line 101.

Is this duplication a bug or intended behavior? Sorry for the inconvenience if it is intended.

Clearly a bug, as it is clobbering the old entry, which was

    nonTerminalInfo.put("INTJ", new String[][]{{right, "INTJ", "IJ", "SP"}});

The new entry makes it left headed (except for punct). Do you have any insight into which is better?

In CTB 5.1, all INTJ nodes are for single words, such as

(INTJ (IJ 唉呀))

except for this, which would appear to be a mistake based on the bracketing of the punctuation:

(IP 
  (INTJ (PU 「) (IJ 嘿咻))
  (PU !) 
  (PU 」)
  ...

I don't have CTB 9 lying around, but I will ask the people in charge of such things to put it on our cluster.

Thanks for the quick response!

I must admit that I don't have prior knowledge about CTB (and I don't have the CTB 9 neither). Therefore, I am unable to determine which value is better😔.