/jcrfsuite

Java interface for CRFsuite: http://www.chokkan.org/software/crfsuite/

Primary LanguageJavaApache License 2.0Apache-2.0

This is a Java interface for crfsuite, a fast implementation of Conditional Random Fields, using SWIG and class injection technique (the same technique used in snappy-java)

Jcrfsuite can be dropped into any Java web applications and run without problem with JVM's class loader.

Maven dependency

<dependency>
  <groupId>com.github.vinhkhuc</groupId>
  <artifactId>jcrfsuite</artifactId>
  <version>0.6</version>
</dependency>

License

Jcrfsuite is released under the Apache License 2.0. The original crfsuite is distributed under the BSD License.

Example on Twitter Part-of-Speech (POS) tagging

1) Training

To train a POS model from Twitter POS data, run

java -cp target/jcrfsuite-*.jar com.github.jcrfsuite.example.Train example/tweet-pos/train-oct27.txt twitter-pos.model
2) Tagging

To test the trained POS model against the test set, run

java -cp target/jcrfsuite-*.jar com.github.jcrfsuite.example.Tag twitter-pos.model example/tweet-pos/test-daily547.txt

The output should be as follows:

Gold	Predict	Probability
........................
N       N       0.99
P       P       1.00
Z       ^       0.59
$       $       0.97
N       N       1.00
P       P       0.98
A       N       0.80
$       $       1.00
N       N       0.99
U       U       1.00

Accuracy = 92.99%

Note that the accuracy might be slightly different than in the above output.