/Text-Norm

Code for the [EMNLP 2013 paper]

Primary LanguageJava

Java codes for EMNLP paper: A Log-Linear Model for Unsupervised Text Normalization

The training file has the following format:
<s> standardToken1 standardToken2 ... nonstandardToken1:nonstd ... </s>

For example,
<s> hello , nice 2:nonstd meet u:nonstd ! </s>

SGD training codes will be provided later.