PKULCWM/PKUSUMSUM

why just check the first character in word?

Opened this issue · 1 comments

Hey I just wonder that why just only check the first character in word in Tokenizer.java?

public boolean ifWords_Eng(String tmpWord)
{
if (tmpWord.charAt(0)>='A' && tmpWord.charAt(0)<='Z') return true;
if (tmpWord.charAt(0)>='a' && tmpWord.charAt(0)<='z') return true;
return false;
}

Sorry for replying you so late.
It's a naive filter for non-word. You can check the all characters. But words like "U.S.A." might be filtered. Usually it won't affect the outcome...