why just check the first character in word?
Opened this issue · 1 comments
xuehui1991 commented
Hey I just wonder that why just only check the first character in word in Tokenizer.java?
public boolean ifWords_Eng(String tmpWord)
{
if (tmpWord.charAt(0)>='A' && tmpWord.charAt(0)<='Z') return true;
if (tmpWord.charAt(0)>='a' && tmpWord.charAt(0)<='z') return true;
return false;
}
sodawater commented
Sorry for replying you so late.
It's a naive filter for non-word. You can check the all characters. But words like "U.S.A." might be filtered. Usually it won't affect the outcome...