Project Description
This item is used for word spell checking.
Support English word spelling detection, and Chinese spelling detection.
Feature description
Support English word correction
-
1000X faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
-
You can quickly determine whether the current word is spelled incorrectly
-
Can return the best match result
-
You can return to the corrected matching list, support specifying the size of the returned list
-
Error message support i18n
-
Support uppercase and lowercase, full-width and half-width formatting
-
Support custom thesaurus
Support basic Chinese spelling check
Change log
Quick start
JDK version
Jdk 1.7+
maven introduction
<dependency>
<groupId>com.github.houbb</groupId>
<artifactId>word-checker</artifactId>
<version>0.1.0</version>
</dependency>
Test Case
According to the input, the best correction result is automatically returned.
final String speling = "speling";
Assert.assertEquals("selling", EnWordCheckers.correct(speling));
Core api introduction
The core api is under the EnWordCheckers
tool class.
Function | Method | Parameters | Return Value | Remarks |
---|---|---|---|---|
Determine whether the spelling of the word is correct | isCorrect(string) | The word to be detected | boolean | |
Return the best corrected result | correct(string) | The word to be detected | String | If no word that can be corrected is found, then return itself |
Determine whether the spelling of the word is correct | correctList(string) | The word to be detected | List | Return a list of all matching corrections |
Determine whether the spelling of the word is correct | correctList(string, int limit) | The word to be detected, the size of the returned list | Return the corrected list of the specified size | List size <= limit |
Test example
Is the spelling correct?
final String hello = "hello";
final String speling = "speling";
Assert.assertTrue(EnWordCheckers.isCorrect(hello));
Assert.assertFalse(EnWordCheckers.isCorrect(speling));
Return the best match result
final String hello = "hello";
final String speling = "speling";
Assert.assertEquals("hello", EnWordCheckers.correct(hello));
Assert.assertEquals("selling", EnWordCheckers.correct(speling));
Corrected the match list by default
final String word = "goo";
List<String> stringList = EnWordCheckers.correctList(word);
Assert.assertEquals("[good, goo, goon, goof, gobo, gook, goop]", stringList.toString());
Specify the size of the corrected match list
final String word = "goo";
final int limit = 2;
List<String> stringList = EnWordCheckers.correctList(word, limit);
Assert.assertEquals("[go, good]", stringList.toString());
Chinese spelling correction
Core api
In order to reduce learning costs, the core api and ZhWordCheckers
are consistent with English spelling detection.
Is the spelling correct?
final String right = "正确";
final String error = "万变不离其中";
Assert.assertTrue(ZhWordCheckers.isCorrect(right));
Assert.assertFalse(ZhWordCheckers.isCorrect(error));
Return the best match result
final String right = "正确";
final String error = "万变不离其中";
Assert.assertEquals("正确", ZhWordCheckers.correct(right));
Assert.assertEquals("万变不离其宗", ZhWordCheckers.correct(error));
Corrected the match list by default
final String word = "万变不离其中";
List<String> stringList = ZhWordCheckers.correctList(word);
Assert.assertEquals("[万变不离其宗]", stringList.toString());
Specify the size of the corrected match list
final String word = "万变不离其中";
final int limit = 1;
List<String> stringList = ZhWordCheckers.correctList(word, limit);
Assert.assertEquals("[万变不离其宗]", stringList.toString());
Formatting
Sometimes the user's input is various, this tool supports the processing of formatting.
Case
Uppercase will be uniformly formatted as lowercase.
final String word = "stRing";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
Full-width half-width
Full-width will be uniformly formatted as half-width.
final String word = "string";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
Custom English Thesaurus
File configuration
You can create the file resources/data/define_word_checker_en.txt
in the project resource directory
The content is as follows:
my-long-long-define-word,2
my-long-long-define-word-two
Different words are on their own lines.
The first column of each row represents the word, and the second column represents the number of occurrences, separated by a comma ,
.
The greater the number of times, the higher the return priority when correcting. The default value is 1.
User-defined thesaurus has a higher priority than the built-in thesaurus of the system.
Test code
After we specify the corresponding word, the spelling check will take effect.
final String word = "my-long-long-define-word";
final String word2 = "my-long-long-define-word-two";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
Assert.assertTrue(EnWordCheckers.isCorrect(word2));
Custom Chinese Thesaurus
File configuration
You can create the file resources/data/define_word_checker_zh.txt
in the project resource directory
The content is as follows:
默守成规 墨守成规
Use English spaces to separate, the front is wrong, and the back is correct.
Long text mixed in Chinese and English
Condition
The actual spelling of the story, the best user experience is a long text entered by the user, and it may be a mixture of Chinese and English.
Then realize the corresponding functions mentioned above.
Core method
The WordCheckers
tool class provides the automatic function of mixing Chinese and English long texts.
Function | Method | Parameters | Return Value | Remarks |
---|---|---|---|---|
Determine whether the spelling of the word is correct | isCorrect(string) | The word to be detected | boolean | |
Return the best corrected result | correct(string) | The word to be detected | String | If no word that can be corrected is found, then return itself |
Determine whether the spelling of the text is correct | correctMap(string) | The text to be detected | Map<String, List<String>> |
Return a list of all matching corrections |
Determine whether the spelling of the text is correct | correctMap(string, int limit) | The text to be detected, the size of the returned list | Return the corrected list of the specified size | List size <= limit |
Is the spelling correct?
final String hello = "hello 你好";
final String speling = "speling 你好 以毒功毒";
Assert.assertTrue(WordCheckers.isCorrect(hello));
Assert.assertFalse(WordCheckers.isCorrect(speling));
Return the best corrected result
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("hello 你好", WordCheckers.correct(hello));
Assert.assertEquals("selling 你好以毒攻毒", WordCheckers.correct(speling));
Determine whether the spelling of the text is correct
Each word corresponds to the correction result.
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("{hello=[hello], =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello).toString());
Assert.assertEquals("{ =[ ], speling=[selling, spewing, sperling, seeling, spieling, spiling, speeling, speiling, spelding], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling).toString());
Determine whether the spelling of the text is correct
Same as above, specify the maximum number of returns.
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("{hello=[hello], =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello, 2).toString());
Assert.assertEquals("{ =[ ], speling=[selling, spewing], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling, 2).toString());
NLP 开源矩阵
Late Road-Map
-
Support English word segmentation and process the entire English sentence
-
Support Chinese word segmentation spelling detection
-
Introduce Chinese error correction algorithm, homophone characters and similar characters processing.
-
Support Chinese and English mixed spelling detection
Technical Acknowledgements
Words provides raw English word data.