Project Description

中文文档

This item is used for word spell checking.

Support English word spelling detection, and Chinese spelling detection.

Feature description

Support English word correction

1000X faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
You can quickly determine whether the current word is spelled incorrectly
Can return the best match result
You can return to the corrected matching list, support specifying the size of the returned list
Error message support i18n
Support uppercase and lowercase, full-width and half-width formatting
Support custom thesaurus

Support basic Chinese spelling check

Change log

Change Log

Quick start

JDK version

Jdk 1.7+

maven introduction

<dependency>
     <groupId>com.github.houbb</groupId>
     <artifactId>word-checker</artifactId>
    <version>0.1.0</version>
</dependency>

Test Case

According to the input, the best correction result is automatically returned.

final String speling = "speling";
Assert.assertEquals("selling", EnWordCheckers.correct(speling));

Core api introduction

The core api is under the EnWordCheckers tool class.

Function	Method	Parameters	Return Value	Remarks
Determine whether the spelling of the word is correct	isCorrect(string)	The word to be detected	boolean
Return the best corrected result	correct(string)	The word to be detected	String	If no word that can be corrected is found, then return itself
Determine whether the spelling of the word is correct	correctList(string)	The word to be detected	List	Return a list of all matching corrections
Determine whether the spelling of the word is correct	correctList(string, int limit)	The word to be detected, the size of the returned list	Return the corrected list of the specified size	List size <= limit

Test example

See EnWordCheckerTest.java

Is the spelling correct?

final String hello = "hello";
final String speling = "speling";
Assert.assertTrue(EnWordCheckers.isCorrect(hello));
Assert.assertFalse(EnWordCheckers.isCorrect(speling));

Return the best match result

final String hello = "hello";
final String speling = "speling";
Assert.assertEquals("hello", EnWordCheckers.correct(hello));
Assert.assertEquals("selling", EnWordCheckers.correct(speling));

Corrected the match list by default

final String word = "goo";
List<String> stringList = EnWordCheckers.correctList(word);
Assert.assertEquals("[good, goo, goon, goof, gobo, gook, goop]", stringList.toString());

Specify the size of the corrected match list

final String word = "goo";
final int limit = 2;
List<String> stringList = EnWordCheckers.correctList(word, limit);
Assert.assertEquals("[go, good]", stringList.toString());

Chinese spelling correction

Core api

In order to reduce learning costs, the core api and ZhWordCheckers are consistent with English spelling detection.

Is the spelling correct?

final String right = "正确";
final String error = "万变不离其中";

Assert.assertTrue(ZhWordCheckers.isCorrect(right));
Assert.assertFalse(ZhWordCheckers.isCorrect(error));

Return the best match result

final String right = "正确";
final String error = "万变不离其中";

Assert.assertEquals("正确", ZhWordCheckers.correct(right));
Assert.assertEquals("万变不离其宗", ZhWordCheckers.correct(error));

Corrected the match list by default

final String word = "万变不离其中";

List<String> stringList = ZhWordCheckers.correctList(word);
Assert.assertEquals("[万变不离其宗]", stringList.toString());

Specify the size of the corrected match list

final String word = "万变不离其中";
final int limit = 1;

List<String> stringList = ZhWordCheckers.correctList(word, limit);
Assert.assertEquals("[万变不离其宗]", stringList.toString());

Formatting

Sometimes the user's input is various, this tool supports the processing of formatting.

Case

Uppercase will be uniformly formatted as lowercase.

final String word = "stRing";

Assert.assertTrue(EnWordCheckers.isCorrect(word));

Full-width half-width

Full-width will be uniformly formatted as half-width.

final String word = "string";

Assert.assertTrue(EnWordCheckers.isCorrect(word));

Custom English Thesaurus

File configuration

You can create the file resources/data/define_word_checker_en.txt in the project resource directory

The content is as follows:

my-long-long-define-word,2
my-long-long-define-word-two

Different words are on their own lines.

The first column of each row represents the word, and the second column represents the number of occurrences, separated by a comma ,.

The greater the number of times, the higher the return priority when correcting. The default value is 1.

User-defined thesaurus has a higher priority than the built-in thesaurus of the system.

Test code

After we specify the corresponding word, the spelling check will take effect.

final String word = "my-long-long-define-word";
final String word2 = "my-long-long-define-word-two";

Assert.assertTrue(EnWordCheckers.isCorrect(word));
Assert.assertTrue(EnWordCheckers.isCorrect(word2));

Custom Chinese Thesaurus

File configuration

You can create the file resources/data/define_word_checker_zh.txt in the project resource directory

The content is as follows:

默守成规 墨守成规

Use English spaces to separate, the front is wrong, and the back is correct.

Long text mixed in Chinese and English

Condition

The actual spelling of the story, the best user experience is a long text entered by the user, and it may be a mixture of Chinese and English.

Then realize the corresponding functions mentioned above.

Core method

The WordCheckers tool class provides the automatic function of mixing Chinese and English long texts.

Function	Method	Parameters	Return Value	Remarks
Determine whether the spelling of the word is correct	isCorrect(string)	The word to be detected	boolean
Return the best corrected result	correct(string)	The word to be detected	String	If no word that can be corrected is found, then return itself
Determine whether the spelling of the text is correct	correctMap(string)	The text to be detected	`Map<String, List<String>>`	Return a list of all matching corrections
Determine whether the spelling of the text is correct	correctMap(string, int limit)	The text to be detected, the size of the returned list	Return the corrected list of the specified size	List size <= limit

Is the spelling correct?

final String hello = "hello 你好";
final String speling = "speling 你好 以毒功毒";
Assert.assertTrue(WordCheckers.isCorrect(hello));
Assert.assertFalse(WordCheckers.isCorrect(speling));

Return the best corrected result

final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("hello 你好", WordCheckers.correct(hello));
Assert.assertEquals("selling 你好以毒攻毒", WordCheckers.correct(speling));

Determine whether the spelling of the text is correct

Each word corresponds to the correction result.

final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("{hello=[hello],  =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello).toString());
Assert.assertEquals("{ =[ ], speling=[selling, spewing, sperling, seeling, spieling, spiling, speeling, speiling, spelding], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling).toString());

Determine whether the spelling of the text is correct

Same as above, specify the maximum number of returns.

final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";

Assert.assertEquals("{hello=[hello],  =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello, 2).toString());
Assert.assertEquals("{ =[ ], speling=[selling, spewing], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling, 2).toString());

NLP 开源矩阵

nlp-hanzi-similar 汉字相似度

word-checker 拼写检测

sensitive-word 敏感词

Late Road-Map

Support English word segmentation and process the entire English sentence
Support Chinese word segmentation spelling detection
Introduce Chinese error correction algorithm, homophone characters and similar characters processing.
Support Chinese and English mixed spelling detection

Technical Acknowledgements

Words provides raw English word data.

zhaoxjmail/word-checker

Project Description

Feature description

Support English word correction

Support basic Chinese spelling check

Change log

Quick start

JDK version

maven introduction

Test Case

Core api introduction

Test example

Is the spelling correct?

Return the best match result

Corrected the match list by default

Specify the size of the corrected match list

Chinese spelling correction

Core api

Is the spelling correct?

Return the best match result

Corrected the match list by default

Specify the size of the corrected match list

Formatting

Case

Full-width half-width

Custom English Thesaurus

File configuration

Test code

Custom Chinese Thesaurus

File configuration

Long text mixed in Chinese and English

Condition

Core method

Is the spelling correct?

Return the best corrected result

Determine whether the spelling of the text is correct

Determine whether the spelling of the text is correct

NLP 开源矩阵

Late Road-Map

Technical Acknowledgements