Capitalize proper nouns in english sentences.
Provided dictionary proper_nouns.json
is extracted from two corpus: Gutenberg project, Wikipedia-dump, which include 8 billion words.
Gather all words from text files.
python gather_dictionary.py ${DICTIONARY_SAVEPATH} ${TEXT_1} ${TEXT_2} ...
Refinement phase.
- Limit the length of keys
- Limit the number of keys
- Filter low-scored keys
python refine_dictionary.py ${DIC_PATH} ${SAVE_PATH}
Custom your own function in capitalize.py
: extract_target
, envelop_target
These are used for handling various format of text files.
Capitalize proper nouns in target text.
python capitalize.py ${DIC_PATH} ${TARGET_PATH} ${SAVE_PATH}