/HKSeg

Primary LanguagePython

HKSeg

It's a collaborative ongoing work.
† Mainly contributed by Minghao
‡ Mainly contributed by Randy

Dictionary

  • dict_celebrity †
    • Hong Kong celebrity
  • dict_institution †
  • dict_location_hk †
    • Includes locations related to Hong Kong. They don't have to be in Hong Kong physically.
  • dict_politics ‡
  • dict_covid19_vaccine †
  • dict_covid19 †
  • dict_general_en ‡ †
  • dict_general_zh ‡ †

Stopword

  • ad_words †
    • Summaried keywords related to advertisement and promotional contents appears frequently in Hong Kong public media.
    • Keywords are in regular expression format.
  • stopwords_canto ‡ †
    • Stopwords in Cantonese.
  • stopwords_en ‡ †
    • Stopwords in English.
  • stopwords_simple †
    • A simplified list of Cantonese stopwords. Numbers are included.

jieba tools

  • jieba_formatter.py †
    • A script made for solving a mistake when loading user-defined dictionary with jieba (jieba3k 0.35.1).