
Crawler for Cantonese pronunciation data on LSHK Jyutping Word List (香港語言學學會粵拼詞表)

Primary LanguagePython


Crawler for Cantonese pronunciation data on LSHK Jyutping Word List (香港語言學學會粵拼詞表)

See sanitized.txt for the final result.

File structure

  • lshk.py: The crawler
  • result.txt: Raw result output by the crawler
  • sanitize.py: Sanitizer for the result
  • sanitized.txt: Final result output by the sanitizer
  • sanitize_log.txt: Sanitize log


According to the original terms, the dictionary data is distributed under CC BY 4.0.

Python code in this repository is distributed under MIT license.


The link of the word list is now broken. If you are interested in a more up-to-date word list, see rime/rime-cantonese.