PyThaiNLP/pythainlp

PyThaiNLP 2.3 change log

bact opened this issue · 1 comments

bact commented

Schedule

  • First development release: 16 March 2021
  • Beta release: 23 March 2021
  • Production release: 30 March 2021

Docs: https://pythainlp.github.io/docs/2.3/index.html

See 2.3 Milestone.

Deprecation and other API changes

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

POS Tagging

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update: ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add: ThaiNameTagger version and add ThaiNER 1.4 support

Transliteration

  • #485 Fix: romanize failed in some examples
  • #511 Add: Thai W2P (Thai Word-to-Phoneme converter)

Text summarization

  • #523 Add: mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add: pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

wangchanberta

  • #540 Add: wangchanberta (pythainlp.wangchanberta)

Update Schedule