PyThaiNLP 5.0 Change Log
wannaphong opened this issue · 2 comments
wannaphong commented
Schedule
- First Beta release: 5 February 2024
- Production release: 10 February 2024
See 5.0 Milestone.
What is new?
License information
- Use SPDX license identifier at the header of source code #876
Deprecation and other API changes
- Change default NER to thainer-v2 5e97e7c
- Move
pythainlp.util.is_native_thai
topythainlp.morpheme.is_native_thai
524759a
Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841
New API
- Add
pythainlp.coref
for Thai coreference resolution #802 - Add
wtpsplit
to sentence segmentation & paragraph segmentation #804 and addparagraph_threshold
intoparagraph_tokenize()
function #806 - Add word approximation to
pythainlp.soundex.sound
#809 by @wannaphong - Add
pythainlp.wsd
for Thai word sense disambiguation #818 by @wannaphong - Add
pythainlp.chat
andWangChanGLM
topythainlp.generate
#819 by @wannaphong - Add
pythainlp.cls
a param-free classification model #821 by @c4n - Add
pythainlp.el
entity linking #822 by @wannaphong - Add
pythainlp.ancient
by @wannaphong in #833 - Add
pythainlp.util.rhyme
by @wannaphong in #849 - Add:
remove_trailing_repeat_consonants
by @konbraphat51 in #862 - Add
pythainlp.util.to_idn
by @wannaphong in #875 - Add
pythainlp.corpus.find_synonyms
by @wannaphong in #890 - Add
pythainlp.util.morse
by @wannaphong in #891 - Add
pythainlp.morpheme
by @wannaphong in #896
Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenize
funtion to split Thai text to a paragraph #804 - Add
paragraph_threshold
intoparagraph_tokenize()
function #806 by @pavaris-pm in - Add 🪿 Han-solo by @wannaphong in #830
- Fix
newmm
to better handle non-Thai characters in tokens #856 by @konbraphat51 - Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanberta_thai_grammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
Tag
- add function for pos tag with transformers by @MpolaarbearM in #857
- Update pos_tag_transformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873
Chat
- Fixed bug #828
Translate
- Add small100 to
pythainlp.translate
#815 by @wannaphong
Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
Corpus
- Add
pythainlp.corpus.thai_orst_words()
Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong - Add
pythainlp.corpus.thai_wikipedia_titles()
Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51 - Add
pythainlp.corpus.thai_volubilis_words()
Thai word list from Volubilis dictionary #870 by @konbraphat51 - Add
pythainlp.corpus.thai_icu_words()
Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm - Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
Util
- Add
pythainlp.util.encoding
#813 by @wannaphong - Add
pythainlp.util.spell_words
#817 by @wannaphong - Add
pythainlp.util.remove_trailing_repeat_consonants()
#862 by @konbraphat51
New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
Full Changelog: v4.0.2...v5.0.0
Contributors
Thanks all the contributors. (Image made with contributors-img)
If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.
wannaphong commented
- The next beta release of PyThaiNLP 4.1 will release after end hacktoberfest 2023.
wannaphong commented
PyThaiNLP have major change about tokenizer (see #856), so the next release of PyThaiNLP will change to PyThaiNLP v5.0!