/TextProcessing

Primary LanguageJupyter Notebook

TextProcessing

  1. generate word feature
  2. generate sentence feature

---------NLP指标----------

  • sum_nsyl:音节数
  • mean_nsyl:平均音节数
  • median_bncSpkFreq:句子的词频的中位数(BNC spoken english)
  • mean_idf:句子单词的平均idf
  • nterm:单词数
  • type_token_ratio:形符比

---------可读性指标----------

Citation:https://github.com/mmautner/readability

  • ARI
  • ColemanLiauIndex
  • FleschKincaidGradeLevel
  • FleschReadingEase
  • GunningFogIndex
  • LIX
  • RIX
  • SMOGIndex
  • ...