/webpage_categorization

A Chinese Webpage Title Text Categorization Tool 中文网页标题分类工具(短文本分类)

Primary LanguageC

A Chinese Webpage Title Categorization Tool

中文网页标题分类工具(短文本分类)

dependency:

  • gcc>=4.9
  • other library has been embedded in the project, including: jieba Chinese text segmentation, libSVM and sqlite.

hints:

  • For categorizing short text, 20 words are recommanded for best performance.

  • This is just a practise which was accomplished when I was a post-graduate. For the god's sake, don't blame me for these messy code.

  • The output id-category relation

      1. economy 经济金融
      2. education 教育
      3. entertainment 娱乐八卦
      4. sports 体育
      5. IT 科技