/Chinese-Text-MultiClassification

A repo about the architecture and methods of chinese text multiClassification,which may be include classification algorithms and deep learning method.

Primary LanguagePython

Chinese Text MultiClassification

Description: This project was created for text classification,as we all know, text classication played an importent role in nlp tasks,so it was worth studying with different algorithms,and in this repo we will supply different methods to complete classification,include deep learning and traditional algorithms like svm, naive bayes. The main features of this project that we develop a completed framework of classification from data pre-processing,feature selection, model training and predict, and model update strategy,so that it can be easily migrated and used.

Table of Contents

Traditional Approach

TBD

Feature Selection && Extraction

TBD

Deep Learning Approach

Word Embedding

Text Classification

As the deep learning algorithms develops, we can build a noval text classification easily instead of doing much more feature engineering,bellow I will give some common methods or algorithms to build a classification system,which can achive the state-of-art preformance.

fastText

fastText is an useful library designed by Facebook team, which can be used for word representations and sentence classification.In this post I had create a vanillia demo based on pyfasttext a python package of fastText.

Dependent

pyfasttext a python binding for fastText, you can install it like,

pip install pyfasttext

If you have any install problems you push issues here

Enjoy Demo
bash run.sh

In this bash, fristly it will be download a toy classification data created by myself, second it will train a classification model based on those data, finally it will predict the samples from valid data and write the results into a new file in result.The data contains three labels, every label in each sample start with 'label' which is the default label format as raw fastText, you also can set your own label format by apis.