Program1

fastText, CNN, LSTM model based on TensorFlow or Pytorch

Introduction

This program offers English training set and English testing set for clustering, the format of the data are as follows:

label3 rick denzien rick denzien is a songwriter singer and studio technician from buffalo new york . label14 talking to strange men talking to strange men is a 1987 novel by british writer ruth rendell . label6 porsche 550 the porsche 550 was a sports car produced by porsche from 1953-1956 . label11 mendoncella mendoncella is a genus of flowering plants from the orchid family orchidaceae . label5 gino matrundola gino matrundola ( born july 21 1940 ) is a former politician in ontario canada . ······

In the examples above, "label3" indicate the label of the text "rick denzien rick denzien is a songwriter singer and studio technician from buffalo new york". There is no need to know the exact meaning of label3, because it won't affect the final result.

Requirement

Algorithm Design

Develop three algorithm models about fastText, CNN and LSTM respectively, based on ThsorFlow or Pytorch.

Outcome Evaluation

The accuracy should larger than 85%. The F1 value should not be less than 0.8.

Code Submission

All of the code should be submit on GitHub with lucid introduction.

Report

Basic Information

fastText

reference link

CNN