In this design project, we would like to design our sequence labelling model for informal texts using the HMM that we have learned in class. We hope that your se- quence labelling system for informal texts can serve as the very first step towards building a more complex, intelligent sentiment analysis system for social media text. Specifically, we will focus on building two NLP systems – a sentiment analysis system as well as a phrase chunking system for Tweets. The files for this project are in the files EN.zip, FR.zip, as well as SG.zip, CN.zip (the latter two will be available on 9 Nov 2018, after we all have finished part 1). For each dataset, we provide a labelled training set train, an unlabelled development set dev.in, and a labelled development set dev.out. The labelled data has the format of one token per line with token and tag separated by tab and a single empty line that separates sentences.
Implement HMM, calculate a, b.
Implement Viterbi.
Implement HMM with second-order dependencies.
Implement Structured perceptron. Please find more information in ML_Report.pdf.