pmarkovic/comp_ling

My work for the Computational Linguistics course

Jupyter Notebook

Computational Linguistics class

Assignment 1

Empirically verified Zipf’s law using the following freely available corpora: King James Bible, The Jungle Book and SETIMES Turkish-Bulgarian parallel newspaper text.
Reimplementation the “Dissociated Press” system that generates random text from an n-gram model over a corpus.

Assignment 2

Implementation of a bigram part-of-speech (POS) tagger based on Viterbi algorithm and Hidden Markov Models from scratch.

Assignment 3

Implementation of the Cocke-Kasami-Younger (CKY) algorithm for bottom-up CFG parsing, and apply it to the word and the parsing problem of English.

Assignment 4

Implementation of the IBM Model 1 word aligner for statistical machine translation between 100.000 English-French sentence pairs. Additionally, compared results with a simple baseline and fast_align implementation.