Assignments of Coursera National Language Processing by Michael Collins Columbia University ---- H1: Hidden Markov Models ---- Instruction refer to h1/h1.pdf hmm.py Hmm_ex, extending Hmm, calculates and stores: * e(x|y), * q(y_i|y_i-1, y_i-2) * count(x), * rare_word, * all tags * all words SimpleTagger does simple tagging as instructed by Part 1 ViterbiTagger does Viterbi tagging as instructed by Part 2 p1.py Part 1 p2.py Part 2 p3.py Part 3 not as good as required: Your F1-Score is 35.009 and the goal F1-Score is 39.519. util.py Helper methods including * handling rare word (applying different rules) * test data iterator ---- H2: Probabilistic Context-Free Grammar (PCFG) ---- Instruction refer to h2/h2.pdf pcfg.py PCFG, extending Count, calculate and store * q(X->Y1Y2) * q(X->w) CKYTagger implements CKY algorithm p1.py Part 1 p2.py Part 2 Expected development total F1-Scores are 0.79 for part 2 and 0.83 for part 3. p3.py Part 3 ---- H3: IBM Model 1 & 2 ---- Instruction refer to h3/h3.pdf ibmmodel.py Count * t(f|e) IBMModel1, implements EM and align algorithm p1.py Part 1 The expected development F-Scores are 0.420, 0.449, and a basic intersection alignment should give 0.485 for the last part. ---- H4: GLM ---- glm.py Par1 ``` Found 1337 GENEs. Expected 642 GENEs; Correct: 280. precision recall F1-Score GENE: 0.209424 0.436137 0.282971 ``` Par2 ``` Found 775 GENEs. Expected 642 GENEs; Correct: 390. precision recall F1-Score GENE: 0.503226 0.607477 0.550459 ``` Par3 ``` Found 571 GENEs. Expected 642 GENEs; Correct: 366. precision recall F1-Score GENE: 0.640981 0.570093 0.603462 ```
hankcs/Coursera_NLP_MC
Coursera Natural Language Processing by Michael Collins Columbia University
JavaScript