/Coursera_NLP_MC

Coursera Natural Language Processing by Michael Collins Columbia University

Primary LanguageJavaScript

Assignments of Coursera National Language Processing by Michael Collins Columbia University
----

H1: Hidden Markov Models
----
Instruction refer to h1/h1.pdf

hmm.py
    Hmm_ex, extending Hmm, calculates and stores:
        * e(x|y), 
        * q(y_i|y_i-1, y_i-2)
        * count(x), 
        * rare_word, 
        * all tags 
        * all words
    SimpleTagger does simple tagging as instructed by Part 1
    ViterbiTagger does Viterbi tagging as instructed by Part 2    
p1.py
    Part 1
p2.py
    Part 2
p3.py
    Part 3
    not as good as required: Your F1-Score is 35.009 and the goal F1-Score is 39.519.
util.py
    Helper methods including
        * handling rare word (applying different rules)
        * test data iterator

----

H2: Probabilistic Context-Free Grammar (PCFG)
----
Instruction refer to h2/h2.pdf

pcfg.py
    PCFG, extending Count, calculate and store
        * q(X->Y1Y2)
        * q(X->w)
    CKYTagger implements CKY algorithm
p1.py
    Part 1
p2.py
    Part 2
    Expected development total F1-Scores are 0.79 for part 2 and 0.83 for part 3. 
p3.py
    Part 3

----
H3: IBM Model 1 & 2
----
Instruction refer to h3/h3.pdf

ibmmodel.py
    Count
        * t(f|e)
    IBMModel1, implements EM and align algorithm


p1.py
    Part 1

The expected development F-Scores are 0.420, 0.449, and a basic intersection alignment should give 0.485 for the last part.

----
H4: GLM
----

glm.py
Par1

```
Found 1337 GENEs. Expected 642 GENEs; Correct: 280.
 
	 precision 	recall 		F1-Score
GENE:	 0.209424	0.436137	0.282971
```

Par2

```
Found 775 GENEs. Expected 642 GENEs; Correct: 390.
 
	 precision 	recall 		F1-Score
GENE:	 0.503226	0.607477	0.550459
```

Par3

```
Found 571 GENEs. Expected 642 GENEs; Correct: 366.
 
	 precision 	recall 		F1-Score
GENE:	 0.640981	0.570093	0.603462
```