Language: Python 3.5
To implement the IBM Model-1 which finds the lexical translation of the words in the corpus by using the EM (Expectation Maximization) algorithm.- The data is present in data1.json, which consists of 5 Franch sentences and the corresponding English translation of each sentence. The data is present as a dictionary.
- Using the data in the dictionary, all the unique French and English words were extracted into 2 lists.
- The EM algorithm is run till convergence is achieved (usually by 10-20 iterations).
- The alignment of the sentences is then printed after EM algorithm completes.
- To run the code, simply run model1.py.
To run the following code, Anaconda needs to be readily installed.