##Decoder
####CIS526, Machine Translation, HW2
Sean Welleck
This project is related to decoding a source sentence by maximizing the probability of the target sentence.
The project contains three decoders:
- Monotone Decoder
- Stack Decoder
- Greedy Decoder
And functions to combine two decodings.
Run python decode > output.txt
to decode using the default input files, and output the translations to output.txt.
Run python combine -x filename1 -y filename2 > combined.txt
to combine two decoded files, by choosing the higher scoring sentence, and output to combined.txt.
#####Algorithm
- Decode with the monotone decoder.
- Decode with the greedy decoder, using the decodings from (1) as the initial seed decoding.
- Save decodings from (2).
- Decode with the stack decoder.
- Decode with the greedy decoder, using the decodings from (4) as the initial seed decoding.
- Combine decodings from (5) and (2).
#####Other
- Uses a combination of histogram pruning and threshold pruning.
- Uses <= 40 translations per phrase.
#####decoder.py
Contains the decoder implementations in a single Decoder
class.
Contains the top-level user functions decode()
and combine()
.
#####evaluator.py
Contains an Evaluator
class that adapts the grading function.
Used to choose between two sentence translations while combining decodings.
Used as an alternative scoring function for the greedy decoder. Due to performance, I ended up just using my original, simpler scoring function.