------------------------------------------------------------------------------------------------------------------------ 6320.501: Natural Language Processing ------------------------------------------------------------------------------------------------------------------------ Author Name: Dhwani Raval Project Name: Brill's Tagger Version: 0.1 Programming Language: Python 3.6 ------------------------------------------------------------------------------------------------------------------------ Problem Description ------------------------------------------------------------------------------------------------------------------------ Transformation-based POS Tagging: Implement Brill’s transformation-based POS tagging algorithm using ONLY the previous word’s tag to extract the best five (5) transformation rules to: 1. Transform “NN” to “VB” 2. Transform “VB” to “NN” Using the learnt rules, manually fill out the missing POS tags (for the word “control”) in the following sentence: The_DT president_NN wants_VBZ to_TO control_??? the_DT board_NN 's_POS control_??? ------------------------------------------------------------------------------------------------------------------------ About Files ------------------------------------------------------------------------------------------------------------------------ The project contains the following files: 1. sourcecode/Tagger.py: The python file for the given problem description 2. resources/POSTaggedTrainingSet.txt: A training set that has been tagged with POS tags from the Penn Treebank POS tagset 3. output/tuple: A text file created during program execution 4. output/unigram: Text files related to unigrams created during program execution 5. output/tags: Text files related to correct and current tags for words created during program execution 6. output/logs: Log files created during each iteration for top 10 rules 7. output/top10.txt: Top 10 transformation rules 8. readme: A text file containing information about the project ------------------------------------------------------------------------------------------------------------------------ Brill's Tagging Description ------------------------------------------------------------------------------------------------------------------------ Brill's Tagging is used for Part-of-Speech (POS) tagging. It is inductive in nature and is based on Transformation Based Learning (TBL). The basic idea is to iteratively assign the best tag to a word using the learned transformations based upon a set of predefined rules. The goal of this approach is to minimize the error rate in every step. ------------------------------------------------------------------------------------------------------------------------ Running Instructions ------------------------------------------------------------------------------------------------------------------------ 1. Download the project and unzip it in the desired location 2. In IDE, import the project and run Tagger.py 3. In cmd, navigate to the location where Brill_Tagging is unzipped and use the following instruction python sourcecode\Tagger.py
dhwaniraval/Brill_Tagger
Transformation-based POS Tagging: Implemented Brill’s transformation-based POS tagging algorithm using ONLY the previous word’s tag to extract the best five (5) transformation rules to: 1. Transform “NN” to “VB” 2. Transform “VB” to “NN”
Python