Author: Alex Bowe
Email: alex@alexbowe.com
A program to deploy a Hadoop MapReduce job that extracts noun-phrases using NLTK to POS-tag, chunk using a grammar (from S. N. Kim et al, "Evaluating n-gram based evaluation metrics for automatic keyphrase extraction") and ranks them using TF-IDF.
To clone this repository:
$ git clone http://github.com/alexbowe/keyphrase.git
This will create a directory keyphrase
in your working directory. Note that this won't allow you to submit changes to the master repository.
You must have Hadoop and Dumbo installed. Just type:
./run.sh
This will copy the contents of the text folder to HDFS and format the results to work with our evaluation script.
While debugging, you may want to run it Dumbo in local mode:
./run.sh -l
PROVIDED:
NOT PROVIDED:
Anyone can use my work however they wish.
NLTK is distributed under the Apache License Version 2.0. PyYAML is distributed under the MIT License.