Location Extractor from Text Documents

Setup & Requirements

pip3 install -r requirements.txt

Running the project

python3 driver.py

Introduction

This project attempts to detect words representing locations from natural text documents using a supervised learning algorithm.

For example, in the following sentence, our extractor will attempt to POSITIVELY classify the instances in bold as locations. It will also attempt to ~DISCARD the instance in ~~Strikethrough~~, which look like locations but are actually not as per the context of the sentence.

WASHINGTON — Smartphone users in Russia can no longer download the LinkedIn app on iPhone or Android devices, following a similar move in China to block The ~~New York~~ Times app on iPhones.

DataSet Used

Kaggle News DataSet which contains well-formed sentences. Also, documents contain only plain text in English.

Stages

Stage 1: Information extraction from natural text.
Stage 2: Crawling and extracting structured data from Web pages. (To be done)
Stage 3: Entity matching. (To be done)
Stage 4: Integrating and performing analysis. (To be done)

Project Website

https://sites.google.com/view/data-science-project/home

calvincodes/location-extractor-from-text-documents