`rnlp`

Relational NLP Preprocessing: A Python package and tool for converting text into a set of relational facts.

Kaushik Roy (@kkroy36) and Alexander L. Hayes (@batflyer)

Installation

Stable builds on PyPi

pip install rnlp

Development builds on GitHub

pip install git+git://github.com/starling-lab/rnlp.git

Some modules in nltk need to be available:

import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')

Quick-Start

rnlp can be used either as a command line interface (CLI) tool or as an imported Python Package.

CLI

Imported

$ python -m rnlp -f example_files/doi.txt
Reading corpus from file(s)...
Creating background file...
100%|████████| 18/18 [00:00<00:00, 38it/s]

from rnlp.corpus import declaration
import rnlp

doi = declaration()
rnlp.converter(doi)

Text will be converted into relational facts, relations encoded are:

between sentences and the surrounding block of n sentences.
between words and the surrounding sentence.
between words within the surrounding sentence.

---

The relationships currently encoded are:

earlySentenceInBlock - sentence occurs within a third of the block length
earlyWordInSentence - word occurs within a third of the sentence length
lateSentenceInBlock - sentence occurs after two-thirds of the block length
midWayWordInSentence - word occurs between a third and two-thirds of the block length
nextSentenceInBlock - sentence that follows a sentence in a block
nextWordInSentence - word that follows a word in a sentence in a block
sentenceInBlock - sentence occurs in a block
wordInSentence - word occurs in a sentence.
wordString - the string contained in the word.
partOfSpeech - the part of speech of the word.

---

Files contain a toy corpus (example files/) and an image of a BoostSRL tree for predicting if a word in a sentence is the word "you".

The tree says that if the word string contained in word 'b' is "you" then 'b' is the word "you" with a high probability. (This is of course true). A more interesting inference is the False branch that says that if word 'b' is an early word in sentence 'a' and word 'anon12035' is also an early word in sentence 'a' and if the word string contained in word 'anon12035' is "Thank", then the word 'b' has decent chance of being the word "you". (The model was able to learn that the word "you" often occurs with the word "Thank" in the same sentence when "Thank" appears early in that sentence).

leodd/rnlp

rnlp

Installation

Quick-Start

`rnlp`