/yargy

Tiny package for information extraction

Primary LanguagePythonMIT LicenseMIT

Yargy Build Status Build status Documentation Status PyPI

Yargy is a Earley parser, that uses russian morphology for facts extraction process, and written in pure python

Install

Yargy supports both Python 2.7+ / 3.4+ versions including PyPy.

$ pip install yargy

Usage

from yargy import Parser, rule, and_, not_
from yargy.interpretation import fact
from yargy.predicates import gram
from yargy.relations import gnc_relation
from yargy.pipelines import morph_pipeline


Name = fact(
    'Name',
    ['first', 'last'],
)
Person = fact(
    'Person',
    ['position', 'name']
)

LAST = and_(
    gram('Surn'),
    not_(gram('Abbr')),
)
FIRST = and_(
    gram('Name'),
    not_(gram('Abbr')),
)

POSITION = morph_pipeline([
    'управляющий директор',
    'вице-мэр'
])

gnc = gnc_relation()
NAME = rule(
    FIRST.interpretation(
        Name.first
    ).match(gnc),
    LAST.interpretation(
        Name.last
    ).match(gnc)
).interpretation(
    Name
)

PERSON = rule(
    POSITION.interpretation(
        Person.position
    ).match(gnc),
    NAME.interpretation(
        Person.name
    )
).interpretation(
    Person
)

parser = Parser(PERSON)

match = parser.match('управляющий директор Иван Ульянов')
print(match)

And in output you will see something like this:

Person(
    position='управляющий директор',
    name=Name(
        first='Иван',
        last='Ульянов'
)

For more examples, details on grammar syntax, predicates and pipelines see Yargy documentation.

License

Source code of yargy is distributed under MIT license (allows modification and commercial usage)

Support