/PatternOmatic

Finds linguistic patterns effortlessly - Spacy

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

PatternOmatic 0.2.*

#AI · #EvolutionaryComputation · #NLP

Built with spaCy License: LGPL v3 Build Status Sonar Coverage Duplicated Lines (%) Maintainability Rating GitHub repo size Libraries.io SourceRank PyPI - Downloads PyPI version

Discover spaCy's linguistic patterns matching a given set of string samples

Requirements

Basic usage

From sources

Clone SCM official repository

git clone git@github.com:revuel/PatternOmatic.git

Play with Makefile

  • make venv to activate project's Virtual Environment*
  • make libs to install dependencies
  • make test to run Unit Tests
  • make coverage to run Code Coverage
  • make run to run PatternOmatic's script with example parameters

* you must have one first

From package

Install package

pip install PatternOmatic

Play with the CLI

# Show help 
patternomatic.py -h

# Usage example 1: Basic
patternomatic.py -s Hello world -s Goodbye world

# Usage example 2: Using a different language
python -m spacy download es_core_news_sm
patternomatic.py -s Me llamo Miguel -s Se llama PatternOmatic -l es_core_news_sm

Play with the library

""" 
PatternOmatic library client example.
Find linguistic patterns to be used by the spaCy Rule Based Matcher

"""
from PatternOmatic.api import find_patterns, Config

if __name__ == '__main__':

    my_samples = ['I am a cat!', 'You are a dog!', 'She is an owl!']

    # Optionally, let it evolve a little bit more!
    config = Config()
    config.max_generations = 150
    config.max_runs = 3

    patterns_found, _ = find_patterns(my_samples)

    print(f'Patterns found: {patterns_found}')


Features

Generic

✅ No OS dependencies, no storage or database required!

✅ Lightweight package with just a little direct pip dependencies

✅ Easy and highly configurable to boost clever searches

✅ Includes basic logging mechanism

✅ Includes basic reporting, JSON and CSV format supported. Report file path is configurable

✅ Configuration file example provided (config.ini)

✅ Default configuration is run if no configuration file provided

✅ Provides rollback actions against several possible misconfiguration scenarios

Evolutionary

✅ Basic Evolutionary (Grammatical Evolution) parameters available and configurable

✅ Supports two different Evolutionary Fitness functions

✅ Supports Binary Tournament Evolutionary Selection Type

✅ Supports Random One Point Crossover Evolutionary Recombination Type

✅ Supports "µ + λ" Evolutionary Replacement Type

✅ Supports "µ ∪ λ" with elitism Evolutionary Replacement Type

✅ Supports "µ ∪ λ" without elitism Evolutionary Replacement Type

✅ Typical evolutionary performance metrics included:

  • Success Rate (SR)
  • Mean Best Fitness (MBF)
  • Average Evaluations to Solution (AES)

Linguistic

Compatible with any spaCy Language Model

Supports all spaCy's Rule Based Matcher standard Token attributes

Supports the following spaCy's Rule Based Matcher non standard Token attributes (via underscore)

  • ent_id
  • ent_iob
  • ent_kb_id
  • has_vector
  • is_bracket
  • is_currency
  • is_left_punct
  • is_oov
  • is_quote
  • is_right_punct
  • lang
  • norm
  • prefix
  • sentiment
  • string
  • suffix
  • text_with_ws
  • whitespace

✅ Supports skipping boolean Token attributes

Supports spaCy's Rule Based Matcher Extended Pattern Syntax

Supports spaCy's Rule Based Matcher Grammar Operators and Quantifiers

Supports Token Wildcard

✅ Supports defining the number of attributes per token within searched patterns

✅ Supports usage of non repeated token attribute values


Author: Miguel Revuelta Espinosa (revuel), a humble AI enthusiastic