/bedrock

Bedrock is a high-level text pre-processing API, written in Python and can run on NLTK or Spacy as its backends.

Primary LanguagePythonMIT LicenseMIT

Bedrock

Build Status

You have discovered bedrock

Bedrock is a high-level text pre-processing API, written in Python and can run on Spacy as its backend. It allows you to quickly perform the text processing groundwork without having. It does the menial work, so you don't have to.

Use this library if you find the following highlights useful:

  • Fast prototyping
  • Switching between different backends
  • Work in batches, rather than writing loops
  • Support for DataFrame and CAS xmi inputs/outputs

Install bedrock in a jiffy:

pip install bedrock
bedrock download de

From zero to bedrock hero in 10 seconds

Now you can run

from bedrock.pipeline import Pipeline
Pipeline(language='de').parse_text("Hello world").get_docs()

Congrats! 🎉

Engines and Languages

Currently bedrock supports spacy as its background engine.

And the following languages and corresponding download arguments:

  • English ('en' or 'english')
  • German ('de', 'german' or 'deutsch')
  • German ('fr' or 'french')

Installation and usage

Package installation

pip install bedrock

Install support for all languages:

bedrock download all

Install support only for English:

bedrock download en

Install support for German:

bedrock download de

Install support for French:

bedrock download fr

Import modules from package in your code:

from bedrock.pipeline import Pipeline                        # Processing texts
from bedrock.annotator.annotator import Annotator            # Annotator interface
from bedrock.annotator.dictionary_annotator import DictionaryAnnotator # Prebuilt dictionary annotator
from bedrock.annotator.regex_annotator import RegexAnnotator # Prebuild regex annotator