Saucer: A Python repository from drvenabili

Saucer is a simple wrapper for spaCy NLP framework (v2.1.3) with S3 support for model storage.
Created mainly to fit spaCy and its en_core_web_lg model in AWS Lambda.
Saucer was made to "live" within Lambda, heavily relying on S3 to load related files.
Note that in case executed locally, a slow connection can greatly affect the performance

Below example is quite similar to the native spaCy example.
In addition the new init_s3_connection(), note set_data_path() and load() both use default locations.

import spacy

#Initialize connection with S3, where prefix contains 'en_core_web_xx-2.1.0'
#Currently supporting sm, md and lg as models
spacy.init_s3_connection('s3_bucket', 's3_prefix', 'en_core_web_lg-2.1.0')
spacy.util.set_data_path()
nlp = spacy.load()

# Process whole documents
text = (u"When Sebastian Thrun started working on self-driving cars at "
        u"Google in 2007, few people outside of the company took him "
        u"seriously. “I can tell you very senior CEOs of major American "
        u"car companies would shake my hand and turn away because I wasn’t "
        u"worth talking to,” said Thrun, now the co-founder and CEO of "
        u"online higher education startup Udacity, in an interview with "
        u"Recode earlier this week.")
doc = nlp(text)

# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

# Determine semantic similarities
doc1 = nlp(u"my fries were super gross")
doc2 = nlp(u"such disgusting fries")
similarity = doc1.similarity(doc2)
print(doc1.text, doc2.text, similarity)

To use:

Download the desired en_core_web model (either SM, MD or LG)
Extract the model
Upload the entire en_core_web_xx-2.1.0/en_core_web_xx/en_core_web_xx-2.1.0 directory to S3, where xx is the model you selected earlier
Specify the bucket, prefix and your model using spacy.init_s3_connection as in the example above.

spaCy vs Saucer in Lambda - Benchmark

drvenabili/Saucer