/lambda-python-nltk-layer

Lambda layer to enable using famous NLTK python package with AWS lambda

Primary LanguageShell

Lambda Layer for python NLTK package

Lambda layer to enable using famous NLTK python package with lambda. This project is setup to build for python3.7 runtime and download punkt & stopwords to NLTK_DATA directory (Instruction on how to customize this to your needs below in Configuration

Usage

Setup & Build

Clone/Download the repository and run the following command,

$ ./bin/bootstrap
$ ./bin/build

./bin/build will create a python-nltk-layer.zip file inside share folder.

Configuration

  1. Change the python runtime version your project needs in Dockerfile. e.g if you need to build for python version 3.8 search & replace all occurances of 3.7 to 3.8 in the Dockerfile
  2. Add/update instruction for downloading the NLTK data you need. e.g If you need NLTK brown corpus instead of stopwords you can change this line to RUN python -W ignore -m nltk.downloader brown -d /build/nltk_data

Deploy

You can create a layer in your AWS account in one of two ways,

  1. You can upload the zip file directly in AWS Console, e.g screenshot on how to do that below, Screen Shot 2020-07-26 at 7 49 51 AM

    Or

  2. Assuming you have your AWS CLI setup,more info here. You can run ./bin/deploy to publish the lambda layer.

Lambda Instructions

  1. Configure lambda to use the lambda layer you published above.
  2. Due to the manual setup of NLTK Data, you need to set NLTK_DATA=/opt/nltk_data environment variable for your lambda function. Screen Shot 2020-07-26 at 7 57 33 AM

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/customink/lambda-python-nltk-layer. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.