/aws-nlp-data-pipeline

Pipeline for ingesting and leveraging text data using AWS services

Primary LanguagePythonApache License 2.0Apache-2.0

AWS NLP Data Pipeline

Ingest real-time streaming text data with automatic appending of NLP metadata

Architecture Kibana Dashboard

Overview

This project represents a mostly serverless data engineering architecture for ingesting real-time streaming data and automatically appending NLP metadata via managed AWS services. The project may serve as a baseline for implementing complex ingestion pipelines powering NLP services.

The following AWS services are leveraged:

Deployment

This project leverages GitHub Actions for its CI/CD pipeline. If forking, you can deploy via your own Actions by providing the following Secrets in your repository:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_REGION_ID
  • IP_ADDRESS

Example

A dataset for demonstration purposes has been provided. Use the following script to send example data to the Ingest Lambda for processing.

python stream.py