RHS Database Pipeline

A small project to kickstart my experimental data pipeline.

Step by step process:

  1. Extracted data using Selenium and Webdriver
  2. Cleaned data using Pandas Dataframe
  3. Uploaded SQL database to Amazon RDS and images to S3 data lake.

Motivation behind this project:

To collect as much information as possible on local plants in the UK and enable new commercial applications for horticulturists and gardeners.

Credit and data ownership:

To the Royal Horticultural Society for making the plant database publically available.