/cjsurf

Lahinch surf predictions with Hopsworks

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

cjsurf-surf-report

cjsurf-swell-batch-predict

Cjsurf

A serverless analytical ML system tha predicts surf (wave) heights at Lahinch Beach, Ireland:

Lahinch

Operated using only Free Serverless Services

  1. Hopsworks: Features, models, and assets are stored on https://app.hopsworks.ai
  2. Github Actions: Two feature pipelines and a batch prediction pipeline are executed in total five times per day using GitHubActions.
  3. Github Pages: The latest predictions are published on the github pages site.

The model training notebook was run manually in Colab, and can be run again at any time, using the new training data that has been collected since the last training run.

Architecture

CJSurf is written entirely in Python.

CJSurf Architecture

Details

Requirements: Create accounts on app.hopsworks.ai, github.com, streamlit.io.

Files:

  1. Github Actions files: .github/workflows/*.yml - they run the notebooks below on 6 hr and 24 hr schedules using bash scripts.
  2. Streamlit UI: streamlit-image.py - this Python program downloads the prediction image from Hopsworks and displays it. You need to set the HOPSWORKS_API_KEY environment variable in your Streamlit application. You create the HOPSWORKS_API_KEY in app.hopsworks.ai.
  3. Notebooks:
  • surf-report-feature-pipeline.ipynb: Downloads the latest surf report for today and writes it to the lahinch feature group. Run manually first with 'BACKFILL=True' to fill the feature group with some surf reports from 2004 from a csv file.
  • swell-predictions-feature-pipeline.ipynb: Downloads the latest swell predictions and writes them to swells_exploded. Run manually first with 'BACKFILL=True' to fill the feature group with some swell predictions from 2004 from a csv file.
  • training-pipeline.ipynb: Trains a k-nearest neighbor model using scikit-learn. Creates training data using a feature view lahinch_surf that is created by performing a point-in-time correct join of features from the lahinch and swells_exploded feature groups.
  • batch-prediction-pipeline.ipynb: Gets the latest feature values for the lahinch_surf feature view and makes predictions of the surf heights for every 2 hours for the next 238 hours. It writes the predictions to a feature group wave_predictions and generates a PNG image with the predictions that is uploaded to Hopsworks. Streamlit downloads and shows this PNG as the surf predictions.
  1. Scripts: these are run by the Github Actions workflows. They use nbconvert to convert the notebooks to Python programs that are then run.

Data Sources

Buoy for Predictions

Surf Height Observations at Lahinch Beach