/pyspark-horse-race-predict

Using PySpark for Tensorflow model inferencing on GCP Dataproc Cluster. Demo for PyCon Hong Kong Fall 2020 Presentation

Primary LanguageJupyter Notebook

Integrating data pipeline with Tensorflow Model Deployment using PySpark

  • Presentation for PyCon HK 2020 Fall Session (Cantonese Track)
  • Speaker: Winnie Yeung
  • Slide Deck

Problem Description

How can we predict the winning horse out of each race at Jockey Club horse race?

Tech Stack

  • GCP Dataproc, PySpark 2.4.7, Pandas, Tensorflow 2.0, Java

Running jobs

  • Individual Script: pyspark < script.py
  • Submit job on GCP Dataproc: /shells/ nohup ./submit_inference_job.sh &

Credits

Useful links: