- Presentation for PyCon HK 2020 Fall Session (Cantonese Track)
- Speaker: Winnie Yeung
- Slide Deck
How can we predict the winning horse out of each race at Jockey Club horse race?
- GCP Dataproc, PySpark 2.4.7, Pandas, Tensorflow 2.0, Java
- Individual Script:
pyspark < script.py
- Submit job on GCP Dataproc: /shells/
nohup ./submit_inference_job.sh &
- Lantana Camara Dataset on Kaggle: https://www.kaggle.com/lantanacamara/hong-kong-horse-racing
- Cullen Sun's Tensorflow Model Design on this dataset: https://www.kaggle.com/cullensun/deep-learning-model-for-hong-kong-horse-racing/
- Jockey Club webscraping package: https://github.com/jaloo555/HK-Horse-Racing-Data-Scraper
- Databricks Model Inferencing Guides: https://docs.databricks.com/applications/machine-learning/model-inference/dl-model-inference.html
- Medium posts on PySpark Pipeline Data Transformation: