This repo contains my work based on my Nvidia internship as a Data Science Intern.
⚡️ Worked on data engineering aspect of a Multi-Step Time Series Forecasting problem with GPU accelerated stack on Nvidia GPU Cloud. Check out work: Distributed Data Science using NVTabular on Spark & Dask
⚡️ Improved distributed model training speed up to 10% through the migration of the data loading pipeline from Petastorm to KerasSequenceLoader. RAPIDS-NVTabular-Horovod-Spark-Databricks
hub.docker.com/r/tauhait/databricks_nvtabular