/Kaggle-NYC-Taxi-Trip-Duration

Python / Kaggle / Regression - Predict the trip duration of NYC Taxi trip using location data with regression algorithms.

Primary LanguagePython

Predict taxi trip duration of New York City - Kaggle Competition

In this Kaggle competition, the goal was to create a model able to predict the trip duration of New York City taxi trip. Within this Jupyter Notebook, we run an A to Z data science project. By in a first time analyzing the data and identify potential bias in the data. Then processing and cleaning the data. We also compute new features using features engineering process and finally add some more variables using open data. Finally, we created regression algorithms and optimize their hyperparameters. We finally reached a Log RMSE of 0.48757 with a Light GBM model.

Competition link: https://www.kaggle.com/c/nyc-taxi-trip-duration/overview

image

Example of features engineering we proceeded by creating neighborhoods of New York city using Kmeans algorithms