In this Kaggle competition, the goal was to create a model able to predict the trip duration of New York City taxi trip. Within this Jupyter Notebook, we run an A to Z data science project. By in a first time analyzing the data and identify potential bias in the data. Then processing and cleaning the data. We also compute new features using features engineering process and finally add some more variables using open data. Finally, we created regression algorithms and optimize their hyperparameters. We finally reached a Log RMSE of 0.48757 with a Light GBM model.
Competition link: https://www.kaggle.com/c/nyc-taxi-trip-duration/overview
Example of features engineering we proceeded by creating neighborhoods of New York city using Kmeans algorithms