This repository is the submission for the AI challenge for S.E.A hosted by Grab. The selected challenge is Safety. The challenge can be found via this website.
by: Satsawat Natakarnkitkul (Net)
Email: n.satsawat@gmail.com
Country: Thailand
Motivation: This challenge is very interesting in so many ways, but as I use Grab nearly everyday. Hence this challenge is the most impact to the Grab users.
- The notebook
Grab AI Challenge_Safety_Data Exploration.ipynb
is mainly used as part of data understanding and EDA for sensor data provided by Grab. You may not run this notebook, but it will provide some understandings and explanation onto telemetry data of the sensor world. - The notebook
Grab AI Challenge_ML model comparison.ipynb
is purposely created to train and test ML techniques to produce the final model as well as try on feature engineering and other data transformation. - The notebook
Grab AI Challenge_GridSearchCV and Feature Engineering.ipynb
shows the grid search for XGBoost algorithm; with the current model and feature engineering, it achieved the AUC score of 0.5945.
- This folder contains the final model object to be used for prediction.
- This folder contains the final python source code for manipulating, creating new features and predicting the data set.
- This folder contains the image embedded onto EDA and other notebooks.
- This folder contains the bookingID, predicted class, and probability of the prediction, this is the outcome from running the
py
script.
The model is used to predict the safety of the trip as such the assumption is that this is not the real time prediction (online), but rather an offline (data for each booking ID is available). The transformation is the aggregation of each booking ID onto single observations and feed into the model for prediction.
To run the prediction, please use Safety_Prediction.py in the code
directory.
- The script in
code
folder will read in the feature data file withindata/safety/features
folder.- If there's any change in the data path, please adjust the
DATA_DIR
onto the correct folder respectively.
- If there's any change in the data path, please adjust the
- The script will automatically run the feature transformation and engineering.
- The script will load the XGBoost model object from
model
directory to make a prediction. - The script will save the prediction with bookingID onto
../output/all_prediction.csv
file. - If
LABEL_IND = True
in the script, it will attempt to run evaluation between the prediction with true label.- The true label file should be in the
data/safety/labels
folder with the proper bookingID and label columns. - If there's any change to
labels
folder, please adjust this toLABEL_DIR
in the script respectively. - If the evaluation is not neeeded, you can turn this off by setting
LABEL_IND = False
in the script.
- The true label file should be in the