This repository contains the results I delivered to a spatial data science company for a Junior Data Scientist role (3-9 Nov 2020)
The first process is ETL - NYC taxi data and census block group geometries were loaded to the PostGIS.
The next step is to explore the data to build a baseline model for predicting number of taxi pickup number using ACS dataset.
The two tasks were separately handled in different jupyter notebooks:
Summary report is available: pdf.
All data files were saved under "./data" directory.
- NYC taxi data (Jan, Apr, July 2015) .zip
- ACS demographic and socio-economic data by census block group .csv
- NYC census block group geometries .json
Two docker containers for a PostGIS database and Jupyter notebook were created using a docker compose.
docker-compose up
Libraries for geospatial data processing (e.g. geopandas, GeoAlchemy2) were used in addition to the general pydata stack (pandas, numpy, sklearn, matplotlib, seaborn).
[] Expand data exploratory analysis
[] Improve modeling