Data science project where the new york subway dataset is analyzed.
The project is divided in three parts:
- Data collect (python, http request, pandas);
- Data analysis (pandasql, matplotlib, numpy);
- Data processing (MapReduce);
- Install Anaconda Scientific Python
- Clone repo https://github.com/AlanPrado/FDSI2_subway_data
- Open your terminal and type in the project directory
jupyter notebook analyzing-subway-data-ndfdsi-checkpoint.ipynb
You can see the final result here
│ analyzing-subway-data-ndfdsi.html # This is a html that shows the project execution
│ analyzing-subway-data-ndfdsi.ipynb # This file is a jupyter notebook and contains all the source code and instructions
├───data # This directory will be generated will all raw data
└───output # This directory contains all processed data
mapper_result.txt
reducer_result.txt
Code and documentation copyright 2016-2017 Code released under the MIT License
- Alan Thiago do Prado (aprado.cnsp@gmail.com)