Those packages are needed for the data handling : tidyverse
, lubridate
, and rio
.
The data can be downloaded on the official CitiBike data repository : https://www.citibikenyc.com/system-data This app is designed to analyse the data of a whole year, and it was based on 2016 data that can be found here : https://s3.amazonaws.com/tripdata/index.html
Run the src/0_Data_downloading.R
script for downloading data, after setting the parameters to the year and months that you want to explore.
The csv
files take a large space as each row contains all of the informations about the starting and the ending citibike station.
The first part of the data preparation is then to reduce those files
Run the script src/1_Data_minifying.R
that will :
- Remove trips longer than 1h (less that 2% of the trips)
- Convert all time to a unique format (for 2016, January to September are MMDDYYYY, but October to December are YYYYMMDD).
- Rename column names (from
start station id
,start station name
etc. to more formatted names likestartID
,startName
etc.) - Extract station informations (
ID
,Name
,Lat
,Long
) to other csv (one for each month of data) - Remove those informations (keeping
startID
andendID
) from the trip datasets - Save all the newly formatted files into
data/2016mm-citibike-tripdata_mini.csv
files
The last script, src/3_Data_compressing.R
, will read all the minified trip files and gather those inside a single .Rdata
compressed file, so that it can be directly loaded into R
.
It will also filter the trips according to these steps :
- Remove trips where stations are incorrectly located (
NULL island
, or too far away from the main area) - Remove trips from/where inexistant stations
- Remove short (<2 min) and long (> 1h) trips
It will then create new columns for easier exploration :
Date
: contains the Date, without timestampMonth
: a factor of the english name of each monthWDay
: a factor of the english name of each day of the week, starting on MondayDay
: the number of the dayDHour
: The decimal hour (eg. 5h45 is 5.75)Hour
: The plain hour (eg. 5h45 is 5)userGender
: a factor of the user's gender :0
is unknown,1
is male,2
is female
Once all of this is done, you can run the Shiny App with shiny::runApp()
You'll need those packages :
shiny
, scales
, tidyverse
, leaflet
(Development version), leaflet.extras
(GitHub version)
The running app can also be used here : http://shiny.parisgeo.cnrs.fr/CitiBikeEDB/
Those scripts are under GPLv3
licence, and the shiny app is licensed under the GNU-AGPLv3
licence.