NYC-taxi-dataset

The New York City taxi dataset is a very good example of a large dataset with lots of immediate analytics applications. We use it in different tutorials to demonstrate examples of data-processing and modeling using R, Microsoft R. The original data for yellow cabs (one CSV per month) can be downloaded directly from the above link. Here is an example. A data dictionary is also available here.

To run the R code shown in the examples, you will need the RevoScaleR R package, which is not available on CRAN. Instead, you need to use Microsoft R Server, or it's light-weight non-commercial counterpart Microsoft R Client.