andersthuesen.github.io

TODO

  • Clean data
  • Generate overall statistics
    • Number of observations
    • Number of columns
  • Show something timeseries-ish

What is the idea?

From the TLC Trip Record data set, we explore different spatiotemporal patterns for yellow taxi trip records in 2022. Through data analysis and visualization of taxi habits we tell the story of the New Yorkers everyday life trought the magazine genre.

Our dataset consists of taxi trip record data from 2022. We obtain this dataset by concatenating trip records for each month in the year. This amounts in total to 39,656,098 trips. For each trip 19 attributes are recorded. These include the timestamp as well as the zone id of the pickup and dropoff, the trip distance, the number of passengers, the payment type, tip amount, fare amount, total amount.

This is a lot of data, which in it self is a good thing, but we have already run in to the problem whit allocating space. We have discovered many outliors and NaNs in the data, which will lower the amount of data.

Here it could be interesting to investigate where the New Yorkers most often go from and to at specific times of the day, are there any weekly-, mothly-, seasonal patterns, which areas are most generous in terms of tips, where do people most often pay by cash.