As a regular user of the MBTA transit system during my time at Harvard, particularly relying on the red line at Harvard Station for quick commutes to MIT, I recognized the importance of accurate transit predictions. While the MBTA provides scheduled predictions, there is an opportunity to create our own real-time predictions. This project aims to encompass a comprehensive full-stack Machine Learning engineering workflow. From data engineering and machine learning to DevOps, MLOps, and web development, the project's goal is to develop a real-time streaming dashboard that predicts whether a transit line will arrive on time or experience delays.
The primary objective of the MBTA Data Streaming and Prediction Project is to leverage data science and engineering to enhance transit prediction accuracy. By combining real-time data streaming and predictive modeling, we aim to create a system that provides users with valuable insights into transit timings. The project involves several key components:
-
Real-Time Data Streaming: Gathering real-time data from the MBTA transit system, including information on train schedules, delays, and historical performance.
-
Data Engineering: Processing and transforming the collected data to prepare it for use in predictive modeling.
-
Predictive Modeling: Developing machine learning models that analyze historical and real-time data to predict whether a transit line will be on time or experience delays.
-
Real-Time Predicting: Implementing a streaming pipeline that continuously updates predictions as new data arrives.
-
Dashboard Development: Creating a user-friendly web dashboard that displays real-time transit predictions and provides insights into transit line performance.
-
DevOps and MLOps: Establishing an effective DevOps workflow to automate deployment and monitoring of the streaming pipeline and models.
- Develop accurate predictive models for MBTA transit line timings, integrating both historical and real-time data.
- Implement a data streaming pipeline that continuously updates predictions as new data becomes available.
- Create an interactive web dashboard that enables users to monitor real-time transit predictions.
- Apply DevOps and MLOps practices to ensure automated deployment, scaling, and monitoring of the entire system.
- Python
- Javascript
- Kafka
- KSQL
- Machine Learning Libraries
- Flask
- Docker
- Mapbox
- Postgres
- FastAPI
- GitHub Actions (for CI/CD)
The MBTA Data Streaming and Prediction Project aims to provide commuters with reliable and accurate transit predictions, enhancing their daily travel experience. By leveraging real-time data and predictive modeling, we hope to contribute to improved transit planning and decision-making for MBTA users.
To get started with this project, follow the instructions in the project documentation (Coming Soon) to set up the required environment, run the data streaming pipeline, deploy the predictive models, and access the real-time dashboard.
This Project is for Personal Hobby Use Only
This project is developed solely for personal interest and is not intended to be used as a consumer product or as a decision-making tool. It may contain inaccuracies or errors, and there are no guarantees or warranties associated with its use. Users are encouraged to use their discretion and verify any information provided by this project independently.
Not Affiliated with the MBTA
I want to make it explicitly clear that I am not affiliated with the Massachusetts Bay Transportation Authority (MBTA) or any other official transportation organization. This project is independent and unofficial.
Please exercise caution and use any information or functionality provided by this project responsibly and in accordance with the laws and regulations governing your location.