This repository contains two projects that leverage MongoDB for data science applications. Both projects showcase the powerful integration of MongoDB with data analysis and machine learning workflows.
Objective:
Perform an in-depth analysis of the NYC Taxi Trips dataset to uncover insights and trends related to taxi operations, passenger behavior, and geographical patterns.
Key Features:
- Data Exploration: Loading and exploring large-scale datasets with MongoDB.
- Data Aggregation: Utilizing MongoDB's aggregation framework to perform complex queries and data summarization.
- Visualization: Creating meaningful visualizations to represent findings, such as trip distributions, revenue patterns, and peak hours.
Outcome:
Generated valuable insights into the taxi operations in NYC, helping understand peak times, revenue patterns, and the impact of geographical factors on taxi services.
Objective:
Build a machine learning model to predict car prices based on various features such as make, model, year, mileage, etc., using MongoDB as the backend database.
Key Features:
- Data Preprocessing: Efficiently storing, retrieving, and preprocessing car-related data using MongoDB.
- Model Building: Implementing regression models to predict car prices with high accuracy.
- Model Evaluation: Evaluating model performance using metrics like Mean Absolute Error (MAE) and R-squared.
Outcome:
Achieved a predictive model capable of estimating car prices with a good degree of accuracy, demonstrating the seamless integration of MongoDB with machine learning pipelines.
- Python 3.x
- MongoDB installed locally or accessible via cloud
- Required Python packages listed in
requirements.txt
-
Clone the repository:
git clone https://github.com/MisbahullahSheriff/mongodb-with-data-science.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Ensure MongoDB is running and accessible.
- Navigate to the respective project directories to find detailed instructions on how to run the code.
- Use Jupyter notebooks provided for step-by-step execution.