/Machine-Learning-Project

This project involves predicting Total Yield Energy of a gas turbine using linear regression models.

Primary LanguageHTML

Gas Turbine Emissions Analysis Project ๐ŸŒ๐Ÿ”ฌ

Overview ๐Ÿ“„

This project conducts a comprehensive analysis of a gas turbine's operational data to predict the total energy yield and understand the emission patterns of CO and NOx. The dataset, sourced from the UCI Machine Learning Repository, provides a detailed compilation of sensor readings that reflect the turbine's performance and environmental impact.

Table of Contents ๐Ÿ“š

Introduction ๐Ÿ‘‹

The goal of this project is to utilize machine learning techniques to forecast the energy output of a gas turbine based on various operational parameters. Through this analysis, we aim to identify key factors that influence energy efficiency and emissions, which are critical for optimizing turbine performance and reducing environmental footprints.

Data Collection ๐Ÿ›ฐ๏ธ

The analysis is based on data from a gas turbine located in Turkey during the year 2015. The dataset contains 7384 instances, each with 11 features that capture the atmospheric conditions, machine conditions, and emission levels at different timestamps.

Data Preprocessing and Cleaning ๐Ÿงน

Initial data inspection revealed a well-maintained dataset with no missing values. Feature examination was performed to understand the data types and distributions. Outliers were detected in the Turbine Inlet Temperature (TIT) and were treated appropriately to ensure they do not skew the analysis.

Exploratory Data Analysis (EDA) ๐Ÿ”

EDA was conducted to reveal underlying patterns and relationships in the data. I created univariate plots to understand the distribution of individual variables and bivariate plots to explore the correlations between the variables. The heatmaps confirmed strong correlations between Total Energy Yield (TEY) and features like Compressor Discharge Pressure (CDP), Turbine Inlet Temperature (TIT), and others.

Predictive Modeling ๐Ÿ“ˆ

I built and compared three machine learning models: Linear Regression, Random Forest, and Decision Trees. Model selection was guided by the nature of the data and the project's regression problem framework. I performed hyperparameter tuning to optimize each model's performance and conducted a thorough evaluation to compare their predictive accuracy.

Results and Discussion ๐Ÿ’ก

My modeling efforts highlighted that the Decision Tree algorithm provided the best performance after hyperparameter tuning, achieving impressive accuracy. The high correlation between some of the features and the target variable, as revealed in the EDA phase, contributed to the predictive success of the models.

Conclusions ๐ŸŽฏ

The project effectively demonstrated the application of machine learning in predicting the performance of a gas turbine. The insights gained could help in making informed decisions for operational efficiency and emission control. For future work, I would consider implementing ensemble methods and more advanced regression techniques like XGBoost to potentially improve predictive accuracy.

Getting Started ๐Ÿš€

To replicate the analysis, clone the repository, install the required Python packages, and run the Jupyter notebooks in order: Phase1_Group91.ipynb for data preprocessing and EDA, followed by Phase2_Group91.ipynb for predictive modeling.

Contributing ๐Ÿค

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Authors ๐Ÿ‘ฅ

  • Souvik Ghosh - Initial work and documentation