This project focuses on predicting player churn in online games using machine learning techniques. By analyzing key player metrics and behaviors, the project aims to provide actionable insights for improving player retention. The project includes a complete pipeline from data processing to model training, along with a Power BI dashboard for visual representation.
- Installation
- Usage
- Project Structure
- Project Workflow
- Features
- Files in the Repository
- Data
- Models
- Results
- Visualizations
- Contributors
- Acknowledgments
- Contact Information
- Clone the repository to your local machine.
- Ensure you have Python installed (preferably version 3.7 or above).
- Install the required Python packages using the
requirements.txt
file. - Open the Jupyter Notebook
PlayerChurnForecasting.ipynb
to run the code. - To view the Power BI dashboard, download and open the
player_churn_powerbi_dashboard.pbix
file using Power BI Desktop.
- Run the Jupyter Notebook to process the data and build predictive models.
- Open the Power BI dashboard to visualize player churn metrics and insights.
- Utilize the findings from the research paper for further analysis or academic purposes.
dataset.csv
: Raw dataset containing player metrics.final_dataset.csv
: Processed dataset after feature engineering and cleaning.PlayerChurnForecasting.ipynb
: Jupyter Notebook containing the entire workflow from data processing to model building.player_churn_powerbi_dashboard.pbix
: Power BI file with visualizations of player churn metrics.Research Paper.pdf
: Published research paper detailing the methodologies and findings of the project.
The workflow of the Player Churn Prediction project is as follows:
-
Data Collection:
- The initial raw data was collected from various gaming platforms, capturing key player metrics and behaviors.
-
Data Preprocessing:
- Loading Data:
import pandas as pd raw_data = pd.read_csv('dataset.csv')
- Data Cleaning:
- Handle missing values, remove duplicates, and normalize data types.
cleaned_data = raw_data.dropna().drop_duplicates()
- Feature Engineering:
- Create new features that enhance the predictive power of the dataset.
cleaned_data['session_duration_log'] = np.log(cleaned_data['session_duration'] + 1)
- Splitting Data:
- Split the data into training and testing datasets.
from sklearn.model_selection import train_test_split X = cleaned_data.drop('churn', axis=1) y = cleaned_data['churn'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
- Loading Data:
-
Model Building:
- Model Selection:
- Choose appropriate machine learning models for prediction.
- Here, Random Forest and Logistic Regression were selected.
- Model Training:
from sklearn.ensemble import RandomForestClassifier rf_model = RandomForestClassifier(n_estimators=100, random_state=42) rf_model.fit(X_train, y_train)
- Model Evaluation:
- Evaluate the model performance using accuracy, precision, recall, and F1-score.
from sklearn.metrics import accuracy_score, classification_report y_pred = rf_model.predict(X_test) print("Accuracy:", accuracy_score(y_test, y_pred)) print("Classification Report:\n", classification_report(y_test, y_pred))
- Model Selection:
-
Data Visualization:
- Power BI Dashboard:
- Use Power BI to visualize key metrics and player churn insights.
- Load the
final_dataset.csv
into Power BI and create visuals.
- Dashboard Integration:
- The dashboard integrates insights from the predictive model and highlights areas for improving player retention.
- Power BI Dashboard:
-
Documentation:
- The findings, methodology, and results were documented in a research paper, which was published and added to the repository.
- The paper provides detailed analysis and insights derived from the project, offering valuable contributions to the field.
- Predictive Modeling: Utilizing machine learning algorithms like Random Forest and Logistic Regression to predict player churn.
- Data Processing: Feature engineering and data cleaning techniques to enhance prediction accuracy.
- Visualization: Power BI dashboard providing a clear visual representation of churn metrics and insights.
dataset.csv
: The raw data used for the project.final_dataset.csv
: Cleaned and processed data ready for modeling.PlayerChurnForecasting.ipynb
: The main code file containing the data analysis, modeling, and prediction steps.player_churn_powerbi_dashboard.pbix
: The Power BI dashboard file to visualize and analyze the results.Research Paper.pdf
: A detailed document discussing the project’s findings and contributions.
- The
dataset.csv
contains raw player data, including metrics such as session duration, in-game purchases, and player demographics. - The
final_dataset.csv
includes the processed data with engineered features ready for model training.
- Random Forest: Used for predicting player churn with high accuracy.
- Logistic Regression: Applied as a baseline model for churn prediction.
- Evaluation Metrics: Models were evaluated using accuracy, precision, recall, and F1-score.
- The models provided strong predictive performance, with Random Forest achieving the highest accuracy.
- Key findings from the analysis revealed that certain player behaviors and demographics significantly impact churn likelihood.
-
Power BI Dashboard: The dashboard visualizes player churn metrics, allowing for easy interpretation of the data.
- Kalpana Kale
- Ruchita Patre
- Prof. Rahesha Mulla
- Thanks to the team in building the project - Kalpana Kale, Ruchita Patre
- Special thanks to the contributors of the datasets and tools used in this project.
- Appreciation to the academic and professional communities that provided valuable feedback on the research paper.
For any questions, feedback, or collaboration opportunities, feel free to reach out to me at pancham8675@gmail.com.