Welcome! This repository contains a selection of data science and machine learning projects I’ve worked on, demonstrating a broad skill set across predictive modeling, natural language processing, deep learning, and more. Each project includes an overview, methodology, key results, and insights, along with the code and relevant data files.
- Waiter Tips Analysis & Prediction
- Language Detection
- Text Emotions Classification
- Telco Customer Churn Prediction
- Future Sales Prediction
- Electricity Price Prediction
- Online Payments Fraud Detection
- Classification with Neural Networks
- Stress Detection
- Data Preprocessing & EDA: Feature engineering, handling missing values, visualizations.
- Model Building: Supervised & unsupervised learning, time series forecasting, neural networks.
- Natural Language Processing (NLP): Sentiment analysis, text classification, language detection.
- Deep Learning: Neural networks, LSTM, transformer models.
- Model Evaluation & Optimization: Cross-validation, grid search, and performance metrics.
Goal: Analyze and predict the tips waitstaff receive based on factors like party size and meal time.
- Techniques: Linear Regression, EDA with visualizations.
- Results: Identified key factors affecting tips, allowing for better waitstaff insights.
- Code: Link to Project
Goal: Detect the language of given text samples.
- Techniques: NLP, Naive Bayes, Logistic Regression.
- Results: Achieved high classification accuracy across multiple languages.
- Code: Link to Project
Goal: Classify text data into emotions (e.g., joy, anger, sadness).
- Techniques: Natural Language Processing (NLP) with TF-IDF, Naive Bayes, and SVM.
- Results: High accuracy on multi-class emotion classification.
- Code: Link to Project
Goal: Predict customer churn for a telecommunications company.
- Techniques: Logistic Regression, Random Forest, and XGBoost.
- Results: Enabled insights into churn drivers and implemented predictive models.
- Code: Link to Project
Goal: Forecast future sales for a retail company using historical sales data.
- Techniques: Time series forecasting with ARIMA, Prophet, and LSTM models.
- Results: Accurate forecasting of weekly sales, aiding inventory and resource management.
- Code: Link to Project
Goal: Predict future electricity prices using time series data.
- Techniques: Time series analysis, LSTM and GRU models for sequential data.
- Results: Successfully forecasted short-term price fluctuations.
- Code: Link to Project
Goal: Develop a model to detect fraudulent online transactions and reduce false positives.
- Techniques: Logistic Regression, Random Forest, Isolation Forest, and SMOTE for handling imbalanced data.
- Results: Achieved high recall with minimized false positives, crucial for effective fraud detection.
- Code: Link to Project
Goal: Explore neural networks for classification on structured datasets.
- Techniques: Multilayer Perceptron (MLP) networks, hyperparameter tuning.
- Results: Improved performance over traditional models, demonstrating the power of neural networks.
- Code: Link to Project
Goal: Detect stress levels based on physiological data.
- Techniques: Data preprocessing, feature engineering, ensemble models.
- Results: Identified patterns indicating stress, opening paths for real-time applications.
- Code: Link to Project
These projects highlight my proficiency in data science and machine learning with practical applications across various domains. I enjoy the challenge of translating data into actionable insights and building predictive models that make a tangible impact.
- Programming: Python (Pandas, NumPy, Scikit-Learn), Jupyter Notebooks
- Visualization: Matplotlib, Seaborn, Plotly
- Machine Learning: Scikit-Learn, TensorFlow, Keras
- NLP: NLTK, TF-IDF, Word Embeddings
- Time Series Analysis: ARIMA, Prophet, LSTM