Data analysis and data science projects from the Yandex Bootcamp
Project name | Description | Skills | Libraries |
---|---|---|---|
Car price | Factors influencing car prices posted in a web page | Data cleaning, dealing with missing values | pandas , numpy , matplotlib |
Mobile tariff analysis | Revenue comparisons between mobile plans | EDA, Hypothesis test | pandas , numpy , scipy , matplotlib |
Games ratings analysis | Videogames ratings analysis and regional consumer profile based on trends | Data cleaning, mergin, visualization and analysis using hypothesis tests | pandas , scipy , seaborn , matplotlib |
Phone plan recommendation | Phone plan recomendation based on customer behavior using machine learning models | Machine learning, model evaluation | pandas , sklearn |
Bank customer churn prediction | Prediction of bank customer churn based on customer behavior to develop customer retention strategies | Build supervised learning models (logistic regression, decision trees and random forest) with class imbalance (up-, down-sampling), and model evaluation | pandas ,sklearn |
Oil well prospection | Data driven decision for selecting profitable regions to drill oil wells | Prediction using linear regression. Bootstrap for generating random distribution for risk assessment | pandas , scipy , sklearn |
Gold extraction model | Linear regression models to predict the amount of gold extracted in different stages of purification process with the aim to optimize production | Linear regression, data visualization, custom made evaluation metrics | pandas , sklearn ,seaborn |
Insurance benefits prediction | Prediction of insurance benefits using masked data for personal data protection | Linear regression using masked data applying linear algebra | pandas , numpy , math , sklearn |
Car price predictive models | Prediction of second hand car selling prices using regression models | Regression models, comparision of models' quality in terms of fitting time and RMSE | pandas , sklearn , lightGBM , Catboost |
Cab orders prediction | Cab order prediction at an airport using time series | Time series, Regression models | pandas , sklearn , statmodels |
Movie reviews sentiment analysis | Automatic classification of movies reviews using machine learning sentiment analysis | Text preprocessing, regular expressions, TF-IDF, BERT embiddings, ML | sklearn , spaCy , NLTK , transformers |