Data Science course projects
The projects were implemented during training at the school of data analysis Yandex-Practicum, by profession "Data Scientist". Below is a list of projects with a brief description and used libraries.
Link: https://practicum.yandex.ru/data-scientist/
Project name | Description | Used Libraries |
---|---|---|
Age detection (CV) | A machine learning model has been created that determines the age of a person from his photo. | pandas, tensorflow, matplotlib |
Credits | Research of reliable borrowers. | pandas, numpy, pymystem3 |
Real estate | Research analysis of the cost of apartments in St. Petersburg and neighboring settlements was carried out according to data for several years. | pandas, numpy, matplotlib |
Telecom | Based on a sample of 500 users of the Megaline company, the behavior of customers for 2018 is analyzed on two tariff plans: "Smart" and "Ultra". | pandas, numpy, seaborn, scipy |
Tariff recommendation | Comparison of models for solving the problem of binary classification for choosing the optimal tariff for customers of the mobile operator "Megaline". | pandas, numpy, matplotlib, sklearn, pylab |
Client churn | Building a model for solving the binary classification problem for predicting the outflow of Beta-Bank customers. | pandas, numpy, matplotlib, sklearn, plotly |
Study of oil producing regions | The data of oil samples in three regions were considered, where the quality of oil and the volume of its reserves were measured. A machine learning model has been built to determine the region where mining will bring the greatest profit. Possible profits and risks are analyzed using the Bootstrap technique. | pandas, numpy, sklearn |
Linear algebra | It is necessary to protect the data of clients of the insurance company "Though the Flood". A data conversion method has been developed to make it difficult to recover personal information from them. The correctness of its work is substantiated. | pandas, numpy, sklearn |
Car prices | Service for the sale of used cars "Not beaten, not beautiful" is developing an application to attract new customers. In it, you can quickly find out the market value of your car. | pandas, numpy, matplotlib, seaborn, sklearn, lightgbm, catboost |
Taxi order forecast | The options for building machine learning models for predicting the number of taxi orders for the next hour are considered. | pandas, numpy, matplotlib, statsmodels, seaborn, sklearn, lightgbm, catboost, xgboost |
Toxic comments | The text classification models that determine the toxicity of the commentary text are considered. | pandas, numpy, matplotlib, sklearn, torch, torch, transformers, nltk |
Air transportation (SQL) | The analysis of passenger demand for flights to cities where the largest cultural festivals are held is carried out. The source of information for the study was the airline database. | pandas, numpy, matplotlib, scipy |
Console games research | An analysis of world sales of computer games, user and expert ratings, genres and platforms based on historical data from open sources is presented. The study is based on data up to 2016. | pandas, numpy, seaborn, scipy, plotly |
Gold recovery process | The project contains the analysis of data on the concentration of metals at different stages of mining and ore refining. | pandas, numpy, matplotlib, sklearn, plotly |
Final project | The task was set to learn how to predict the outflow of customers of the telecom operator "Notasingledisconnec.com". | pandas, numpy, matplotlib, seaborn, scipy, plotly, sklearn |