/data_analysis_portfolio

This is a repository that I have created to showcase skills, share projects and track my progress in Data Analytics / Data Science related topics.

Primary LanguageJupyter Notebook

Aleksandr Nikitin - Data Analysis Portfolio

About

Hi, I'm Alex! I have a technical background and hold a Master of Science (M.S.) degree in Geoscience from Saint Petersburg Mining University, with a major in Seismic Data Processing and Analysis. I have 12 years of experience in the field of geoscience, where I held roles such as Data Processing Engineer, Head of Department, and CTO in a technological startup. After a year of transitioning from geoscience to Data Analysis, I am currently a Data Analyst at a Fintech company, starting from June 2021.
Achieved the 7th place at the open Data Analyst 2021 competition held by Career Factory.

My article on Habr on matching two open data datasets with help of Machine Learning ---> ENG | RUS

My Data Analytics blog on Medium

My CV in pdf


This repository serves to showcase my skills and as a platform to share my projects, and a way to track my progress in Data Analytics and Data Science-related topics.

Table of contents

Portfolio Projects

In this section I will list data analytics projects briefly describing the technology stack used to solve cases.

Video Games Sales Analysis

Code: video_games_sales.ipynb
Description: The dataset contains 16715 records as of 2016. There is a list of video games with sales (by region), year of release, platform, critics and users score. The project includes the following steps: data loading, data cleaning and preprocessing, filling missing values, EDA (exploratory data analysis), analyzing region based user profiles, measuring statistical factors, hypothesis testing.
Skills: data cleaning, data analysis, descriptive statistics, central limit theorem, hypothesis testing, data visualization.
Technology: Python, Pandas, Numpy, Scipy Stats, Seaborn, Matplotlib.
Results: Review of the global and regional video games markets, data-based business recommendations.

A Mobile Game Data Analysis

Code: final_project.ipynb
Presentation: my_project_slides.pdf
Description: The final project for 5 month Data Analysis Course. Setup: you're employed in a mobile games development company. A Product Manager gives you following tasks: to find and visualize retention, to make a decision based on the A/B test data, to suggest a number of metrics to evaluate the results of the last monthly campaign.
Skills: data cleaning, detecting data anomalies, python coding, data visualization, descriptive statistics, dealing with outliers, A/B tests, Shapiro–Wilk test, Levene's test, data transforms, Mann–Whitney U test, proportions z-test, bootstrapping, defining metrics.
Technology: Python, Pandas, Numpy, Scipy Stats, Seaborn, Matplotlib, Statsmodels Stats, Bootstrap.
Results: python functions to calculate and plot users retention, hypothesis testing, detecting statistically significant result with a recommendation to push tested in-app changes into production, a set of metrics to evaluate success of promotion campaign.

A Landing Page Design Experiment

Code: notebook.ipynb
Presentation: slides.pdf
Description: My solution to an open Data Analyst competition held by Карьерный Цех. The solution took the 7th place in the competition (≈100 solutions were submitted by participants).
Skills: evaluating A/B-test design, data cleaning, data anomalies detection, checking splitting system, calculating conversion rate, calculating bounce rate, log-scale transformations, Shapiro–Wilk test of distribution normality, A/B-tests (proportions z-test, Mann–Whitney rank test), plotting results, making conclusion and giving recommendations for follow-up actions.
Technology: Python, Pandas, Numpy, Scipy Stats, Seaborn, Matplotlib, Statsmodels Stats.
Results: A/B test design analysis, conclusion on rolling the new landing page design to production, recommendations on how to improve.

Online Advertising Campaign Analysis

Code: ya_ad_int_solution.ipynb
Presentation: ya_ad_int_slides_upd.pdf
Description: My submission to Yandex Advertising Analytics internship program.
Skills: data cleaning, CTR, CPC, CPA and CR calculation, comparing metrics with competitors, visualizing results, drawing conclusions.
Technology: Python, Pandas, Numpy, Seaborn, Matplotlib.
Results: the slide deck with online advertising campaign analysis and recommendations on how to improve based on the service category.

Airbnb Listings Analytics

Tableau Public: dashboard
Dashboard canvas: dashboard_canvas.pdf
Description: Tableau Public dashboard consisted of: calculated renting property occupation rate; analytical chart to choose the best property by occupation rate, review score and price per night; a ranked table of top listings by calculated potential annual revenue; average price, average occupation rate and a number of unique listings KPIs; filters by neighborhood, occupation rate and a number of reviews per the last twelve month.
Skills: interview with a customer, requirements capture, designing an analytical dashboard, product delivery.
Technology: Tableau.
Results: created an analytical dashboard to support daily activities of a company involved in apartments renting business.

Study Projects

In this section I will provide links to my github repositories containing code and jupyter notebooks I created while passing online courses or while just having fun with data and code.

advanced ab testing course

Description: Advanced A/B testing course by karpov.courses. This self-paced course explores such topics as Basics of Statistics, Hypothesis testing, Experimental design, Design testing, Confidence intervals, Improving test sensitivity, Metric selection, Cuped, Stratification, Multiple testing, Traffic splitting, Analysis of ratio metrics (Linearization and Delta Method) and a Complete A/B testing pipeline. All with an extensive coding practice in Python. Repository: Check the repository having my solutions on Advanced A/B testing course tasks and challenges ---> go to repo..
Status: Completed in June 2023 (please, check the certificates section below).

algorithms training by yandex

Description: Algorithms Training 3.0 by Yandex. The course official page.
This 1 month algorithms and data structures coding journey explores such topics as stacks, queues, dynamic programming, graphs, DFS, BFS, etc.
Repository: Check the repository having my notes and solutions on Algorithms Training 3.0 by Yandex based on the course lectures, tasks and materials ---> go to repo..
Status: Completed in April 2023 (please, check the certificates section below).

data structures and algorithms in python

Description: Data Structures & Algorithms in Python on Udacity by Google. The course official page.
This 1 month course introduces common data structures and algorithms in Python. It overviews frequently-asked technical interview questions and teaches how to structure your responses.
Repository: Check the repository having my notes and useful links on Data Structures & Algorithms in Python based on the course lectures, tasks and materials ---> go to repo..
Status: Completed in March 2023 (please, check the certificates section below).

ab testing course by google

Description: A/B Testing Course by Google. The course official page.
This 1 month course covers how to choose and characterize metrics to evaluate your experiments, how to design an experiment with enough statistical power, how to analyze the results and draw valid conclusions.
Repository: Check the repository having my notes and useful links on A/B testing based on the course lectures, tasks and materials ---> go to repo..
Status: Completed in February 2023 (please, check the certificates section below).

sklearn ml course

Description: Machine Learning in Python with scikit-learn by France Université Numérique. The course official page.
This 3 month course is an in-depth introduction to predictive modeling with scikit-learn. Step-by-step and didactic lessons introduce the fundamental methodological and software tools of machine learning, and is as such a stepping stone to more advanced challenges in artificial intelligence, text mining, or data science.
Repository: Check the repository having jupyter notebooks with the course lectures and tasks' solutions ---> go to repo..
Status: Completed in May 2022 (please, check the certificates section below).

Kaggle 30 Days of ML

Description: 30 days of Machine Learning by Kaggle. The course rapidly covers the most essential skills needed to get hands dirty with data and quickly learn how to build machine learning models.
Repository: Check the repository having jupyter notebooks with the course tasks' solutions ---> go to repo..
Status: Completed in August 2021 (please, check the certificates section below).

Data Analyst Specialization

Description: This is a 5 month specialization by karpov.courses. The specialization includes Python, API, Git, Airflow, SQL, Statistics, A/B testing, Visualization, Product development and Product Analytics modules.
Repository: Check the repository having 37 data analysis mini-projects ---> go to repo..
Status: Completed in July 2021 (please, check the certificates section below).

Data Analysis Course Tinkoff-MSU

Description: This is a 3 month course by Tinkoff Education. The course was created for Moscow State University Faculty of Mechanics and Mathematics students and includes following topics: Introduction to Data Analysis, SQL, Data Visualization in Python, A/B tests, Data Interpretation, Models, Logistic regression, Mobile Analytics, Random Forest, etc..
Repository: Check the repository having my code and solutions for home-tasks and projects of the course ---> go to repo..
Status: Completed in May 2021 (please, check the certificates section below).

Learning SQL

Description: SQL queries for tasks from codecademy, sql-ex.ru, stepik, sql module on Yandex Praktikum, etc.
Repository: Check the repository having 400+ SQL queries ---> go to repo..
Status: Some of the courses are still in progress.

Python Developer Track

Description: 25 projects, 154 hours, 300 topics python developer track from JetBrains Academy
Repository: Check the repository having 11 completed projects including: Hangman, Tic-Tac-Toe, Rock-Paper-Scissors games; Matrix calculator, own-coded Regex engine, To-Do list, etc. ---> go to repo..
Status: Completed 11 projects, studied 116 topics from the track to practice my python skills. Will revert back to the track later.

Computer Science Career Path

Description: 20 weeks Computer Science Career Path from Codecademy. The career path includes following topics: command line commands, git, python 3, OOP, linear data structures, complex data structures, asymptotic notation, recursion, sorting algorithms, search algorithms, graph search algorithms.
Repository: Although the career path has been already completed the repository is still under development, having only 9 listed projects including: words statistics calculator, English nouns pluralizer, English verbs conjugation, censor engine, etc. ---> go to repo..
Status: Completed in July 2020 (please, check the certificates section below).

Google Python Class

Description: This is a free class for people with a little bit of programming experience who want to learn Python by Google. Topics covered: strings, lists, sorting, dicts, files, regular expressions, utilities, urllib.
Repository: Containing 10 cool projects including: mimicking random text generator, baby-names popularity counter (based on data from The Social Security administration US), etc. ---> go to repo..
Status: Completed in November 2020.

Side Projects

Description: Side projects and various code snippets I'm having fun with.
Repository: pull-ups ladder calculator, motivational bad habits tracker, my solutions to coding problems for Tinkoff Fintech Junior / Tinkoff Internship admission tests, applications to internships, Google Sheets Course by Yandex Praktikum, etc. ---> go to repo..
Status:

Tableau Vizzes

Description: My Tableau Public account ---> go to Tableau..
Status:

Certificates

I believe that the best way to showcase skills is by doing and sharing your job done but sometimes certificates appear to be as an indirect result:) So here is a list of the ones I have (in reverse-chronological order, with the date of completion in brackets):

Contacts