Hi, I'm Alex! I have a technical background and hold a Master of Science (M.S.) degree in Geoscience from Saint Petersburg Mining University, with a major in Seismic Data Processing and Analysis. I have 12 years of experience in the field of geoscience, where I held roles such as Data Processing Engineer, Head of Department, and CTO in a technological startup. After a year of transitioning from geoscience to Data Analysis, I am currently a Data Analyst at a Fintech company, starting from June 2021.
Achieved the 7th place at the open Data Analyst 2021 competition held by Career Factory.
My article on Habr on matching two open data datasets with help of Machine Learning ---> ENG | RUS
My Data Analytics blog on Medium
My CV in pdf
This repository serves to showcase my skills and as a platform to share my projects, and a way to track my progress in Data Analytics and Data Science-related topics.
- About
- Portfolio Projects
- Study Projects
- Advanced A/B Testing Course
- Algorithms Training 3.0 by Yandex
- Data Structures & Algorithms in Python
- A/B Testing Course by Google
- sklearn ML course
- Kaggle 30 Days of ML
- Data Analysis Course
- Data Analysis Course Tinkoff-MSU
- Learning SQL
- Python Developer Track
- Computer Science Career Path
- Google Python Class
- Side Projects
- Tableau Vizzes
- Certificates
- Contacts
In this section I will list data analytics projects briefly describing the technology stack used to solve cases.
Code: video_games_sales.ipynb
Description: The dataset contains 16715 records as of 2016. There is a list of video games with sales (by region), year of release, platform, critics and users score. The project includes the following steps: data loading, data cleaning and preprocessing, filling missing values, EDA (exploratory data analysis), analyzing region based user profiles, measuring statistical factors, hypothesis testing.
Skills: data cleaning, data analysis, descriptive statistics, central limit theorem, hypothesis testing, data visualization.
Technology: Python, Pandas, Numpy, Scipy Stats, Seaborn, Matplotlib.
Results: Review of the global and regional video games markets, data-based business recommendations.
Code: final_project.ipynb
Presentation: my_project_slides.pdf
Description: The final project for 5 month Data Analysis Course. Setup: you're employed in a mobile games development company. A Product Manager gives you following tasks: to find and visualize retention, to make a decision based on the A/B test data, to suggest a number of metrics to evaluate the results of the last monthly campaign.
Skills: data cleaning, detecting data anomalies, python coding, data visualization, descriptive statistics, dealing with outliers, A/B tests, Shapiro–Wilk test, Levene's test, data transforms, Mann–Whitney U test, proportions z-test, bootstrapping, defining metrics.
Technology: Python, Pandas, Numpy, Scipy Stats, Seaborn, Matplotlib, Statsmodels Stats, Bootstrap.
Results: python functions to calculate and plot users retention, hypothesis testing, detecting statistically significant result with a recommendation to push tested in-app changes into production, a set of metrics to evaluate success of promotion campaign.
Code: notebook.ipynb
Presentation: slides.pdf
Description: My solution to an open Data Analyst competition held by Карьерный Цех. The solution took the 7th place in the competition (≈100 solutions were submitted by participants).
Skills: evaluating A/B-test design, data cleaning, data anomalies detection, checking splitting system, calculating conversion rate, calculating bounce rate, log-scale transformations, Shapiro–Wilk test of distribution normality, A/B-tests (proportions z-test, Mann–Whitney rank test), plotting results, making conclusion and giving recommendations for follow-up actions.
Technology: Python, Pandas, Numpy, Scipy Stats, Seaborn, Matplotlib, Statsmodels Stats.
Results: A/B test design analysis, conclusion on rolling the new landing page design to production, recommendations on how to improve.
Code: ya_ad_int_solution.ipynb
Presentation: ya_ad_int_slides_upd.pdf
Description: My submission to Yandex Advertising Analytics internship program.
Skills: data cleaning, CTR, CPC, CPA and CR calculation, comparing metrics with competitors, visualizing results, drawing conclusions.
Technology: Python, Pandas, Numpy, Seaborn, Matplotlib.
Results: the slide deck with online advertising campaign analysis and recommendations on how to improve based on the service category.
Tableau Public: dashboard
Dashboard canvas: dashboard_canvas.pdf
Description: Tableau Public dashboard consisted of: calculated renting property occupation rate; analytical chart to choose the best property by occupation rate, review score and price per night; a ranked table of top listings by calculated potential annual revenue; average price, average occupation rate and a number of unique listings KPIs; filters by neighborhood, occupation rate and a number of reviews per the last twelve month.
Skills: interview with a customer, requirements capture, designing an analytical dashboard, product delivery.
Technology: Tableau.
Results: created an analytical dashboard to support daily activities of a company involved in apartments renting business.
In this section I will provide links to my github repositories containing code and jupyter notebooks I created while passing online courses or while just having fun with data and code.
Description: Advanced A/B testing course by karpov.courses.
This self-paced course explores such topics as Basics of Statistics, Hypothesis testing, Experimental design, Design testing, Confidence intervals, Improving test sensitivity, Metric selection, Cuped, Stratification, Multiple testing, Traffic splitting, Analysis of ratio metrics (Linearization and Delta Method) and a Complete A/B testing pipeline. All with an extensive coding practice in Python.
Repository: Check the repository having my solutions on Advanced A/B testing course tasks and challenges ---> go to repo..
Status: Completed in June 2023 (please, check the certificates section below).
Description: Algorithms Training 3.0 by Yandex. The course official page.
This 1 month algorithms and data structures coding journey explores such topics as stacks, queues, dynamic programming, graphs, DFS, BFS, etc.
Repository: Check the repository having my notes and solutions on Algorithms Training 3.0 by Yandex based on the course lectures, tasks and materials ---> go to repo..
Status: Completed in April 2023 (please, check the certificates section below).
Description: Data Structures & Algorithms in Python on Udacity by Google. The course official page.
This 1 month course introduces common data structures and algorithms in Python. It overviews frequently-asked technical interview questions and teaches how to structure your responses.
Repository: Check the repository having my notes and useful links on Data Structures & Algorithms in Python based on the course lectures, tasks and materials ---> go to repo..
Status: Completed in March 2023 (please, check the certificates section below).
Description: A/B Testing Course by Google. The course official page.
This 1 month course covers how to choose and characterize metrics to evaluate your experiments, how to design an experiment with enough statistical power, how to analyze the results and draw valid conclusions.
Repository: Check the repository having my notes and useful links on A/B testing based on the course lectures, tasks and materials ---> go to repo..
Status: Completed in February 2023 (please, check the certificates section below).
Description: Machine Learning in Python with scikit-learn by France Université Numérique. The course official page.
This 3 month course is an in-depth introduction to predictive modeling with scikit-learn. Step-by-step and didactic lessons introduce the fundamental methodological and software tools of machine learning, and is as such a stepping stone to more advanced challenges in artificial intelligence, text mining, or data science.
Repository: Check the repository having jupyter notebooks with the course lectures and tasks' solutions ---> go to repo..
Status: Completed in May 2022 (please, check the certificates section below).
Description: 30 days of Machine Learning by Kaggle. The course rapidly covers the most essential skills needed to get hands dirty with data and quickly learn how to build machine learning models.
Repository: Check the repository having jupyter notebooks with the course tasks' solutions ---> go to repo..
Status: Completed in August 2021 (please, check the certificates section below).
Description: This is a 5 month specialization by karpov.courses. The specialization includes Python, API, Git, Airflow, SQL, Statistics, A/B testing, Visualization, Product development and Product Analytics modules.
Repository: Check the repository having 37 data analysis mini-projects ---> go to repo..
Status: Completed in July 2021 (please, check the certificates section below).
Description: This is a 3 month course by Tinkoff Education. The course was created for Moscow State University Faculty of Mechanics and Mathematics students and includes following topics: Introduction to Data Analysis, SQL, Data Visualization in Python, A/B tests, Data Interpretation, Models, Logistic regression, Mobile Analytics, Random Forest, etc..
Repository: Check the repository having my code and solutions for home-tasks and projects of the course ---> go to repo..
Status: Completed in May 2021 (please, check the certificates section below).
Description: SQL queries for tasks from codecademy, sql-ex.ru, stepik, sql module on Yandex Praktikum, etc.
Repository: Check the repository having 400+ SQL queries ---> go to repo..
Status: Some of the courses are still in progress.
Description: 25 projects, 154 hours, 300 topics python developer track from JetBrains Academy
Repository: Check the repository having 11 completed projects including: Hangman, Tic-Tac-Toe, Rock-Paper-Scissors games; Matrix calculator, own-coded Regex engine, To-Do list, etc. ---> go to repo..
Status: Completed 11 projects, studied 116 topics from the track to practice my python skills. Will revert back to the track later.
Description: 20 weeks Computer Science Career Path from Codecademy. The career path includes following topics: command line commands, git, python 3, OOP, linear data structures, complex data structures, asymptotic notation, recursion, sorting algorithms, search algorithms, graph search algorithms.
Repository: Although the career path has been already completed the repository is still under development, having only 9 listed projects including: words statistics calculator, English nouns pluralizer, English verbs conjugation, censor engine, etc. ---> go to repo..
Status: Completed in July 2020 (please, check the certificates section below).
Description: This is a free class for people with a little bit of programming experience who want to learn Python by Google. Topics covered: strings, lists, sorting, dicts, files, regular expressions, utilities, urllib.
Repository: Containing 10 cool projects including: mimicking random text generator, baby-names popularity counter (based on data from The Social Security administration US), etc. ---> go to repo..
Status: Completed in November 2020.
Description: Side projects and various code snippets I'm having fun with.
Repository: pull-ups ladder calculator, motivational bad habits tracker, my solutions to coding problems for Tinkoff Fintech Junior / Tinkoff Internship admission tests, applications to internships, Google Sheets Course by Yandex Praktikum, etc. ---> go to repo..
Status: ∞
Description: My Tableau Public account ---> go to Tableau..
Status: ∞
I believe that the best way to showcase skills is by doing and sharing your job done but sometimes certificates appear to be as an indirect result:) So here is a list of the ones I have (in reverse-chronological order, with the date of completion in brackets):
- Advanced A/B Testing Course (Jun 2023) (karpov.courses)
- Algorithms Training 3.0 by Yandex (Apr 2023) (Yandex)
- Data Structures & Algorithms in Python (Mar 2023) (Udacity - Google)
- A/B Testing (Feb 2023) (Udacity - Google)
- Teamlead 101 (Jul 2022) (Stratoplan Management School)
- sklearn ML course (May 2022) (France Université Numérique)
- Intermediate Machine Learning (Aug 2021) (Kaggle)
- Intro to Machine Learning (Aug 2021) (Kaggle)
- Data Analyst Specialization (Jul 2021) (karpov.courses)
- Jira and Confluence basics (Jun 2021) (GeekBrains)
- Databases for Developers: SQL Foundations (Jun 2021) (Oracle)
- Data Analysis Course Tinkoff-MSU (May 2021) (Tinkoff Education)
- Data Analyst Professional Development Training (Mar 2021) (Yandex Praktikum & University 20.35)
- Data Literacy Certificate (Mar 2021) (Qlik Q, accenture, Data Yoga)
- New Features in Python 3.9 course (Jan 2021) (RealPython)
- Fintech Trends (Dec 2020) (Tinkoff Education)
- Data Science Math Skills (Oct 2020) (Coursera - Duke University)
- Computer Science Career Path (Jul 2020) (Codecademy)
- Learn the Command Line Course (Jul 2020) (Codecademy)
- Learn Git Course (Jun 2020) (Codecademy)
- Learn Python 3 Course (Jun 2020) (Codecademy)
- English for Career Development (Feb 2018) (Coursera - University of Pennsylvania)
- Learning How to Learn (Feb 2018) (Coursera - University of California San Diego)
- Fundamentals of Project Planning and Management (Oct 2015) (Coursera - University of Virginia)
- Introduction to Linux (Dec 2014) (Stepik - Bioinformatics Institute)
- IELTS Academic (Overall Band Score 7.0 - Proficient English User (C1)) (Apr 2014)
- LinkedIn: @nktnlx
- Telegram: @nktnlx
- Twitter: @nktn_lx
- E-mail: nktn.lx@gmail.com