In this repository, you will find the source code to various projects I have been working on or still work-in-progress. The majority of the projects are accompanied by a Medium blog posts at medium.com/@tuandoan.nguyen. I published almost exclusively on Towards Data Science publication through Medium's Partnership program so please check out these articles as a way to support me and my future projects. Alternatively, you can also find my blog posts at my personal website here.
My interests lie in the intersection of statistical techniques, data visualization and sports (especially football). All the codes are written entirely in Python or R. I don't have a strong preference or attempt to make a concerted effort to code in a specific language/platform. The decision is mostly based on how specific functionalities needed for a project are supported (scraping in Python and data processing with dplyr piping in R).
A collection of projects that explore the intricate statistical aspect of the Beautiful Game
- Empirical Bayes and penalty taking ability Using Bayesian statistics to make meaningful comparison between players across Europe.
- Poisson process and match prediction - Here we learn about the Poisson process and how a random model outperforms football experts with its prediction.
- The mathematics of football betting strategies - With the Poisson model and some additional help from mathematical research, can we beat the bookies?
- Fisher vs Neyman-Person debate and Paul the Octopus - We went over the theory (or many theories) of hypothesis testings and see how they apply to the psychic ability of Paul the Octopus.
- Bayes theorem and a probabilistic argument for God
- Dating with probability theory - Here we explore what probability theory has to say about the most optimal strategy to find the love of your life.
- Bayes theorem and why it matters to my workout routine: A lightweight introduction to Bayes' theorem and how it helps convince me to hit the gym.
- NetworkX and Basemap - Here is a comprehensive tutorial of how we can visualize geographical data with powerful tools that support Python.
- Tkinter and Python - Building your own firework shows with Tkinter (and some math chops).
- Data visualization with Matplotlib and Seaborn - Learn how to construct publish-worthy visualizations with Matplotlib and Seaborn packages.
- End-to-end Machine Learning project with R - Here is a full data science project that covers data collection, cleaning, visualization, machine learning and validation.
- Unsupervised Learning - Clustering method with R - An introduction to an array of unsupervised learning algorithms: Hierachical clustering, k-means, and Factor Analysis.